THE IMPACT OF CITATION BEHAVIOUR ON CITATION STRUCTURE
Loet Leydesdorff
Department of Science Dynamics
Nieuwe Achtergracht 166
1018 WV AMSTERDAM
The Netherlands
ABSTRACT
Aggregated journal-journal citation relations have been extensively analyzed in terms of structural properties of networks. However, in each year citing can be considered as action. By citing one another the journals reproduce structure in a subsequent year. Since the citation matrix contains information with respect to both the cited and the citing dimension, it should provide us with an opportunity to make predictions about this reproduction of structure.
The question of how the update of the network is affected by action at the local nodes is a well-known problem in artificial intelligence (cf. Pearl 1988). By combining the Bayesian inference of the a posteriori probability distribution with Shannon's (1948) information theory, an algorithm for the prediction of the impact of action (i.e., citing) on structure (i.e., being cited) can be derived. The model is tested empirically for a journal-journal citation network. Some theoretical implications are discussed.
DATA
Thirteen journals (see Table 1) were used for the construction of aggregated journal-journal citation matrices in 1984 and 1985 on the basis of the data listed in the Journal Citation Reports of the Science Citation Index (cf. Leydesdorff 1986; Tijssen et al. 1987). In each year, all volumes under the respective journal titles can be cited, but citing is the running variable that reflects citation behaviour. The matrices are provided in Tables 2 and 3.[1]
The grouping in the right column of Table 1, which will be used as an operationalization of structure below, is based on factor analysis of these matrices (cf. Leydesdorff 1992a).
Table 1
journals: grouping:
Chemical Physics chemical physics
Chemical Physics Letters chemical physics
Inorganic Chemistry inorganic chemistry
J. of the American Chemical Society organic chemistry
J. of Chemical Physics chemical physics
J. of the Chemical Society- Dalton Transactions inorganic chemistry
J. of Organic Chemistry organic chemistry
J. of Organometallic Chemistry inorganic chemistry
J. of Physical Chemistry chemical physics
Molecular Physics chemical physics
Physical Review A chemical physics
Tetrahedron organic chemistry
Tetrahedron Letters organic chemistry
RESEARCH QUESTION
To what extent could the change in the observed interaction between CITED and CITING journals in 1985 have been predicted on the basis of the 1984 matrix?
If structure is considered as cited, and action as citing, this would lead to a matrix of 3 cited groups of journals versus 13 citing journals. The 3 x 13 matrix is generated from the 13 x 13 matrix (in Tables 2 and 3) by aggregation of the relevant row vectors. Mutatis mutandis, one can compare predictions on the basis of the following assumptions:
I. In the case of no grouping on either side
(13 x 13 matrix)
II.Assumption of the presence of structure CITED, but not CITING
( 3 x 13 matrix)
III.Assumption of the presence of structure CITED and CITING
( 3 x 3 matrix)
(IV.Assumption of the presence of structure CITING, but not CITED
(13 x 3 matrix). This model has no obvious interpretation, since in this study citing is considered as action with reference to cited.)
DYNAMIC RELATIONS BETWEEN STRUCTURE AND ACTION
Let A be the cited structure and B the distribution of citing journals. At each moment in time the actors in this network will take action in relation to one another given this structure; thus, B is conditioned by A when it operates, i.e., B|A. Action has an impact on structure A only in instances thereafter. (See Figure 1; cf. Burt 1982.)
Figure 1
Structure (A) can also be considered as a network of which the actors (B) are the nodes. The network is added to the nodes as a communication structure (Luhmann 1984 and 1990; Leydesdorff 1991). At each moment in time, the network conditions all the actors who contribute to its reproduction; if action takes place at any node(s), this conditions the network at the next moment.
DERIVATION OF THE ALGORITHM
Theil (1972) specified on the basis of Shannon (1948) the expected information content of the message that an a priori distribution pi has changed into an a posteriori distribution qi, as follows:
I = Σi qi * log( qi / pi )
Thus, the self-referential update of the network A upon action B at the nodes is expected to contain the following information:
I(A|B : A) = Σ q(A|B) * log{ q(A|B) / p(A)}
However, according to Bayes' Rule:
p(A) * p(B|A)
q(A|B) = ─────────────
p(B)
Thus, it follows that:
I(A|B(posterior):A(prior)) = Σ q(A|B) * [log{p(B|A)/p(B)}] (1)
This formula can also be written as an improvement of the prediction (cf. Theil 1972; Leydesdorff 1990b and 1992a):
I(A|B : A) = Σ q(A|B) * log{ q(A|B) / p(B)} +
- Σ q(A|B) * log{ q(A|B) / p(B|A)
= I(A|B : B) - I(A|B : B|A) (2)
INTERPRETATION
The expected information content of the message that A is conditioned by B (left-hand side of the equation) is equal to an improvement of the prediction of the a posteriori distribution (Σ q(A|B)) if we add to our knowledge of the a priori distribution (Σ p(B)) the information about how the latter distribution was conditioned by the network distribution (Σ p(B|A)).
In other words: if one initially (at t = t) had knowledge only of the distribution of the nodes, i.e. the actors (Σ p(B)), and then became additionally informed of how the actors at the nodes are distributed given the network (Σ p(B|A)), this formula teaches us that the message of this event improves our prediction of the network distribution at the next moment (at t = t + 1). This improvement is equal to the prediction based on the a priori network.
The crucial point is the shift in the systems of reference in formula (1). The right-hand factor of the right-hand term of formula (1), i.e., (Σ p(B|A) / p(B)), describes the instantaneous conditioning of action by structure at t = t, while the left-hand factor refers to the description of the network after action, i.e., Σ q(A|B) at the next moment. Therefore, the formula explicates how action at the nodes and the network are conditioned mutually and dynamically. Note that one can also use the algorithm to test the assumption of structure on either side symmetrically.
APPLICATION
The 1984 matrix will be considered as the a priori, and the 1985 matrix as the a posteriori probability distribution. In each year, one can compute the interaction between the "cited" and the "citing" side of the matrix in terms of "mutual information" or "transmission" (Theil 1972; see also: Leydesdorff 1990a and 1990b). However, by using formula (1), one can also make a prediction of the (static) transmission in 1985 on the basis of 1984 data.
In other words: the 1984 matrix contains information about both the cited structure in this year, and of citation behaviour. These two kinds of information taken together should suffice to enable us to make a prediction about the reproduction of structure in 1985, i.e. about the impact of citation behaviour on cited structure in the later year. As noted, structure was hypothesized on the basis of factor analysis of these matrices.
RESULTS
The results are summarized in Table 4. The various possibilities for grouping in either dimension ("cited" and/or "citing") are compared in terms of the percentage of observed interaction in 1985 predicted on the basis of the data for 1984.
Table 4
Observed and expected values for the "mutual information" between the cited and the citing dimensions of aggregated journal-journal citation matrices in mbits of information.
observed expected observed percentage
1984 1985 1985 prediction
I. (in the case of no grouping on either side)
964.17 969.73 972.48
+ 5.56 + 8.31 66.9%
II. (assumption of the presence of structure CITED, but not CITING)
726.75 731.81 732.23
+ 5.06 + 5.48 92.3%
III. (assumption of the presence of structure CITED and CITING)
669.33 673.02 672.43
+ 3.69 + 3.10 119.0%
(IV. (assumption of presence of structure CITING, but not CITED)[2]
717.03 722.04 721.76
+ 5.01 + 4.73 105.9%)
CONCLUSIONS AND DISCUSSION
The assumption that the cited patterns are structured, and that citing represents independent actions, leads to a better prediction (92.3%) than assuming indepence on both sides (66.9%). Assuming structure on both sides overestimates structure in the data by 19.0%. Thus, the data suggest a degree of coupling in the citing dimension in addition to the coupling on the cited side. Note, however, that this was only a crude application of the model for the purpose of giving an example. Among other things, I did not allow for groupings into more than three groups, and I assumed that the one-year difference was an adequate time-scale.
The derivation of this one algorithm illuminates the power and utility of considering the self-referential network as a communication system separate from the systems which perform the action. In the local event(s) the systems inform one another, and they can use this information for their respective updates. The study of structure/action contingency relations is thus a special case of the analysis of co-evolutions among interacting dynamic systems. Other examples in science and technology studies of such co-evolutions include, for example, the mutual shaping of language and knowledge (Leydesdorff 1992b) or the interaction between technological trajectories and selection environments (cf. Leydesdorff 1992c). The algorithm studied above models the function of the window of communication which co-evolving systems maintain between one another.
The possibility of combining theoretical perspectives from Shannon's (1948) mathematical theory of communication and the theory of self-organizing systems is not incidental, since these theories have a common basis in (non-equilibrium) thermodynamics. The crucial point for their understanding is to refrain from the attribution of "citedness" or "citation behaviour" to journals or groups of journals as units of analysis, but to consider them rather as units of operation which may refer to different operating systems. For example, in this case "citedness" referred to a structural citation network, while "citing" was considered as an operation of each journal upon this network.
References
Burt, R. S. (1982). Toward a Structuralist Theory of Action (New York, etc.: Academic Press).
Leydesdorff, L. (1986). "The Development of Frames of References," Scientometrics 9, 103‑25.
Leydesdorff, L. (1990a). "Relations Among Science Indicators I. The Static Model," Scientometrics 18, 281‑307.
Leydesdorff, L. (1990b). "Relations Among Science Indicators II. The Dynamics of Science," Scientometrics 19, 271‑96.
Leydesdorff, L. (1991). "Structure/Action Contingencies and the Model of Parallel Computing," Paper presented at the XVth Annual Meeting of the Society for the Social Studies of Science (Cambridge, Mass.; November 1991).
Leydesdorff, L. (1992a). "The Static and Dynamic Analysis of Network Data Using Information Theory," Social Networks (forthcoming).
Leydesdorff, L. (1992b). "Knowledge Representations, Bayesian Inferences, and Empirical Science Studies," Social Science Information (forthcoming).
Leydesdorff, L. (1992c). "Irreversibilities in Science and Technology Networks: An Empirical and Analytical Approach," Scientometrics (forthcoming).
Luhmann, N. (1984). Soziale Systeme. Grundrisz einer allgemeinen Theorie (Frankfurt a.M.: Suhrkamp).
Luhmann, N. (1990). Die Wissenschaft der Gesellschaft (Frankfurt a.M.: Suhrkamp).
Pearl, J. (1988). Probabilistic Reasoning and Artificial Intelligence: Networks of Plausible Inference (San Mateo, Cal.: Morgan Kaufman).
Shannon, C. H. (1948). "A Mathematical Theory of Communication," Bell System Technical Journal 27, 379‑423, and 623-56.
Theil, H. (1972). Statistical Decomposition Analysis (Amsterdam/ London: North‑Holland).
Tijssen, R., J. de Leeuw, and A. F. J. Van Raan (1987). "Quasi‑Correspondence Analysis on Square Scientometric Transaction Matrices," Scientometrics 11, 347‑61.