Homepage | Publications | Software | Courseware; indicators | Animation | Geo | Search website (Google) |
The Triple-Helix Indicator and its Extension to Four Dimensions
(Mutual information in more than two dimensions)
at http://arxiv.org/abs/1211.7230 .
Leydesdorff, L., Park, H. W., & Lengyel, B. (2014). A Routine for Measuring Synergy in University-Industry-Government Relations: Mutual Information as a Triple-Helix and Quadruple-Helix Indicator. Scientometrics, 99(1), 27-35. doi: 10.1007/s11192-013-1079-4
The program th4.exe (v2; 16 Feb. 2021) reads an input file “data.txt” and generates (or adds to an existing) file th4.dbf containing probabilistic entropy values and mutual information values for three and (or) four nominal variables. (The source code can be found here.) In a number of studies (see the reference list at the end) we used the mutual information in three dimensions as Triple Helix indicator; for example, to measure the reduction of uncertainty (e.g., Yeung, 2008:59f.; cf. McGill, 1954) in the interactions between distributions in the geographical dimensions (addresses), organizational size, and technological capacities of firms (Lengyel & Leydesdorff, 2011; Leydesdorff et al., 2006; Leydesdorff & Fritsch, 2006; Leydesdorff & Strand, in press).
Using publications as units of analysis, the focus can be on university, industry and/or government addresses in co-authorship relations (Leydesdorff, 2003; Park et al., 2005; Ye et al., in preparation). A program for examining TH relations on a case-by-case basis is available at http://www.leydesdorff.net/th/th.exe . (The program th.exe also computes also Krippendorff’s (2009a) IABC→AB, AC, BC and the redundancy R; T = I – R (Krippendorff, 2009b; Leydesdorff, 2009, 2010).
In a number of studies (and in the literature) questions have been raised about extending the Triple Helix to more than three helices (e.g., Carayannis & Campbell, 2009 and 2010; Leydesdorff, 2012). The issue is urgent since the dimension international versus national was found to be important as an additional dimension in a number of recent studies (Ye et al., in preparation). One may wish to appreciate international coauthorship as a fourth variable (Leydesdorff & Sun, 2009; Kwon et al., 2010) or “foreign driven investment” in the case of firm data (Lengyel & Leydesdorff, 2011; Strand & Leydesdorff, in press).
This routine (th4.exe) is meant to facilitate the computation of these values in the case of large sets. This version (unlike th.exe) operates on nominal values; for example, industry codes, the names of regions, classifications; the older routine th.exe uses numerical values. In the case of numerical values, one may wish to bin these or dichotomize. For example, if three addresses are provided of which two are from universities and one from industry, these U-I relation should be counted as “1”. In other words, numbers are read as character string by this (!) program.
Input file
Input file is a text file with one case (firm, publication, patent, etc.) on each line, and maximally five variables. It should be named “data.txt.” The first variable is a case-identifier; for example, “firm1” or “id0001”. The second to fifth variable are read as four nominal variables (including “0” and “1”). If the fifth variable is missing, all values are set to zero, and the corresponding dimension (“z”) is not computed. The four dimensions are indicated as w, x, y, and z, respectively. Each variable on the input file has to be embedded in double quotation marks—do not use “curly” quotation marks—and the variables are delimited with commas. As follows:
"id1", "1", "b", "region1", "2"
"id2", "2", "a", "region2", "1"
"id3", "1", "a", "region2", "2"
"id4", "1", "b", "region5", "1"
For example, in the case of address information, the second variable may indicate the presence of a university address (Y/N), the third an industrial address, etc. In the case of firm data, the second variable may be a size category (e.g., zero for firms without employees to six for firms with more than 500 employees), the third variable a technology code (e.g., OECD’s NACE codes), the third an indication of the region, and the fifth whether the firm is domestically owned or a subsidiary of a foreign company.
The size of the file is not limited (but < 2 GByte). The input file should be named “data.txt”. Place no header with variable names at the first line (because these will be counted as separate categories). Note that typos may lead to the declaration of an additional class because the program indexes on the strings. The program and the input have to be placed in the same folder.
Output
The program generates the file th4.dbf if not present in this folder; or if present, a new record is appended to th4.dbf. This file can be read using Excel or a similar program. As said, the variables are denoted “w”, “x”, “y”, and “z”, and the new record contains the uncertainties in these four dimensions (Hw, Hx, Hy, Hz), the joint entropies (such Hwx, Hwxy, Hwxyz, etc.), and all possible transmissions (Twx, Twxy, Twxyz, etc.) among them.
Notes
The current version is very much a beta-version. Please, provide feedback for further improvements if bugs are encountered. Carefully check the output on errors! [The source code (written for Flagship v7/Clipper 87) is available from here.]
Matthijs den Besten reimplemented this routine in R at https://github.com/mdbesten/th_n ;
see also: M. den Besten (2014), “Transmission, an indicator of synergy reimplemented”.
Mark Johnson noted a bug which was repaired in this version 2 on February 16, 2021.
I acknowledge Balazs Lengyel, Han Park, and Řivind Strand for help in developing this routine.
References:
Krippendorff, K. (2009b). Information of Interactions in Complex Systems. International Journal of General Systems, 38(6), 669-680.
Kwon, K. S., Park, H. W., So, M., & Leydesdorff, L. (2012). Has Globalization Strengthened South Korea’s National Research System? National and International Dynamics of the Triple Helix of Scientific Co-authorship Relationships in South Korea. Scientometrics, 90(1), 163-175. doi: 10.1007/s11192-011-0512-9
Leydesdorff, L., H. Park, & B. Lengyel (2012), A Routine for Measuring Synergy in University-Industry-Government Relations: Mutual Information as a Triple-Helix and Quadruple-Helix Indicator; at http://arxiv.org/abs/1211.7230 .
McGill, W. J. (1954). Multivariate information transmission. Psychometrika, 19(2), 97-116.
Yeung, R. W. (2008). Information Theory and Network Coding. New York, NY: Springer.