Ai Citation Patterns
Ai Citation Patterns
https://doi.org/10.1038/s42256-019-0024-5
As artificial intelligence (AI) applications see wider deployment, it becomes increasingly important to study the social and societal
implications of AI adoption. Therefore, we ask: are AI research and the fields that study social and societal trends keeping pace with
each other? Here, we use the Microsoft Academic Graph to study the bibliometric evolution of AI research and its related fields from
1950 to today. Although early AI researchers exhibited strong referencing behaviour towards philosophy, geography and art, modern
AI research references mathematics and computer science most strongly. Conversely, other fields, including the social sciences, do
not reference AI research in proportion to its growing paper production. Our evidence suggests that the growing preference of AI
researchers to publish in topic-specific conferences over academic journals and the increasing presence of industry research pose a
challenge to external researchers, as such research is particularly absent from references made by social scientists.
T
oday’s artificial intelligence (AI) has implications for the and computers. However, engineering the entirety of human intel-
future of work1, the stock market2,3, medicine4,5, transporta- ligence has proved difficult. Instead, progress has come from engi-
tion6,7, the future of warfare8 and the governance of society9–11. neering specific human capabilities. While we often use the term AI
On one hand, AI adoption has the positive potential to reduce today in reference to machine learning, the meaning of AI has fluc-
human error and human bias12. As examples, AI systems have bal- tuated in the past 60 years to variably emphasize vision, language,
anced judges towards more equitable bail decisions13, AI systems speech and pattern recognition.
can assess the safety of neighbourhoods from images14 and AI sys- To study the nature of AI research, we use the MAG to iden-
tems can improve hiring decisions for board directors while reduc- tify relevant computer science (CS) subfields from the citations
ing gender bias15. On the other hand, recent examples suggest that of academic publications from 1950 to 2018. The MAG uses
AI technologies can be deployed without understanding the social natural language processing (NLP), including keyword analysis,
biases they possess or the social questions they raise. Consider the to identify the academic field of each publication according to a
recent reports of racial bias in facial recognition software16,17, the hierarchy of academic fields. These data have been particularly
ethical dilemmas of autonomous vehicles6 and income inequality useful for studying bibliometric trends in CS22–25. Our analysis
from computer-driven automation18–20. relies strongly on the MAG’s field of study classifications and,
These examples highlight the diversity of today’s AI technology thus, our analysis is potentially limited in its accounting of more
and the breadth of its application; an observation leading some to specific research areas within CS and within AI-related fields.
characterize AI as a general-purpose technology1,21. As AI becomes These data enable us to study the paper production and refer-
increasingly widespread, researchers and policymakers must encing behaviour of different academic fields. For example, CS
balance the positive and negative implications of AI adoption. has risen to the fourth most productive academic field accord-
Therefore, we ask: how tightly connected are the social sciences and ing to annual paper production (see Supplementary Fig. 1) with
cutting-edge machine intelligence research? AI being the most prominent subfield of CS in recent decades26
Here, we employ the Microsoft Academic Graph (MAG) to (see also Fig. 1d).
explore the research connections between AI research and other aca- To identify the CS subfields that are most relevant to AI research,
demic fields through citation patterns. The MAG data offer coverage we construct a citation network using all CS papers published within
for both conference proceedings, where AI papers are often pub- each decade from 1950 to 2018. We consider CS subfields to repre-
lished, and academic journals, where other fields prefer to publish. sent AI research if they are strongly associated with AI, which is
Although early AI research was inspired by the several other fields, itself a CS subfield, throughout a significant proportion of the time
including some social sciences, modern AI research is increasingly period under analysis. Examples include computer vision, machine
focused on engineering applications—perhaps due to the increas- learning and pattern recognition. Interestingly, NLP, which is col-
ingly central role of the technology industry. Furthermore, the most loquially thought of as a specific problem area in AI27, is strongly
central research institutions within the AI research community are associated with AI research before the mid 1980s, after which NLP
increasingly based in industry rather than academia. becomes more strongly associated with information retrieval and
data mining for text-based data (Fig. 1a–c,e). In the remainder, we
Modern AI research use papers published in AI, computer vision, machine learning,
The effort to create human-like intelligence has dramatically pattern recognition and NLP to approximate AI research from the
advanced in recent decades thanks to improvements in algorithms 1950s to today.
1
Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA. 2Kellogg School of Management, Northwestern University, Evanston, IL,
USA. 3Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, USA. 4Institute for Data, Systems, and Society, Massachusetts
Institute of Technology, Cambridge, MA, USA. 5Center for Humans and Machines, Max Planck Institute for Human Development, Berlin, Germany.
*e-mail: irahwan@mit.edu
d
105
104
Paper production
103
102 CS subfields:
NLP Computer vision
101 Data science Machine learning
AI Pattern recognition
100
1950 1960 1970 1980 1990 2000 2010
e Year
106
CS subfields:
Number of citations to AI
103
102
101
100
1950 1960 1970 1980 1990 2000 2010
Year
Fig. 1 | Citation patterns among CS subfields identify areas of AI-related research. a–c, We examine the rate of citations between CS subfields based
on journal and conference publications from three different decades: the 1960s (a), the 1980s (b) and the 2010s through 2017 (c). For each network, the
nodes (circles) correspond to CS subfields according to the MAG data, and the node size corresponds to the number of papers published in each subfield
(note, the same paper may belong to multiple subfields). The width of the links connecting the nodes corresponds to the number of references made
between papers published in those subfields. After constructing the complete network, we apply topological clustering45 and report the number of citations
made between these clusters using weighted arrows. Networks with labels for each subfield are provided in Supplementary Section 2. d, Annual paper
production by CS subfield. Subfields related to AI are coloured, as well as data science (black) because of its notable decline in relative paper production.
e, The annual number of references from papers in each CS subfield to papers in the AI subfield, and vice versa (that is, (subfield → AI) + (subfield ← AI)).
The paper production of CS subfields has varied over the past and societal dynamics to completely understand the impact of AI
half-century. For example, data science has gradually diminished in systems9–11,32,33. However, the developers of new AI systems are often
relative paper production and theoretical CS has been replaced by separate from the scientists who study social questions. Therefore,
increased focus on real-time and distributed computing. However, we might hope to see increasing research interest between these
AI-related research areas have experienced steadily growing paper fields of study and AI.
production since 1950 and account for the largest share of paper To investigate, we study the association between various aca-
production in CS today (Fig. 1d). demic fields and AI research through the referencing relationship
of papers published in each academic field. External fields reference
Shaping the study of intelligent machines AI research for a number of reasons. Some fields, such as engineer-
Just as early myths and parables emphasized the social and ethi- ing or medicine, reference AI research because they use AI methods
cal questions around human-created intelligence28–30, today’s intel- for optimization or data analysis. Other fields, such as philosophy,
ligent machines provide their own interesting social questions. For reference AI research because they explore its consequences for
example, how responsible are the creators, the manufacturers and society (for example, moral and/or ethical consequences). Similarly,
the users for the outcomes of an AI system? How should regula- AI researchers reference other fields, such as mathematics or psy-
tors handle distributed agency11,31? How will AI technologies reduce chology, because AI research incorporates methods and models
instances of human bias? As AI systems become more widespread1,21, from these areas. AI researchers may also cite other fields because
it becomes increasingly important to consider these social, ethical they use them as application domains to benchmark AI techniques.
14 CS Engineering Environmental
Mathematics Art Mathematics science
12 Geology
Psychology Geography Geography
10 History
Philosophy Engineering
Sociology
8 100 Philosophy Economics
6 Art Materials science
Business Medicine
4
Psychology Biology
2
10–1 Physics Chemistry
Political science
1960 1970 1980 1990 2000 2010
Year 1960 1970 1980 1990 2000 2010
Year
Fig. 2 | The referencing strength between AI and other sciences is declining. a, The share of references made by AI papers in each year to papers
published in other academic fields. b, The reference strength (see equation (2)) from AI papers to papers published in other academic fields. c, The share
of references made by each academic field to AI papers in each year. d, The reference strength from each other academic field to AI papers in each year.
All lines are smoothed using a five-year moving average. In b,d, dashed lines indicate academic fields exhibiting lower reference strength than would be
expected under random referencing behaviour in 2017.
In Fig. 2a,c, we examine the share of references made from Before 1980, AI research made relatively frequent reference
AI papers to other fields, and from papers published in other to psychology in addition to CS and mathematics (Fig. 2a).
fields to AI. The reference share from academic field A to field B Controlling for the paper production of the referenced fields,
according to we find that early AI’s reference strengths towards philoso-
phy, geography and art were comparable to the field’s strength
of association with mathematics (Fig. 2b) suggesting that early
# refs from A papers to B papers in year
share year(A, B) = (1) AI research was shaped by a diverse set of fields. However,
# refs made by A papers in year AI research transitioned to strongly relying on mathematics
and CS soon after 1987, which suggests an increasing focus on
controls for the total paper production of the referencing field over computational research.
time, and has been used in other bibliometric studies34. However, How important is AI research to other academic fields?
temporal changes in reference share may be explained by paper Unsurprisingly, CS, which includes all of the AI-related subfields
production in the referenced field; therefore, we consider another in our analysis, steadily increased its share of references made
measure that also controls for the total paper production in the ref- to AI papers throughout the entire period of analysis (Fig. 2c).
erenced field as well (Fig. 2b,d). We calculate the reference strength Surprisingly, mathematics experienced a notable increase in refer-
from field A to field B according to ence share to AI only after 1980. Meanwhile, several fields that are
not often cited in today’s AI research played an important role in
strength year(A, B) =
( # refs from A papers to B papers in year
# refs made by A papers in year ) the field’s development, but may not have reciprocated this inter-
est. For example, psychology was relatively important to early AI
a b
0.8
0.22 Institutional
distribution of: 0.7
0.20 Papers
0.6
(1 – Gini coefficient)
Authors
Slope estimate
Diversity score 0.18 Citations 0.5 0.8
0.16 0.4 0.7
correlation
Pearson
0.6
0.14 0.3
0.5
0.2
0.12 0.4
0.1
0.10 1960 1985 2010
1960 1970 1980 1990 2000 2010 1960 1970 1980 1990 2000
Year Year
c 0.040
0.035 Publication venue:
Citation network PageRank
Fig. 3 | AI research is increasingly dominated by only a few research institutions and AI-specific conferences. a, The diversity of the annual distribution
of all AI papers (black), AI authors (red) and all citations to AI papers (green) across research institutions according to the Gini coefficient. Example
distributions of AI paper production and AI citation share are provided in Supplementary Section 3. b, To see whether preferential attachment explains
citation dynamics, we include only AI papers with at least one citation and estimate the linear relationship between each research institution’s cumulative
citation count from 1950 to the institution’s citation count in each year (see equation (3)). The model’s slope estimation steadily rises throughout the
period of analysis to around 0.70 as the model increasingly captures variance in the citation accumulation of institutions according to Pearson correlation
(inset). The error bars are 95% confidence intervals for our estimate of the linear model’s slope in each year (m in equation (3)). c, The PageRank of each
publication venue for AI papers using the number of references from AI papers published in each venue to papers published in each other venue. The
lines of notable publication venues are highlighted with colour. Dashed lines indicate venues whose PageRank has declined during the period of analysis.
In all plots, lines are smoothed using a five-year moving average. More recent citation results may change as recent publications continue to accumulate
citations. LNCS, Lecture Notes in Computer Science; ICLP, International Conference on Logic Programming; ISNN, International Symposium on Neural
Networks; ICCV, International Conference on Computer Vision; ECCV, European Conference on Computer Vision; ICML, International Conference
on Machine Learning; CVPR, Conference on Computer Vision and Pattern Recognition; IJCAI, International Joint Conference on AI; KR, Principles of
Knowledge Representation and Reasoning.
The consolidation of AI research fields of study actually exhibit increasing diversity over time accord-
How do leading research institutions shape AI research? On one ing to these metrics (see Supplementary Section 5).
hand, the prestige of an academic university can boost the sci- This decrease in scientific diversity suggests that notable research
entific impact of CS publications35. On the other hand, although ‘hubs’ may be forming (similar to the industry use of deep learning21).
scientific research is often undertaken at universities, major AI This type of hierarchical structure can occur when referencing
advances have emerged from industry research centres as well. between institutions is well modelled by preferential attachment41.
For example, the AI start-up DeepMind received recent attention If preferential referencing explains the citation dynamics within
for their AlphaGo project36 and Google has been acknowledged AI research, then the proportion of citations gained by a research
as a leader in the development of autonomous vehicles37–39. With institution in each year will be proportional to the institution’s total
increased industrial and regulatory involvement, recent work sug- accumulation of citations. Figure 3b reports estimates of the slope
gests that areas of AI, including deep learning21, are undergoing a m for the model
consolidation of research and deployment worldwide. While CS
on the whole has become increasingly diverse40, what can be said log 10(# of citations) = m × log 10(cumulative # of citations) + b (3)
about AI research?
If the AI research community is experiencing a consolidation as well as 99% confidence intervals for those slope estimates using
of influence, then what types of citation dynamics might indicate linear regression. Both the annual slope estimates and the perfor-
such a phenomenon? We investigate by examining the distribution mance of this model (see inset) rise steadily throughout the period
of AI paper production and the distribution of citations made to AI of analysis. Combined, this evidence suggests that preferential refer-
papers by research institution (see Supplementary Section 3 for visu- encing may be occurring among AI research institutions.
alization of the distributions by decade). Since 1980, the diversity of How have AI publication practices changed over time to
AI paper production, authorship and citations to AI papers across enable preferential referencing? To investigate, we calculate the
institutions have decreased by 30% according to the Gini coefficient PageRank42 of each AI publication venue—including both aca-
applied to annual distributions (Fig. 3a). Repeating this analysis for demic journals and conferences—from the references of the AI
other academic fields, we find that this decreasing diversity is not papers published by each venue in each year (Fig. 3c). Publications
simply a reflection of aggregate academic trends since most other venues with larger PageRank are more central to AI research. In the
60 Geology Business
Biology Economics
40 Medicine Political science
CS Sociology
20 Environmental science Philosophy
10–1 Engineering
0
1960 1970 1980 1990 2000 2010 1960 1970 1980 1990 2000 2010
Year Year
Fig. 4 | Industry is increasingly central to AI research, but industry-authored AI papers are referenced less often by other academic fields. a, The
PageRank of each research institution using the number of references from AI papers published by each institution to papers published by each other
research institution. The lines of notable research institutions are coloured for visualization. Dashed lines indicate academic institutions while solid lines
indicate industry. b, The share of the top 10% most cited AI papers published in each year with academic-only, industry-only and mixed authorship.
c, Similarly to b, we examine the referencing behaviour of engineering towards AI papers according to the authorship of the AI papers. Analogous plots
for each other academic field are provided in Supplementary Section 4. d, Generalizing on c, the IPS calculated from each academic field’s referencing
behaviour towards AI papers (see equation (4)). The solid (dashed) lines indicate fields that reference AI papers with industry-only authorship more
(less) than would be expected according to random referencing behaviour. In all plots, lines are smoothed using a five-year moving average.
late 1980s, several specific conferences, including the Conference While academia has remained the largest source of AI papers
on Computer Vision and Pattern Recognition, the Conference throughout the entire period of analysis, the increased presence of
on Neural Information Processing Systems and the International industry can be seen from the authorship of AI papers over time
Conference on Machine Learning, rise in prominence, while more (Fig. 4b). Out of the 10% of AI papers with the most citations after
general AI conferences, including the National Conference on 10 years, the relative number of papers with industry-only author-
Artificial Intelligence and the International Joint Conference on ship is on the decline. Meanwhile, collaborations between academia
Artificial Intelligence, decline in prominence for AI researchers. and industry are becoming more abundant.
Meanwhile, very few academic journals maintain high citation How are other fields of study responding to the increased pres-
PageRank with the exception of the IEEE Transactions on Pattern ence of industry in AI research? As an example, references from
Analysis and Machine Intelligence, which remains one of the most engineering showed preference for AI papers with industry-only
central publication venues for AI research. authorship until the late 1980s, which is contrary to the aggregate
If preferential referencing is producing research hubs, then which trend (Fig. 4c; and see Supplementary Section 3 for similar plots for
research institutions enjoy a privileged role in the AI research com- all academic fields). Similar to reference strength, temporal changes
munity? To investigate, we calculate the citation PageRank of each in a field’s preference for AI papers with industry authorship (that is,
institution from the references of the AI papers published by each at least one author has an industry affiliation) may result from
institution in each year (Fig. 4a). Before 1990, the most prominent the abundance of industry-based AI paper production over time.
research institutions were academic, including the Massachusetts Therefore, we examine each field’s industry preference score, which
Institute of Technology, Stanford University and Carnegie Mellon is given for field A by
University, and included only a few industry-based research
institutions, such as Bell Labs and IBM. However, the late 1980s
(ref. share of A to industry AI papers)
again marks a transition point that reshaped the field. While uni- IPS year(A) = (4)
versities dominate scientific progress across all academic fields43, (industry share of AI papers from 1950 to year)
industry-based organizations, including Google and Microsoft,
are increasingly central to modern AI research, and the PageRank Here, an AI paper has industry authorship if at least one co-author
scores of academic institutions are on the decline. Chinese has an affiliation with an industry-based institution. Fields with
research institutions at today’s forefront of AI research are notably IPS(A) > 1 exhibit stronger preference for industry AI papers than
absent from Fig. 4a because their rise in prominence is recent in would be expected under random referencing behaviour towards
the 65-year time span of our analysis. However, the increasing AI papers. Academic fields that may be interested in the appli-
prominence of Chinese research institutions, as well as other non- cation of AI technology, such as materials science, engineering,
US-based institutions, is apparent when focusing on recent years chemistry and physics, tend to have greater preference for industry
(see Supplementary Section 8). AI papers. However, many of the social sciences and fields that
study social and societal dynamics, such as sociology, economics, social and societal benefits and consequences of today’s AI technol-
philosophy and political science, tend to have lower preference for ogy as well as identifying the mechanisms that limit communication
industry AI papers. between research domains.