Predicting The Future of AI With AI: High-Quality Link Prediction in An Exponentially Growing Knowledge Network
Predicting The Future of AI With AI: High-Quality Link Prediction in An Exponentially Growing Knowledge Network
                                                   Mario Krenn,1, ∗ Lorenzo Buffoni,2 Bruno Coutinho,2 Sagi Eppel,3 Jacob Gates Foster,4
                                             Andrew Gritsevskiy,3, 5, 6 Harlin Lee,4 Yichao Lu,7 João P. Moutinho,2 Nima Sanjabi,8 Rishi Sonthalia,4
                                                   Ngoc Mai Tran,9 Francisco Valente,10 Yangxinyu Xie,11 Rose Yu,12 and Michael Kopp6
                                                             1
                                                               Max Planck Institute for the Science of Light (MPL), Erlangen, Germany.
                                                                            2
                                                                              Instituto de Telecomunicações, Lisbon, Portugal.
                                                                                        3
                                                                                          University of Toronto, Canada.
                                                                               4
                                                                                 University of California Los Angeles, USA.
                                                                          5
                                                                            Cavendish Laboratories, Cavendish, Vermont, USA.
                                                         6
                                                           Institute of Advanced Research in Artificial Intelligence (IARAI), Vienna, Austria.
                                                                                         7
                                                                                           Layer 6 AI, Toronto, Canada.
arXiv:2210.00881v1 [cs.AI] 23 Sep 2022
                                                                                8
                                                                                  Independent Researcher, Barcelona, Spain.
                                                                                    9
                                                                                      University of Texas at Austin, USA.
                                                                                10
                                                                                   Independent Researcher, Leiria, Portugal.
                                                                                      11
                                                                                         University of Pennsylvania, USA.
                                                                               12
                                                                                  University of California, San Diego, USA.
                                                      A tool that could suggest new personalized research directions and ideas by taking insights from
                                                   the scientific literature could significantly accelerate the progress of science. A field that might
                                                   benefit from such an approach is artificial intelligence (AI) research, where the number of scientific
                                                   publications has been growing exponentially over the last years, making it challenging for human
                                                   researchers to keep track of the progress. Here, we use AI techniques to predict the future research
                                                   directions of AI itself. We develop a new graph-based benchmark based on real-world data – the
                                                   Science4Cast benchmark, which aims to predict the future state of an evolving semantic network
                                                   of AI. For that, we use more than 100,000 research papers and build up a knowledge network with
                                                   more than 64,000 concept nodes. We then present ten diverse methods to tackle this task, ranging
                                                   from pure statistical to pure learning methods. Surprisingly, the most powerful methods use a
                                                   carefully curated set of network features, rather than an end-to-end AI approach. It indicates a great
                                                   potential that can be unleashed for purely ML approaches without human knowledge. Ultimately,
                                                   better predictions of new future research directions will be a crucial component of more advanced
                                                   research suggestion tools.
                                             I.   INTRODUCTION AND MOTIVATION                           vision. New research ideas often result from drawing
                                                                                                        novel connections between seemingly unrelated con-
                                            The corpus of scientific literature grows at an ever-       cepts [1–3]. Therefore, we formulate the evolution of
                                         increasing speed. Specifically, in the field of Artifi-        AI literature as a temporal network modelling task.
                                         cial Intelligence (AI) and Machine Learning (ML),              We created an evolving semantic network character-
                                         the number of papers every month grows exponen-                izing the content and evolution of the scientific liter-
                                         tially with a doubling rate of roughly 23 months               ature in the field of AI since 1994. The network con-
                                         (see Fig. 1). Simultaneously, the AI community is              tains about 64,000 nodes (each representing a con-
                                         embracing diverse ideas from many disciplines such             cept used in an AI paper) and 18 million edges that
                                         as mathematics, statistics, and physics, making it             connect two concepts when they were investigated
                                         challenging to organize different ideas and uncover            jointly in a scientific paper.
                                         new scientific connections. We envision a computer                We use the semantic network as an input to 10
                                         program that can automatically read, comprehend                diverse statistical and machine-learning methods to
                                         and act on AI literature. It can predict and suggest           predict the future evolution of the semantic network
                                         meaningful research ideas that transcend individual            with high accuracy. That is, we can predict which
                                         knowledge and cross-domain boundaries. If success-             combinations of concepts AI researchers will investi-
                                         ful, it could significantly improve the productivity           gate in the future. Being able to predict what scien-
                                         of AI researchers, open up new avenues of research,            tists will work on is a first crucial step for suggesting
                                         and help drive progress in the field.                          new topics that might have a high impact.
                                            Here, we address this important and challenging                Several of the methods presented in this paper
                                                                                                        have been contributions to the Science4Cast com-
                                                                                                        petition hosted by IEEE BigData 2021, which ran
                                                                                                        from August to November 2021. Broadly, we can
                                         ∗   mario.krenn@mpl.mpg.de                                     divide the methods into two classes: methods that
                                                                                                                                             2
                                           ML+AI arXiv papers per month                first step would be to use the features of a large lan-
                                                                                       guage model (such as GPT3 [4], Gopher [5], Mega-
    # of papers per month
                            4,000              log-scale                               Tron [6] or PaLM [7]) from the text of each article
                                       1,000                                           to extract concepts automatically. However, those
                            3,000        100
                                                                                       methods still struggle in reasoning capabilities [8, 9],
                                          10
                                                                                       thus it is not yet directly clear how these models can
                            2,000          1994            2007          2021
                                                                                       be used for identifying and suggesting new ideas and
                                                                                       concept combinations.
                            1,000
                                                                                          An alternative approach has been pioneered by
                                                                                       Rzhetsky and colleagues [10]. They have shown
                                1994                              2007          2021   how knowledge networks (or semantic networks) in
                                                  publication year                     biochemistry can be created from co-occurring con-
                                                                                       cepts in scientific papers. The nodes in their net-
Figure 1. Number of papers published per months                                        work correspond to scientific concepts—concretely,
in the arXiv categories of AI grow exponentially.                                      the names of individual biomolecules. The nodes
The doubling rate of papers per months is roughly 23                                   are linked when a paper mentions both of the corre-
months, which might lead to problems for publishing in                                 sponding biomolecules in its title or abstract. Taking
these fields, at some point. The categories are cs.AI,                                 millions of papers into account leads to an evolv-
cs.LG, cs.NE, and stat.ML.                                                             ing semantic network that captures the history of
                                                                                       the field. Using supercomputer simulations, non-
                                                                                       trivial statements about the collective behaviour of
use hand-crafted network-theoretical features, and
                                                                                       scientists can be extracted, which allows for the sug-
those that automatically learn features. We found
                                                                                       gestions of alternative and more efficient research
that models using carefully hand-crafted features
                                                                                       behaviour [11]. Of course, by creating a semantic
outperform methods that attempt to learn features
                                                                                       network from concept co-occurrences, only a tiny
autonomously. This (somewhat surprising) finding
                                                                                       amount of knowledge is extracted from each pa-
indicates a great potential for improvements of mod-
                                                                                       per. However, if this process is repeated for a large
els free of human priors.
                                                                                       dataset of papers, the resulting network captures
   Our manuscript has several purposes. First, we
                                                                                       nontrivial and actionable content.
introduce a new meaningful benchmark for AI on
                                                                                          The idea to build up a semantic network of a sci-
real-world graphs. Second, we provide nearly 10 di-
                                                                                       entific discipline was then applied and extended in
verse methods that solve this benchmark. Third, we
                                                                                       the field of quantum physics [12]. There, the au-
explain how solving this task could become an es-
                                                                                       thors (including one of us) built a network of more
sential ingredient for the big picture goal of having
                                                                                       than 6,000 quantum physics concepts. The authors
a tool that could suggest meaningful research direc-
                                                                                       formulate the task of predicting new research trends
tions for scientists in AI or in other disciplines.1
                                                                                       and connections for the first time as an ML task.
   The manuscript is structured in the following way.
                                                                                       The task was to identify which concept pairs, which
We first introduce more background into semantic
                                                                                       have never been discussed jointly in the scientific lit-
networks and how they can help to suggest new
                                                                                       erature, have a high probability to be investigated in
ideas. Then we explain how we generate the dataset
                                                                                       the near future. This prediction task was phrased as
and some of its network-theoretical properties. Then
                                                                                       one component for personalized suggestions of new
we briefly explain the 10 methods that we have inves-
                                                                                       research ideas.
tigated to solve the task. We conclude with a num-
ber of important open questions that could bring us
further toward the goal of AI-based suggestions for
research directions.                                                                      A.   Link Prediction in Semantic Networks
              >>
           ar iV                                                                    evolving
                                   64,000 nodes:                   Semantic Network                                                      Statistical & ML methods
                                        concepts via
                               natural language processing                multipayer
                                                                          perceptron                             generative
                                                                                                      2015
                                                                                                                 adversarial
                                                                                                                  network
                                                                                                                                                                        ?
                                                                                             20
                                                                                               15                                  unconnected              connected
                                                                                                                 2015
                                                                         13
                                                                                                                                   concept pairs
                                                                      20
                                                                                                                                                             in 2021
                                                                                   2013
                                                              weather
                                                             prediction
                                                                                                recurrent            statistical      in 2018
                                18,000,000 edges:                                               neural              hypothesis
                                    time-dependent                                              network                testing
                                                                                        20             202
                                                                          20
                                     co-occurances                                 20                        0
                                                                          13
                                       of concepts                                                                 graph
                                                                                                                   neural
                                                                          time series          2020
                                                                                                                  network
                                     data
            data set              processing                          network generation                                                   Science4Cast challenge
Figure 2. From arXiv to Science4Cast. We use 143,000 papers in AI and ML categories on arXiv from 1992 to
2020. From there, we construct a list of concepts (using RAKE and other NLP tools). Those concepts form the
nodes of a semantic network. The edges are drawn when two concepts occur jointly in the title or abstract of a
paper. In that way, we generate an evolving semantic network that grows over time as more concepts are investigated
together. The task is to predict, from unconnected nodes (i.e. concepts that have not been investigated together
in the scientific literature), which will be connected within a few years. In this manuscript, we present 10 diverse
statistical and machine learning methods to solve this challenge.
   Link prediction is a very common problem in com-                                       entists, or a pair of scientists to suggest topics for
puter science that can be solved with classical met-                                      collaborations in an interdisciplinary setting. Im-
rics and features as well as machine learning tech-                                       portant future questions concern the discovery of im-
niques. From the network theory side, several works                                       pactful and surprising suggestions, and suggestions
have studied local motif-based methods [13–17], of-                                       that give more context than two scientific concepts.
ten based on path-counting, while other methods
have studied more global features using linear opti-
mization [18], global perturbations [19] and stochas-                                           III.             GENERATION AND ANALYSIS OF
tic block models [20]. Other machine-learning works                                                                   THE DATASET
have tried to optimize over a combination of hundred
of predictors [21]. Further discussion on these meth-                                                                   A.         Dataset Construction
ods is available in a recent review on link prediction
[22].
                                                                                             We use papers that are published on arXiv in the
   In [12], this task was solved by computing 17                                          categories cs.AI, cs.LG, cs.NE, and stat.ML, from
hand-crafted features of the evolving semantic net-                                       1992 to 2020, to create a dynamic semantic network.
work. In the Science4Cast competition, the goal                                           The nodes stand for computer science and in partic-
was to find more precise methods for link-prediction                                      ular artificial intelligence concepts. We create the
tasks in semantic networks (a semantic network of                                         list of concepts from the title and abstracts of all
AI that is 10 times larger than the one in [12]).                                         of the 143,000 papers. We use Rapid Automatic
Specifically, on the one hand, we would like to deter-                                    Keyword Extraction (RAKE) to create candidate
mine which features are useful; on the other hand,                                        concepts [24], and normalize the list using standard
we would also like to know whether this task can                                          NLP techniques and other self-created methods. Ul-
be solved efficiently without hand-crafted features.                                      timately, this leads to a list of 64,719 concepts.
Here, we present results for both questions.
                                                                                             These concepts form the nodes of the semantic
                                                                                          network. The edges are drawn when two concepts
                                                                                          co-appear in a title or abstract of a paper. Each edge
  B.   Potential for Idea Generation in Science                                           has a time stamp, which is the publication date of
                                                                                          the paper in which the concepts co-appear. Mul-
  The long-term goal of predictions and suggestions                                       tiple edges with different time-stamps between two
in semantic networks is to provide new ideas to in-                                       concepts are very common, as concept pairs can co-
dividual researchers. In a way, we hope to build a                                        appear in many papers with different publication
creative artificial muse in science [23]. We can bias                                     dates. As edges have time stamps, the entire se-
or constrain the model to give research topics that                                       mantic network is evolving in time. The workflow is
are related to the research interest of individual sci-                                   depicted in Fig. 2.
                                                                                                                                                                                                             4
Figure 7. The Science4Cast benchmark: Link predictions in an exponentially growing semantic network.
Here we show the AUC values for different models that use machine learning techniques (ML), hand-crafted network
features (NF) or a combination thereof. The left plot shows results for the prediction of a single new link (i.e., w = 1),
the right one shows results for the prediction of new triple links w = 3. The task is to predict δ = [1, 3, 5] years into
the future, with cutoff values c = [0, 5, 25]. We sort the models by the the results for the task (w = 1, δ = 3, c = 0),
which was the task in the Science4Cast competition. Data points that are not shown have a AUC below 0.6 or are
not computed due to computational costs. Note that the prediction of new triple edges can be performed nearly
determinstically. It will be interesting to understand the origin of this quasi-deterministic pattern in AI research.
edges), and the distribution changing over time, the           with AUC¿99.5%. Understanding this apparently
AUC provides a meaningful and interpretation. For              quasi-deterministic pattern in AI research will be an
perfect predictions, AUC=1, while random predic-               interesting target for follow-up research.3
tions give AUC=0.5. It gives the percentage that a
random true element is higher ranked than a random
false one.                                                                      A.    M1: Features+ML
[31] and applies heavy regularization to combat over-     negative instances have been randomly sampled and
fitting due to the scarcity of positive samples. The      combined.
graph neural network approach employs a time-                One of the goals was to identify features that are
aware graph neural network to learn node represen-        very informative with a very low computational cost.
tations on dynamic semantic networks.                     We found that the degree centrality of the nodes
                                                          is the most important feature, and the degree cen-
                                                          trality of the neighbouring nodes and the degree of
              B.   M2: Features+ML                        mutual neighbours gave us the best tradeoff. As
                                                          all of the extracted features distributions are highly
   The method proposed by Team HashBrown as-              skewed to the right, meaning most of the features
sumes that the probability that nodes u and v form        take near zero values, using a power transform like
an edge in the future is a function of the node fea-      Yeo-Johnson [39] helps to make the distributions
tures f (u), f (v), and some edge feature h(u, v). We     more Gaussian which boosts the learning. Finally,
chose node features f that capture popularity at the      for the link prediction task, we saw that LSTMs per-
current time t0 (such as degree, clustering coefficient   form better than fully connected neural networks.
[32, 33], and PageRank [30]). We also use these fea-
tures’ first and second time-derivatives to capture
                                                                        D.   M4: pure Features
the evolution of the node’s popularity over time. Af-
ter variable selection during training, we chose h to
consist of the HOP-rec score [34, 35] and a variation        The following two methods are based on a purely
of the Dice similarity score [36] as a measure of sim-    statistical analysis of the test data and are explained
ilarity between nodes. In summary, we use 31 node         in detail in [40].
features for each node, and two edge features, which         Preferential Attachment – In the network
gives 31 × 2 + 2 = 64 features in total. These fea-       analysis we concluded that the growth of this dataset
tures are then fed into a small multilayer perceptron     tends to maintain a heavy-tailed degree distribu-
(MLP) (5 layers, each with 13 neurons) with ReLU          tion, often associated with scale-free networks. As
activation.                                               mentioned before the γ-value of the degree distribu-
   Cold start is the problem that some nodes in the       tion is very close to 2, suggesting that preferential-
test set do not appear in the training set. Our strat-    attachment [41] is likely the main organizational
egy for a cold start is imputation. We say a node v       principle of the network. As such, we implemented
is seen if it appeared in the training data, and un-      a simple prediction model following this procedure.
seen otherwise; similarly, we say that a node is born     Preferential-attachment scores in link prediction are
at time t if t is the first time stamp where an edge      often quantified as
linking this node has appeared. The idea is that an
unseen node is simply a node born in the future, so                           sPA
                                                                               ij = ki · kj .                (1)
its features should look like a recently born node in
the training set. If a node is unseen, then we im-        with ki,j the degree of nodes i and j. However, this
pute its features as the average of the features of       assumes the scoring of links between nodes that are
the nodes born recently. We found that with impu-         already connected to the network, that is ki,j > 0,
tation during training, the test AUC scores across        which is not the case for all the links we must score
all models consistently increased by about 0.02. For      in the dataset. As a result, we define our preferential
a complete description of this method, we refer the       attachment model as
reader to [37].
                                                                             sPA
                                                                              ij = ki + kj .                 (2)
used in link prediction methods [22]. As such we          whether the nodes v1 and v2 will have w edges in
decided to test a method known as Common Neigh-           the future. After the training, the model computes
bours [13]. If we define Γ(i) ∩ Γ(j) as the set of        the probability for all 10 million evaluation exam-
common neighbours between nodes i and j. We can           ples. This list is sorted and the AUC is computed.
easily score the nodes with
                   sCN
                    ij = |Γ(i) ∩ Γ(j)|             (3)
the intuition being that nodes which share a larger            G.   M7: end-to-end ML (Transformers)
number of neighbours are more likely to be con-
nected than distant nodes that do not share any.
  Evaluating this score for each pair (i, j) on the          This model, which is detailed in [44], does not
dataset of unconnected pairs, which can be com-           use any handcrafted features but learns them in
puted as the second power of the adjacency matrix,        a completely unsupervised manner. To do so, we
A2 , we obtained an AUC which is sometimes higher         extract various snapshots of the adjacency matrix
than preferential attachment and sometimes lower          through time, capturing graphs in the form of At
than it but is still consistently quite close with the    for t = 1994, . . . , 2019. We then embed each of
best learning-based models.                               these graphs into 128-dimensional Euclidean space
                                                          via Node2vec [45, 46]. For each node u in the se-
                                                          mantic graph, we extract different 128-dimensional
             E.     M5: Features + ML                     vector embeddings nu (A1994 ), . . . , nu (A2019 ).
                                                             Transformers have performed extremely well in
   This method is based on [42] with a modifica-          natural language processing tasks [47], thus we ap-
tion disclosed in the VI C. First, 10 groups of first-    ply them to learn the dynamics of the embedding
order graph features are extracted to get some neigh-     vectors. We pre-train a transformer to help clas-
bourhood and similarity properties from each pair         sify node pairs. For the transformer, the encoder
of nodes: degree centrality of nodes, pair’s total        and decoder had 6 layers each; we used 128 as the
number of neighbours, common neighbours index,            embedding dimension, 2048 as the feedforward di-
Jaccard coefficient, Simpson coefficient, geometric       mension and 8-headed attention. This transformer
coefficient, cosine coefficient, Adamic-Adar index,       acts as our feature extractor. Once we pre-train our
resource allocation index, and preferential attach-       transformer, we add a 2-layer ReLU network with
ment index. They are obtained for three consecu-          hidden dimension 128 as a classifier on top.
tive years to capture the temporal dynamics of the
semantic network, leading to a total of 33 features.
Second, principal component analysis (PCA) [43] is
applied to reduce the correlation between features,       H.   M8: end-to-end ML (auto node embedding)
speed up the learning process and improve general-
ization, which results in a final set of 7 latent vari-
ables. Lastly, a random forest classifier is trained         The most immediate way one can apply machine
(using a balanced dataset) to estimate the likelihood     learning to this problem is by automating the detec-
of new links between the AI concepts.                     tion of features. Quite simply, the baseline solution
                                                          M6 is modified such that instead of 15 hand-crafted
                                                          features, the neural network is instead trained on
              F.    M6: Features+ML                       features extracted from a graph embedding. In our
                                                          approach, we use the ProNE embedding [48], which
   The baseline solution for the Science4Cast com-        is based on sparse matrix factorizations modulated
petition was closely related to the model presented       by the higher-order Cheeger inequality [49], as well
in [12]. It uses 15 hand-crafted features of a pair of    as Node2Vec [45]. We use the implementations pro-
nodes v1 and v2 (Degrees of v1 and v2 in the current      vided in the nodevectors Python package [50].
year and previous two years, these are six proper-           The embeddings learn a 32-dimensional represen-
ties. The number of shared neighbours in total of v1      tation for each node; hence, each edge representa-
and v2 in the current year and previous two years are     tion is normalized to a single point in [0, 1]64 , and
six properties. The number of shared neighbours be-       the concatenated features are the input of a neu-
tween v1 and v2 in the current year and the previous      ral network with two hidden layers of size 1000 and
two years, these are 3 properties). These 15 features     30, respectively. Similarly to M6, the model is then
are the input of a neural network with four layers        tasked with computing the probability for the eval-
(15, 100, 10, and 1 neurons), intending to predict        uation examples, which lets us determine the ROC.
                                                                                                                9
  V.   EXTENSIONS AND FUTURE WORK                        be also possible to find some ways to decrease the
                                                         complexity of the analysis using clever tricks. For
   Creating an AI that can suggest research topics       example, the authors in [54] showed that the maxi-
to human scientists is highly ambitious and chal-        mum node and hyperedge cover problem, two com-
lenging. The present work of link prediction for a       putational NP-hard problems, can be solved in poly-
temporal network to draw connections between ex-         nomial time for most of the real-world hypergraphs
isting concepts is only the first step. We point out     tested. Whether such tricks exist for hyperlink pre-
several extensions and future works that are directly    diction is still an open problem. The inclusion of so-
relevant to the overarching goal of AI for AI.           ciological factors, such as the status of the involved
   High-quality predictions without feature              researchers and their affiliations might help in pre-
engineering – Surprisingly, given a graph with al-       diction tasks.
ready extracted concepts as nodes and edges plotting        Predictions of scientific success – The predic-
the time evolution of joint appearance of these con-     tion of a new link between nodes in the semantic
cepts in publications, the most powerful methods         network means that we predict which concepts sci-
all used carefully hand-crafted features. It will be     entists will study for the first time in the future. This
interesting to see whether end-to-end deep learning      prediction however does not say anything about the
methods can solve tasks without feature engineering.     potential importance and impact of the new connec-
   Fully automated concept extraction – The              tion. As a tool for high-quality suggestions, we need
concept list at the moment is created by a purely sta-   to introduce the prediction of a metric-of-success, for
tistical text analysis using RAKE. The suggestions       example, estimated citation numbers of the new link
by RAKE are then manually inspected and phrases          or the rate of citation growth over time. This exten-
that do not correspond to a concept are removed.         sion seems reasonable given that modelling and pre-
While this process can be partially automated (as        dictions of citation information in citation networks
RAKE often makes the same mistakes which can             (where nodes are papers) is a prominent area of re-
be captured automatically), it is not a scalable pro-    search within the science of science [55, 56]. Adapt-
cess if one wants to create concept lists for the much   ing these techniques to semantic networks will be an
larger corpus of science and engineering. A fully au-    interesting future research direction.
tomated natural language processing algorithm that          Anomaly detections – In a way, predicting the
can extract meaningful concepts with minimal mis-        most likely new connection between concepts does
takes would be extremely useful.                         not necessarily directly coincide with the goal of sug-
   Generation of new concepts – Here we pre-             gestions of new surprising research directions. After
dict the emergence of links between two known con-       all, those links are predictable, thus potentially not
cepts. One important question is whether an AI al-       surprising by themselves. While we believe that this
gorithm can compose words and generate new con-          type of prediction can still be a very useful contribu-
cepts. Different from the current work that is mostly    tion for suggestions, there is another way to more di-
supervised, the generation of new concepts is unsu-      rectly find surprising combinations, namely by find-
pervised, hence more difficult. One approach to ad-      ing anomalies in the semantic networks. Those are
dress this question has been presented in [51, 52].      potential links that have extreme properties in some
There the authors can detect clusters of concepts        metrics. There are powerful deep learning methods
with specific dynamics that indicate the formation       for anomaly detection [57, 58] and their application
of a new concept. It will be interesting to see how      in the semantic network presented here might be
such emerging concepts can be incorporated into the      very interesting. In fact, while scientists tend to
current framework and used for suggestions for new       study topics in which they are already directly in-
research topics.                                         volved [2, 3], often higher scientific impact results
   Semantic information beyond concept pairs             from the unexpected combination of more distant
– At the moment, every article’s abstract and title      domains [10], which foster the search for those sur-
are compressed into several links between concept        prising and impactful associations.
pairs. This procedure does not represent all infor-         End-to-end formulation – As outlined above,
mation in the article’s abstract (let alone, the ar-     we necessarily decomposed our goal of extracting
ticle itself). The more information one can extract      knowledge from the scientific literature into two sub-
from the article, the more meaningful the predictions    tasks: extracting concepts and building and pre-
and suggestions will be. Extending the representa-       dicting the evolution of a semantic network result-
tion of the semantic network to more complex data        ing from those concepts. This stands in contrast
structures, such as hypergraphs [53] are likely to be    to the dominant paradigm in deep learning that
computationally more demanding but could signif-         emerged over the last decade of so-called ‘end-to-
icantly improve the prediction qualities. It might       end’ training based on early spectacular successes
                                                                                                            10
[59–62]. In this paradigm, problems are not bro-          out hand-crafted features, will achieve high-quality
ken into sub-problems but solved directly using deep      results in the future. We also point out a number
differentiable architecture components trained via        of open problems towards the goal of practical, per-
back-propagation [63, 64]. If such an ‘end-to-end’        sonalized, interdisciplinary AI-based suggestions for
solution approach to our goal could be achieved it        new impactful research direction – which we believe
would be interesting to see whether it could repli-       could become a disruptive tool in the future.
cate the success this deep learning paradigm had in
other areas.
   Human level machine comprehension – One                                     APPENDIX
of the defining goals of the Dartmouth Summer Re-
search Project on Artificial Intelligence in 1956 was                   A.     Model availability
the following: ‘An attempt will be made to find how
to make machines use language, form abstractions            All of the models described above can be found on
and concepts, solve kinds of problems now reserved        GitHub. M1, M2, M3, M4, M5, M6, M7, M8.
for humans, and improve themselves.’ [65]. Such
an algorithm would be expected to handle an evo-
lution in concept denotations due to new insights                         B.    Details on M9
(i.e. the emergence of the term ‘Gibbs entropy’ to
distinguish Boltzmann’s original concept of thermo-
                                                             The solution M9 was not part of the
dynamical entropy as opposed to seeing it in the light
                                                          Science4Cast competition and therefore not
of the more general emergent ‘Shannon entropy’ or
                                                          described in the corresponding proceedings, thus we
‘von Neumann Entropy’) or due to disputed original-
                                                          want to add more details. We compare the ProNE
ity (i.e. Bolai-Lobatchevskian Geometry and Hyper-
                                                          embedding to Node2Vec, which is also commonly
bolic Geometry are the same concept). An algorithm
                                                          used for graph embedding problems. The algorithm
with such natural language understanding capabili-
                                                          maps each node of the network to a point in 32-
ties would thus be extremely useful to get closer to
                                                          dimensional space based on a biased random walk
our goal. Although large language models and other
                                                          procedure, which is fundamentally parameterized
multimodally trained language models like CLIP [66]
                                                          by two variables—p, the “return parameter”, and
or CLOOB [67] have achieved outstanding results re-
                                                          q, the “in-out parameter”. The return parameter
cently, it is an open question how much statistically
                                                          determines the frequency of backtracking in the
trained natural language models alone could even-
                                                          random walk, while the in-out parameter deter-
tually form concepts and abstractions on a human
                                                          mines whether to bias the exploration to nearby
level [68, 69].
                                                          nodes or distant nodes. Notably, these parameters
                                                          significantly affect how the network is encoded—for
                                                          instance, in the BlogCatalog dataset, optimal
               VI.   CONCLUSION                           parameters were p = 0.25, q = 0.25, whereas for
                                                          the Wikipedia graph, they were p = 4, q = 0.5
   Here we present a new AI benchmark for link            [45]. In initial experiments, we used the default
prediction in exponentially growing semantic net-         p = q = 1 for a 64-dimensional encoding, before
works. Several of the solutions have been collected       feeding it into the same neural network as for
in the IEEE BigData Competition Science4Cast in           the ProNE experiment. The higher variance in
fall 2021, and generalized to the mode diverse tasks      Node2Vec-based predictions likely has to do with
presented here. The goal was to boost the capabil-        the method’s significant sensitivity to its hyper-
ities of predicting future research directions in the     parameters. While ProNE is clearly better suited
field of AI itself, which grows enormous over the         for a general multi-dataset link prediction problem,
decade. This ability might be an important part of        Node2Vec’s parameter sensitivity may help us
a tool that gives personalized research suggestions       identify what features of the network are most
to human scientists in the future. We find, rather        important for predicting its temporal evolution.
surprisingly, that the prediction of strong new links
(those that are formed three or more times) can be
predicted with extremely high quality (AUC beyond                 C.   Consideration for Model M6
99%). It will be interesting to investigate this quasi-
deterministic pattern in AI research in more de-             In this manuscript, a modification was performed
tail. The best methods used a clever combination of       in relation to the original formulation of the method
hand-crafted features and machine learning. It will       [42]: two of the original features, average neighbor
be interesting whether pure learning methods, with-       degree and clustering coefficient, were infeasible to
                                                                                                                        11
extract for some of the tasks covered in this paper,           ing to set up and successfully execute the compe-
as their computation can be heavy for such a very              tition and the corresponding workshop. The au-
large network, and they were discarded. Due to some            thors thank Xuemei Gu for creating Fig.2, and Milad
computational memory issues, it was not possible to            Aghajohari and Mohammad Sadegh Akhondzadeh
run the model for some of the tasks covered in this            for helpful comments on the manuscript. The
study, and so those results are missing.                       work of HL, RS, and JGF were supported by
                                                               grant TWCF0333 from the Templeton World Char-
                                                               ity Foundation.     HL is additionally supported
                                                               by NSF grant DMS-1952339. JPM acknowledges
             ACKNOWLEDGEMENTS                                  the support of FCT (Portugal) through scholar-
                                                               ship SFRH/BD/144151/2019. BC thanks the sup-
  The authors thank IARAI Vienna and IEEE for                  port from FCT/MCTES through national funds
supporting and hosting the IEEE BigData Com-                   and when applicable co-funded EU funds under the
petition Science4Cast. We are specifically grate-              project UIDB/50008/2020, and FCT through the
ful to David Kreil, Moritz Neun, Christian Eichen-             project CEECINST/00117/2018/CP1495/CT0001.
berger, Markus Spanring, Henry Martin, Dirk                    NMT and YX are supported by NSF Grant DMS-
Geschke, Daniel Springer, Pedro Herruzo, Mar-                  2113468, the NSF IFML 2019844 award to the Uni-
vin McCutchan, Alina Mihai, Toma Furdui, Gabi                  versity of Texas at Austin, and the Good Systems
Fratica, Miriam Vázquez, Aleksandra Gruca, Jo-                Research Initiative, part of University of Texas at
hannes Brandstetter and Sepp Hochreiter for help-              Austin Bridging Barriers.
 [1] James A Evans and Jacob G Foster, “Metaknowl-                    “Large language models are zero-shot reasoners,”
     edge,” Science 331, 721–725 (2011).                              arXiv:2205.11916 (2022).
 [2] Santo Fortunato, Carl T Bergstrom, Katy Börner,           [9]   Honghua Zhang, Liunian Harold Li, Tao Meng,
     James A Evans, Dirk Helbing, Staša Milojević,                  Kai-Wei Chang, and Guy Van den Broeck, “On
     Alexander M Petersen, Filippo Radicchi, Roberta                  the paradox of learning to reason from data,”
     Sinatra, Brian Uzzi, et al., “Science of science,” Sci-          arXiv:2205.11502 (2022).
     ence 359, eaao0185 (2018).                                [10]   Andrey Rzhetsky, Jacob G Foster, Ian T Foster,
 [3] Dashun Wang and Albert-László Barabási, The sci-              and James A Evans, “Choosing experiments to ac-
     ence of science (Cambridge University Press, 2021).              celerate collective discovery,” Proceedings of the
 [4] Tom Brown, Benjamin Mann, Nick Ryder, Melanie                    National Academy of Sciences 112, 14569–14574
     Subbiah, Jared D Kaplan, Prafulla Dhariwal,                      (2015).
     Arvind Neelakantan, Pranav Shyam, Girish Sastry,          [11]   Jacob G Foster, Andrey Rzhetsky, and James A
     Amanda Askell, et al., “Language models are few-                 Evans, “Tradition and innovation in scientists re-
     shot learners,” Advances in neural information pro-              search strategies,” American Sociological Review
     cessing systems 33, 1877–1901 (2020).                            80, 875–908 (2015).
 [5] Jack W Rae, Sebastian Borgeaud, Trevor Cai, Katie         [12]   Mario Krenn and Anton Zeilinger, “Predicting re-
     Millican, Jordan Hoffmann, Francis Song, John                    search trends with semantic and neural networks
     Aslanides, Sarah Henderson, Roman Ring, Susan-                   with an application in quantum physics,” Proceed-
     nah Young, et al., “Scaling language models: Meth-               ings of the National Academy of Sciences 117, 1910–
     ods, analysis & insights from training gopher,”                  1916 (2020).
     arXiv:2112.11446 (2021).                                  [13]   David Liben-Nowell and Jon Kleinberg, “The link-
 [6] Shaden Smith, Mostofa Patwary, Brandon Norick,                   prediction problem for social networks,” Journal of
     Patrick LeGresley, Samyam Rajbhandari, Jared                     the American society for information science and
     Casper, Zhun Liu, Shrimai Prabhumoye, George                     technology 58, 1019–1031 (2007).
     Zerveas, Vijay Korthikanti, et al., “Using deep-          [14]   István Albert and Réka Albert, “Conserved net-
     speed and megatron to train megatron-turing nlg                  work motifs allow protein–protein interaction pre-
     530b, a large-scale generative language model,”                  diction,” Bioinformatics 20, 3346–3352 (2004).
     arXiv:2201.11990 (2022).                                  [15]   Tao Zhou, Linyuan Lü, and Yi-Cheng Zhang, “Pre-
 [7] Aakanksha Chowdhery, Sharan Narang, Jacob De-                    dicting missing links via local information,” The Eu-
     vlin, Maarten Bosma, Gaurav Mishra, Adam                         ropean Physical Journal B 71, 623–630 (2009).
     Roberts, Paul Barham, Hyung Won Chung,                    [16]   István A Kovács, Katja Luck, Kerstin Spirohn,
     Charles Sutton, Sebastian Gehrmann, et al.,                      Yang Wang, Carl Pollis, Sadie Schlabach, Wenting
     “Palm: Scaling language modeling with pathways,”                 Bian, Dae-Kyum Kim, Nishka Kishore, Tong Hao,
     arXiv:2204.02311 (2022).                                         et al., “Network-based prediction of protein interac-
 [8] Takeshi Kojima, Shixiang Shane Gu, Machel                        tions,” Nature communications 10, 1–8 (2019).
     Reid, Yutaka Matsuo,          and Yusuke Iwasawa,
                                                                                                                      12
[17] Alessandro Muscoloni, Ilyes Abdelhamid,         and           ing decision tree,” Advances in neural information
     Carlo Vittorio Cannistraci, “Local-community net-             processing systems 30 (2017).
     work automata modelling based on length-three-         [32]   Paul W Holland and Samuel Leinhardt, “Transitiv-
     paths for prediction of complex network structures            ity in structural models of small groups,” Compar-
     in protein interactomes, food webs and more,”                 ative group studies 2, 107–124 (1971).
     bioRxiv , 346916 (2018).                               [33]   Duncan J Watts and Steven H Strogatz, “Collec-
[18] Ratha Pech, Dong Hao, Yan-Li Lee, Ye Yuan, and                tive dynamics of small-worldnetworks,” nature 393,
     Tao Zhou, “Link prediction via linear optimization,”          440–442 (1998).
     Physica A: Statistical Mechanics and its Applica-      [34]   Jheng-Hong Yang, Chih-Ming Chen, Chuan-Ju
     tions 528, 121319 (2019).                                     Wang, and Ming-Feng Tsai, “Hop-rec: high-order
[19] Linyuan Lü, Liming Pan, Tao Zhou, Yi-Cheng                   proximity for implicit recommendation,” in Proceed-
     Zhang, and H Eugene Stanley, “Toward link pre-                ings of the 12th ACM Conference on Recommender
     dictability of complex networks,” Proceedings of              Systems (2018) pp. 140–144.
     the National Academy of Sciences 112, 2325–2330        [35]   Bo-Yu Lin, “Ogb collab project,” https://github.
     (2015).                                                       com/brucenccu/OGB_collab_project (2021).
[20] Roger Guimerà and Marta Sales-Pardo, “Missing         [36]   Th A Sorensen, “A method of establishing groups of
     and spurious interactions and the reconstruction              equal amplitude in plant sociology based on similar-
     of complex networks,” Proceedings of the National             ity of species content and its application to analyses
     Academy of Sciences 106, 22073–22078 (2009).                  of the vegetation on danish commons,” Biol. Skar.
[21] Amir Ghasemian, Homa Hosseinmardi, Aram Gal-                  5, 1–34 (1948).
     styan, Edoardo M Airoldi, and Aaron Clauset,           [37]   Ngoc Mai Tran and Yangxinyu Xie, “Improving ran-
     “Stacking models for nearly optimal link prediction           dom walk rankings with feature selection and impu-
     in complex networks,” Proceedings of the National             tation science4cast competition, team hash brown,”
     Academy of Sciences 117, 23393–23400 (2020).                  in 2021 IEEE International Conference on Big Data
[22] Tao Zhou, “Progresses and challenges in link pre-             (Big Data) (IEEE, 2021) pp. 5824–5827.
     diction,” Iscience 24, 103217 (2021).                  [38]   Nima Sanjabi, “Efficiently predicting scientific
[23] Mario Krenn, Robert Pollice, Si Yue Guo, Mat-                 trends using node centrality measures of a sci-
     teo Aldeghi, Alba Cervera-Lierta, Pascal Friederich,          ence semantic network,” in 2021 IEEE International
     Gabriel dos Passos Gomes, Florian Häse, Adrian               Conference on Big Data (Big Data) (IEEE, 2021)
     Jinich, AkshatKumar Nigam, et al., “On sci-                   pp. 5820–5823.
     entific understanding with artificial intelligence,”   [39]   In-Kwon Yeo and Richard A Johnson, “A new fam-
     arXiv:2204.01467 (2022).                                      ily of power transformations to improve normality
[24] Stuart Rose, Dave Engel, Nick Cramer, and Wendy               or symmetry,” Biometrika 87, 954–959 (2000).
     Cowley, “Automatic keyword extraction from indi-       [40]   João P Moutinho, Bruno Coutinho, and Lorenzo
     vidual documents,” Text mining: applications and              Buffoni, “Network-based link prediction of scientific
     theory 1, 1–20 (2010).                                        concepts–a science4cast competition entry,” in 2021
[25] Jeff Alstott, Ed Bullmore, and Dietmar Plenz,                 IEEE International Conference on Big Data (Big
     “powerlaw: a python package for analysis of heavy-            Data) (IEEE, 2021) pp. 5815–5819.
     tailed distributions,” PloS one 9, e85777 (2014).      [41]   Albert-László Barabási, “Network science,” Philo-
[26] Trevor Fenner, Mark Levene, and George Loizou,                sophical Transactions of the Royal Society A: Math-
     “A model for collaboration networks giving rise to a          ematical, Physical and Engineering Sciences 371,
     power-law distribution with an exponential cutoff,”           20120375 (2013).
     Social Networks 29, 70–80 (2007).                      [42]   Francisco Valente, “Link prediction of artificial in-
[27] Anna D Broido and Aaron Clauset, “Scale-free net-             telligence concepts using low computational power,”
     works are rare,” Nature communications 10, 1–10               in 2021 IEEE International Conference on Big Data
     (2019).                                                       (Big Data) (2021) pp. 5828–5832.
[28] Tom Fawcett, “Roc graphs: Notes and practical          [43]   Ian T. Jolliffe and Jorge Cadima, “Principal compo-
     considerations for researchers,” Machine learning             nent analysis: a review and recent developments,”
     31, 1–38 (2004).                                              Philosophical Transactions of the Royal Society A:
[29] Yichao Lu, “Predicting research trends in artificial          Mathematical, Physical and Engineering Sciences
     intelligence with gradient boosting decision trees            374, 20150202 (2016).
     and time-aware graph neural networks,” in 2021         [44]   Harlin Lee, Rishi Sonthalia, and Jacob G Foster,
     IEEE International Conference on Big Data (Big                “Dynamic embedding-based methods for link pre-
     Data) (IEEE, 2021) pp. 5809–5814.                             diction in machine learning semantic network,” in
[30] Sergey Brin and Lawrence Page, “The anatomy                   2021 IEEE International Conference on Big Data
     of a large-scale hypertextual web search engine,”             (Big Data) (IEEE, 2021) pp. 5801–5808.
     Computer networks and ISDN systems 30, 107–117         [45]   Aditya Grover and Jure Leskovec, “node2vec: Scal-
     (1998).                                                       able feature learning for networks,” in Proceedings
[31] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang,              of the 22nd ACM SIGKDD international conference
     Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan                   on Knowledge discovery and data mining (2016) pp.
     Liu, “Lightgbm: A highly efficient gradient boost-            855–864.
                                                                                                                    13
[46] Renming        Liu     and        Arjun    Krishnan,          tection,” Cluster Computing 22, 949–961 (2019).
     “PecanPy:        a fast,      efficient and paral-       [58] Guansong Pang, Chunhua Shen, Longbing Cao,
     lelized Python implementation of node2vec,”                   and Anton Van Den Hengel, “Deep learning for
     Bioinformatics       37,       3377–3379       (2021),        anomaly detection: A review,” ACM Computing
     https://academic.oup.com/bioinformatics/article-              Surveys (CSUR) 54, 1–38 (2021).
     pdf/37/19/3377/40556655/btab202.pdf.                     [59] Ronan Collobert, Jason Weston, Léon Bottou,
[47] Ashish Vaswani, Noam M. Shazeer, Niki Parmar,                 Michael Karlen, Koray Kavukcuoglu, and Pavel
     Jakob Uszkoreit, Llion Jones, Aidan N. Gomez,                 Kuksa, “Natural language processing (almost) from
     Lukasz Kaiser, and Illia Polosukhin, “Attention is            scratch,” Journal of machine learning research 12,
     all you need,” ArXiv abs/1706.03762 (2017).                   2493–2537 (2011).
[48] Jie Zhang, Yuxiao Dong, Yan Wang, Jie Tang, and          [60] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E
     Ming Ding, “Prone: Fast and scalable network rep-             Hinton, “Imagenet classification with deep convolu-
     resentation learning,” in Proceedings of the Twenty-          tional neural networks,” Advances in neural infor-
     Eighth International Joint Conference on Artificial           mation processing systems 25 (2012).
     Intelligence, IJCAI-19 (International Joint Confer-      [61] Volodymyr Mnih, Koray Kavukcuoglu, David Sil-
     ences on Artificial Intelligence Organization, 2019)          ver, Andrei A Rusu, Joel Veness, Marc G Bellemare,
     pp. 4278–4284.                                                Alex Graves, Martin Riedmiller, Andreas K Fidje-
[49] Afonso S. Bandeira, Amit Singer, and Daniel A.                land, Georg Ostrovski, et al., “Human-level control
     Spielman, “A cheeger inequality for the graph con-            through deep reinforcement learning,” nature 518,
     nection laplacian,” (2012).                                   529–533 (2015).
[50] Matt Ranger, “nodevectors,” https://github.com/          [62] David Silver, Aja Huang, Chris J Maddison, Arthur
     VHRanger/nodevectors (2021).                                  Guez, Laurent Sifre, George Van Den Driessche, Ju-
[51] Angelo A Salatino, Francesco Osborne, and Enrico              lian Schrittwieser, Ioannis Antonoglou, Veda Pan-
     Motta, “How are topics born? understanding the                neershelvam, Marc Lanctot, et al., “Mastering the
     research dynamics preceding the emergence of new              game of go with deep neural networks and tree
     areas,” PeerJ Computer Science 3, e119 (2017).                search,” nature 529, 484–489 (2016).
[52] Angelo A Salatino, Francesco Osborne, and En-            [63] Tobias Glasmachers, “Limits of end-to-end learn-
     rico Motta, “Augur: forecasting the emergence of              ing,” in Asian conference on machine learning
     new research topics,” in Proceedings of the 18th              (PMLR, 2017) pp. 17–32.
     ACM/IEEE on joint conference on digital libraries        [64] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton,
     (2018) pp. 303–312.                                           “Deep learning,” nature 521, 436–444 (2015).
[53] Federico Battiston, Enrico Amico, Alain Barrat,          [65] John McCarthy, Marvin L Minsky, Nathaniel
     Ginestra Bianconi, Guilherme Ferraz de Arruda,                Rochester, and Claude E Shannon, “A proposal
     Benedetta Franceschiello, Iacopo Iacopini, Sonia              for the dartmouth summer research project on ar-
     Kéfi, Vito Latora, Yamir Moreno, et al., “The                tificial intelligence, august 31, 1955,” AI magazine
     physics of higher-order interactions in complex sys-          27, 12–12 (2006).
     tems,” Nature Physics 17, 1093–1098 (2021).              [66] Alec Radford, Jong Wook Kim, Chris Hallacy,
[54] Bruno Coelho Coutinho, Ang-Kun Wu, Hai-Jun                    Aditya Ramesh, Gabriel Goh, Sandhini Agarwal,
     Zhou, and Yang-Yu Liu, “Covering problems and                 Girish Sastry, Amanda Askell, Pamela Mishkin,
     core percolations on hypergraphs,” Phys. Rev. Lett.           Jack Clark, et al., “Learning transferable visual
     124, 248301 (2020).                                           models from natural language supervision,” in Inter-
[55] Hanwen Liu, Huaizhen Kou, Chao Yan, and Liany-                national Conference on Machine Learning (PMLR,
     ong Qi, “Link prediction in paper citation network            2021) pp. 8748–8763.
     to construct paper correlation graph,” EURASIP           [67] Andreas Fürst, Elisabeth Rumetshofer, Viet Tran,
     Journal on Wireless Communications and Network-               Hubert Ramsauer, Fei Tang, Johannes Lehner,
     ing 2019, 1–12 (2019).                                        David Kreil, Michael Kopp, Günter Klambauer,
[56] Niklas Reisz, Vito D P Servedio, Vittorio Loreto,             Angela Bitto-Nemling, et al., “Cloob: Modern
     William Schueller, Márcia R Ferreira, and Stefan             hopfield networks with infoloob outperform clip,”
     Thurner, “Loss of sustainability in scientific work,”         arXiv:2110.11316 (2021).
     New Journal of Physics 24, 053041 (2022).                [68] Melanie Mitchell, “Abstraction and analogy-making
[57] Donghwoon Kwon, Hyunjoo Kim, Jinoh Kim,                       in artificial intelligence,” Annals of the New York
     Sang C Suh, Ikkyun Kim, and Kuinam J Kim, “A                  Academy of Sciences 1505, 79–101 (2021).
     survey of deep learning-based network anomaly de-        [69] Yann LeCun, “A path towards autonomous machine
                                                                   intelligence,” openreview preprint (2022).