0% found this document useful (0 votes)
19 views21 pages

1 Xyz

It is a research paper on Graph Machine Learning.

Uploaded by

avinandansh08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views21 pages

1 Xyz

It is a research paper on Graph Machine Learning.

Uploaded by

avinandansh08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 1

Graph Machine Learning in the Era of


Large Language Models (LLMs)
Wenqi Fan, Shijie Wang, Jiani Huang, Zhikai Chen, Yu Song, Wenzhuo Tang,
Haitao Mao, Hui Liu, Xiaorui Liu, Dawei Yin, Qing Li

Abstract—Graphs play an important role in representing complex relationships in various domains like social networks, knowledge
graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in
Graph Machine Learning (Graph ML), facilitating the representation and processing of graph structures. Recently, LLMs have
demonstrated unprecedented capabilities in language tasks and are widely adopted in a variety of applications such as computer vision
and recommender systems. This remarkable success has also attracted interest in applying LLMs to the graph domain. Increasing efforts
arXiv:2404.14928v2 [cs.LG] 4 Jun 2024

have been made to explore the potential of LLMs in advancing Graph ML’s generalization, transferability, and few-shot learning ability.
Meanwhile, graphs, especially knowledge graphs, are rich in reliable factual knowledge, which can be utilized to enhance the reasoning
capabilities of LLMs and potentially alleviate their limitations such as hallucinations and the lack of explainability. Given the rapid progress
of this research direction, a systematic review summarizing the latest advancements for Graph ML in the era of LLMs is necessary to
provide an in-depth understanding to researchers and practitioners. Therefore, in this survey, we first review the recent developments in
Graph ML. We then explore how LLMs can be utilized to enhance the quality of graph features, alleviate the reliance on labeled data, and
address challenges such as graph heterogeneity and out-of-distribution (OOD) generalization. Afterward, we delve into how graphs can
enhance LLMs, highlighting their abilities to enhance LLM pre-training and inference. Furthermore, we investigate various applications
and discuss the potential future directions in this promising field.

Index Terms—Graph Machine Learning, Graph Foundation Models, Graph Learning, Large Language Models (LLMs), Pre-training and
Fine-tuning, Prompting, Representation Learning.

1 I NTRODUCTION

G R aph data are widespread in many real-world appli-


cations [1], [2], including social graphs, knowledge
graphs, and recommender systems [3]–[5]. Typically, graphs GNNs
Recommendations

Social Network Citation Network


consist of nodes and edges, e.g., in a social graph, nodes
Drug Discovery
represent users and edges represent relationships [6], [7]. In ...
addition to the topological structure, graphs tend to possess
various features of nodes, such as textual description, which Knowledge Graph Molecular Graph GNNs+LLMs Q&A
provide valuable context and semantic information about
Text Features
nodes. To effectively model the graph, Graph Machine Learning Graph Description: <node_1> is connected to
(Graph ML) has garnered significant interest. With the advent <node_3>, <node_4>, <node_7> within one hop... Molecule

of deep learning (DL), Graph Neural Networks (GNNs)


Graph Feature: Paper_1's Title is ... and the
Author is ....; Paper_2's Title is ...
LLMs ...
Prediction

have become a critical technique in Graph ML due to their


message-passing mechanism. This mechanism allows each
Figure 1: Illustration of the application of Large Language
node to obtain its representation by recursively receiving
Models (LLMs) in graph machine learning. The integration
and aggregating messages from neighboring nodes [8],
of LLMs with Graph Neural Networks (GNNs) is utilized
[9], thereby capturing the high-order relationships and
to model an extensive range of graph data across various
dependencies within the graph structure. To mitigate the
downstream tasks.

• W. Fan is with the Department of Computing (COMP) and Department


of Management and Marketing (MM), The Hong Kong Polytechnic reliance on supervised data, many research focused on
University. E-mail: wenqifan03@gmail.com. developing self-supervised Graph ML methods to advance
• S. Wang, and Q. Li are with the Department of Computing, The GNNs to capture transferable graph patterns, enhancing
Hong Kong Polytechnic University. E-mail: shijie.wang@connect.polyu.hk;
csqli@comp.polyu.edu.hk.
their generalization capabilities across various tasks [10]–[13].
• J. Huang is with Wuhan University. E-mail: huangjiani@whu.edu.cn. Given the exponential growth of applications of graph data,
• Z. Chen, Y. Song, W. Tang, H. Mao, and H. Liu are with Michigan researchers are actively working to develop more powerful
State University. E-mail: {chenzh85, songyu5, tangwen2, haitaoma, Graph ML methods.
liuhui7}@msu.edu.
• X. Liu is with North Carolina State University. E-mail: xliu96@ncsu.edu. Recently, Large Language Models (LLMs) have started
• D. Yin is Senior Director of Engineering at Baidu Inc. E-mail: a new trend of AI and have shown remarkable capabilities
yindawei@acm.org. in natural language processing (NLP) [14], [15]. With the
(Corresponding authors: Wenqi Fan and Qing Li.) evolution of these models, LLMs are not only being applied
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 2

Deep Learning on +
Graphs ...
Section 3 Backbone Architecture Graph Pretext Tasks Downstream Adaptation

G1 Out1
...
...
LLMs for Graph Gn
...
Outn
Models ... ...
Solving Vanilla GNN
Section 4 Enhancing Feature Quality Training Limitations Heterophily and Generalization

+ +
Reasoning Path
Graphs for LLMs Answer

Evident Graph
Section 5 KG-enhanced LLM Pre-training KG-enhanced LLM Inference

Applications ...
Section 6 Recommender Systems Knowledge Graphs AI for Science Robot Task Planning

Future Directions + + + + + ...


Section 7 Generalization and Multi-modal Graph
Trustworthiness Efficiency
Transferability Learning

Figure 2: The outline of our survey. Section 3 Deep Learning on Graphs explores the development of DNN-based
methods, focusing on the Backbone Architecture, Graph Pretext Tasks, and Downstream Adaption three aspects. Section 4
LLMs for Graph Models explore how current LLMs help the current Graph ML towards GFMs from Enhancing Feature
Quality, Solving Vanilla GNN Training Limitations, and Heterophily and Generalization three aspects. Section 5 Graph
for LLMs focuses on Knowledge Graph(KG)-enhanced LLM Pre-training and KG-enhanced LLM Inference. Section 6
Applications presents various applications, including Recommender System, Knowledge Graph, AI for Science, and Robot
Task Planning. Section 7 Future Directions discusses potential future directions for LLMs in graph machine learning from
the Generalization and Transferability, Multi-modal Graph Learning, Trustworthiness and Efficiency.

to language tasks but also showcasing great potentials in of vanilla Graph ML on labeled data, where they make
various applications such as CV [16], and Recommender inferences based on implicit and explicit graph structure
System [17]. The effectiveness of LLMs in complex tasks information [21]–[23]. For instance, InstructGLM [21] fine-
is attributed to their extensive scale in both architecture tunes models like LlaMA [24] and T5 [25] by serializing graph
and dataset size. For example, GPT-3 with 175 billion data as tokens and encoding structural information about
parameters demonstrates exciting capabilities by generating the graph to solve graph tasks. Secondly, to overcome the
human-like text, answering complex questions, and coding. challenge of feature quality, some methods further employ
Furthermore, LLMs are able to grasp extensive general LLMs to enhance the quality of graph features [26]–[28]. For
knowledge and sophisticated reasoning due to their vast example, SimTeG [26] fine-tunes LLMs on textual graphs
training datasets. Therefore, their abilities in linguistic datasets to obtain textual attribute embeddings, which are
semantics and knowledge reasoning enable them to learn then utilized to augment the GNN for various downstream
semantic information. Additionally, LLMs exhibit emergence tasks. Additionally, some studies explore using LLMs to
abilities, excelling in new tasks and domains with limited address challenges such as heterogeneity [29] and OOD [27]
or no specific training. This attribute is expected to provide of graphs.
high generalisability across different downstream datasets On the other hand, although LLM achieves great success
and tasks even in few-shot or zero-shot situations. Therefore, in various fields, it still faces several challenges, including
leveraging the capabilities of LLMs in Graph Machine hallucinations, actuality awareness, and lacking explainabil-
Learning (Graph ML) has gained increasing interest and is ity [30]–[33]. Graphs, especially knowledge graphs, capture
expected to enhance Graph ML towards Graph Foundation extensive high-quality and reliable factual knowledge in
Models (GFMs) [18], [19]. a structured format [5]. Therefore, incorporating graph
GFMs are generally trained on extensive data and can structure into LLMs could improve the reasoning ability
be adapted for a wide range of downstream tasks [20]. By of LLMs and mitigate these limitations [34]. To this end,
exploiting the ability of LLMs, it is expected to enhance the efforts have been made to explore the potential of graphs in
ability of Graph ML to generalize a variety of tasks, thus augmenting LLMs’ explainability [35], [36] and mitigating
facilitating GFMs. Currently, researchers have made several hallucination [37], [38]. Given the rapid evolution and
initial efforts to explore the potential of LLMs in advancing significant potential of this field, a thorough review of recent
Graph ML towards GFMs. Figure 1 demonstrates an example advancements in graph applications and Graph ML in the
of integrating LLMs and GNNs for various graph tasks. era of LLMs is imperative.
Firstly, some methods leverage LLMs to alleviate the reliance Therefore, in this survey, we aim to provide a com-
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 3

prehensive review of Graph Machine Learning in the era structures. This technique involves a stochastic process of
of LLMs. The outline of the survey is shown in Figure 2: moving from one node to another within a graph, which
Section 2 reviews work related to graph machine learning is instrumental in understanding node connectivity and
and foundation models. Section 3 introduces the deep influence within networks. Building upon Random Walks,
learning methods on graphs, which focus on various GNN Graph Embedding methods aim to represent nodes (or edges)
models and self-supervised methods. Subsequently, the as low-dimensional vectors while preserving graph topology
survey delves into how LLMs can be used to enhance and node relationships. Representative methods such as
Graph ML in Section 4 and how graphs can be adopted LINE [49], DeepWalk [50], and Node2Vec [51] leverage
for augmenting LLMs in Section 5. Finally, some applications Random Walks to learn node representations, capturing local
and potential future directions for Graph ML in the era of structures and community information effectively.
LLMs are discussed in Section 6 and Section 7, respectively. Due to the exceptional representation learning and
Our main contributions can be summarized as follows: modeling capabilities, GNNs bolstered by deep learning have
• We detail the evolution from early graph learning methods brought significant advances in graph learning. For example,
to the latest GFMs in the era of LLMs; GCNs [52] introduce convolutional operations to graph data,
• We provide a comprehensive analysis of current LLMs en- enabling effective aggregation of neighborhood information
hanced Graph ML methods, highlighting their advantages for each node, thus enhancing node representation learning.
and limitations, and offering a systematic categorization; GraphSAGE [53] learns a function to aggregate information
• We thoroughly investigate the potential of graph structures from a node’s local neighborhood in an inductive setting,
to address the limitations of LLMs; allowing efficient embedding generation for unseen nodes.
• We explore the applications and prospective future direc- GAT [54] further advances GNNs by integrating attention
tions of Graph ML in the era of LLMs, and discuss both mechanisms, assigning varying weights to nodes in a
research and practical applications in various fields. neighborhood, thereby sharpening the model’s ability to
Concurrent to our survey, Wei et al. [39] review the focus on significant nodes. Inspired by the success of
development of graph learning. Zhang et al. [40] provide transformers [55] in NLP and CV, several studies [56]–[60]
a prospective review of large graph models. Jin et al. [41] adopt self-attention mechanisms to graph data, providing
and Li et al. [42] review different techniques for pre- a more global perspective of graph structures and interac-
training language models (in particular LLMs) on graphs tions. Recent works [61]–[65] further leverage transformer
and applications to different types of graphs, respectively. architectures to enhance graph data modeling. For example,
Liu et al. [43] review the Graph Foundation Models according GraphFormer [61] integrates GNN within each layer in the
to the pipelines. Mao et al. [19] focus on the fundamental transformer, enabling simultaneous consideration of textual
principles and discuss the potential of GFMs. Different and graph information.
from these concurrent surveys, our survey provides a more The advancements in LLMs have given rise to graph
comprehensive review with the following differences: (1) learning. Recent works [21], [22], [27], [66], [67] apply
we present a more systematic review of the development techniques from these advanced language models like
of Graph Machine Learning and further exploration of LLaMA [24] or ChatGPT to graph data, resulting in models
LLMs for Graph ML towards GFMs; (2) we present a capable of understanding and handling graph structures
more comprehensive and fine-grained taxonomy of recent in a manner similar to natural language processing. A
advancements of Graph ML in the era of LLMs; (3) we typical approach, GraphGPT [23], tokenizes graph data for
delve into the limitations of recent Graph ML, and provide insertion into LLMs (i.e. Vicuna [68] and LLaMA [24]) thus
insights into how to overcome these limitations from LLM’s providing a powerful generalization capability. GLEM [69]
perspective; (4) we further explore how graphs can be used further integrates the graph models and LLMs, specifically
to augment LLMs; and (5) we thoroughly summarize a broad DeBERTa [70], within a variational Expectation-Maximization
range of applications and present a more forward-looking (EM) framework. It alternates between updating LLM and
discussion on the challenges and future directions. GNN in the E-step and M-step, thereby scaling efficiently
and improving effectiveness in downstream tasks.
2 RELATED WORK
In this section, we briefly review some related works in 2.2 Foundation Models (FMs)
the fields of graph machine learning and foundation model Foundation Models (FMs) represent a significant break-
techniques. through in the field of artificial intelligence, characterized
by their ability to be extensively pre-trained on large-scale
2.1 Graph Machine Learning datasets and adapted to a variety of downstream tasks. These
As one of the most active fields in artificial intelligence, models are distinguished by their extensive pre-training on
graph learning has attracted considerable attention to its large-scale datasets and their adaptability to a wide range of
capability to model complex relationships and structures downstream tasks. It is worth noting that FMs are not limited
in data represented as graphs [44]. Nowadays, it has been to a single field and can be found in natural language [14],
widely adopted in various applications, including social [15], vision [71], [72], and graph domains [19], [43], serving
network analysis [45], protein detection [46], recommender as a promising research direction.
systems [47], [48], etc. In the realm of vision, Visual Foundation Models (VFMs)
The initial phases of graph learning typically use Random have gained significant success, making substantial impacts
Walks, which is a foundational method for exploring graph on areas such as image recognition, object detection, and
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 4

scene understanding. Specifically, VFMs benefit from pre- adaptability, and flexibility. Furthermore, recent advance-
training on extensive and diverse image datasets, allowing ments demonstrate the successful integration of LLMs with
them to learn intricate patterns and features. For instance, other models, like recommender system [17], reinforcement
models such as DALL-E [73], and CLIP [71] leverage self- learning (RL) [100], GNNs [26], [101]–[103]. This integration
supervised learning to understand and generate images enables LLMs to tackle both traditional and novel challenges,
based on textual descriptions, demonstrating remarkable proposing prospective avenues for applications.
cross-modal understanding capabilities. Recent Visual Chat- LLMs have found applications in diverse sectors
GPT [72] integrates ChatGPT with a series of Visual like chemistry [104], [105], education [106], [107], and
Foundation Models (VFMs), making it perform a variety finance [108], [109]. In these fields, they contribute to
of complex visual tasks. These VFMs allow models to learn various tasks from data analysis to personalized learning.
from a broader range of visual data, thereby improving their Particularly, LLMs exhibit great potential in graph tasks such
generalizability and robustness. as graph classification and link prediction, demonstrating
In the sphere of Natural Language Processing (NLP), their versatility and broad applicability. Specifically, several
Large Language Models (LLMs) such as ChatGPT and studies like Simteg [26], GraD [102], Graph-Toolformer [101],
LLaMA have also revolutionized the field [74]. Characterized Graph CoT [110], and Graphologue [103] have notably
by their extensive scale, LLMs are trained on billions of advanced graph learning. These models utilize LLMs for
parameters using extensive textual datasets, which enable textual graph learning, graph-aware distillation, and graph
them to excel in comprehending and generating natural reasoning, illustrating the potential of LLMs in enhancing
language. The landscape of pre-trained language models the understanding of and interaction with complex graph
is diverse, such as GPT (Generative Pre-trained Trans- structures.
former) [14], BERT (Bidirectional Encoder Representations Although FMs have revolutionized Vision and NLP
from Transformers) [15] and T5 (Text-To-Text Transfer domains, the development of Graph Foundation Models
Transformer) [25]. These models can broadly fall into three (GFMs) is still in the nascent stages. With the rapid evolution
categories: encoder-only, decoder-only, and encoder-decoder and significant potential of this field, it is imperative to
models. Encoder-only models, such as BERT, specialize continue exploring and developing advanced techniques
in understanding and interpreting language. In contrast, that can further enhance Graph ML towards GFMs.
decoder-only models like GPT excel in generating coherent
and contextually relevant text. Encoder-decoder models, like 3 D EEP L EARNING ON G RAPHS
T5, combine both abilities, efficiently performing various
With the rapid development of deep neural networks
NLP tasks from translation to summarization.
(DNNs), GNN techniques modeling graph structure and
As an encoder-only model, BERT introduces a paradigm
node attributes for representation learning have been
in NLP with its innovative bi-directional attention mech-
widely explored and have become one key technology in
anism, which analyzes text from both directions simul-
Graph ML. While vanilla GNNs demonstrate proficiency in
taneously, unlike its predecessors like transformer which
various graph tasks, they still encounter several challenges
processed text in a single direction (either left-to-right or
such as scalability, generalization to unseen data, and
right-to-left). This feature allows BERT to attain a com-
limited capability in capturing complex graph structures. To
prehensive context understanding, significantly improving
overcome these limitations, many efforts have been made to
its language nuance comprehension. On the other hand,
improve GNN with the self-supervised paradigm. Therefore,
decoder-only models such as GPT, including variants like
to provide a comprehensive review of these methods, in
ChatGPT, utilize a unidirectional self-attention mechanism.
this section, we first introduce the backbone architecture,
This design makes them particularly effective in predicting
including GNN-based models and graph transformer-based
subsequent words in a sequence, thus excelling in tasks like
models. After that, we explore two important aspects of
text completion, creative writing, language translation, and
self-supervised graph ML models: graph pretext tasks
code generation [75]. Additionally, as an encoder-decoder
and downstream adaptation. Note that a comprehensive
model, T5 uniquely transforms a variety of NLP tasks as
summary of these methods is presented in Table 1.
text generation problems. For example, it reframes sentiment
analysis from a classification task to a text generation task,
where input like ”Sentiment: Today is sunny” would prompt 3.1 Backbone Architecture
T5 to generate an output such as ”Positive”. This text-to-text As one of the most active fields in the artificial intelligence
approach underscores T5’s versatility and adaptability across (AI) community, various GNN methods have been proposed
diverse language tasks. to solve various tasks. The powerful capability of these
The evolution of LLMs has seen the emergence of models is largely dependent on the development of their
advanced models like GPT-3 [97], LaMDA [98], PaLM [99], backbone architectures. Therefore, in this subsection, we
and Vicuna [68]. These models represent significant advances focus on two broadly used architectures: neighborhood
in NLP, distinguished by their enhanced capabilities in aggregation-based models and graph transformer-based
comprehending and generating complex, fine-grained lan- models.
guage. Their training methods are usually more sophisticated,
involving larger datasets and more powerful computational 3.1.1 Neighborhood Aggregation-based Model
resources. This scaling up has led to unprecedented lan- Neighborhood aggregation-based models are the most
guage understanding and generation capabilities, exhibiting popular graph learning architectures that have been ex-
emergent properties such as in-context learning (ICL), tensively studied and applied in various downstream
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 5

Table 1: A comparison of various DNN-based models. We present Models and their Architecture, Pretext Task, Adaptation
Method, and Downstream Tasks. URL in Adaptation Method indicates Unsupervised Representation Learning.

Model Architecture Pretext Task Adaptation Method Downstream Tasks


DGI [76] GNN Contrastive Learning URL Node
GRACE [77] GNN Contrastive Learning URL Node
GraphMAE [78] GNN Graph Generation URL Node, Graph
MVGRL [79] GNN Contrastive Learning URL Node, Graph
GraphCL [10] GNN Contrastive Learning Fine-tuning Node, Graph
CSSL [11] GNN Contrastive Learning Fine-tuning Graph
GCC [13] GNN Contrastive Learning URL&Fine-tuning Node, Graph
G-BERT [80] BERT Graph Generation Fine-tuning Recommendation
AdapterGNN [81] GNN Multi-task Fine-tuning Graph
GROVER [82] Graph Transformer Property Prediction Fine-tuning Graph
Graph-Bert [60] Graph Transformer Graph Generation URL&Fine-tuning Node
G-Adapter [83] Graph Transformer Multi-task Fine-tuning Graph
GraphGPT [84] Graph Transformer Graph Generation Fine-tuning Node, Edge, Graph
MoMu [85] BERT, GNN Contrastive Learning Fine-tuning Graph
TOUCHUP-G [86] BERT, ViT, GNN Contrastive Learning Fine-tuning Node, Edge
GraphPrompt [87] GNN Contrastive Learning Prompt Tuning Node, Graph
GPPT [88] GNN Contrastive Learning Prompt Tuning Node
PGCL [89] GNN Contrastive Learning Prompt Tuning Node, Edge, Graph
GPF [90] GNN Multi-task Prompt Tuning Graph
ProG [91] GNN, Graph Transformer Contrastive Learning Prompt Tuning Node, Edge, Graph
ULTRA-DP [92] GNN Multi-task Prompt Tuning Node
SAP [93] GNN Contrastive Learning Prompt Tuning Node, Graph
PRODIGY [94] GNN Multi-task Prompt Tuning Node, Edge
SGL-PT [95] GNN Multi-task Prompt Tuning Node, Graph
DeepGPT [96] Graph Transformer Graph Regression Prompt Tuning Graph

tasks. These models operate based on the message-passing Lehman isomorphism test, making it widely chosen as the
mechanism [111], which updates a node‘s representation by backbone model for a lot of structure-intensive tasks.
aggregating the features of its neighboring nodes along with Although these models are widely adopted to solve graph
its own features. Formally, this process can be represented tasks, they still suffer from some inherent limitations, such
as: as over-smoothing and lack of generalization. In addition,
the lower amount of parameters also limits the modeling
mu = Aggregate(fv , v ∈ Nu ), (1) capacity as the backbone model to serve multiple datasets
fu′ = U pdate(mu , fu ), (2) and tasks.

where, for each node u, a message mu is generated 3.1.2 Graph Transformer-based Model
through the aggregation function from its neighboring While neighborhood aggregation-based GNN models
nodes. Subsequently, the graph signal f is updated with have shown remarkable performance in processing graph-
the message. structured data, they suffer from some limitations. A
GCN is a typical method designed to leverage both the significant challenge for these models is their difficulty
graph structure and the node attributes. This architecture in handling large graphs due to their reliance on local
updates node representations by aggregating neighboring neighborhood information and their limited capacity in
features with the node’s own. As the number of net- capturing long-range dependencies within the graph [64],
work layers increases, each captures an increasingly larger [114], [115]. To overcome these problems, inspired by the
neighborhood. Owing to the efficiency and performance, success of the transformer model in various NLP tasks, graph
GCN [52] has been widely applied by several methods such transformer-based models have been proposed [57], [62],
as CSSL [11] and PRODIGY [94]. GraphSAGE [53] is another [64]. These models leverage the self-attention mechanism
notable neighborhood aggregation-based model. Due to its to adaptly capture both local and global graph structures,
inductive paradigms, GraphSAGE can easily generalize to which allows the model to stack multiple layers without
unseen nodes or graphs, making it widely employed by over-smoothing. Due to the lower inductive bias, graph
many studies such as PinSage [112] for inductive learning. transformer-based models can learn the structural patterns
Additionally, several studies [78], [91], [94] incorporate Graph from data rather than solely relying on the graph structure.
Attention Networks (GATs) [54] as the backbone architecture. Additionally, transformers have demonstrated great scaling
GATs integrate attention mechanisms into GNNs, assigning behavior in CV and NLP, suggesting that their performance
variable weights to neighboring nodes, thereby focusing on can keep improving with more data and parameters.
the most relevant parts of the input graph for improved Graph transformer-based models have been widely
node representations. As another important model in the applied as a backbone architecture in various tasks [60], [82],
family of GNNs, Graph Isomorphism Network (GIN) [113] [83], [96], [116]. For example, Heterformer [117] introduces
has also been widely used [10], [13], [87], [95], due to a graph-empowered Transformers architecture by adding
its powerful representation ability. Its unique architecture neighbor tokens into each language Transformer layer.
guarantees the expressiveness equivalent to the Weisfeiler- Edgeformers [118] propose to encode text and structure
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 6

Tuned
Pre-training Tasks Downstream Tasks Downstream Tasks Generally, graph augmentations can be broadly catego-
Frozen
rized into two types: 1) feature perturbation and 2) topology
perturbation. They assume that tiny changes in the feature
Pre-training Pre-trained Pre-trained
Input graph GNNs GNNs GNNs or structural space do not change the semantics of the
Node/Edge/(sub)graph. Feature perturbation involves per-
turbing the features of the nodes in the graph. For example,
GRACE [77] randomly masks the node features to learn
more robust representations. On the other hand, topology
perturbation mainly involves modifying the structure of
Prompt
the graph. A typical example is CSSL [11] which employs
... ... ...
... Pre-training Fine-tuning Prompt Tuning
strategies like edge perturbation or node dropping to adopt
graph-graph level contrast, thereby enhancing the robustness
of representations.
Figure 3: A comparison of pre-training, fine-tuning, and Regarding the scale of contrast, the approaches can
prompt tuning. (a) Pre-training involves training the GNN be divided into node-level and graph-level. For example,
model based on specific pre-training tasks. (b) Fine-tuning GRACE [77] computes the similarities between node-level
updates the parameters of the pre-trained GNN model embeddings to learn discriminative node representations.
according to the downstream tasks. (c) Prompt tuning GCC [13] also works at the node level but learns local
generates and updates the features of the prompt according structural patterns by sampling a node’s neighbors to
to the downstream tasks, while keeping the pre-trained GNN obtain subgraphs (positive pairs) and contrasting them with
model fixed and without any modification. randomly selected non-contextual subgraphs (negative pairs).
In contrast, DGI [76] contrasts node-level embeddings with
graph-level embedding to capture global graph structures.
inside each Transformer layer jointly. Graph-Bert [60]
GraphCL [10] takes a different approach by implementing
employs a transformer to pre-train on the graph dataset
graph-to-graph level contrast, thereby learning robust repre-
with feature and edge reconstruction tasks and then fine-
sentations. The scale used for pre-training has a huge impact
tunes for various downstream tasks. Similarly, GROVER [82]
on the downstream performance. When adopting contrastive
introduces a self-supervised graph transformer-based model
learning as the pre-training task, one key challenge is how to
designed specifically for large-scale molecular data. It pre-
design the objective such that the embeddings learned can
trains on extensive molecular datasets and then fine-tunes
account for downstream tasks of different scales.
for specific downstream tasks. GraphGPT [84] employs a
Graph Generation methods aim to learn the distribution of
(semi-)Eulerian path to transform the graph into a sequence
graph data to enable graph generation or reconstruction. In
of tokens, and then feeds the sequence into the transformer.
contrast to models in CV that predict masked image patches,
Specifically, it constructs a dataset-specific vocabulary such
or in NLP that predict the next token in a sequence, graph
that each node can correspond to a unique node ID.
data presents a unique challenge due to its interconnected
Despite graph transformer-based models that can some-
nature. Consequently, graph generation methods typically
how address the limitations of traditional GNNs, they
work on the feature or structural space. Feature generation
also face several challenges. One of the challenges is
methods focus on masking the features of one or a subset
the quadratic complexity caused by self-attention, which
of nodes and then training the model to recover the
becomes particularly problematic for large-scale graphs. In
masked features. For instance, GraphMAE [78] utilizes a
addition, there is a risk of losing some information about the
masked autoencoder framework to reconstruct masked graph
original graph structure when serializing the graph.
portions based on their context, effectively capturing the
underlying node semantics and their connection patterns.
3.2 Self-Supervised Learning on Graphs Alternatively, structure generation methods concentrate on
To adapt GNNs to various graph tasks, many self-supervised training the model to recover the graph structure. The
learning methods have been proposed and extensively method GraphGPT [84] encodes the graph into sequences of
studied. These approaches enable GNNs to learn graph tokens and then employs a transformer decoder to predict
representations from the pre-training task and transfer them the next token of the sequence to recover the connectivity
to various downstream tasks, such as node classification, of the graph. In addition, Graph-Bert [60] is trained on
graph classification, and link prediction. Therefore, in this both node attribute recovery and graph structure recovery
subsection, we will introduce graph self-supervised learning tasks to ensure that the model captures local node attribute
methods from pretext tasks and downstream adaptation, information while maintaining a global view of the graph
respectively. structure.
Graph Property Prediction methods gain guidance from
3.2.1 Graph Pretext Tasks the node-, edge-, and graph-level properties, which are
Graph Contrastive Learning aims to learn augmentation inherently present in the graph data. These methods follow
representations by contrasting similar and dissimilar graph a training approach similar to supervised learning, as both
data pairs, effectively identifying nuanced relationships and utilize ”sample-label” pairs for training. The key distinction
structural patterns. We can review graph contrastive learning lies in the origin of the labels: in supervised learning, labels
from two perspectives: graph augmentations and the scale are manually annotated by human experts which can be
of contrast. costly in real scenarios, whereas in property-based learning,
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 7

the labels are automatically generated from the graph using demand labeled data for supervision, and obtaining these
some heuristics or algorithms. For example, GROVER [82] annotations can be resource-intensive in terms of time and
utilizes professional software to extract the information on cost. Secondly, real-world graphs often contain abundant
graph motifs as labels for classification. Similarly, [119] textual information, which is crucial for downstream tasks.
leverage statistical properties of the graph for graph self- However, GNNs typically rely on shallow text embeddings
supervised learning. for semantic extraction, thereby limiting their capacity to
capture intricate semantics and text features. Moreover, the
3.2.2 Downstream Adaptation diversity of graphs presents challenges for GNN models in
Unsupervised Representation Learning (URL) is a common terms of generalization across diverse domains and tasks.
method due to the scarcity of labeled data in the real Recently, LLMs have achieved remarkable success in
world [76]–[79]. In URL, the pre-trained graph encoder handling natural language, with exciting features like (1)
is frozen and only a task-specific layer is learned during conducting zero/few-shot predictions and (2) providing a
downstream tuning. The learned representations are then unified feature space. These capabilities present a potential
directly fed into decoders. This pattern allows URLs to be ef- solution to address the above challenges faced by Graph ML
ficiently applied to downstream tasks. For example, DGI [76] and GFMs. Therefore, this section aims to investigate the
trains an encoder model to learn node representations within contributions that current LLMs can make to enhance Graph
graph-structured. Node representations can then be used for ML’s progress towards GFMs, while also examining their
downstream tasks. However, due to the gap between the current limitations, as Figure 4 shows.
pretext task and downstream tasks, URL can also lead to
suboptimal performance.
Fine-tuning is the default method to adapt a pre-trained 4.1 Enhancing Feature Quality
model to a certain downstream task. As shown in Figure 3, it Graphs encompass diverse attribute information, spanning
adds a randomly initialized task header (e.g., a classifier) on text, images, audio, and other multi-modal modes. The
top of the pre-trained model, and during fine-tuning, both semantics of these attributes play a crucial role in a range
the backbone model and the header are jointly trained [10], of downstream tasks. In comparison with earlier pre-trained
[11], [60]. Compared with URL, fine-tuning provides more models, LLMs stand out due to their substantial parameter
flexibility as it allows changes in the backbone parameters, volume and training on extensive datasets, endowing them
and one can choose the layers to be tuned while keeping with rich open-world knowledge. Consequently, researchers
others fixed. Additionally, recent studies [10], [81], [83] are exploring the potential of LLMs to improve feature
further explore advanced graph fine-tuning methods that go quality and align feature space. This section delves into
beyond naive fine-tuning. For instance, AdapterGNN [81] research endeavors aimed at leveraging LLMs to accomplish
introduces two trainable adapters in parallel before and these goals.
after the message passing. It freezes the GNN model
during fine-tuning while only tuning the adapters, enabling 4.1.1 Enhancing Feature Representation
parameter-efficient fine-tuning with minimal influence on Researchers utilize the powerful language understanding
the downstream performance. capabilities of LLMs to generate better representations
Prompt-tuning: ”Pre-training & fine-tuning” is prevalent for text attributes compared to traditional shallow text
in adapting pre-trained models to specific downstream embeddings [27], [120], [121]. For example, Patton [154]
tasks, but it overlooks the gap between pre-training and proposes to pre-train a language model on the target graph
downstream tasks, potentially limiting generalization capa- to obtain high-quality feature representation. METERN [155]
bilities. Moreover, fine-tuning for different tasks also leads introduces a soft prompt-based method to learn node
to significant time and computational costs. Inspired by multiplex embeddings for different edge types with one
recent advancements in NLP, several methods [87]–[93], [95], language model encoder. Chen et al. [27] utilize LLMs as text
[96] have presented the potential of introducing prompts encoders and the GNN model as a predictor, validating the
to adapt pre-trained models to specific tasks as illustrated effectiveness of LLMs as an enhancer in node classification
in Figure 3. Specifically, Prompt-tuning first unifies the tasks. In LKPNR [120], an LK-Aug news encoder enhances
downstream task with the pre-trained task into the same the news recommender system by concatenating LLM
paradigm, followed by the introduction of learnable prompts embeddings with entity embeddings within the news text to
for tuning. For example, GPPT [88] first reframe node obtain an enriched news representation. Several researchers
classification as link predictions. GraphPrompt [87] further explore fine-tuning LLMs to obtain text representations better
extends graph classification into link prediction. On the suited for downstream graph tasks. SimTeG [26] treats node
other hand, Prog [91] unifies all the downstream tasks classification and link prediction tasks as text classification
into subgraph classification. The inserting prompt including and text similarity tasks, fine-tuning PLMs using LoRA [156]
vectors [87], [88], [90], node [95] and sub-graph [91]. By on the TAG dataset. The fine-tuned PLMs are then used to
inserting these prompts, the pre-trained parameters can generate embeddings for text attributes, followed by GNN
be utilized in a way that aligns more closely with the training for downstream tasks.
requirements of the downstream tasks.
4.1.2 Generating Augmented Information
4 LLM S FOR G RAPH M ODELS Several studies investigate leveraging the generation capabil-
Despite great potential, Graph ML based on the GNNs has its ities and general knowledge of LLMs to generate augmented
inherent limitations. Firstly, vanilla GNN models commonly information from original textual attributes. TAPE [122]
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 8

Enhancing Feature Quality Solving Vanilla GNN Training Limitations Heterophily and Generalization

Ignoring Implicit Explicit


Graph Structure Structure Structure
0 1 0 1 0 1

Attributes Multi-modal 2 2 2 Knowledge Graph Citation Network


Information
Title text text More
Abstract graph Graphs...
embedding
Molecular Graph

Large Language Models for


ChatGPT LLaMA PaLM Vicuna
Graph Machine Learning

Enhancing Feature Representation Predictions


Title Embedding Social Network
Node Node1: A
Abstract Embedding
Classification Node2: B
Generating Augmented Aligning Feature Instruction
Information Space Who is the key influencer in the
Link Prediction Node 1-> Node2
network?
New
Answer
Attributes
More Tasks Predictions... The key influencer in the
network is Annie.

Figure 4: Illustration of LLMs for Graph ML. (1) Methods using LLMs for Enhancing Feature Quality by enhancing feature
representation, generating augmented information, and aligning feature space. (2) Explorations for solving Vanilla GNN
Training Limitations are categorized based on how structural information in the graph is processed: ignoring structural
information, implicit structural information, and explicit structural information. (3) Research about employing LLMs to
alleviate the limitations of Heterophily and Generalization.

first leverages LLM to generate potential node labels and employs a similar approach to enhance item and user
explanations, utilizing text attributes (such as title and attributes in recommender systems. For instance, based on
abstract) as input. These labels and explanations generated by historical behavior information, LLM outputs user profiles
the LLM are regarded as augmented attributes. Subsequently, like age, gender, country, language, and preferred or disliked
these augmented attributes are encoded by a fine-tuned genres. For item attributes, taking movie information such as
language model (LM) and processed by a GNN model, which title as input, LLM generates outputs such as movie director,
integrates the graph structure for making final predictions. country, and language.
In contrast to TAPE, KEA [27] does not directly predict node In addition to generating augmented text attributes,
labels with LLM. Instead, LLM extracts terms mentioned in researchers also employ LLMs to enhance graph topological
textual attributes and provides detailed descriptions of these structures by generating or refining nodes and edges. In
terms. ENG [127], LLM is employed to generate new nodes and
In the domain of molecular property prediction, both their corresponding text attributes for each node category.
LLM4Mol [66] and GPT-MolBERTa [126] adopt a similar To integrate the generated nodes into the original graph, the
approach, where LLMs generate interpretations for input authors train an edge predictor using relations in the original
Simplified Molecular-Input Line-Entry System (SMILES) dataset as supervised signals. Sun et al. [128] leverage
notations as augmented attributes. LLMs to refine graph structures. Specifically, they let LLMs
In the realm of recommender systems, several methods remove unreliable edges by predicting the semantic similarity
leverage LLMs to enhance the textual attributes of both between node attributes. Additionally, they utilize pseudo-
users and items. LLM-Rec [125] enables LLMs to produce labels generated by LLMs to aid the GNN in learning proper
more detailed item descriptions by explicitly stating the edge weights.
recommendation intent within the prompt. RLMRec [123]
explores using LLM to enhance user preference. Specifically, 4.1.3 Aligning Feature Space
the LLM receives user and item information as input, In real-world scenarios, the text attributes of graphs across
generates user preferences, potential types of users that the different domains exhibit considerable diversity. Addi-
item may attract, and the reasoning process. LLMRec [124] tionally, beyond text modal attributes, the graph may
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 9

Table 2: A summary of LLM for Graph ML research. We present the GNN model, LLM model, predictor, domain, task,
datasets, and project link. FT is Fine-tuning, refers to whether modifications are made to the parameters of the LLM model
while PR is Prompting, involves inputting textual prompts to the LLM to obtain responses. In the context of task, ”node”
denotes node-level tasks such as node classification, ”edge” signifies edge-level tasks like link prediction, ”graph” represents
graph-level tasks such as graph classification, and ”structure” pertains to structure understanding tasks, such as node degree
counting.

Role Sub Category Method GNN Model LLM Model FT PR Domain Task Datasets Link
Chen et al. [27] GCN, GAT, MLP ChatGPT, LLaMA × × Citation, E-commerce Node 5 link
Enhancing Feature SimTeG [26] SAGE, MLP, etc. all- MiniLM-L6-v2, etc. ✓ × Citation, E-commerce Node, Edge 3 link
Representation ChatGLM2, RWKV,
LKPNR [120] wiki KG ✓ × Recommendation Click Prediction 1 link
LLaMA2
Enhancing Feature Quality

GRID [121] GAT INSTRUCTOR × × Robotics Robot Task Planning 2 link


GCN, SAGE,MLP,
TAPE [122] ChatGPT, LLaMA2 × ✓ Citation, E-commerce Node 5 link
RevGAT
KEA [27] GCN, GAT, MLP ChatGPT × ✓ Citation, E-commerce Node 5 link
RLMRec [123] LightGCN ChatGPT × ✓ Recommendation Recommendatoon 3 link
Generating Augmented LLMRec [124] LightGCN ChatGPT × ✓ Recommendation Recommendation 2 link
Information LLM-Rec [125] - text-davinci-003 ✓ × Recommendation Recommendation 2 -
LLM4Mo [66] - ChatGPT × ✓ Molecular Molecular Property Prediction 3 link
GPT-MolBERTa [126] - ChatGPT × ✓ Molecular Molecular Property Prediction 9 -
ENG [127] GCN,GAT ChatGPT × ✓ Citation Node -
Sun et al. [128] GAT, GCN, etc. ChatGPT × ✓ Citation Node 4 -
Aligning Feature Space TouchUp-G [86] SAGE, GAT, etc. BERT, etc. ✓ × Citation, E-commerce Node 4 link
OFA [129] GCN,GAT,etc. LLaMA-2-7B, etc. ✓ × Citation, Molecular, etc. Node, Edge, Graph 9 link
Ignoring Structural
Hu et al. [130] - ChatGPT,GPT-4 × ✓ Citation,KG Node, Edge 5 -
Information
GPT4Graph [22] - text-davinci-003 × ✓ - Structure,Graph,Node,KGQA 4 link
GraphText [29] - ChatGPT, LLaMA2-7B ✓ ✓ Citation, Web Node 7 -
NLGraph [131] - ChatGPT, GPT-4, etc. × ✓ - Structure 3 link
Solving Vanilla GNN Training Limitations

InstructGLM [21] - Flan T5, LLaMA ✓ ✓ Citation Node 3 link


ChatGPT, GPT-4,
LLMtoGraph [132] - × ✓ - Structure - link
Vicuna-13B, etc.
Implicit Structural GPT-4,
Graph Agent [133] - × ✓ Citation, Bioinformatics Node, Edge 2 -
Information embedding-ada-002
LLM-Prop [134] - T5 ✓ × materials science Crystal Property Prediction - link
GLRec [135] - BELLE-LLaMA-7B ✓ ✓ Recommendation Recommendation 1 -
ReLM [136] TAG,GCN GPT-3.5, Vicuna × ✓ Chemistry Chemical Reaction Prediction 5 link
Chen et al. [27] - ChatGPT × ✓ Citation, E-commerce Node 5 link
Hu et al. [130] ChatGPT,GPT-4 × ✓ Citation,KG Node, Edge 5 -
Huang et al. [137] - ChatGPT × ✓ Citation, E-commerce Node 5 link
Fatemi et al. [138] PaLM2 XXS, PaLM 62B × ✓ - Structure - -
MolReGPT [139] - ChatGPT × ✓ Molecular molecule-caption translation 1 link
LLaGA [140] - Vicuna-7B × ✓ Citation, E-commerce Node, Edge 4 link
GraphEdit [141] GCN Vicuna-v1.5 ✓ ✓ Citation Node 3 link
GraphGPT [23] Graph Transformer Vicuna-7B ✓ ✓ Citation Node 3 link
GraphLLM [142] Graph Transformer LLaMA2 × ✓ - Structure 4 link
GNP [143] GAT Flan T5 ✓ ✓ KG KGQA 4 -
DrugChat [144] GAT, etc. Vicuna-13B × ✓ Drug QA - link
Explicit Structure KoPA [145] RotateE Alpaca-7B ✓ ✓ Knowledge Graph Knowledge Graph Completion 3 link
Information GIMLET [146] - T5 ✓ ✓ Molecular Molecular Property Prediction 14 link
GIT-Mol [147] GIN MolT5 ✓ ✓ Molecular Molecular Generation, etc. 6 -
BioMedGPT [148] GIN LLaMA2-7B-Chat ✓ ✓ Biomedical QA 3 link
ProteinChat [149] GVP-GNN Vicuna-13B × ✓ Protein QA 1 link
DGTL [150] Disentangled GNN LLaMA-2-13B-chat × ✓ Citation, E-commerce Node 3 -
G-Retriever [151] GAT LLaMA-2-7B ✓ ✓ - QA 3 link
GraphToken [152] GCN,GIN,etc. PaLM 2 S × ✓ - Structure 1 -
Heterophily Chen et al. [27] - ChatGPT × ✓ Citation, E-commerce Node 5 link
HG

GraphText [29] - ChatGPT, LLaMA2-7B ✓ ✓ Citation, Web Node 7 -


Generalization OpenGraph [153] Graph Transformer Not mentioned × ✓ Citation, Drug, etc. Node, Edge 7 link

encompass various other modal attributes. Employing task information into the original input graph, allowing the
Pretrained Models (PMs) directly for encoding cross-domain GNN model to adaptively perform different tasks based on
and multi-modal features may not produce satisfactory the prompt graph.
results. Therefore, LLMs are employed to align feature
space and provide better representations. TouchUp-G [86] 4.2 Solving Vanilla GNN Training Limitations
introduces a graph-centric fine-tuning strategy aimed at
The training of vanilla GNNs relies on labeled data. However,
enhancing multi-modal features for graph-related tasks.
obtaining high-quality labeled data has long been associated
Initially, they present a novel feature homophily measure
with substantial time and costs. In contrast to GNNs, LLMs
to quantify the alignment between node features and the
showcase robust zero/few-shot capabilities and possess
graph structure. Building upon this measure, the authors
expansive open-world knowledge. This unique characteristic
devise a structure-aware loss function to optimize the PM
empowers LLMs to directly leverage node information for
by minimizing discrepancies between features and graphs.
prediction, without relying on extensive annotated data.
The work of [129] introduces OFA, a unified framework for
Therefore, researchers have explored employing LLMs to
classification tasks in graphs across different domains. OFA
generate annotations or predictions, alleviating dependence
collects nine text-attributed graph datasets covering diverse
on human supervision signals in Graph ML. According to
domains and represents nodes and relations in natural
how structural information in graph data is processed, we
language. LLMs are then employed to embed those cross-
categorize the methods into the following three categories:
domain graph information into the same embedding space.
• Ignoring structural information: utilize node attributes
Moreover, OFA proposes a graph prompting paradigm,
which incorporates a prompt graph containing downstream exclusively for constructing textual prompts, disregard-
ing neighboring labels and relations.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 10

• Implicit Structural information: describe neighbor in- to integrate the typical inductive bias of GNNs through the
formation and graph topology structure in natural construction of various graph syntax trees.
language;
• Explicit Structural information: employ GNN models to 4.2.3 Explicit Structural Information
encode graph structure.
While implicitly describing structure in natural language
4.2.1 Ignoring Structural Information has achieved preliminary success, these methods still face
certain limitations. Firstly, due to the constraint of input
The fundamental distinction between graphs and text lies
length, LLMs can only get local structural information, and
in the structural information inherent in graphs. Given that
lengthy contexts might diminish their reasoning [159] and
the LLM processes text as its input, an intuitive approach
instruction-following abilities [27]. Secondly, for different
involves leveraging the textual attributes of the target node,
tasks and datasets, substantial effort is often required for
disregarding the structural information within the graph,
prompt engineering. A prompt that performs well on one
and making predictions directly. For instance, the work of
dataset may not generalize effectively to others, resulting in
[130] explores the effectiveness of LLMs in solving graph
a lack of robustness. Consequently, researchers investigate
tasks without using structure information. In the citation
representing graph structure explicitly, typically comprising
network, they employ the article’s title and abstract to
three essential modules: encoding module, fusion module, and
construct a prompt and instruct the LLM to predict the
LLM module. More specifically, the encoding module aims to
article’s category. Since this kind of paradigm does not
process the graph-structured and textual information, gener-
incorporate the structural information of the graph, the actual
ating graph embeddings and text embeddings, respectively.
task performed by the LLM is text classification rather than
Afterward, the fusion module takes these two embeddings
a graph-related task.
as input, producing a modality fusion embedding. At last,
4.2.2 Implicit Structural Information the modality fusion embedding, which contains both graph
information and instruction information, is fed into the LLM
Researchers implicitly leverage structural information to
to obtain the final answer. Given the research focus is on
solve graph tasks by describing graph structure in natural
how LLMs explicitly utilize graph structure information, we
language. For example, Hu et al. [130] propose two kinds of
will delve into the encoding and fusion modules of various
methods for utilizing structural information. The first method
studies in detail, without primarily focusing on the LLM
involves directly inputting the data of all neighboring nodes
model itself.
into LLM, while the second method employs a retrieval-
Encoding Module. The encoding module is responsible for
based prompt to guide the LLM to focus solely on relevant
both graph and text encoding, and we will provide separate
neighbor data. Similarly, Huang et al. [137] employ an LLM to
summaries for each.
assign scores to neighboring nodes and subsequently choose
high-scoring nodes as structural information. NLGraph [131] • Graph Encoding. Pre-trained GNN models are commonly
introduces a Build-a-Graph prompting strategy to improve used for graph encoding. For instance, GIT-Mol [147]
the LLM’s understanding of graph structure. This strategy employs the GIN model from the pre-trained MoMu
entails appending “Let’s construct a graph with the nodes model [85] to encode molecular graphs. KoPA [145] utilizes
and edges first.” after providing the graph data description. the pre-trained RotateE model to obtain embeddings
The work of [21] introduces InstructGLM, which utilizes for entities and relations in the knowledge graph. In
natural language for graph description and fine-tunes Flan- addition, GIMLET [146] presents a unified graph-text
T5 through instruction tuning. They generate a set of 31 model without the need for additional graph encoding
prompts by combining four configuration parameters: task modules. Particularly, GIMLET proposes a distance-based
type, inclusion of node features, maximum hop order, and joint position embedding method, where the shortest graph
utilization of node connections. Notably, maximum hop order distance is utilized to represent the relative positions
and node connections implicitly convey graph structure between graph nodes, enabling the Transformer encoder to
information to the LLM. GraphEdit [141] leverages LLMs to encode both graph and text. GraphToken [152] evaluates a
understand graph structure and refine it by removing noisy series of GNN models as graph encoders, including GCN,
edges and uncovering implicit node connections. Specifically, MPNN [111], GIN, Graph Transformer, HGT [59], etc.
it employs an edge predictor to identify the top k candidate • Text Encoding. Due to the tremendous capability of LLMs in
edges for each node, and these candidate edges, along with understanding textual information, most existing methods,
the original edges of the graph, are then fed into the LLM. such as ProteinChat [149] and DrugChat [144], directly
The LLM is prompted to determine which edges should be employ LLMs as text encoders. In GraphLLM [142],
integrated into the final graph structure. the tokenizer and frozen embedding table of LLM are
In addition to employing natural language expres- leveraged to obtain the representation of node text
sion, several researchers leverage structured languages for attributes, aligning with the downstream frozen LLM.
graph description. GPT4Graph [22], for instance, utilizes Fusion Module. The goal of the fusion module is to align the
Graph Modelling Language [157] and Graph Markup graph and text modalities, generating a fusion embedding
Language [158] to represent graph structure in XML format. as input for the LLM. To achieve the goal, a straightforward
GraphText [29] constructs a graph syntax tree for each graph, solution is to design a linear projection layer to directly
containing node attributes and relations information. By transform the graph representation generated by GNN into
traversing this tree, structural graph-text sequences can be an LLM-compatible soft prompt vector [144], [145], [148].
generated. The advantage of GraphText lies in the ability Additionally, inspired by BLIP2’s Q-Former [160], [147]
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 11

Graph
1 7 Graph description:
+
1 7
Node Count Answer:
4 2 6 Node1 connects to Node2,
6 LLM There are 7 nodes in
4 2 3 5 Node2 connects to Node1,
this graph.
Node3, Node 5...
3 5
Structure Understanding Implicit Structural Information

Node Count Edge Count

connectivity ... Node Answer:


LLM
Classification Fusion Node1 belongs to A
Semantic Understanding LLM
1 7 Layer class, Node2 belongs
Node Classification 6
4 2 GNN to B class...
Link Prediction ... 3 5

Task-related Instruction Explicit Structural Information

Figure 5: The illustration of employing LLMs with implicit and explicit structural information. (1) Methods leveraging
implicit structural information describe nodes and graph structure information in natural language and combine task-specific
instructions to form a textual prompt, which is then input into the LLM to generate prediction results. (2) Methods employing
explicit structural information use GNNs and LLMs to encode graph and instruction information separately. Then, fusion
layers are added to align the graph and text modalities, and the fused embedding is input into the LLM for prediction.

propose a GIT-Former, which aligns graph, image, and text demonstrate that LLMs exhibit promising performances in
with the target text modality using self-attention and cross- addressing OOD generalization issues. OpenGraph [153]
attention mechanisms. aims at solving zero-shot graph tasks across different
In addition to the above methods, G-Retriever is proposed domains. In this model, LLMs are leveraged to generate
to integrate both explicit and implicit structural information synthetic graphs of data scarcity scenarios, thereby enhancing
[151]. To be specific, GAT is employed to encode the graph the pre-training process of OpenGraph.
structure, while representing node and relationship details
through textual prompts. To accommodate real-world graphs
5 G RAPHS FOR LLM S
with larger scales, G-Retriever introduces a RAG module
specifically designed for retrieving subgraphs relevant to LLMs have demonstrated impressive language generation
user queries. and understanding capabilities across various domains. Nev-
ertheless, they still face several pressing challenges, including
4.3 Heterophily and Generalization factuality awareness, hallucinations, limited explainability
in the reasoning process, and beyond. To alleviate these
Despite achieving promising performance in graph tasks,
issues, one potential approach is to take advantage of the
GNNs exhibit several shortcomings. A notable drawback
Knowledge Graphs (KGs), which store high-quality, human-
involves the inadequacy of the neighbor information aggrega-
curated factual knowledge in a structured format [5]. Recent
tion mechanism, especially when dealing with heterogeneous
reviews [30], [162]–[164] have summarized the research
graphs. GNN performance notably diminishes when faced
on using KGs to enhance LMs. Hu et al. [162] present
with instances where adjacent nodes lack similarity. Addi-
a review on knowledge-enhanced pre-training language
tionally, GNN encounters challenges in out-of-distribution
models for natural language understanding and natural
(OOD) generalization, leading to a degradation in model
language generation. Agrawal et al. [163] systematically
performance on distributions beyond the training data. This
review research on mitigating hallucination in LLMs by
challenge is particularly prevalent in practical applications,
leveraging KGs across three dimensions: inference process,
primarily due to the inherent difficulty of encompassing
learning algorithm, and answer validation. Pan et al. [164]
all possible graph structures within limited training data.
provides a comprehensive summary of the integration of KGs
Consequently, when GNNs infer on unseen graph structures,
and LLMs from three distinct perspectives: KG-enhanced
their performance may experience a substantial decline. This
LLMs, LLM-augmented KGs, and the synergized LLMs and
reduced generalization capability renders GNNs relatively
KGs, where LLMs and KGs mutually reinforce each other. In
fragile when confronted with evolving graph data in
this section, we will delve into relevant research that explores
real-world scenarios. For example, GNNs may encounter
the usage of KGs to achieve knowledge-enhanced language
difficulties handling newly emergent social relationships in
model pre-training, mitigate hallucinations, and improve
social networks.
inference explainability.
LLMs have been utilized to mitigate the above limitations.
In particular, GraphText [29] effectively decouples depth and
scope by encapsulating node attributes and relationships 5.1 KG-enhanced LLM Pre-training
in the graph syntax tree. This approach yields superior While LLMs excel in text understanding and generation, they
results compared to the GNN baseline, particularly on may still produce grammatically accurate yet factually incor-
heterogeneous graphs. Chen et al. [27] investigate the LLM’s rect information. Explicitly incorporating knowledge from
ability to handle OOD generalization scenarios. They utilize KGs during LLM pre-training holds promise for augmenting
the GOOD [161] benchmark as the criterion and results LLM’s learning capacity and factual awareness [165]–[167].
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 12

In this subsection, we will outline the research advancements hidden representations of PLMs with retrieved knowledge
in KG-enhanced pre-trained language models (PLMs). While representations. To further control the activation levels
there is limited work on KG-enhanced pre-training for LLMs, of adapters, DAKI [177] incorporates an attention-based
research on KG-enhanced PLMs can offer insights for LLM knowledge controller module, which is an adapter module
pretraining. Existing KG-enhanced pre-training methods can with additional linear layers.
be classified into three main categories: modifying input data,
modifying model structures, and modifying pre-training 5.1.3 Modifying Pre-training Tasks
tasks. To explicitly model the interactions between text and KG
knowledge, various pre-training tasks are proposed. Three
5.1.1 Modifying Input Data
major lines of work in this direction include entity-centric
Several researchers investigate integrating KG knowledge by tasks [172], [178]–[181], relation-centric tasks [165], and
modifying input data while keeping the model architecture beyond.
unchanged. For instance, Moiseev et al. [168] directly train For entity-centric tasks, ERNIE [172] randomly masks
PLMs on mixed corpora consisting of factual triples from some token-entity alignments and then requires the model
KGs and natural language texts. E-BERT [169] aligns entity to predict all corresponding entities based on aligned tokens.
vectors with BERT’s wordpiece vector space, preserving the LUKE [178] uses Wikipedia articles as training corpora and
structure and refraining from additional pre-training tasks. treats hyperlinks within them as entity annotations, training
KALM [170] utilizes an entity-name dictionary to identify the model to predict randomly masked entities. KILM [179]
entities within sentences and employs an entity tokenizer to also utilizes hyperlinks in Wikipedia articles as entities.
tokenize them. The input of the Transformer consists of the However, it inserts entity descriptions after corresponding
original word embeddings and entity embeddings. Moreover, entities, tasking the model with reconstructing the masked
K-BERT [171] integrates the original sentence with relevant description tokens rather than directly masking entities. In
triples by constructing a sentence tree, where the trunk addition to predicting masked entities, GLM [180] further
represents the original sentence and the branches represent introduces a distractor-suppressed ranking task. This task
the triples. To convert the sentence tree into model input, leverages negative entity samples from KGs as distractors,
K-BERT introduces both a hard-position index and a soft- enhancing the model’s ability to distinguish various entities.
position index within the embedding layer to differentiate Relation-centric tasks are also commonly utilized in KG-
between original tokens and triple tokens. enhanced PLMs. For instance, JAKET [182] proposes relation
prediction and entity category prediction tasks for enhancing
5.1.2 Modifying Model Structures
knowledge modeling. Dragon [183] is pre-trained in a KG
Some research designs knowledge-specific encoders or fusion link prediction task. Given a text-KG pair, the model needs
modules to better inject knowledge into PLMs. ERNIE [172] to predict the masked relations in KG and the masked tokens
introduces a K-Encoder to inject knowledge into represen- in the sentence. ERICA [184] introduces a relation discrimina-
tations. This involves feeding token embeddings and the tion task aiming at semantically distinguishing the proximity
concatenation of token embeddings and entity embeddings between two relations. Specifically, it adopts a contrastive
into a fusion layer for generating new token embeddings learning manner, wherein the relation representations of
and entity embeddings. In contrast, CokeBERT [173] extends entity pairs belonging to the same relations are encouraged
this approach by incorporating relation information from to be closer.
KGs during pre-training. It introduces a semantic-driven Additionally, there are several innovative pre-training
GNN model to assign relevant scores to relations and tasks for KG-enhanced pre-training. KEPLER [185] pro-
entities based on the given text. Finally, it fuses the selected poses a knowledge embedding task to enhance knowledge-
relations and entities with text using a K-Encoder similar awareness of PLMs. Specifically, it uses PLMs to encode
to ERNIE. KLMO [174] propose Knowledge Aggregator to entity descriptions as entity embeddings and jointly train
fuse text modality and KG modality during pre-training. To the knowledge embedding and masked language modeling
incorporate the structural information in KG embeddings, tasks on the same PLM. ERNIE 2.0 [186] constructs a series
KLMO utilizes KG attention, which integrates a visibility of continuous pre-training tasks from word, structure, and
matrix with a conventional attention mechanism, facilitating semantic perspectives.
interaction among adjacent entities and relations within the
KG. Subsequently, the token embeddings and contextual
KG embeddings are aggregated with entity-level cross-KG 5.2 KG-enhanced LLM Inference
attention. Knowledge within KGs can be dynamically updated,
Several studies refrain from modifying the overall whereas updating the knowledge in LLMs often necessitates
structures of the language model but introduce additional adjustment of model parameters, which demands substantial
adapters to inject knowledge. To preserve the original computational resources and time. Therefore, many studies
knowledge within PLMs, Wang et al. [175] propose K- opt to utilize KGs during LLMs inference stage. The
Adapter as a pluggable module to leverage KG knowledge. “black-box” nature of LLMs poses a significant challenge in
During pre-training, the parameters of the K-Adapter understanding how the model made a specific prediction
are updated while the parameters of the PLMs remain or generated a specific text. Additionally, LLMs have often
frozen. KALA [176] introduces a Knowledge-conditioned been criticized for generating false, erroneous, or misleading
Feature Modulation layer, which functions similarly to an content, typically referred to as hallucination [31], [32],
adapter module, by scaling and shifting the intermediate [187]. Given the structured and fact-based nature of KGs,
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 13

integrating them during the inference stage can enhance the propose an explainer model to explain why LLMs choose a
explainability of LLM answers and consequently mitigate particular answer while rejecting others. Specifically, the
hallucinations. approach involves constructing an element graph based
While several methods extract relevant triples from KGs on the entities present in the question and the candidate
based on user queries and describe these triples in natural answers. Subsequently, a GCN model is employed to assign
language within prompts [188], [189], these approaches attention scores to each node within the element graph.
overlook the structured information inherent in KGs, and Nodes exhibiting high attention scores are identified as
still fail to elucidate how LLMs arrive at their answers. reason elements, and LLMs are then prompted to provide
Consequently, extensive studies utilize KGs to aid LLMs explanations based on these selected reason elements.
in reasoning and generate intermediary information like To assess the transparency and interpretability of LLMs,
relation paths, evidence subgraphs, and rationales, forming various benchmarks have been proposed. For example,
the basis for explaining the LLM’s decision-making process Li et al. [38] introduce a novel task named Knowledge-
and checking for hallucinations [35], [37], [38], [190]–[192]. aware Language Model Attribution (KaLMA) and develop a
Several researchers investigate enabling LLMs to directly corresponding benchmark dataset. This benchmark evaluates
reason on KGs and generate relation paths to interpret LLM’s the LLM’s capability to derive citation information from KG
answers. The reasoning path at each step helps to enhance to support its answers. KaLMA also provides an automatic
the explainability of the answer and the transparency of evaluation covering aspects such as text quality, citation
the reasoning process. Through observing the reasoning quality, and text-citation alignment of the answers. In
decisions made at each step, it becomes possible to identify addition, XplainLLM [194] introduces a dataset for better
and address hallucinations arising from LLMs’ reasoning. understanding LLMs’ decision-making from the perspectives
Both RoG [35], Knowledge Solver [191], and Keqing [36] of “why-choose” and “why-not-choose”.
employ relation paths as explanations for LLM’s responses.
Specifically, given the KG schema and user query, RoG [35]
guides LLMs to predict multiple relation paths using textual 6 A PPLICATIONS
prompts like “Please generate helpful relation paths for In this section, we will present practical applications that
answering the question”. Subsequently, LLMs generate demonstrate the potential and value of GFMs and LLMs.
the final answer based on the retrieving results of the As shown in Table 2, recommender systems, knowledge
valid relation path. Conversely, the Knowledge Solver graphs, AI for science, and robot task planning emerge as the
method [191] differs in that it enables LLMs to generate the most prevalent domains. We will provide a comprehensive
relation path step by step. Keqing [36] initially decomposes summary of each of these applications.
complex questions into several sub-questions, each of which
can be addressed by pre-defined logical chains on KGs,
and then LLMs will generate final answers with relation 6.1 Recommender Systems
paths based on the answers of sub-questions. Mindmap [190] Recommender systems leverage user historical behaviors
uses evident subgraphs to explain the answers generated by to predict items that users are likely to appreciate [195]–
LLMs, where path-based and neighbor-based methods are [197]. Graphs play a crucial role in recommender systems,
introduced to obtain several evident subgraphs. The LLM wherein items can be regarded as nodes and collaborative
in Mindmap is prompted to merge these evident subgraphs, behaviors such as clicks and purchases can be viewed as
utilizing the merged graph to generate the final answer. edges. Recently, an increasing amount of research is exploring
In contrast to previous methods which involve gradually the use of LLMs for direct recommendation [198]–[201] or
retrieving knowledge and obtaining answers, KGR [37] takes leveraging LLMs to enhance graph models or datasets for
a different approach. Initially, the LLM directly generates a recommendation tasks [120], [123], [124], [202], [203].
draft answer. Subsequently, it extracts the claims requiring For directly using LLMs as recommendation models,
verification from this answer and retrieves KG’s information liu et al. [204] construct task-specific prompts to evaluate
to correct claims with hallucinations. Based on the corrected ChatGPT’s performance on five common recommendation
claims, the LLM adjusts the draft answer to get the final tasks, encompassing rating prediction, sequential recom-
answer. mendation, direct recommendation, explanation generation,
The above research employs relation paths or evident and review summarization. Bao et al. [205] employ prompt
graphs as the basis for explaining the LLM’s decision-making templates to guide LLM to decide whether the user will
process and checking hallucinations. In contrast, several like the target item based on their historical interactions
research explore using inherently interpretable models rather and perform instruction tuning on the LLM to improve its
than LLMs to make final predictions. ChatGraph [193] recommendation capability.
presents an innovative approach to enhance both the text For using LLMs to enhance traditional recommendation
classification capabilities and explainability of ChatGPT. It methods or datasets, KAR [202] leverages LLMs to generate
utilizes ChatGPT to extract triples from unstructured text and factual knowledge of items and reasoning basis of user
subsequently constructs KGs based on these triples. To ensure preferences; these knowledge texts are then encoded into
the explainability of the classified results, ChatGraph avoids vectors and integrated into existing recommendation models.
employing LLMs directly for predictions. Instead, it leverages Methods like LLM-Rec [125], RLMRec [123], and LLM-
a graph model without non-linear activation functions and Rec [124] enrich recommendation datasets by incorporating
trains the model on text graphs to get predictions. Given LLM-generated descriptions. In contrast, Wu et al. [203]
a question and a list of possible answers, XplainLLM [194] utilize LLMs to condense recommendation datasets, in which
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 14

LLMs are employed to synthesize a condensed dataset for applications encompass scenarios involving graph-structured
the content-based recommendation, aiming at addressing the data.
challenge of resource-intensive training on large datasets. The molecular graph is a way of representing molecules,
While the previously discussed methods have explored where the nodes represent atoms, and the edges represent
utilizing LLMs for certain recommendation tasks or domains, the bonds between the atoms. With the emergence of LLMs,
an emerging research direction aims to develop foundation researchers have explored their performance in tasks related
models for recommendation. Tang et al. [199] propose to molecular graphs. Methods like MolReGPT [139] and
an LLM-based domain-agnostic framework for sequential GPT-MolBERTa [126] adopt a similar approach, converting
recommendation. Their approach integrates user behavior molecular graphs into textual descriptions using SMILES
across domains, and leverages LLMs to model user behaviors language. They create prompts based on SMILES data, asking
based on multi-domain historical interactions and item the LLM to provide detailed information about functional
titles. Hua et al. [206] attempt to address the potential groups, shapes, chemical properties, etc. This information
unfairness of recommender systems introduced by LLM bias. is then used to train a smaller LM for molecular property
They propose a Counterfactually Fair Prompting method to prediction. In contrast to methods directly using LLMs for
develop an unbiased foundation model for recommendation. prediction, ReLM [136] first uses GNNs to predict high-
To summarize the progress in the area of recommendation probability candidate products, and then leverages LLMs to
foundation model, Huang et al. [207] provide a systematic make the final selection from these candidates.
overview of the existing approaches, categorizing them into In addition to the above research, LLMs are further
three main types: language foundation models, personalized utilized in drug discovery and materials design. Bran et
agent foundation models, and multi-modal foundation al. [105] present ChemCrow, a chemistry agent integrating
models. LLMs and 18 specialized tools for diverse tasks across
drug discovery, materials design, and organic synthesis.
6.2 Knowledge Graphs InstructMol [218] presents a two-stage framework for
aligning language and molecule graph modalities in drug
LLMs with robust text generation and language understand-
discovery. Initially, the framework maintains the LLM and
ing capabilities have found extensive applications in KG-
graph encoder parameters fixed, focusing on training the pro-
related tasks, including KG completion [145], [208], [209],
jector to align molecule-graph representations. Subsequently,
KG question answering [189], [191], [210]–[212], KG reason-
instruction tuning is conducted on the LLM to address drug
ing [213] and beyond. Meyer et al. [214] introduce LLM-KG-
discovery tasks. Zhao et al. [219] propose ChemDFM, the
Bench, a framework that automatically evaluates the model’s
first dialogue foundation model for chemistry. Trained on
proficiency in KG engineering tasks such as fixing errors
extensive chemistry literature and general data, ChemDFM
in Turtle files, facts extraction, and dataset generation. KG-
exhibits proficiency in various chemistry tasks such as
LLM [209] is proposed to evaluate LLMs’ performance on KG
molecular recognition, molecular design, and beyond.
completion, including triple classification, relation prediction,
and link prediction tasks. Kim et al. [210] propose KG-GPT,
using LLMs for complex reasoning tasks on knowledge 6.4 Robot Task Planning
graphs. ChatKBQA [211] introduces a generate-then-retrieve
Robot task planning aims to decompose tasks into a series
framework for LLMs on knowledge base question answering.
of high-level operations for the step-by-step completion by
Wu et al. [189] present a KG-enhanced LLM framework for
a robot [220]. During task execution, the robot needs to
KG question answering, which involves fine-tuning an LLM
perceive information about the surrounding environment,
to convert structured triples into free-form text, enhancing
typically represented using scene graphs. In a scene graph,
LLMs’ understanding of KG data. The successful application
nodes represent scene objects like people and tables, while
of LLMs in tasks such as KG construction, completion, and
edges describe the spatial or functional relationships between
question answering offers robust support for advancing the
objects. Enabling LLMs for robot task planning crucially
understanding and exploration of KGs.
depends on how to represent the environmental information
Drawing inspiration from foundation models in language
in the scene graph.
and vision, researchers are delving into the development
Many studies have explored using textual descriptions
of foundation models tailored for KGs. These GFMs aim to
of scene information and constructing prompts for LLMs
generalize to any unseen relations and entities within KGs.
to generate task plans. Chalvatzaki et al. [221] introduce
Galkin et al. [215] propose Ultra, which learns universal
the Graph2NL mapping table, representing attributes with
graph representations by leveraging interactions between
different numerical ranges using corresponding textual
relations. This study is based on the insight that those
expressions. For instance, a distance value greater than 5 is
interactions remain similar and transferable across different
represented as “distant”, and smaller than 3 is represented as
datasets.
“reachable”. SayPlan [222] describes the scene graph in JSON
as a text sequence, iteratively invoking LLM to generate plans
6.3 AI for Science and allowing for self-correction. Zhen et al. [223] propose an
The rapid advancement of AI has led to an increasing number effective prompt template, Think Net Prompt, to enhance
of studies leveraging AI to assist scientific research [216], LLM performance in task planning. In contrast to methods
[217]. Recent research has applied LLMs and GFMs for that rely on language to describe scene graph information,
scientific purposes, such as drug discovery, molecular GRID [121] employs the graph transformer to encode scene
property prediction, and material design. Notably, these graphs. It utilizes cross-modal attention to align the graph
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 15

modality with user instruction, ultimately outputting action 7.3 Trustworthiness


tokens through a decoder layer. The powerful understanding The recent applications of LLMs for Graph ML have
and reasoning capabilities of LLMs showcase significant significantly enhanced graph modeling capabilities and
potential in robot task planning. However, as task complexity broadened their utility in various fields. Despite these
increases, the search space explosively expands, posing a advancements, with the growing reliance on these models, it
challenge in efficiently generating viable task plans with is important to ensure their trustworthiness, particularly in
LLMs. critical areas like healthcare, finance, and social network
analysis [224], [225]. Robustness is fundamental in safe-
guarding the models against adversarial attacks, ensuring
7 F UTURE D IRECTIONS
consistent reliability. Explainability is essential for users to
In this survey, we have thoroughly reviewed the latest understand and trust the decisions made by these models.
developments of Graph Machine Learning in the era of LLMs, Fairness is crucial for the model’s ethical and effective
revealing significant advancements and potential in this field. use in various applications. Privacy is important for legal
By harnessing the power of LLMs, it is potential to enhance compliance and key to maintaining user trust. Therefore,
Graph ML to enable GFMs. As this research direction is still the development of trustworthy LLMs on graphs must be
in the exploratory stage, future directions in this field can be equipped with Robustness&Safety, Explainability, Fairness, and
diverse and innovative. Therefore, in this section, we delve Privacy, ensuring their safe and effective use in various
into several potential future directions of this promising field. applications.

7.3.1 Robustness&Safety
7.1 Generalization and Transferability
Recently, integrating LLMs into Graph ML has shown
While Graph ML has deployed for various graph tasks, a
promising performance in various downstream tasks, but
notable problem is their limited capacity for generalization
they are also highly vulnerable to adversarial perturbations,
and transferability across different graph domains [40].
raising significant concerns about their robustness and
Different from fields such as NLP and CV, where data often
safety. To enhance the resilience of these models, some
adhere to a uniform format (e.g., a sequence of tokens or
studies add adversarial perturbations to GNNs [226], [227]
a grid of pixels), graphs can be highly heterogeneous in
or LLMs [228], [229] for adversarial training. However, these
nature. This heterogeneity manifests in varying graph sizes,
methods may not be effective for the new paradigm of
densities, and types of nodes and edges, which presents a
Graph ML integrating LLMs, as vulnerabilities can arise
significant challenge in developing a universal model capable
from both graphs, such as graph poisoning attacks [230],
of performing optimally across various graph structure
[231] and graph modification attacks [232], [233], and from
data. Currently, LLMs have demonstrated great potentials
the language model, like prompt attacks [234] and misleading
in improving the generalization ability of the graph model.
text data [235]. To address these issues, more sophisticated
For example, OFA [129] provides a solution for classification
detection and defense mechanisms need to be developed
tasks across several certain domains. Nevertheless, there
by considering both the intricacies of LLMs and graphs to
is still scarce exploration of the generalizability of GFMs
ensure the comprehensive safety and robustness of Graph
compared to LLMs. Therefore, future research should aim
ML.
to develop more adaptable and flexible models that can
effectively apply learned patterns from one graph type, such
7.3.2 Explainability
as social networks, to another, like molecular structures,
without extensive retraining. Nowadays, LLMs are increasingly employed in Graph
ML across various applications, such as recommender sys-
tems [15], [236] and molecular discovery [85], [139]. However,
7.2 Multi-modal Graph Learning due to privacy and security concerns, an application provider
Recent LLMs have shown significant potential in advancing may prefer to provide an API version without revealing
GFMs. Many efforts have been made to transform graph the architecture and parameters of the LLM, such as with
data into formats suitable for LLM input, such as tokens or ChatGPT. This lack of transparency can make it challenging
text [27], [84], [131]. However, many nodes in graphs are for users to understand the model’s results, leading to
enriched with diverse modalities of information, including confusion and dissatisfaction. Therefore, it’s important to
text, images, and videos. Understanding this multi-modal enhance the explainability of Graph ML, especially with
data can potentially benefit graph learning. For example, on LLMs. Owing to their reasoning and interpretive capabilities,
social media platforms, a user’s post could include textual LLMs are promising to provide better explainability in graph-
content, images, and videos, all of which are valuable for related tasks. For example, P5 [236] can provide reasons
comprehensive user modeling. Given the importance of for its recommendations in recommendation tasks. Future
multi-modal data, a promising direction for future research efforts should be directed toward making the inner workings
is to empower LLMs to process and integrate graph structure of these models more transparent and explainable to better
with multi-modal data. Currently, TOUCHUP-G [86] makes comprehend their decision-making processes.
an initial exploration of multi-modal ( i.e., texts, images) in
graph learning. In the future, we expect the development of 7.3.3 Fairness
a unified model capable of modeling universal modalities As LLMs become prevalent in enhancing Graph ML to-
for more advanced GFMs. wards GFMs, concerns about their fairness are growing.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 16

Fairness is crucial to ensure these models operate without their ability to enhance LLM pre-training and inference.
biases or discrimination, especially when dealing with Additionally, we demonstrate their potential in diverse
complex, interconnected graph data [225]. Recent studies applications such as molecule discovery, knowledge graphs,
demonstrate that both language models [237], [238] and and recommender systems. Despite their success, this field
GNN models [239] can potentially be discriminatory and is still evolving and presents numerous opportunities for
unfair [42]. Therefore, it is necessary to maintain fairness in further advancements. Therefore, we further discuss several
both textual and graph contexts. To enhance the fairness of challenges and potential future directions. Overall, our
LLMs, recent studies include retraining strategies that adjust survey aims to provide a systematic and comprehensive
model parameters for unbiased outputs [240], implementing review to researchers and practitioners, inspiring future
alignment constraints [241], and adopting contrastive learn- explorations in this promising field.
ing to diminish bias in model training [242]. Concurrently,
studies like FairNeg [239] also explore improving the fairness R EFERENCES
of recommendation data. Despite these efforts, achieving
[1] H. Mao, J. Li, H. Shomer, B. Li, W. Fan, Y. Ma, T. Zhao, N. Shah,
fairness in GFMs is still a significant challenge that needs and J. Tang, “Revisiting link prediction: A data perspective,” arXiv
further exploration. preprint arXiv:2310.00793, 2023.
[2] M. Hashemi, S. Gong, J. Ni, W. Fan, B. A. Prakash, and W. Jin,
“A comprehensive survey on graph reduction: Sparsification,
7.3.4 Privacy coarsening, and condensation,” arXiv preprint arXiv:2402.03358,
Privacy is a critical issue in Graph ML, particularly given 2024.
[3] W. Fan, X. Zhao, Q. Li, T. Derr, Y. Ma, H. Liu, J. Wang,
the risk of these models inadvertently leaking sensitive and J. Tang, “Adversarial attacks for black-box recommender
information contained in graph data [243]–[245]. For example, systems via copying transferable cross-domain user profiles,” IEEE
Graph ML integrated with LLMs could potentially expose Transactions on Knowledge and Data Engineering, 2023.
private user data, like browsing histories or social connec- [4] J. Wu, W. Fan, J. Chen, S. Liu, Q. Li, and K. Tang, “Disentangled
contrastive learning for social recommendation,” in Proceedings of
tions when generating outputs. This concern is especially the 31st ACM International Conference on Information & Knowledge
pressing in highly data-sensitive areas such as healthcare Management, 2022, pp. 4570–4574.
or finance. To mitigate these privacy risks, [246] introduces [5] J. Chen, W. Fan, G. Zhu, X. Zhao, C. Yuan, Q. Li, and Y. Huang,
“Knowledge-enhanced black-box attacks for recommendations,”
Privacy-Preserving Prompt Tuning (RAPT) to protect user
in Proceedings of the 28th ACM SIGKDD Conference on Knowledge
privacy through local differential privacy. Future exploration Discovery and Data Mining, 2022, pp. 108–117.
in LLM-enhanced Graph ML should also focus on integrating [6] T. Derr, Y. Ma, W. Fan, X. Liu, C. Aggarwal, and J. Tang,
privacy-preserving technologies like differential privacy and “Epidemic graph convolutional network,” in Proceedings of the 13th
International Conference on Web Search and Data Mining (WSDM),
federated learning to strengthen data security and user 2020, pp. 160–168.
privacy. [7] Y. Ma and J. Tang, Deep learning on graphs. Cambridge University
Press, 2021.
[8] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip,
7.4 Efficiency “A comprehensive survey on graph neural networks,” IEEE
transactions on neural networks and learning systems, vol. 32, no. 1,
While LLMs have proven effective in constructing GFMs, pp. 4–24, 2020.
their operational efficiency, particularly in processing large [9] J. Zhang, R. Xue, W. Fan, X. Xu, Q. Li, J. Pei, and X. Liu, “Linear-
and complex graphs, is still a significant challenge [247]. time graph neural networks for scalable recommendations,” arXiv
preprint arXiv:2402.13973, 2024.
For example, the use of APIs like GPT4 for large-scale [10] Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen, “Graph
graph tasks can lead to high costs under current billing contrastive learning with augmentations,” Advances in neural
models. Additionally, deploying open-source large models information processing systems, vol. 33, pp. 5812–5823, 2020.
(e.g., LLaMa) for parameter updates or just inference in local [11] J. Zeng and P. Xie, “Contrastive self-supervised learning for graph
classification,” in Proceedings of the AAAI conference on Artificial
environments demands substantial computational resources Intelligence, vol. 35, no. 12, 2021, pp. 10 824–10 832.
and storage. Therefore, enhancing the efficiency of LLMs [12] X. Xu, F. Zhou, K. Zhang, and S. Liu, “Ccgl: Contrastive
for graph tasks remains a critical issue. Recent studies cascade graph learning,” IEEE Transactions on Knowledge and Data
Engineering, vol. 35, no. 5, pp. 4539–4554, 2022.
introduce techniques like LoRA [156] and QLoRA [248] [13] J. Qiu, Q. Chen, Y. Dong, J. Zhang, H. Yang, M. Ding, K. Wang,
to fine-tune LLM parameters more efficiently. Furthermore, and J. Tang, “Gcc: Graph contrastive coding for graph neural
model pruning [249], [250] is also a promising method to network pre-training,” in Proceedings of the 26th ACM SIGKDD
increase efficiency by removing redundant parameters or international conference on knowledge discovery & data mining, 2020,
pp. 1150–1160.
structures from LLMs, thereby simplifying their application [14] A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al.,
in graph machine learning. “Improving language understanding by generative pre-training,”
2018.
[15] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert:
8 C ONCLUSION Pre-training of deep bidirectional transformers for language
understanding,” arXiv preprint arXiv:1810.04805, 2018.
In this survey, we have thoroughly reviewed the recent [16] L. Zhou, H. Palangi, L. Zhang, H. Hu, J. Corso, and J. Gao,
progress of graph applications and Graph ML in the era “Unified vision-language pre-training for image captioning and
vqa,” in Proceedings of the AAAI conference on artificial intelligence,
of LLMs, an emerging field in graph learning. We first vol. 34, no. 07, 2020, pp. 13 041–13 049.
review the evolution of Graph ML, and then delve into [17] W. Fan, Z. Zhao, J. Li, Y. Liu, X. Mei, Y. Wang, Z. Wen, F. Wang,
various methods of LLMs enhancing Graph ML. Due to the X. Zhao, J. Tang, and Q. Li, “Recommender systems in the era of
remarkable capabilities in various fields, LLMs have great large language models (llms),” Aug. 2023.
[18] Y. Li, P. Wang, Z. Li, J. X. Yu, and J. Li, “Zerog: Investigating
potential to enhance Graph ML towards GFMs. We further cross-dataset zero-shot transferability in graphs,” arXiv preprint
explore the augmenting of LLMs with graphs, highlighting arXiv:2402.11235, 2024.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 17

[19] H. Mao, Z. Chen, W. Tang, J. Zhao, Y. Ma, T. Zhao, N. Shah, [43] J. Liu, C. Yang, Z. Lu, J. Chen, Y. Li, M. Zhang, T. Bai, Y. Fang,
M. Galkin, and J. Tang, “Graph foundation models,” arXiv preprint L. Sun, P. S. Yu et al., “Towards graph foundation models: A survey
arXiv:2402.02216, 2024. and beyond,” arXiv preprint arXiv:2310.11829, 2023.
[20] R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von [44] L. Wang, W. Fan, J. Li, Y. Ma, and Q. Li, “Fast graph condensation
Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill et al., “On with structure-based neural tangent kernel,” arXiv preprint
the opportunities and risks of foundation models,” arXiv preprint arXiv:2310.11046, 2023.
arXiv:2108.07258, 2021. [45] W. Fan, S. Wang, X.-y. Wei, X. Mei, and Q. Li, “Untargeted
[21] R. Ye, C. Zhang, R. Wang, S. Xu, and Y. Zhang, “Natural language black-box attacks for social recommendations,” arXiv preprint
is all a graph needs,” Aug. 2023. arXiv:2311.07127, 2023.
[22] J. Guo, L. Du, H. Liu, M. Zhou, X. He, and S. Han, “Gpt4graph: [46] M. Tsubaki, K. Tomii, and J. Sese, “Compound–protein interaction
Can large language models understand graph structured data ? prediction with end-to-end learning of neural networks for graphs
an empirical evaluation and benchmarking,” Jul. 2023. and sequences,” Bioinformatics, vol. 35, no. 2, pp. 309–318, 2019.
[23] J. Tang, Y. Yang, W. Wei, L. Shi, L. Su, S. Cheng, D. Yin, and [47] W. Fan, Y. Ma, Q. Li, Y. He, E. Zhao, J. Tang, and D. Yin, “Graph
C. Huang, “Graphgpt: Graph instruction tuning for large language neural networks for social recommendation,” in The world wide
models,” Oct. 2023. web conference, 2019, pp. 417–426.
[24] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, [48] W. Fan, Y. Ma, Q. Li, J. Wang, G. Cai, J. Tang, and D. Yin, “A graph
T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “Llama: neural network framework for social recommendations,” IEEE
Open and efficient foundation language models,” arXiv preprint Transactions on Knowledge and Data Engineering, 2020.
arXiv:2302.13971, 2023. [49] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line:
[25] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Large-scale information network embedding,” in Proceedings of the
Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer 24th international conference on world wide web, 2015, pp. 1067–1077.
learning with a unified text-to-text transformer,” The Journal of [50] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning
Machine Learning Research, vol. 21, no. 1, pp. 5485–5551, 2020. of social representations,” in Proceedings of the 20th ACM SIGKDD
[26] K. Duan, Q. Liu, T.-S. Chua, S. Yan, W. T. Ooi, Q. Xie, and J. He, international conference on Knowledge discovery and data mining, 2014,
“Simteg: A frustratingly simple approach improves textual graph pp. 701–710.
learning,” arXiv preprint arXiv:2308.02565, 2023. [51] A. Grover and J. Leskovec, “node2vec: Scalable feature learning
[27] Z. Chen, H. Mao, H. Li, W. Jin, H. Wen, X. Wei, S. Wang, D. Yin, for networks,” in Proceedings of the 22nd ACM SIGKDD international
W. Fan, H. Liu, and J. Tang, “Exploring the potential of large conference on Knowledge discovery and data mining, 2016, pp. 855–864.
language models (llms) in learning on graphs,” Aug. 2023. [52] T. N. Kipf and M. Welling, “Semi-supervised classification with
[28] E. Chien, W.-C. Chang, C.-J. Hsieh, H.-F. Yu, J. Zhang, graph convolutional networks,” arXiv preprint arXiv:1609.02907,
O. Milenkovic, and I. S. Dhillon, “Node feature extraction by self- 2016.
supervised multi-scale neighborhood prediction,” arXiv preprint [53] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation
arXiv:2111.00064, 2021. learning on large graphs,” Advances in neural information processing
[29] J. Zhao, L. Zhuo, Y. Shen, M. Qu, K. Liu, M. Bronstein, Z. Zhu, and systems, vol. 30, 2017.
J. Tang, “Graphtext: Graph reasoning in text space,” Oct. 2023. [54] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio,
[30] Y. Ding, W. Fan, L. Ning, S. Wang, H. Li, D. Yin, T.-S. Chua, and and Y. Bengio, “Graph attention networks,” arXiv preprint
Q. Li, “A survey on rag meets llms: Towards retrieval-augmented arXiv:1710.10903, 2017.
large language models,” arXiv preprint arXiv:2405.06211, 2024. [55] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N.
[31] R. L. Logan IV, N. F. Liu, M. E. Peters, M. Gardner, and S. Singh, Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”
“Barack’s wife hillary: Using knowledge-graphs for fact-aware Advances in neural information processing systems, vol. 30, 2017.
language modeling,” arXiv preprint arXiv:1906.07241, 2019. [56] Y. Li, X. Liang, Z. Hu, Y. Chen, and E. P. Xing, “Graph transformer,”
[32] S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, 2018.
E. Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lundberg et al., “Sparks of [57] S. Yun, M. Jeong, R. Kim, J. Kang, and H. J. Kim, “Graph
artificial general intelligence: Early experiments with gpt-4,” arXiv transformer networks,” Advances in neural information processing
preprint arXiv:2303.12712, 2023. systems, vol. 32, 2019.
[33] H. Zhao, H. Chen, F. Yang, N. Liu, H. Deng, H. Cai, S. Wang, [58] J. Baek, M. Kang, and S. J. Hwang, “Accurate learning of graph
D. Yin, and M. Du, “Explainability for large language models: A representations with graph multiset pooling,” in ICLR, 2021.
survey,” ACM Transactions on Intelligent Systems and Technology, [59] Z. Hu, Y. Dong, K. Wang, and Y. Sun, “Heterogeneous graph
vol. 15, no. 2, pp. 1–38, 2024. transformer,” in Proceedings of the web conference 2020, 2020, pp.
[34] S. Xiong, A. Payani, R. Kompella, and F. Fekri, “Large 2704–2710.
language models can learn temporal reasoning,” arXiv preprint [60] J. Zhang, H. Zhang, C. Xia, and L. Sun, “Graph-bert: Only
arXiv:2401.06853, 2024. attention is needed for learning graph representations,” arXiv
[35] L. Luo, Y.-F. Li, G. Haffari, and S. Pan, “Reasoning on graphs: preprint arXiv:2001.05140, 2020.
Faithful and interpretable large language model reasoning,” Oct. [61] J. Yang, Z. Liu, S. Xiao, C. Li, D. Lian, S. Agrawal, A. Singh,
2023. G. Sun, and X. Xie, “Graphformers: Gnn-nested transformers for
[36] C. Wang, Y. Xu, Z. Peng, C. Zhang, B. Chen, X. Wang, L. Feng, and representation learning on textual graph,” Advances in Neural
B. An, “keqing: knowledge-based question answering is a nature Information Processing Systems, vol. 34, pp. 28 798–28 810, 2021.
chain-of-thought mentor of llm,” arXiv preprint arXiv:2401.00426, [62] C. Ying, T. Cai, S. Luo, S. Zheng, G. Ke, D. He, Y. Shen, and
2023. T.-Y. Liu, “Do transformers really perform badly for graph
[37] X. Guan, Y. Liu, H. Lin, Y. Lu, B. He, X. Han, and representation?” Advances in Neural Information Processing Systems,
L. Sun, “Mitigating large language model hallucinations via vol. 34, pp. 28 877–28 888, 2021.
autonomous knowledge graph-based retrofitting,” arXiv preprint [63] S. Mitheran, A. Java, S. K. Sahu, and A. Shaikh, “Introducing
arXiv:2311.13314, 2023. self-attention to target attentive graph neural networks,” arXiv
[38] X. Li, Y. Cao2, L. Pan, Y. Ma, and A. Sun, “Towards verifiable preprint arXiv:2107.01516, 2021.
generation: A benchmark for knowledge-aware language model [64] D. Kreuzer, D. Beaini, W. Hamilton, V. Létourneau, and P. Tossou,
attribution,” Oct. 2023. “Rethinking graph transformers with spectral attention,” Advances
[39] S. Wei, Y. Zhao, X. Chen, Q. Li, F. Zhuang, J. Liu, F. Ren, and in Neural Information Processing Systems, vol. 34, pp. 21 618–21 629,
G. Kou, “Graph learning and its advancements on large language 2021.
models: A holistic survey,” arXiv preprint arXiv:2212.08966, 2022. [65] V. P. Dwivedi and X. Bresson, “A generalization of transformer
[40] Z. Zhang, H. Li, Z. Zhang, Y. Qin, X. Wang, and W. Zhu, “Large networks to graphs,” arXiv preprint arXiv:2012.09699, 2020.
graph models: A perspective,” Aug. 2023. [66] C. Qian, H. Tang, Z. Yang, H. Liang, and Y. Liu, “Can large
[41] B. Jin, G. Liu, C. Han, M. Jiang, H. Ji, and J. Han, “Large language models empower molecular property prediction?” Jul.
language models on graphs: A comprehensive survey,” arXiv 2023.
preprint arXiv:2312.02783, 2023. [67] N. Chen, Y. Li, J. Tang, and J. Li, “Graphwiz: An instruction-
[42] Y. Li, Z. Li, P. Wang, J. Li, X. Sun, H. Cheng, and J. X. Yu, “A following language model for graph problems,” arXiv preprint
survey of graph meets large language model: Progress and future arXiv:2402.16029, 2024.
directions,” arXiv preprint arXiv:2311.12399, 2023.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 18

[68] W.-L. Chiang, Z. Li, Z. Lin, Y. Sheng, Z. Wu, H. Zhang, L. Zheng, [92] M. Chen, Z. Liu, C. Liu, J. Li, Q. Mao, and J. Sun, “Ultra-dp:
S. Zhuang, Y. Zhuang, J. E. Gonzalez et al., “Vicuna: An open- Unifying graph pre-training with multi-task graph dual prompt,”
source chatbot impressing gpt-4 with 90%* chatgpt quality,” See arXiv preprint arXiv:2310.14845, 2023.
https://vicuna. lmsys. org (accessed 14 April 2023), 2023. [93] Q. Ge, Z. Zhao, Y. Liu, A. Cheng, X. Li, S. Wang, and D. Yin,
[69] J. Zhao, M. Qu, C. Li, H. Yan, Q. Liu, R. Li, X. Xie, and J. Tang, “Enhancing graph neural networks with structure-based prompt,”
“Learning on large-scale text-attributed graphs via variational arXiv preprint arXiv:2310.17394, 2023.
inference,” arXiv preprint arXiv:2210.14709, 2022. [94] Q. Huang, H. Ren, P. Chen, G. Kržmanc, D. Zeng, P. Liang, and
[70] P. He, X. Liu, J. Gao, and W. Chen, “Deberta: Decoding-enhanced J. Leskovec, “Prodigy: Enabling in-context learning over graphs,”
bert with disentangled attention,” arXiv preprint arXiv:2006.03654, arXiv preprint arXiv:2305.12600, 2023.
2020. [95] Y. Zhu, J. Guo, and S. Tang, “Sgl-pt: A strong graph learner with
[71] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, graph prompt tuning,” arXiv preprint arXiv:2302.12449, 2023.
G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning [96] R. Shirkavand and H. Huang, “Deep prompt tuning for graph
transferable visual models from natural language supervision,” in transformers,” arXiv preprint arXiv:2309.10131, 2023.
ICML, 2021. [97] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan,
[72] C. Wu, S. Yin, W. Qi, X. Wang, Z. Tang, and N. Duan, “Visual P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al.,
chatgpt: Talking, drawing and editing with visual foundation “Language models are few-shot learners,” NeurIPS, 2020.
models,” arXiv preprint arXiv:2303.04671, 2023. [98] R. Thoppilan, D. De Freitas, J. Hall, N. Shazeer, A. Kulshreshtha,
[73] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, H.-T. Cheng, A. Jin, T. Bos, L. Baker, Y. Du et al., “Lamda: Language
M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” models for dialog applications,” arXiv preprint arXiv:2201.08239,
in ICML, 2021. 2022.
[74] Y. Ding, Y. Ma, W. Fan, Y. Yao, T.-S. Chua, and Q. Li, [99] A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra,
“Fashionregen: Llm-empowered fashion report generation,” arXiv A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann
preprint arXiv:2403.06660, 2024. et al., “Palm: Scaling language modeling with pathways,” arXiv
[75] Y. Yang, S. Xiong, A. Payani, E. Shareghi, and F. Fekri, “Harnessing preprint arXiv:2204.02311, 2022.
the power of large language models for natural language to first- [100] N. Shinn, F. Cassano, A. Gopinath, K. R. Narasimhan, and S. Yao,
order logic translation,” arXiv preprint arXiv:2305.15541, 2023. “Reflexion: Language agents with verbal reinforcement learning,”
[76] P. Veličković, W. Fedus, W. L. Hamilton, P. Liò, Y. Bengio, and R. D. in Thirty-seventh Conference on Neural Information Processing Systems,
Hjelm, “Deep graph infomax,” arXiv preprint arXiv:1809.10341, 2023.
2018. [101] J. Zhang, “Graph-toolformer: To empower llms with graph
[77] Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang, “Deep reasoning ability via prompt augmented by chatgpt,” arXiv
graph contrastive representation learning,” arXiv preprint preprint arXiv:2304.11116, 2023.
arXiv:2006.04131, 2020. [102] C. Mavromatis, V. N. Ioannidis, S. Wang, D. Zheng, S. Adeshina,
[78] Z. Hou, X. Liu, Y. Cen, Y. Dong, H. Yang, C. Wang, and J. Tang, J. Ma, H. Zhao, C. Faloutsos, and G. Karypis, “Train your own
“Graphmae: Self-supervised masked graph autoencoders,” in gnn teacher: Graph-aware distillation on textual graphs,” arXiv
Proceedings of the 28th ACM SIGKDD Conference on Knowledge preprint arXiv:2304.10668, 2023.
Discovery and Data Mining, 2022, pp. 594–604. [103] P. Jiang, J. Rayan, S. P. Dow, and H. Xia, “Graphologue: Exploring
[79] K. Hassani and A. H. Khasahmadi, “Contrastive multi-view large language model responses with interactive diagrams,” arXiv
representation learning on graphs,” in International conference on preprint arXiv:2305.11473, 2023.
machine learning. PMLR, 2020, pp. 4116–4126. [104] C. M. Castro Nascimento and A. S. Pimentel, “Do large language
[80] J. Shang, T. Ma, C. Xiao, and J. Sun, “Pre-training of graph models understand chemistry? a conversation with chatgpt,”
augmented transformers for medication recommendation,” arXiv Journal of Chemical Information and Modeling, vol. 63, no. 6, pp.
preprint arXiv:1906.00346, 2019. 1649–1655, 2023.
[81] S. Li, X. Han, and J. Bai, “Adaptergnn: Efficient delta tuning [105] A. M. Bran, S. Cox, A. D. White, and P. Schwaller, “Chemcrow:
improves generalization ability in graph neural networks,” arXiv Augmenting large-language models with chemistry tools,” arXiv
preprint arXiv:2304.09595, 2023. preprint arXiv:2304.05376, 2023.
[82] Y. Rong, Y. Bian, T. Xu, W. Xie, Y. Wei, W. Huang, and J. Huang, [106] E. Kasneci, K. Seßler, S. Küchemann, M. Bannert, D. Dementieva,
“Self-supervised graph transformer on large-scale molecular data,” F. Fischer, U. Gasser, G. Groh, S. Günnemann, E. Hüllermeier
Advances in Neural Information Processing Systems, vol. 33, pp. et al., “Chatgpt for good? on opportunities and challenges of large
12 559–12 571, 2020. language models for education,” Learning and Individual Differences,
[83] A. Gui, J. Ye, and H. Xiao, “G-adapter: Towards structure- vol. 103, p. 102274, 2023.
aware parameter-efficient transfer learning for graph transformer [107] A. Gilson, C. W. Safranek, T. Huang, V. Socrates, L. Chi, R. A.
networks,” arXiv preprint arXiv:2305.10329, 2023. Taylor, D. Chartash et al., “How does chatgpt perform on the
[84] Q. Zhao, W. Ren, T. Li, X. Xu, and H. Liu, “Graphgpt: Graph united states medical licensing examination? the implications of
learning with generative pre-trained transformers,” arXiv preprint large language models for medical education and knowledge
arXiv:2401.00529, 2023. assessment,” JMIR Medical Education, vol. 9, no. 1, p. e45312, 2023.
[85] B. Su, D. Du, Z. Yang, Y. Zhou, J. Li, A. Rao, H. Sun, Z. Lu, [108] S. Wu, O. Irsoy, S. Lu, V. Dabravolski, M. Dredze, S. Gehrmann,
and J.-R. Wen, “A molecular multimodal foundation model P. Kambadur, D. Rosenberg, and G. Mann, “Bloomberggpt: A
associating molecule graphs with natural language,” arXiv preprint large language model for finance,” arXiv preprint arXiv:2303.17564,
arXiv:2209.05481, 2022. 2023.
[86] J. Zhu, X. Song, V. N. Ioannidis, D. Koutra, and C. Faloutsos, [109] Y. Yang, M. C. S. Uy, and A. Huang, “Finbert: A pretrained
“Touchup-g: Improving feature representation through graph- language model for financial communications,” arXiv preprint
centric finetuning,” arXiv preprint arXiv:2309.13885, 2023. arXiv:2006.08097, 2020.
[87] Z. Liu, X. Yu, Y. Fang, and X. Zhang, “Graphprompt: Unifying [110] B. Jin, C. Xie, J. Zhang, K. K. Roy, Y. Zhang, S. Wang, Y. Meng,
pre-training and downstream tasks for graph neural networks,” and J. Han, “Graph chain-of-thought: Augmenting large language
in Proceedings of the ACM Web Conference 2023, 2023, pp. 417–428. models by reasoning on graphs,” arXiv preprint arXiv:2404.07103,
[88] M. Sun, K. Zhou, X. He, Y. Wang, and X. Wang, “Gppt: Graph pre- 2024.
training and prompt tuning to generalize graph neural networks,” [111] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl,
in Proceedings of the 28th ACM SIGKDD Conference on Knowledge “Neural message passing for quantum chemistry,” in International
Discovery and Data Mining, 2022, pp. 1717–1727. conference on machine learning. PMLR, 2017, pp. 1263–1272.
[89] C. Gong, X. Li, J. Yu, C. Yao, J. Tan, C. Yu, and D. Yin, “Prompt [112] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and
tuning for multi-view graph contrastive learning,” arXiv preprint J. Leskovec, “Graph convolutional neural networks for web-scale
arXiv:2310.10362, 2023. recommender systems,” in Proceedings of the 24th ACM SIGKDD
[90] T. Fang, Y. M. Zhang, Y. Yang, C. Wang, and C. Lei, “Universal international conference on knowledge discovery & data mining, 2018,
prompt tuning for graph neural networks,” in Thirty-seventh pp. 974–983.
Conference on Neural Information Processing Systems, 2023. [113] K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are
[91] X. Sun, H. Cheng, J. Li, B. Liu, and J. Guan, “All in one: Multi-task graph neural networks?” arXiv preprint arXiv:1810.00826, 2018.
prompting for graph neural networks,” 2023. [114] H. Maron, H. Ben-Hamu, H. Serviansky, and Y. Lipman, “Provably
powerful graph networks,” Advances in neural information
processing systems, vol. 32, 2019.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 19

[115] C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, [141] Z. Guo, L. Xia, Y. Yu, Y. Wang, Z. Yang, W. Wei, L. Pang, T.-S.
G. Rattan, and M. Grohe, “Weisfeiler and leman go neural: Higher- Chua, and C. Huang, “Graphedit: Large language models for
order graph neural networks,” in Proceedings of the AAAI conference graph structure learning,” arXiv preprint arXiv:2402.15183, 2024.
on artificial intelligence, vol. 33, no. 01, 2019, pp. 4602–4609. [142] Z. Chai, T. Zhang, L. Wu, K. Han, X. Hu, X. Huang, and Y. Yang,
[116] P. Li, J. Wang, Y. Qiao, H. Chen, Y. Yu, X. Yao, P. Gao, G. Xie, “Graphllm: Boosting graph reasoning ability of large language
and S. Song, “An effective self-supervised framework for learning model,” Oct. 2023.
expressive molecular global representations to drug discovery,” [143] Y. Tian, H. Song, Z. Wang, H. Wang, Z. Hu, F. Wang, N. V. Chawla,
Briefings in Bioinformatics, vol. 22, no. 6, p. bbab109, 2021. and P. Xu, “Graph neural prompting with large language models,”
[117] B. Jin, Y. Zhang, Q. Zhu, and J. Han, “Heterformer: Transformer- Sep. 2023.
based deep node representation learning on heterogeneous text- [144] Y. Liang, R. Zhang, L. Zhang, and P. Xie, “Drugchat: Towards
rich networks,” in Proceedings of the 29th ACM SIGKDD Conference enabling chatgpt-like capabilities on drug molecule graphs,” May
on Knowledge Discovery and Data Mining, 2023, pp. 1020–1031. 2023.
[118] B. Jin, Y. Zhang, Y. Meng, and J. Han, “Edgeformers: Graph- [145] Y. Zhang, Z. Chen, W. Zhang, and H. Chen, “Making large
empowered transformers for representation learning on textual- language models perform better in knowledge graph completion,”
edge networks,” arXiv preprint arXiv:2302.11050, 2023. Oct. 2023.
[119] Z. Hu, C. Fan, T. Chen, K.-W. Chang, and Y. Sun, “Pre-training [146] H. Zhao, S. Liu, C. Ma, H. Xu, J. Fu, Z.-H. Deng, L. Kong, and
graph neural networks for generic structural feature extraction,” Q. Liu, “Gimlet: A unified graph-text model for instruction-based
arXiv preprint arXiv:1905.13728, 2019. molecule zero-shot learning,” Bioinformatics, Preprint, Jun. 2023.
[120] C. hao, X. Runfeng, C. Xiangyang, Y. Zhou, W. Xin, X. Zhanwei, [147] P. Liu, Y. Ren, and Z. Ren, “Git-mol: A multi-modal large language
and Z. Kai, “Lkpnr: Llm and kg for personalized news model for molecular science with graph, image, and text,” Aug.
recommendation framework,” Aug. 2023. 2023.
[121] Z. Ni, X.-X. Deng, C. Tai, X.-Y. Zhu, X. Wu, Y.-J. Liu, and L. Zeng, [148] Y. Luo, J. Zhang, S. Fan, K. Yang, Y. Wu, M. Qiao, and Z. Nie,
“Grid: Scene-graph-based instruction-driven robotic task planning,” “Biomedgpt: Open multimodal generative pre-trained transformer
Sep. 2023. for biomedicine,” arXiv preprint arXiv:2308.09442, 2023.
[122] X. He, X. Bresson, T. Laurent, A. Perold, Y. LeCun, and B. Hooi, [149] H. Guo, M. Huo, R. Zhang, and P. Xie, “Proteinchat: Towards
“Harnessing explanations: Llm-to-lm interpreter for enhanced text- achieving chatgpt-like functionalities on protein 3d structures,”
attributed graph representation learning,” Oct. 2023. 2023.
[123] X. Ren, W. Wei, L. Xia, L. Su, S. Cheng, J. Wang, D. Yin, and [150] Y. Qin, X. Wang, Z. Zhang, and W. Zhu, “Disentangled
C. Huang, “Representation learning with large language models representation learning with large language models for text-
for recommendation,” Oct. 2023. attributed graphs,” arXiv preprint arXiv:2310.18152, 2023.
[124] W. Wei, X. Ren, J. Tang, Q. Wang, L. Su, S. Cheng, J. Wang, [151] X. He, Y. Tian, Y. Sun, N. V. Chawla, T. Laurent, Y. LeCun,
D. Yin, and C. Huang, “Llmrec: Large language models with X. Bresson, and B. Hooi, “G-retriever: Retrieval-augmented
graph augmentation for recommendation.” generation for textual graph understanding and question
[125] H. Lyu, S. Jiang, H. Zeng, Y. Xia, and J. Luo, “Llm-rec: Personalized answering,” arXiv preprint arXiv:2402.07630, 2024.
recommendation via prompting large language models,” arXiv [152] B. Perozzi, B. Fatemi, D. Zelle, A. Tsitsulin, M. Kazemi, R. Al-
preprint arXiv:2307.15780, 2023. Rfou, and J. Halcrow, “Let your graph do the talking: Encoding
[126] S. Balaji, R. Magar, Y. Jadhav, and A. B. Farimani, “Gpt-molberta: structured data for llms,” arXiv preprint arXiv:2402.05862, 2024.
Gpt molecular features language model for molecular property [153] L. Xia, B. Kao, and C. Huang, “Opengraph: Towards open graph
prediction,” Oct. 2023. foundation models,” arXiv preprint arXiv:2403.01121, 2024.
[127] J. Yu, Y. Ren, C. Gong, J. Tan, X. Li, and X. Zhang, “Empower [154] B. Jin, W. Zhang, Y. Zhang, Y. Meng, X. Zhang, Q. Zhu, and J. Han,
text-attributed graphs learning with large language models (llms),” “Patton: Language model pretraining on text-rich networks,” arXiv
Oct. 2023. preprint arXiv:2305.12268, 2023.
[128] S. Sun, Y. Ren, C. Ma, and X. Zhang, “Large language models as [155] B. Jin, W. Zhang, Y. Zhang, Y. Meng, H. Zhao, and J. Han,
topological structure enhancers for text-attributed graphs,” arXiv “Learning multiplex embeddings on text-rich networks with one
preprint arXiv:2311.14324, 2023. text encoder,” arXiv preprint arXiv:2310.06684, 2023.
[129] H. Liu, J. Feng, L. Kong, N. Liang, D. Tao, Y. Chen, and M. Zhang, [156] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang,
“One for all: Towards training one graph model for all classification and W. Chen, “Lora: Low-rank adaptation of large language
tasks,” arXiv preprint arXiv:2310.00149, 2023. models,” arXiv preprint arXiv:2106.09685, 2021.
[130] Y. Hu, Z. Zhang, and L. Zhao, “Beyond text: A deep dive into [157] M. Himsolt, “Gml: Graph modelling language,” University of
large language models’ ability on understanding graph data,” Oct. Passau, 1997.
2023. [158] U. Brandes, M. Eiglsperger, J. Lerner, and C. Pich, “Graph markup
[131] H. Wang, S. Feng, T. He, Z. Tan, X. Han, and Y. Tsvetkov, “Can language (graphml),” 2013.
language models solve graph problems in natural language?” [159] N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni,
arXiv preprint arXiv:2305.10037, 2023. and P. Liang, “Lost in the middle: How language models use long
[132] C. Liu and B. Wu, “Evaluating large language models on graphs: contexts,” arXiv preprint arXiv:2307.03172, 2023.
Performance insights and comparative analysis,” Sep. 2023. [160] J. Li, D. Li, S. Savarese, and S. Hoi, “Blip-2: Bootstrapping
[133] Q. Wang, Z. Gao, and R. Xu, “Graph agent: Explicit reasoning language-image pre-training with frozen image encoders and
agent for graphs,” arXiv preprint arXiv:2310.16421, 2023. large language models,” arXiv preprint arXiv:2301.12597, 2023.
[134] A. N. Rubungo, C. Arnold, B. P. Rand, and A. B. Dieng, “Llm-prop: [161] S. Gui, X. Li, L. Wang, and S. Ji, “Good: A graph out-of-distribution
Predicting physical and electronic properties of crystalline solids benchmark,” Advances in Neural Information Processing Systems,
from their text descriptions,” Oct. 2023. vol. 35, pp. 2059–2073, 2022.
[135] L. Wu, Z. Qiu, Z. Zheng, H. Zhu, and E. Chen, “Exploring [162] L. Hu, Z. Liu, Z. Zhao, L. Hou, L. Nie, and J. Li, “A survey
large language model for graph data understanding in online of knowledge enhanced pre-trained language models,” IEEE
job recommendations,” arXiv preprint arXiv:2307.05722, 2023. Transactions on Knowledge and Data Engineering, pp. 1–19, 2023.
[136] Y. Shi, A. Zhang, E. Zhang, Z. Liu, and X. Wang, “Relm: [163] G. Agrawal, T. Kumarage, Z. Alghami, and H. Liu, “Can
Leveraging language models for enhanced chemical reaction knowledge graphs reduce hallucinations in llms?: A survey,” arXiv
prediction,” arXiv preprint arXiv:2310.13590, 2023. preprint arXiv:2311.07914, 2023.
[137] J. Huang, X. Zhang, Q. Mei, and J. Ma, “Can llms effectively [164] S. Pan, L. Luo, Y. Wang, C. Chen, J. Wang, and X. Wu, “Unifying
leverage graph structural information: When and why,” Sep. 2023. large language models and knowledge graphs: A roadmap,” Jun.
[138] B. Fatemi, J. Halcrow, and B. Perozzi, “Talk like a graph: Encoding 2023.
graphs for large language models,” Oct. 2023. [165] Y. Sun, S. Wang, S. Feng, S. Ding, C. Pang, J. Shang, J. Liu, X. Chen,
[139] J. Li, Y. Liu, W. Fan, X.-Y. Wei, H. Liu, J. Tang, and Q. Li, Y. Zhao, Y. Lu et al., “Ernie 3.0: Large-scale knowledge enhanced
“Empowering molecule discovery for molecule-caption translation pre-training for language understanding and generation,” arXiv
with large language models: A chatgpt perspective,” arXiv preprint e-prints, pp. arXiv–2107, 2021.
arXiv:2306.06615, 2023. [166] L. Fang, Y. Luo, K. Feng, K. Zhao, and A. Hu, “A knowledge-
[140] R. Chen, T. Zhao, A. Jaiswal, N. Shah, and Z. Wang, “Llaga: Large enriched ensemble method for word embedding and multi-sense
language and graph assistant,” arXiv preprint arXiv:2402.08170, embedding,” IEEE Transactions on Knowledge and Data Engineering,
2024. 2022.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 20

[167] Q. Li, D. Wang, S. F. K. Song, Y. Zhang, and G. Yu, “Oerl: [189] Y. Wu, N. Hu, S. Bi, G. Qi, J. Ren, A. Xie, and W. Song,
Enhanced representation learning via open knowledge graphs,” “Retrieve-rewrite-answer: A kg-to-text enhanced llms framework
IEEE Transactions on Knowledge and Data Engineering, 2022. for knowledge graph question answering,” Sep. 2023.
[168] F. Moiseev, Z. Dong, E. Alfonseca, and M. Jaggi, “Skill: Structured [190] Y. Wen, Z. Wang, and J. Sun, “Mindmap: Knowledge graph
knowledge infusion for large language models,” May 2022. prompting sparks graph of thoughts in large language models,”
[169] N. Poerner, U. Waltinger, and H. Schütze, “E-bert: Efficient- Sep. 2023.
yet-effective entity embeddings for bert,” arXiv preprint [191] C. Feng, X. Zhang, and Z. Fei, “Knowledge solver: Teaching llms
arXiv:1911.03681, 2019. to search for domain knowledge from knowledge graphs,” Sep.
[170] C. Rosset, C. Xiong, M. Phan, X. Song, P. Bennett, and S. Tiwary, 2023.
“Knowledge-aware language model pretraining,” arXiv preprint [192] S. Saxena, S. Prasad, M. Prakash, A. Shankar, V. Vaddina,
arXiv:2007.00655, 2020. S. Gopalakrishnan et al., “Minimizing factual inconsistency
[171] W. Liu, P. Zhou, Z. Zhao, Z. Wang, Q. Ju, H. Deng, and P. Wang, and hallucination in large language models,” arXiv preprint
“K-bert: Enabling language representation with knowledge graph,” arXiv:2311.13878, 2023.
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, [193] Y. Shi, H. Ma, W. Zhong, G. Mai, X. Li, T. Liu, and J. Huang,
no. 03, pp. 2901–2908, Apr. 2020. “Chatgraph: Interpretable text classification by converting chatgpt
[172] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu, “Ernie: knowledge to graphs,” arXiv preprint arXiv:2305.03513, 2023.
Enhanced language representation with informative entities,” [194] Z. Chen, J. Chen, M. Gaidhani, A. Singh, and M. Sra, “Xplainllm:
arXiv preprint arXiv:1905.07129, 2019. A qa explanation dataset for understanding llm decision-making,”
[173] Y. Su, X. Han, Z. Zhang, Y. Lin, P. Li, Z. Liu, J. Zhou, and M. Sun, arXiv preprint arXiv:2311.08614, 2023.
“Cokebert: Contextual knowledge selection and embedding [195] W. Fan, Q. Li, and M. Cheng, “Deep modeling of social relations
towards enhanced pre-trained language models,” AI Open, vol. 2, for recommendation,” in Proceedings of the AAAI Conference on
pp. 127–134, Jan. 2021. Artificial Intelligence, vol. 32, no. 1, 2018.
[174] L. He, S. Zheng, T. Yang, and F. Zhang, “Klmo: Knowledge [196] W. Fan, T. Derr, Y. Ma, J. Wang, J. Tang, and Q. Li, “Deep
graph enhanced pretrained language model with fine-grained adversarial social recommendation,” in 28th International Joint
relationships,” in Findings of the Association for Computational Conference on Artificial Intelligence (IJCAI-19). International Joint
Linguistics: EMNLP 2021, M.-F. Moens, X. Huang, L. Specia, and Conferences on Artificial Intelligence, 2019, pp. 1351–1357.
S. W.-t. Yih, Eds. Punta Cana, Dominican Republic: Association [197] W. Fan, Y. Ma, D. Yin, J. Wang, J. Tang, and Q. Li, “Deep social
for Computational Linguistics, Nov. 2021, pp. 4536–4542. collaborative filtering,” in Proceedings of the 13th ACM Conference
[175] R. Wang, D. Tang, N. Duan, Z. Wei, X. Huang, J. Ji, G. Cao, on Recommender Systems, 2019, pp. 305–313.
D. Jiang, and M. Zhou, “K-adapter: Infusing knowledge into pre- [198] B. Yin, J. Xie, Y. Qin, Z. Ding, Z. Feng, X. Li, and W. Lin,
trained models with adapters,” in Findings of the Association for “Heterogeneous knowledge fusion: A novel approach for
Computational Linguistics: ACL-IJCNLP 2021, C. Zong, F. Xia, W. Li, personalized recommendation via llm,” in Proceedings of the 17th
and R. Navigli, Eds. Online: Association for Computational ACM Conference on Recommender Systems, 2023, pp. 599–601.
Linguistics, Aug. 2021, pp. 1405–1418. [199] Z. Tang, Z. Huan, Z. Li, X. Zhang, J. Hu, C. Fu, J. Zhou, and C. Li,
[176] M. Kang, J. Baek, and S. J. Hwang, “Kala: knowledge-augmented “One model for all: Large language models are domain-agnostic
language model adaptation,” arXiv preprint arXiv:2204.10555, 2022. recommendation systems,” Oct. 2023.
[177] Q. Lu, D. Dou, and T. H. Nguyen, “Parameter-efficient domain [200] S. Dai, N. Shao, H. Zhao, W. Yu, Z. Si, C. Xu, Z. Sun, X. Zhang,
knowledge integration from multiple sources for biomedical and J. Xu, “Uncovering chatgpt’s capabilities in recommender
pre-trained language models,” in Findings of the Association for systems,” arXiv preprint arXiv:2305.02182, 2023.
Computational Linguistics: EMNLP 2021, 2021, pp. 3855–3865. [201] H. Wang, X. Liu, W. Fan, X. Zhao, V. Kini, D. Yadav, F. Wang,
[178] I. Yamada, A. Asai, H. Shindo, H. Takeda, and Y. Matsumoto, Z. Wen, J. Tang, and H. Liu, “Rethinking large language model
“Luke: Deep contextualized entity representations with entity- architectures for sequential recommendations,” arXiv preprint
aware self-attention,” arXiv preprint arXiv:2010.01057, 2020. arXiv:2402.09543, 2024.
[179] Y. Xu, M. Namazifar, D. Hazarika, A. Padmakumar, Y. Liu, and [202] Y. Xi, W. Liu, J. Lin, J. Zhu, B. Chen, R. Tang, W. Zhang,
D. Hakkani-Tür, “Kilm: Knowledge injection into encoder-decoder R. Zhang, and Y. Yu, “Towards open-world recommendation with
language models,” Feb. 2023. knowledge augmentation from large language models,” arXiv
[180] T. Shen, Y. Mao, P. He, G. Long, A. Trischler, and W. Chen, preprint arXiv:2306.10933, 2023.
“Exploiting structured knowledge in text via graph-guided [203] J. Wu, Q. Liu, H. Hu, W. Fan, S. Liu, Q. Li, X.-M. Wu, and K. Tang,
representation learning,” arXiv preprint arXiv:2004.14224, 2020. “Leveraging large language models (llms) to empower training-
[181] M. Joshi, D. Chen, Y. Liu, D. S. Weld, L. Zettlemoyer, and O. Levy, free dataset condensation for content-based recommendation,”
“Spanbert: Improving pre-training by representing and predicting arXiv preprint arXiv:2310.09874, 2023.
spans,” Transactions of the association for computational linguistics, [204] J. Liu, C. Liu, R. Lv, K. Zhou, and Y. Zhang, “Is chatgpt
vol. 8, pp. 64–77, 2020. a good recommender? a preliminary study,” arXiv preprint
[182] D. Yu, C. Zhu, Y. Yang, and M. Zeng, “Jaket: Joint pre-training arXiv:2304.10149, 2023.
of knowledge graph and language understanding,” Proceedings [205] K. Bao, J. Zhang, Y. Zhang, W. Wang, F. Feng, and X. He,
of the AAAI Conference on Artificial Intelligence, vol. 36, no. 10, pp. “Tallrec: An effective and efficient tuning framework to align
11 630–11 638, Jun. 2022. large language model with recommendation,” arXiv preprint
[183] M. Yasunaga, A. Bosselut, H. Ren, X. Zhang, C. D. Manning, P. S. arXiv:2305.00447, 2023.
Liang, and J. Leskovec, “Deep bidirectional language-knowledge [206] W. Hua, Y. Ge, S. Xu, J. Ji, and Y. Zhang, “Up5: Unbiased
graph pretraining,” Advances in Neural Information Processing foundation model for fairness-aware recommendation,” arXiv
Systems, vol. 35, pp. 37 309–37 323, 2022. preprint arXiv:2305.12090, 2023.
[184] Y. Qin, Y. Lin, R. Takanobu, Z. Liu, P. Li, H. Ji, M. Huang, M. Sun, [207] C. Huang, T. Yu, K. Xie, S. Zhang, L. Yao, and J. McAuley,
and J. Zhou, “Erica: Improving entity and relation understanding “Foundation models for recommender systems: A survey and
for pre-trained language models via contrastive learning,” arXiv new perspectives,” arXiv preprint arXiv:2402.11143, 2024.
preprint arXiv:2012.15022, 2020. [208] R. Yang, L. Fang, and Y. Zhou, “Cp-kgc: Constrained-prompt
[185] X. Wang, T. Gao, Z. Zhu, Z. Zhang, Z. Liu, J. Li, and J. Tang, knowledge graph completion with large language models,” Oct.
“Kepler: A unified model for knowledge embedding and pre- 2023.
trained language representation,” Transactions of the Association for [209] L. Yao, J. Peng, C. Mao, and Y. Luo, “Exploring large language
Computational Linguistics, vol. 9, pp. 176–194, 2021. models for knowledge graph completion,” Sep. 2023.
[186] Y. Sun, S. Wang, Y. Li, S. Feng, H. Tian, H. Wu, and H. Wang, [210] J. Kim, Y. Kwon, Y. Jo, and E. Choi, “Kg-gpt: A general framework
“Ernie 2.0: A continual pre-training framework for language for reasoning on knowledge graphs using large language models,”
understanding,” in Proceedings of the AAAI conference on artificial Oct. 2023.
intelligence, vol. 34, no. 05, 2020, pp. 8968–8975. [211] H. Luo, H. E, Z. Tang, S. Peng, Y. Guo, W. Zhang, C. Ma,
[187] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, G. Dong, M. Song, and W. Lin, “Chatkbqa: A generate-then-
B. Zhang, J. Zhang, Z. Dong et al., “A survey of large language retrieve framework for knowledge base question answering with
models,” arXiv preprint arXiv:2303.18223, 2023. fine-tuned large language models,” Oct. 2023.
[188] J. Baek, A. F. Aji, and A. Saffari, “Knowledge-augmented language
model prompting for zero-shot knowledge graph question
answering,” arXiv preprint arXiv:2306.04136, 2023.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, SUBMISSION 2023 21

[212] J. Jiang, K. Zhou, Z. Dong, K. Ye, W. X. Zhao, and J.-R. Wen, by promoting unnoticeability,” arXiv preprint arXiv:2202.08057,
“Structgpt: A general framework for large language model to 2022.
reason over structured data,” arXiv preprint arXiv:2305.09645, 2023. [231] X. Zou, Q. Zheng, Y. Dong, X. Guan, E. Kharlamov, J. Lu,
[213] L. Luo, J. Ju, B. Xiong, Y.-F. Li, G. Haffari, and S. Pan, “Chatrule: and J. Tang, “Tdgia: Effective injection attacks on graph neural
Mining logical rules with large language models for knowledge networks,” in Proceedings of the 27th ACM SIGKDD Conference on
graph reasoning,” Sep. 2023. Knowledge Discovery & Data Mining, 2021, pp. 2461–2471.
[214] L.-P. Meyer, J. Frey, K. Junghanns, F. Brei, K. Bulert, S. Gründer- [232] H. Dai, H. Li, T. Tian, X. Huang, L. Wang, J. Zhu, and L. Song,
Fahrer, and M. Martin, “Developing a scalable benchmark for “Adversarial attack on graph structured data,” in International
assessing large language models in knowledge graph engineering,” conference on machine learning. PMLR, 2018, pp. 1115–1124.
Aug. 2023. [233] D. Zügner, A. Akbarnejad, and S. Günnemann, “Adversarial
[215] M. Galkin, X. Yuan, H. Mostafa, J. Tang, and Z. Zhu, “Towards attacks on neural networks for graph data,” in Proceedings of the
foundation models for knowledge graph reasoning,” arXiv preprint 24th ACM SIGKDD international conference on knowledge discovery
arXiv:2310.04562, 2023. & data mining, 2018, pp. 2847–2856.
[216] W. Tang, R. Liu, H. Wen, X. Dai, J. Ding, H. Li, W. Fan, Y. Xie, and [234] A. Wei, N. Haghtalab, and J. Steinhardt, “Jailbroken: How does
J. Tang, “A general single-cell analysis framework via conditional llm safety training fail?” arXiv preprint arXiv:2307.02483, 2023.
diffusion generative models,” bioRxiv, pp. 2023–10, 2023. [235] Z. Zhang, G. Zhang, B. Hou, W. Fan, Q. Li, S. Liu, Y. Zhang, and
[217] Y. Cao, Z.-Q. Yang, X.-L. Zhang, W. Fan, Y. Wang, J. Shen, S. Chang, “Certified robustness for large language models with
D.-Q. Wei, Q. Li, and X.-Y. Wei, “Identifying the kind behind self-denoising,” arXiv preprint arXiv:2307.07171, 2023.
smiles—anatomical therapeutic chemical classification using [236] S. Geng, S. Liu, Z. Fu, Y. Ge, and Y. Zhang, “Recommendation
structure-only representations,” Briefings in Bioinformatics, vol. 23, as language processing (rlp): A unified pretrain, personalized
no. 5, p. bbac346, 2022. prompt & predict paradigm (p5),” in Proceedings of the 16th ACM
[218] H. Cao, Z. Liu, X. Lu, Y. Yao, and Y. Li, “Instructmol: Multi-modal Conference on Recommender Systems, 2022, pp. 299–315.
integration for building a versatile and reliable molecular assistant [237] H. Liu, J. Dacon, W. Fan, H. Liu, Z. Liu, and J. Tang, “Does gender
in drug discovery,” arXiv preprint arXiv:2311.16208, 2023. matter? towards fairness in dialogue systems,” in Proceedings of
[219] Z. Zhao, D. Ma, L. Chen, L. Sun, Z. Li, H. Xu, Z. Zhu, S. Zhu, the 28th International Conference on Computational Linguistics, 2020,
S. Fan, G. Shen et al., “Chemdfm: Dialogue foundation model for pp. 4403–4416.
chemistry,” arXiv preprint arXiv:2401.14818, 2024. [238] G. Zhang, Y. Zhang, Y. Zhang, W. Fan, Q. Li, S. Liu, and S. Chang,
[220] C. Galindo, J.-A. Fernández-Madrigal, J. González, and A. Saffiotti, “Fairness reprogramming,” in Thirty-sixth Conference on Neural
“Robot task planning using semantic maps,” Robotics and Information Processing Systems, 2022.
autonomous systems, vol. 56, no. 11, pp. 955–966, 2008. [239] X. Chen, W. Fan, J. Chen, H. Liu, Z. Liu, Z. Zhang, and Q. Li,
[221] G. Chalvatzaki, A. Younes, D. Nandha, A. T. Le, L. F. R. Ribeiro, “Fairly adaptive negative sampling for recommendations,” in
and I. Gurevych, “Learning to reason over scene graphs: A Proceedings of the ACM Web Conference 2023, 2023, pp. 3723–3733.
case study of finetuning gpt-2 into a robot language model for [240] K. Webster, X. Wang, I. Tenney, A. Beutel, E. Pitler, E. Pavlick,
grounded task planning,” Frontiers in Robotics and AI, vol. 10, p. J. Chen, E. Chi, and S. Petrov, “Measuring and reducing
1221739, Aug. 2023. gendered correlations in pre-trained models,” arXiv preprint
[222] K. Rana, J. Haviland, S. Garg, J. Abou-Chakra, I. Reid, and arXiv:2010.06032, 2020.
N. Suenderhauf, “Sayplan: Grounding large language models [241] Y. Guo, Y. Yang, and A. Abbasi, “Auto-debias: Debiasing masked
using 3d scene graphs for scalable robot task planning,” Sep. 2023. language models with automated biased prompts,” in Proceedings
[223] Y. Zhen, S. Bi, L. Xing-tong, P. Wei-qin, S. Hai-peng, C. Zi-rui, of the 60th Annual Meeting of the Association for Computational
and F. Yi-shu, “Robot task planning based on large language Linguistics (Volume 1: Long Papers), 2022, pp. 1012–1023.
model representing knowledge with directed graph structures,” [242] J. He, M. Xia, C. Fellbaum, and D. Chen, “Mabel: Attenuating
Jun. 2023. gender bias using textual entailment data,” arXiv preprint
[224] H. Liu, Y. Wang, W. Fan, X. Liu, Y. Li, S. Jain, Y. Liu, A. K. Jain, arXiv:2210.14975, 2022.
and J. Tang, “Trustworthy ai: A computational perspective,” arXiv [243] N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss,
preprint arXiv:2107.06641, 2021. K. Lee, A. Roberts, T. B. Brown, D. Song, U. Erlingsson et al.,
[225] W. Fan, X. Zhao, X. Chen, J. Su, J. Gao, L. Wang, Q. Liu, Y. Wang, “Extracting training data from large language models.” in USENIX
H. Xu, L. Chen et al., “A comprehensive survey on trustworthy Security Symposium, vol. 6, 2021.
recommender systems,” arXiv preprint arXiv:2209.10117, 2022. [244] J. Huang, H. Shao, and K. C.-C. Chang, “Are large pre-trained
[226] W. Fan, W. Jin, X. Liu, H. Xu, X. Tang, S. Wang, Q. Li, J. Tang, language models leaking your personal information?” arXiv
J. Wang, and C. Aggarwal, “Jointly attacking graph neural network preprint arXiv:2205.12628, 2022.
and its explanations,” in 2023 IEEE 39th International Conference on [245] J. Zhang, L. Wang, S. Wang, and W. Fan, “Graph unlearning with
Data Engineering (ICDE). IEEE, 2023. efficient partial retraining,” arXiv preprint arXiv:2403.07353, 2024.
[227] W. Fan, T. Derr, X. Zhao, Y. Ma, H. Liu, J. Wang, J. Tang, and Q. Li, [246] Y. Li, Z. Tan, and Y. Liu, “Privacy-preserving prompt tuning for
“Attacking black-box recommendations via copying cross-domain large language model services,” arXiv preprint arXiv:2305.06212,
user profiles,” in 2021 IEEE 37th International Conference on Data 2023.
Engineering (ICDE). IEEE, 2021, pp. 1583–1594. [247] R. Xue, X. Shen, R. Yu, and X. Liu, “Efficient large language models
[228] N. Jain, A. Schwarzschild, Y. Wen, G. Somepalli, J. Kirchenbauer, fine-tuning on graphs,” arXiv preprint arXiv:2312.04737, 2023.
P.-y. Chiang, M. Goldblum, A. Saha, J. Geiping, and T. Goldstein, [248] T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer,
“Baseline defenses for adversarial attacks against aligned language “Qlora: Efficient finetuning of quantized llms,” arXiv preprint
models,” arXiv preprint arXiv:2309.00614, 2023. arXiv:2305.14314, 2023.
[229] D. Bespalov, S. Bhabesh, Y. Xiang, L. Zhou, and Y. Qi, “Towards [249] X. Ma, G. Fang, and X. Wang, “Llm-pruner: On the structural
building a robust toxicity predictor,” in Proceedings of the 61st pruning of large language models,” arXiv preprint arXiv:2305.11627,
Annual Meeting of the Association for Computational Linguistics 2023.
(Volume 5: Industry Track), 2023, pp. 581–598. [250] M. Xia, T. Gao, Z. Zeng, and D. Chen, “Sheared llama: Accelerating
[230] Y. Chen, H. Yang, Y. Zhang, K. Ma, T. Liu, B. Han, and language model pre-training via structured pruning,” arXiv
J. Cheng, “Understanding and improving graph injection attack preprint arXiv:2310.06694, 2023.

You might also like