Exploring the potential of language models for graph learning: opportunities and challenges Yuqun Wang1, ∗, Libin Chen1, Qian Li1, and Hongfu Liu1 1 College of Intelligence Science and Technology, National University of Defense Technology, Changsha, China Abstract Graph learning methods are becoming increasingly popular in solving problems related to social networks, biological networks, and other real-world applications. With the rapid development of large language models (LLMs), they are also being used for graph-related tasks and combined with traditional graph neural network (GNN)-based approaches to improve the ability to process graphs associated with text and graph-structured data. In this paper, we provide a review and analyse existing approaches. Firstly, we propose a new taxonomy that classifies existing methods into three categories based on LLMs and GNNs who serve as the final task solving component. Based on this, representative models among them are summarised for each category. Finally, we analyse the limitations of the existing methods and provide an outlook on future research directions in this area. Keywords Large Language Models, Graph Neural Networks, Natural Language Processing, Graph Learning 1. Introduction Graph data, found in diverse forms such as the Internet, traffic networks, social networks, and biological networks, can be effectively represented through graphs. The analysis and mining of this data have become pivotal areas of research. Graph neural networks based on deep learning graph modeling methods have become a new field developed in recent years based on traditional neural networks. These networks are able to better overcome the limitations imposed by traditional deep neural network learning by defining suitable neural network models on graph data and applying the deep learning approach to graph data. However, in the real world, more and more nodes or connecting edges of graphs are associated with attributes in the form of text. Yet some existing graph neural network methods still have some limitations in dealing with textual attributes of nodes in graph ICCIC 2024: International Conference on Computer and Intelligent Control, June 29–30, 2024, Kuala Lumpur, Malaysia ∗ Corresponding author. wangyuqun18@nudt.edu.cn (Y. Wang); chenlibin@nudt.edu.cn (L. Chen);liqian.nudt@nudt.edu.cn (Q. Li); liuhongfu@nudt.edu.cn (H. Liu) 0009-0008-5787-877 (Y. Wang); 0000-0002-7558-865X (L. Chen); 0009-0002-8920-305 (Q. Li); 0000- 0002-1459-8424 (H. Liu) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings data. Traditional graph neural network methods mainly model the inputs or outputs consisting of nodes and edges on a graph, but ignore the textual attributes contained in the nodes themselves and are unable to model the original textual information. The advent of Large Language Models (LLMs) has brought new perspectives to this challenge. Large-scale language models, capable of harnessing vast amounts of data, offer strong language understanding and generalization capabilities. As a deep learning-based natural language processing model, LLMs are trained on extensive corpora, enabling them to handle a variety of linguistic tasks. The launch of GPT-3 in 2020 garnered significant attention from scholars towards LLMs, sparking the question: Can Large Language Models leverage their potential in graph learning to overcome the limitations of traditional graph neural network approaches? Although this question has been studied and explored by scholars, systematic research reviews examining the impact of large language models on graph learning remain sparse. Liu et al. [1], inspired by the foundational roles of LLMs in natural language processing and GNNs in graph data processing, proposed the concept of 'graph foundation models' and provided a definition. Li et al. [2] investigated the advancements and potential future directions of large language models in graph-related tasks. This article aims to explore and summarize the rapidly evolving field, offering an overview of the influence of language models on graph learning for those interested in pursuing research in this area. Contributions. The main contributions of this paper are summarised as follows.(1) We present the findings of research in the field through a structured taxonomy, which categorizes the existing studies into three distinct classes. (2) Systematic methodological review. For different classification methods, we summarise representative models, describe each model in more detail, and summarise their strengths and weaknesses as well as limitations. (3) Future directions. We provide an in-depth discussion of the limitations of the current work and suggest possible directions for future development in the field. 2. Preliminarys In this section, we introduce the definition of a correlation graph and formalise the concepts and definitions related to the two key areas of large language models and graph neural networks and their development. 2.1 Definition  Definition 1(Graph). A graph is a collection of nodes and edges. A graph is denoted by 𝐺𝐺 = (𝑉𝑉, 𝐸𝐸), where is the set of nodes and is the set of edges. In an undirected graph, edges can be viewed as unordered pairs connecting two nodes; in a directed graph, edges can be viewed as ordered pairs connecting a start node and an end node.  Definition 2(Text Attribute Graph (TAG)). For a textual attribute graph, each node is associated with a contiguous textual feature (sentence). the form of the TAG can be represented as 𝐺𝐺 = (𝑉𝑉, 𝐸𝐸, 𝐷𝐷), where each 𝜈𝜈𝑖𝑖 ∈ 𝑉𝑉 is associated with some textual information 𝑑𝑑𝑣𝑣𝑖𝑖 ∈ 𝐷𝐷. 2.2 Graph Neural Networks Graph Neural Networks are neural network architectures designed to solve tasks related on graph-structured data. The basic idea is to iteratively update the representation of a node by combining the representations of its neighbours and the node's own representation. Graph data is modelled and inferred by learning the interactions between nodes and the global structure of the graph.GNNs perform well in many graph related tasks such as node classification, link prediction and graph generation. A typical graph neural network consists of multiple graph neural network layers, each of which consists of two main steps: information aggregation and feature update. Below is a simplified formulation of a graph neural network layer. Information Aggregation Aggregation: (𝑙𝑙) (𝑙𝑙−𝑙𝑙) ℎ𝑖𝑖 = 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐸𝐸 (𝑙𝑙) ��ℎ𝑗𝑗 , ∀𝑗𝑗 ∈ Ne(𝑖𝑖)�� denotes the hidden state of the node at (𝑙𝑙−𝑙𝑙) layer i. 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐸𝐸 (𝑙𝑙) is an aggregation function that aggregates the hidden states ℎ𝑗𝑗 of the neighbours of node i, node j. Ne(𝑖𝑖) denotes the set of neighbouring nodes connected to node i. Feature Update (Update): (1) (1−1) (1) ℎi = 𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝐸𝐸 (1) �ℎi , ℎi � is an update function that combines the hidden state (1−1) (1) ℎi of the previous layer of node i and the hidden state ℎi of the current layer,to generate a new node representation. By stacking multiple graph neural network layers, information can be propagated layer by layer from neighbouring nodes and capture relationships between nodes further away. Eventually, the graph neural network produces a final representation of each node that can be used for different graph-related tasks. It is important to note that the above formulation is just a simple example, and there are many variants and improvements of graph neural network models in practice, such as GraphSAGE [21], GCN [22], GAT [23], etc., which may use different aggregation and updating functions, as well as other techniques to process graph data 2.3 Large Language Models In recent years, researchers have paid more and more attention to the evolution and development of language models, by expanding the amount of pre-trained language models and the amount of data, Large language model can not only improve the effect of task processing, and can show many special capabilities that small models do not have. The underlying layer of large language model basically adopts the Transformer structure. Currently, the common large language models are BERT [19], GPT [20]. BERT model adopts the bi-directional encoding layer in the 12-layer Transformer structure to represent the network, but it only adopts the encoding layer of the Transformer as the theme framework, which makes it more difficult to solve the tasks of text generation such as article continuation, translation, etc. GPT adopts the decoding layer of the Transformer as the theme framework, which makes it more difficult to solve the tasks of text generation such as article continuation and translation. GPT adopts the decoding layer of Transformer as the main structure of the network for modelling, so it is very effective in text generation tasks, compared with the BERT model, GPT does not care much about the understanding of the current language representation, and focuses on how to continue to generate the rest of the text. Large language models also include other components and techniques such as positional coding, multilayer stacking, pre-training and fine-tuning. By pre-training on large-scale textual data, these models can learn rich linguistic knowledge and show strong performance capabilities in various natural language processing tasks. 2.4 Proposed Taxonomy Depending on the final solution component for solving graph-related problems, we classify the combined LLM and GNN approach into three categories:(1) GNN as the final component for task solving. In this case, LLM functions as a text encoder, processing the input textual information to aid the GNN in task-solving. (2) LLM as the final component of task solving. In this category. In this type, there are two scenarios; one involves using GNNs to encode graph structures, thereby assisting the LLM in capturing graph structure information. The other involves transforming the graph structure into a sequence understandable by the LLM or adapting the transformer architecture to concurrently handle textual and graph structure information, thus eliminating the need for GNNs. (3) Collaborative solving of LLM and GNN. In this category, LLM and GNN co-solving can be done in two ways, either by co-training and sharing features, or by aligning the two through the latent space. In the next section, we will investigate and summarise each of these three categories individually. A model classification diagram and a representative example are given as shown in Fig. 1. Figure 1: Classification and representative examples of models for solving graph-related tasks with the help of large language models (LLMs) 3. Fine Tuning 3.1 GNN as the final component of task solving GNN performs well in areas such as processing graph-structured data, but has limitations in processing graphs with text, whereas LLM possesses a better ability to understand textual information. In this category, LLM acts as a related device for text feature extraction, providing initial node feature vectors to the GNN, which later generates node and edge representations and predictions through the GNN. In the next section, we will discuss the techniques related to these models. GNNs excel in processing graph-structured data but often struggle with graphs containing textual elements. Conversely, LLMs demonstrate superior capabilities in understanding text. In this integration, the LLM serves as an auxiliary tool for text feature extraction, providing initial node feature vectors. These vectors are then further refined by the GNN, which generates comprehensive node and edge representations and predictions. The subsequent section will discuss the techniques related to these models in more detail. LM-GNN [3] exemplifies the joint training of Large Language Models and Graph Neural Networks. Here, a graph-aware transformer functions as a semantic encoder, later fine- tuned in conjunction with a GNN encoder for predicting links in heterogeneous graphs. Meng et al. [4] introduced GNN-LM, a language modeling approach that enhances traditional neural network language models. It does so by referencing similar contexts across the entire training corpus, utilizing a high-dimensional tagged representation to retrieve the k-nearest neighbors of the input context as references. For each input context, a directed isomorphic graph is constructed, with nodes representing tokens from the input context or retrieved neighboring contexts and edges signifying connections between these tokens. A GNN then aggregates information from these contexts to decode the subsequent token. This methodology facilitates the retrieval of pertinent contexts as references, thereby improving the model's ability to predict forthcoming words in language modeling tasks.The LLM-to-LM [5] framework first wraps the textual attributes associated with each node in a custom prompt, and then uses the large language model to query and generate a list of predictions and explanations. Next, the raw text, predictions, and interpretations are used to fine-tune the language model and turned into vector node features. Finally, these node features can be used in a downstream graph neural network to predict the classes of unknown nodes. TextGNN [6] integrates a text encoder with a Graph Neural Network (GNN), showcasing robust performance in tasks such as advertisement relevance. This model capitalizes on the text encoder's natural language understanding capabilities and enhances its performance by incorporating information from graph-type data, outperforming approaches that rely solely on semantic information. TextGNN employs an end-to-end framework that synergizes text encoders with Graph Neural Networks for training and optimization. Within this framework, the text encoder processes textual input, capturing its semantic essence, while the Graph Neural Network handles graph data, extracting graph relationships and contextual insights. Xie et al. [7] proposed a framework model for graph corpora, in which the framework LM+GNN consists mainly of one or more LMs are responsible for encoding textual information, while GNN aggregators are used for information aggregation. Given a graph corpus as input, LM+GNN uses one or more LMs as text encoders for the nodes. The embeddings generated through these LMs are added to the topology of the graph and fused with other information in the graph. Finally, the output is supervised by a task-specific decoder. In comparison to Graph Neural Networks (GNNs), which often depend on high-quality labeling, Large Language Models (LLMs) boast an extensive knowledge base and exhibit remarkable zero-shot and few-shot learning capabilities. This is particularly evident in node classification tasks involving graphs with textual attributes. The integration of an LLM enables the model to effectively handle node classification tasks even with limited samples. Traditional GNNs typically require a substantial number of labeled samples to perform well, posing a challenge in scenarios with limited training data. However, LLMs, with their extensive pre-training and rich linguistic knowledge acquired from large-scale textual data, can mitigate this issue. When combined with a GNN, The LLM comprehends the semantic and contextual nuances of the text, facilitating the generation of high-quality node representations. LLM-GNN [8] is an unlabelled node classification method. The method combines the advantages of graph neural networks and large language models by using LLM to annotate a small number of nodes and training the GNN on the annotations of the large language model to predict the majority of unlabelled nodes. Yu et al. [9] proposed a method to enhance class-level information using Large Language Models to improve the quality of node representations, which was used to solve the problem of node classification tasks under a small number of samples. Semantic information is extracted from the labels using LLM and samples with labels are generated, whereas the structural information in the original dataset is later captured using an edge predictor and the newly generated samples are integrated into the original graph. Finally, the entire dataset is trained by graph neural network and the results of node classification are obtained. OFA [24] describes different graph data through natural language and introduces the concept of nodes of interest, uses a single task to standardise different tasks, and converts all inputs embedded in llm into cued graphs containing both graph and task information through a graphical cueing paradigm, thus allowing adaptive downstream prediction. Experimentally, OFA was found to be capable of under-shooting and zero-shooting learning on different graph domains. 3.2 LLM as the final component of task solving The core concept of this category centers on employing Large Language Models (LLMs) as the primary architectural framework to acquire both graph structure and textual information. Given that graphs vary in structure and feature different forms of definitions, transforming graph data directly into text is not straightforward, posing a significant challenge for the application of LLMs to graph-related tasks. This category can be further subdivided based on whether Graph Neural Networks (GNNs) are involved in the task- solving process. Accordingly, this section is divided into two subcategories: GNN-free method and GNN-based methods 3.2.1 GNN-free methods This type of approach uses LLM directly to obtain textual information and graph structure without GNN involvement. Traditional LLMs use transformers for natural language encoding, but have limited ability to model graph structure information. Therefore, this type of approach obtains node and edge representations by converting the graph structure into textual information or by designing the LLM as an advanced modelling structure capable of processing textual information and encoding it graphically. GPT4Graph [12] converts graph data into a graphical Description Language. GRAPHTEXT [13] encodes graph information into text sequences. InstructGLM [26] uses natural language to describe the geometric structure and node characteristics of graphs, and enables LLM to solve graph-related problems by tuning it with instructions. The above method enables LLM to process graph data directly by converting graph data into textual descriptions, but LLM needs to identify the implicit graph structure from sequential text in the process, and compared with traditional graph learning methods, LLM may face inefficiencies in graph learning based on sequential graph descriptions, and is still insufficient for representing multidimensional and correlated graph data. GraphLLM [25] is an end-to-end approach that synergistically integrates a graph learning model (graph converter) with an LLM into a single system, with a framework that consists of three main steps: node understanding, structural understanding, and LLM- oriented prefix tuning for graph enhancement. Compared to methods that convert graph data to text, GraphLLM is able to improve on graph reasoning tasks by exploiting synergy with graph converters and leveraging the strengths of both. 3.2.2 GNN-based method The ability of GNNs to capture hidden representations of structural information between nodes when processing structured data provides a powerful representation learning capability. This has led to the utilisation of GNNs in a number of approaches to study LLMs when processing graph data to enhance the performance of LLMs. DGTL [10] utilises large language models to provide prediction and achieve interpretability for text-attributed graph related tasks, the framework combines a disentangled graph learning The framework combines the method of untangled graph learning to generate text embeddings by computing the average of the last layer of features in the upstream DGTL, capturing the contextual and semantic information of the text associated with each node, and then using the untangled graph learning to learn embeddings with different domain information, and then finally injecting the learnt features with domain information into the downstream DGTL.The LLM and GNN in the Graph-ToolFormer [27] framework are individually pre-trained, and then the LLM calls the pre-trained GNN model to complete the task. That is, LLM is used as a unified common interface for graph inference tasks. GraphGPT [11] enhances the understanding and adaptation of graph structures by aligning them to the natural language space and through graph instruction tuning. In addition, in order to improve the stepwise reasoning ability of large language models, GraphGPT also integrates thought chain distillation into the framework, which makes the whole model show stronger ability in stepwise reasoning and handling distributed transfer.ReLM [29] makes use of LM and GNN for chemical reaction prediction, using pre- trained GNNs to generate candidate answers and context examples from a pool of candidates, which are then analysed in a multiple-choice format using LM. Zou et al. [28] proposed a new pre-training framework for topology perception by jointly optimising LM and graph neural networks to predict the nodes involved in the context graph. In addition, based on the situation that some nodes are rich in textual information while others have less textual information, an enhancement strategy is designed to enrich the nodes with text from neighbouring nodes with insufficient textual information. After finishing the pre-training, only the LM is applied to the downstream task and the auxiliary role of the GNN is abandoned. PATTON [31] utilises textual information and network structure to enhance and consolidate the LM's ability to comprehend tokens and documents. The GNN nested Transformer architecture GraphFormers proposed by Yang et al. [32] is used in the framework, while two pre- training strategies are later employed to help the LM capture the intrinsic dependencies that exist between textual attributes and network structures. 3.3 Collaborative solving of LLM and GNN These studies combine GNN for graph structure coding and LLM for textual information coding for co-training and mutual enhancement.The GNN component can provide structural information to the whole framework and provide it to LLM, and LLM can provide textual analysing capability to the whole framework and provide textual signals to GNN. Depending on how the two combine and learn from each other, we divide this category into two types: the LLM and GNN predictive alignment, and the LLM and GNN alignment in potential space. 3.3.1 LLM and GNN Predictive Alignment GLEM [14] makes use of the relevant definitions of the Variational EM framework, where the large language model uses the textual information of each node to predict its labels and to model the distribution of labels based on local textual attributes. While graph neural networks use the text and label information of the surrounding nodes to make label predictions and represent the label distribution under global conditions.GLEM makes the language model and graph neural networks collaborate with each other by alternating the optimisation of the E-step and the M-step. Specifically, in the E-step, the graph neural network is fixed and the language model is made to mimic the label inference of the GNN in order to transfer the global knowledge learnt by the GNN to the LM.In the M-step, the LM is fixed and the node representations learnt by the LM are used as features for label prediction by optimising the GNN. The alternating training of the two steps enables the GNN to effectively capture the global correlation of nodes and thus achieve accurate label prediction. Zhang et al. [15] propose a co-training approach that enables classification and pseudo- labelling of textual attribute maps by combining a text analysis module and a network learning module. The framework models both the original text and the network structure, and enhances both modules by co-training and feature sharing. 3.3.2 LLM and GNN aligned in potential space ConGraT [16] jointly learns graph nodes and text representations by using two independent encoders that are aligned in a common latent space and training. This approach receives inspiration from previous work in the area of joint text and image coding, and extends the training objective to take into account node similarity and reasonably guessed information. The G2P2 [17] methodology study utilises a converter-based text encoder and a GNN- based graphical encoder to improve text classification performance. In this case, the converter is used as a text encoder and on the other hand, the GNN serves as a graph encoder taking the graph kernel node features as input and generating node embedding vectors for each node. By combining the coding capabilities of the converter and the GNN, the framework is able to provide more comprehensive node representations, and these node embedding vectors contain both textual information and information about the graph structure, thus better capturing the semantic and associative relationships between nodes. Grenade [18] optimises self-supervised learning algorithms in graphs to capture both textual semantic and structural contextual information. Grenade exploits the synergistic effects of pre-trained language models and graph neural networks, and jointly optimises two self-supervised learning algorithms, graph-centric comparison learning and graph- centric knowledge alignment. GraD [30] encodes graph structures into LMs for fast inference without graphs, and joint training of teacher GNNs and students without graphs through shared LMs allows the two models to learn from each other and improve overall performance. 4. Challenges and future directions While the preceding sections have outlined the current landscape of using language models in graph learning, there remains significant potential for further research in this domain. In this section, we briefly examine some limitations of language models when applied to graph learning and suggest potential directions for future research. Lack of effective pre-training algorithms. most current language models are based on self-supervised pre-training, but this approach is not effective in graph learning. Therefore, it remains a challenge to effectively pre-train on large-scale graph data. Exploring pre- training methods on graphs is a valuable research direction. Insufficient ability to represent structural information. since the training is mainly based on textual data, language models lack in grasping the complexity of graph structural information, and generating topology-based supervised signals using language models is a challenging problem. A way to address this problem could be by designing specific pre- training goals to guide the language model to learn the representation of topological structures. 5. Conclusion The application of Large Language Models (LLMs) to graph-related tasks has emerged as a vital research area in recent years. To classify and provide a comprehensive overview of this field, we propose a novel classification method. This method categorizes techniques involving graphs and textual information into three distinct categories:LLMs as the final component of task solving, GNNs as the final component of task solving, and collaborative solving of LLM and GNN. Based on this categorisation, we systematically review representative studies and discuss some limitations and future research directions in this direction. It is hoped that this comprehensive review will reveal the potential of LLM in the field of graph learning, as well as the advances and challenges made, and provide insights for further developments in the field. References [1] J. Liu, C. Yang, Z. Lu, J. Chen, Y. Li, M. Zhang, et al., “Towards graph foundation models: A survey and beyond,” arXiv preprint arXiv: 2310. 11829, 2023. [2] Y. Li, Z. Li, P. Wang, J. Li, X. Sun, H. Cheng, et al., “A survey of graph meets large language model: Progress and future directions,” arXiv preprint arXiv: 2311. 12399, 2023. [3] V. N. Ioannidis, X. Song, D. Zheng, H. Zhang, J. Ma, Y. Xu, et al., “Efficient and effective training of language and graph neural network models,” arXiv preprint arXiv: 2206. 10781, 2022. [4] Y. Meng, S. Zong, X. Li, X. Sun, T. Zhang, F. Wu, et al., “GNN-LM: Language Modeling based on Global Contexts via GNN,” arXiv preprint arXiv: 2110. 08743, 2021. [5] X. He, X. Bresson, T. Laurent, et al., “Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning,” arXiv preprint arXiv: 2305. 19523, 2023. [6] J. Y. Zhu, Y. Cui, Y. Liu, H. Sun, X. Li, M. Pelger, et al., “TextGNN: Improving Text Encoder via Graph Neural Network in Sponsored Search,” Proceedings of the Web Conference 2021, pp. 2848-2857, 2021. [7] H. Xie, D. Zheng, J. Ma, H. Zhang, V. N. Ioannidis, X. Song, et al., “Graph-Aware Language Model PreTraining on a Large Graph Corpus Can Help Multiple Graph Applications,” Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023. [8] Z. Chen, H. Mao, H. Wen, H. Han, W. Jin, H. Zhang, et al., “Label-free node classification on graphs with large language models (llms),” arXiv preprint arXiv: 2310. 04668, 2023. [9] J. Yu, Y. Ren, C. Gong, J. Tan, X. Li, X. Zhang, “Empower Text-Attributed Graphs Learning with Large Language Models,” arXiv preprint arXiv: 2310. 09872, 2023. [10] Y. Qin, X. Wang, Z. Zhang, W. Zhu, “Disentangled Representation Learning with Large Language Models for Text-Attributed Graphs,” arXiv preprint arXiv: 2310. 18152, 2023. [11] J. Tang, Y. Yang, W. Wei, L. Shi, L. Su, S. Cheng, et al., “GraphGPT: Graph Instruction Tuning for Large Language Models,” arXiv preprint arXiv: 2310. 13023, 2023. [12] J. Guo, L. Du, H. Liu, “GPT4Graph: Can Large Language Models Understand Graph Structured Data? An Empirical Evaluation and Benchmarking,” arXiv preprint arXiv: 2305. 15066, 2023. [13] J. Zhao, L. Zhuo, Y. Shen, M. Qu, K. Liu, M. Bronstein, et al., “GraphText: Graph Reasoning in Text Space,” arXiv preprint arXiv: 2310. 01089, 2023. [14] J. Zhao, M. Qu, C. Li, H. Yan, Q. Liu, R. Li, et al., “Learning on Large-scale Text-attributed Graphs via Variational Inference,” arXiv preprint arXiv: 2210. 14709, 2022. [15] X. Zhang, C. Zhang, X. L. Dong, J. Shang, J. Han, “Minimally-supervised structure-rich text categorization via learning on text-rich networks,” Proceedings of the Web Conference 2021, pp. 3258-3268, 2021. [16] W. Brannon, S. Fulay, H. Jiang, W. Kang, B. Roy, J. Kabbara, et al., “ConGraT: Self- Supervised Contrastive Pretraining for Joint Graph and Text Embeddings,” arXiv preprint arXiv: 2305. 14321, 2023. [17] Z. Wen, Y. Fang, “Augmenting Low-Resource Text Classification with Graph-Grounded Pre-training and Prompting,” arXiv preprint arXiv: 2305. 03324, 2023. [18] Y. Li, K. Ding, K. Lee, “GRENADE: GraphCentric Language Model for Self-Supervised Representation Learning on Text-Attributed Graphs,” arXiv preprint arXiv: 2310. 15109, 2023. [19] J. Devlin, M. W. Chang, K. Lee, et al., “Bert: Pretraining of deep bidirectional transformers for language understanding,” arXiv preprint arXiv: 1810. 04805, 2018. [20] S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, E. Kamar, et al., “Sparks of artificial general intelligence: Early experiments with gpt-4,” arXiv preprint arXiv: 2303. 12712, 2023. [21] W. Hamilton, Z. Ying, J. Leskovec, “GRENADE: GraphCentric Language Model for Self- Supervised Representation Learning on Text-Attributed Graphs,” Advances in neural information processing systems, vol. 30, 2017. [22] T. N. Kipf, M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv: 1609. 02907, 2016. [23] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio, “Graph attention networks,” arXiv preprint arXiv: 1710. 10903, 2017. [24] H. Liu, J. Feng, L. Kong, N. Liang, D. Tao, Y. Chen, et al., “One for All: Towards Training One Graph Model for All Classification Tasks,” arXiv preprint arXiv: 2310. 00149, 2023. [25] Z. Chai, T. Zhang, L. Wu, K. Han, X. Hu, X. Huang, et al., “Graphllm: Boosting graph reasoning ability of large language mode,” arXiv preprint arXiv: 2310. 05845, 2023. [26] R. Ye, C. Zhang, R. Wang, S. Xu, Y. Zhang, “Natural Language is All a Graph Needs,” arXiv preprint arXiv: 2308. 07134, 2023. [27] J. Zhang, “Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPT,” arXiv preprint arXiv: 2304. 11116, 2023. [28] T. Zou, L. Yu, Y. Huang, L. Sun, B. Du, “Pretraining Language Models with Text- Attributed Heterogeneous Graphs,” arXiv preprint arXiv: 2310. 12580, 2023. [29] Y. Shi, A. Zhang, E. Zhang, Z. Liu, X. Wang, “Relm: Leveraging language models for enhanced chemical reaction prediction,” arXiv preprint arXiv: 2310. 13590, 2023. [30] C. Mavromatis, V. N. Ioannidis, S. Wang, D. Zheng, S. Adeshina, J. Ma, et al., “Train your own gnn teacher: Graph-aware distillation on textual graphs,” arXiv preprint arXiv: 2304. 10668, 2023. [31] B. Jin, W. Zhang, Y. Zhang, Y. Meng, X. Zhang, Q. Zhu, et al., “Patton: Language model pretraining on text-rich networks,” arXiv preprint arXiv: 2305. 12268, 2023. [32] J. Yang, Z. Liu, S. Xiao, C. Li, D. Lian, S. Agrawal, et al., “GraphFormers: GNN-nested transformers for representation learning on textual graph,” 35th Conference on Neural Information Processing Systems, vol. 34, pp. 28798-28810, 2021