LLM-Driven Knowledge Enhancement for Securities Index Prediction

LLM-Driven Knowledge Enhancement for Securities Index Prediction ZaiyuanDi dizaiyuan@tongji.edu.cn Tongji University

No. 4800, Cao'an highway, JiaDing District 201804 Shanghai city, Shanghai China

JiantingChen Tongji University

No. 4800, Cao'an highway, JiaDing District 201804 Shanghai city, Shanghai China

YunxiaoYang Tongji University

No. 4800, Cao'an highway, JiaDing District 201804 Shanghai city, Shanghai China

LingDing Tongji University

No. 4800, Cao'an highway, JiaDing District 201804 Shanghai city, Shanghai China

YangXiang tjdxxiangyang@gmail.com Tongji University

No. 4800, Cao'an highway, JiaDing District 201804 Shanghai city, Shanghai China

LLM-Driven Knowledge Enhancement for Securities Index Prediction 1613-0073 AA65750E114847039A305CB587A30D40 GROBID - A machine learning software for extracting information from scholarly documents Stock Market Prediction Large Language Models Knowledge Enhancement

The securities market carries complex financial interactions, providing challenges to its prediction. To represent this complexity, researchers have utilized multi-source data, such as financial news and macro market indicators, for better performance. However, these efforts often ignore the internal knowledge among these data or suffer from the high cost of acquiring diverse knowledge. Thus, we propose a LLM-driven knowledge enhancement method for securities index prediction. Specifically, we collect the daily data of Shanghai Stock Exchange indexes and their related market indicators and model the internal knowledge among them as triplets. Then we leverage LLM as a knowledge base to acquire diverse knowledge efficiently. Finally, we integrate the knowledge and numeric multi-source data as a heterogeneous graph and apply a GNN model to predict the trend of securities indexes. Experiments demonstrate the effectiveness of our method in prediction and real-world backtest.

Introduction

Securities prediction has always been a challenging but engaging task. Many studies [1,2,3,4,5] have already proven the feasibility of predicting securities based on historical price data. These efforts have further made securities prediction a promising profit opportunity for investors and a classic time series prediction task with valuable research significance.

The securities market is a complex environment with various entities and events, often requiring time for their interactions to be fully reflected in price data. For example, the management change information in the announcement will affect its stock price, and the crude oil price will affect the stock price of airline companies. These facts inspire researchers to utilize external information for more accurate price prediction. Thus, multi-source data, such as sentiment information [6,7,8], semantic information from news and posts [9,10,11], and information on companies related to the predicted stock [12,13], has been widely explored.

Substantial efforts have been made to leverage this multi-source data. Nevertheless, many treat these data in isolation, ignoring the relations' role in promoting prediction performance [14,15], for example, representing all information as concatenated vectors [16,17]. Compared with these methods, some work models the internal relations of data and incorporates them into predictions utilizing graph-based models, resulting in good performances. They usually represent the relations as knowledge graphs [12,13] or correlation matrixes [18]. However, these methods still have the following challenges: 1) Since knowledge acquisition from tremendous raw data is expensive, it is not always applicable [12]. 2) Meanwhile, many rely on existing knowledge sources or correlation calculations, both suffer from the limited variety of relation and entity types [13,18].

To overcome these challenges, we propose an economical and effective LLM-driven knowledge enhancement for securities index prediction. First, we collect the daily data of Shanghai Stock Exchange indexes (SSE indexes) and their related market indicators, such as the Northbound Capital and the Shenzhen Composite Index, as multi-source data to overcome the hysteresis of price data. Second, we establish the relationships between them by leveraging a large language model (LLM) as a knowledge source, thus obtaining diverse knowledge inexpensively. Third, we integrate the collected numeric data and knowledge into a heterogeneous graph. Finally, we apply a heterogeneous graph-based neural network to learn the representation of SSE indexes and predict their price movement. The experiment results demonstrate the effectiveness of our method in prediction and real-world backtest. In conclusion, our contributions are as follows:

1) We propose a LLM-driven knowledge enhancement method for securities index prediction, leveraging LLM as an automated knowledge base to acquire various knowledge efficiently.

2) We implement a heterogeneous graph-based neural network to incorporate the numeric multi-source data and their knowledge, resulting in a good improvement in the downstream prediction task.

3) We validate the effectiveness of our method in securities index trend prediction, covering 175 securities indexes and 9 years of data. Additionally, we verify the method's real-world profit capabilities through backtest.

Related work

Deep learning in stock market prediction

Deep learning has been a promising approach to stock market prediction [19]. Recurrent models, such as RNN [20] and LSTM [21], are particularly prominent due to their ability to capture temporal information [19,22]. In the previous work, the majority enhance their method by incorporating multiple numerical and textual data, e.g., technical indicators [23], social text [24], and financial news [25]. Nonetheless, these data often reflect a post-event status of the predicted object in isolation. Since events propagate between different entities in sequence, it causes a lead-lag effect in these data [12,26].

Instead of concatenating the multi-source data as input features, some work also leverages the internal relationships of the data. For instance, Cheng et al. [12] integrate multi-modal information of multiple companies through a company knowledge graph. Matsunaga et al. [13] fuse price data of companies through their commercial relationships. In these works, complex relationships are reflected through knowledge, reflecting the role of knowledge in enhancing securities prediction.

Knowledge enhancement

Knowledge enhancement has become increasingly crucial in stock market prediction [16,27,28,29]. Generally, the challenge of knowledge enhancement lies in the acquisition and incorporation of knowledge. For acquisition, many efforts have been made to obtain knowledge from unstructured raw data [30,31] and existing knowledge sources [32,33,34,35] such as open-source knowledge graphs. For incorporation, it is a common practice to concatenate unstructured knowledge (e.g., semantic information) and historical price data into vectors [36,37], making the recurrent model widely used. Besides, graph-based models are also intuitive ways to incorporate structural knowledge, such as triplets, into predictions. Among these methods, homogeneous graph-based [38] and heterogeneous [39,40] graph-based neural networks are both widely used.

However, the primary methods are confronted with acquisition costs and knowledge diversity. Recently, large language models have demonstrated their potential for knowledge acquisition, engaging many researchers in information extraction [41,42,43]. In this paper, we keep further focusing on reducing the cost of knowledge acquisition by using a large language model and integrating semantic and structural information in knowledge and numerical data with a graph-based model.

Method

Problem description

Our objective is to predict the daily trend of SSE indexes, framing it as a binary classification task. Assuming today is day 𝑡, we use the average closing price over the past 𝛼 days as the baseline. If the closing price on day 𝛽 in the future exceeds the baseline, it is considered an rising trend and labeled as a positive sample. The formula for setting the labels is as follows:

𝑙 𝑡 = {︃ 1, 𝑐 𝑡+𝛽 > 𝜆 𝛼 ∑︀ 𝛼−1 𝑖=0 𝑐 𝑡−𝑖 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒(1)

where 𝑐 𝑡 represent the closing price on day 𝑡, and 𝜆 ≥ 1 is a constant that determines the threshold for classifying the trend. Using the average as a baseline can avoid the impact of market fluctuations. 𝛽 represents the prediction horizon.

The data used for trend prediction comes from multiple sources. In addition to SSE indexes trading data 𝐷 𝑆𝑆𝐸 , it also includes other data sources 𝐷 1 , 𝐷 2 , • • • , 𝐷 𝑟 . These data sources have varying dimensions and granularity. In subsequent sections, we will introduce how to organize these data sources into input data for the prediction task.

LLM-driven knowledge enhancement

Generally, the data used for SSE index prediction comes from two sources: SSE index trading data and other market indicators, as shown in the Figure 1. Trading data has a direct relationship with the SSE indexes, but the available historical data often exhibits lagging characteristics. On the other hand, market indicators refer to other economic market indicators related to the SSE indexes. These indicators can reflect more environmental factors and provide guidance for predicting the trends of SSE indexes. Traditional methods typically fuse these two types of data by simply appending market indicators to the daily trading data, thereby expanding the input feature dimensions. However, they overlooks the semantic information and connections behind the market indicators, which are crucial for improving prediction performance.

We propose a knowledge enhancement method based on Large Language Models (LLM), leveraging LLMs to automatically embed knowledge into market indicator data. We argue that the knowledge of market indicators is reflected in the relationships between them. For example, there is a connection between "Southbound Capital (SC)" and the "Hang Seng Index (HSI)". SC reflects the capital flowing from mainland China into Hong Kong, while HSI is an index of the Hong Kong stock market influenced by SC. To characterize the relationships between market indicators, we adopt a approach that establishes relational paths formed by multiple triplet links. In the example above, we take SC and HSI as nodes and form a triplet link based on their relationship: [(SC, index status, Capital from Mainland), (Capital from Mainland, participate in, Hong Kong market), (HSI, index status, Hong Kong market)].

To extract relationships among numerous market indicators, we leverage a LLM to automatically uncover the connections between market indicators. The specific process is illustrated in the following steps: Step 1: Instruction Construction. We construct instructions with known nodes for input into the LLM. These instructions involve the task description, ontology constraints, output format, examples, and other relevant information. The instructions must enable the LLM to fully understand our task of extracting relationships between market indicators.

Step 2: LLM Interaction. We input the constructed instructions into the LLM and obtain the feedback results. We then validate these results and extract triplets from the successful outcomes.

Step 3: Node Update. After saving the newly generated triplets, we update the known node pool with the newly generated nodes. We repeat Step 1 until the maximum number of iterations is reached.

Step 4: Path Identification. Once the maximum number of iterations is reached, we identify paths from any two market indicator nodes as start and end points using the triplet set. Irrelevant triplets that are not on the identified paths are removed.

The iterative process generates intermediate nodes for market indicators, fully utilizing the knowledge base and reasoning capabilities of the LLM to uncover complex multi-hop and cross paths. After this process, multiple market indicators, including SSE indexes, form a connected heterogeneous graph 𝐺 = (𝑉, 𝐸). 𝑆𝑆𝐸 . These attribute values reflect the specific state of the nodes at the target time. Differences in these states directly influence the prediction results.

The intermediate nodes 𝑉 𝑝 do not have specific numerical values but possess clear semantics. Therefore, we number all intermediate nodes and use one-hot encoding as their attributes, i.e., 𝒜(𝑛 𝑖 , 𝑡) = 𝑜𝑛𝑒ℎ𝑜𝑡(𝑛 𝑖 , |𝑉 𝑝 |). These intermediate node encodings reflect the path semantics, guiding the information aggregation between market indicators. The attributes of all nodes form the attribute set 𝐴 𝑇 . This graph, enriched with knowledge for the prediction 𝑙 𝑡 , is then input into the model.

GNN-based securities index prediction

Based on the graph data 𝐺 = (𝑉, 𝐸, 𝐴) 1 , we then construct a GNN model 𝜑 𝑔𝑛𝑛 (𝐺) to predict the trend of the index. The feedforward computation process of this model, as illustrated in Figure 3, consists of three main components: feature mapping, feature fusion, and classification output.

Feature mapping aims to map different types of nodes into a unified vector space. In the input heterogeneous graph, the node set 𝑉 contains 𝑘 types of nodes, and each type of node corresponds to a fixed set of feature attributions. We prepare a feature mapping function for each type of node. Given a node 𝑛 𝑖 of type 𝑗 and its associated feature values a 𝑖 , the computation of feature mapping is as follows:

x 𝑖 = 𝑓 𝑗 (a 𝑖 ) = 𝑊 𝑗 a 𝑖 + b 𝑗 ,(2)

where 𝑊 𝑗 and b 𝑗 represent the weights and bias, respectively. The vector x 𝑖 denotes the feature vector of node 𝑛 𝑖 .

Feature fusion is the process of using graph neural networks to aggregate information from nodes and edges. The feature mapping step outputs vector representations for all nodes in the graph, denoted as 𝑋 = {x 𝑠 , x 1 , • • • , x 𝑛−1 }. We employ a Heterogeneous Graph Transformer (HGT) model to learn the representations of nodes and the topological structure. The fusion vector representation of node 𝑛 𝑖 is denoted as ℎ 𝑙 𝑖 , where 𝑙 indicates the layer number of HGT, and the initial fusion vector is h 0 𝑖 = x 𝑖 . The entire network consists of 𝑚 layers, and the feedforward process for each layer is calculated as:

h 𝑙+1 𝑖 = 𝜑 ℎ𝑔𝑡 (h 𝑙+1 𝑖 , {(h 𝑙 𝑗 , 𝑒 𝑖𝑗 )|𝑛 𝑗 ∈ 𝒩 (𝑛 𝑖 )}),(3)

where 𝒩 (𝑛 𝑖 ) represents the neighbor nodes of 𝑛 𝑖 , and 𝑒 𝑖𝑗 represents the edge between nodes 𝑛 𝑖 and 𝑛 𝑗 .

The HGT model, based on the Transformer architecture, can aggregate features from different types of nodes and edges. The vector representation of SSE index nodes includes not only the node's original features but also additional data features and the semantic knowledge underlying the data.

Classification output predicts the trend of the SSE index based on the high-level representation. Given the fusion feature representation ℎ 𝑚 𝑠 of the SSE index node, we use a fully connected neural network and the softmax function to calculate the probability of the index rising or falling. This computation is defined as:

𝑝 = Softmax(𝑊 h 𝑚 𝑠 + b).(4)

The model output 𝑝 is a 2-dimensional vector, with each element corresponding to the probabilities of the index rising and falling, respectively. During model training, we utilize the cross-entropy function as the loss function and update all learning parameters through gradient descent.

Experiment

In this section, we prepare model prediction and market backtesting experiments to validate our method.

Experiment setting

Datasets. We collect the price and trading data of 175 SSE indexes from 2013 to 2021, along with 6 technical indicators and 12 market indicators. With the collected data, 364,314 prediction samples are formed. We split the datasets and conduct the experiments by year. In the prediction experiment, we adopt a time-series cross-validation,as referenced in [44]. The samples of each year is split into five parts: Jan to Apr, May to Jun, Jul to Aug, Sep to Oct, and Nov to Dec. For the 𝑖-th fold validation, we take the first 𝑖 parts as a training set and the 𝑖 + 1-th part as a validation set. In the backtesting experiment, we use the samples from Jan to Oct as a training set and those from Nov to Dec as a test set.

Evaluation Metrics. In the prediction experiment, we apply the accuracy (Acc), precision (P), recall (R), and Macro-F1 score (F1) as metrics to evaluate the prediction performance of different models. In the backtesting experiment, we define the daily return (𝑅) as follows:

𝑅 𝑡 = 1 |𝐼 𝑡 | ∑︁ 𝑖∈𝐼 𝑡 𝑃 𝑡 𝑖 − 𝑃 𝑡−1 𝑖 𝑃 𝑡−1 𝑖 ,(5)

where 𝑃 𝑡 𝑖 denotes the price of index 𝑖 at time 𝑡, and 𝐼 𝑡 denotes the number of indexes in the portfolio held by the model at time 𝑡. We then use the average daily return (DR) and the Sharpe ratio (SR) to measure the profitability of different models. In the Sharpe ratio, we use the 1-year China Government Bond Yield as the reference for the risk-free rate. To align with the daily return 𝑅, the annual Bond Yield is divided by 365.

Implementation Details. For the label setting, we take 𝛼 = 5, 𝛽 = 2, and 𝜆 = 1.01 as specified in Eq. 1. The heterogeneous graph generated by the LLM consists of 30 nodes and 56 edges, with the time interval for node features set to 𝛾 = 5. During training, we use the Adam optimizer with a batch size of 512 and a learning rate of 0.001, and we implement early stopping to prevent overfitting. In our GNN model, the embedded vector dimension for feature mapping is set to 60, the hidden vector dimension of the HGT is 90, the number of attention heads is 4, and the number of layers is 𝑚 = 6. For the baseline settings, recurrent models have 4 layers with a hidden vector dimension of 360; graph-based models use an embedded vector dimension of 100, a hidden vector dimension of 120, 3 attention heads, and 4 layers.

Prediction experiment

We compare our model, Knowledge-enhanced HGT (KHGT), with several baselines, including recurrent models such as LSTM and Bi-LSTM, and graph-based models such as GAT and HGT. Their average performance over 9 years is presented in Table 1. The results indicate that KHGT outperforms all baselines. Furthermore, yearly comparison results are illustrated in Figure 4, showing that KHGT demonstrates strong generalization, achieving the highest Macro-F1 score and accuracy in most years from 2013 to 2021.

By comparing graph-based models and recurrent models, we observe that graph-based models generally do not outperform recurrent models. Specifically, GAT lags behind other baselines in terms of recall and F1 score, and HGT is slightly inferior to Bi-LSTM in terms of accuracy, recall, and F1 score. This indicates that the structural in GAT and HGT does not provide a significant predictive enhancement. The superior performance of KHGT is attributed to the contribution of knowledge enhancement.

Backtesting experiment

To assess the real-world profitability of KHGT, we backtest it over a period of 9 years. We adopt a straightforward and effective trading strategy, as referenced in [12]: buy when the prediction is "rise" and sell when the prediction is "fall". The trading volume is set proportional to the prediction probability and inversely proportional to the index price. For simplicity, we assume that the index is tradable and ignore 2 presents the average performance of each model, where "Market" refers to always holding all indexes. The results demonstrate that KHGT outperforms all baselines across both metrics. For the baselines, graph-based models generally lag behind recurrent models, consistent with the findings from the prediction experiment. Notably, the profitability of KHGT remains optimal across both metrics.

Effectiveness of knowledge

Intervention experiment. To further verify the effectiveness of knowledge enhancement, we conducted an intervention experiment. We randomly deleted a portion of the knowledge paths between market indicators. A deletion proportion of 1 indicates no knowledge enhancement. For each proportion setting, we randomly selected paths and repeated the experiment three times, reporting the average results.

Table 3 presents the model performance under different proportions. The performance decreases as the number of knowledge paths increases. When eliminating all knowledge, its performance will degrade to the same level as HGT. This result implies that the LLM's knowledge has a strong enhancement effort. Without domain experts, our method can fully exploit the relationships between market indicators, thereby enhancing the predictive ability of the model in an economical and effective manner. We observe that "index status of" and "participate in" play relatively significant roles in falling and rising markets, respectively. "Index status of" connects indicators to financial objects, reflecting the impact of market indicators on the objects they measure. "participate in" connects these financial objects to markets, reflecting the convergence and interaction of these objects. The prominence of these two relations suggests that the model tends to focus on changes in specific indicators during falling markets and the interplay of multiple indicators during rising markets.

Conclusion

This work proposes an LLM-driven knowledge enhancement method for the task of securities index prediction. The innovation of this method lies in leveraging the rich knowledge within LLM, significantly reducing the cost and improving the efficiency of acquiring market index-related knowledge. By interacting with LLM through instructions, we establish triplet paths among market indices, thereby forming graph data that embodies implicit knowledge. Utilizing the graph-structured market data, we construct a GNN-based prediction model, which significantly outperforms traditional models.

Limitations and future work. The quality of the knowledge obtained from the LLM depends on itself and the design of the instructions. Knowledge quality is a crucial constraint affecting the model's predictive capability. Therefore, ensuring the quality of the knowledge is a critical issue that needs to be addressed in future work.

Figure 1 :1Figure 1: The data source of the SSE index prediction task

Figure 2 :2Figure 2: The process of automatically uncovering relational paths between market indicator nodes using a LLM

Furthermore, we align data with knowledge to form the input for the SSE index prediction model. Assuming the prediction model aims to forecast the trend on day 𝑡, we organize the input data into a graph structure 𝐺 𝑡 = (𝑉, 𝐸, 𝐴 𝑡 ). The node set 𝑉 is divided into two categories based on their sources: market indicator nodes 𝑉 𝑚 , which include SSE indexes among others, and intermediate nodes 𝑉 𝑝 , which generated by the LLM and reflect the relationships between market indicators. The market indicator nodes have corresponding data values as predictive support, serving as node feature attributes 𝒜(𝑛 𝑖 , 𝑡), where 𝑛 𝑖 ∈ 𝑉 𝑚 . For instance, the SSE indexes node uses trading data from the period [𝑡 − 𝛾, 𝑡] as its attribute values 𝒜(𝑛 𝑖 , 𝑡) = 𝐷 [𝑡−𝛾,𝑡]

Figure 3 :3Figure 3: The model structure of securities index prediction

Figure 4 :4Figure 4: Performance comparison of models from 2013 to 2021

Figure 5 :5Figure 5: Visualization of attention weights

Table 11Results of Prediction Experiment 2ModelAccPRF1LSTM0.68010.65050.65120.6366Bi-LSTM0.68900.65570.66110.6465GAT0.68360.65870.64510.6296HGT0.68680.66760.65760.6432KHGT0.7183*0.6870*0.6805*0.6793*

Table 22Results of backtesting experimentModelDRSRF1Market0.00090.0683LSTM0.00190.18240.6366Bi-LSTM0.00220.19090.6465GAT0.00120.10890.6296HGT0.00180.17470.6432KHGT0.00290.26600.6793

Table 33Results of intervention experimentDeletionAccF1RatioMeanStdMeanStd1.000.69590.00560.66230.00720.750.70940.00490.66730.00840.500.70310.01230.67050.00850.250.70770.00480.67460.003500.7274*0.00450.6905*0.0035transaction fees.Table

* denotes that the improvement is significantly in a paired t-test (𝑝 < 0.05), same as below.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 72071145).

Technical trading rules as a prior knowledge to a neural networks prediction system for the S&P 500 index Chenowethl Obradovic Lee IEEE Technical Applications Conference and Workshops. Northcon/95 Citeseer 1995 111 Stock market prediction with backpropagation networks BerndFreisleben International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems Springer 1992 An intelligent stock portfolio management system based on short-term trend prediction using dual-module neural networks GiaShuh Jang Proc. of the 1991 International Conference on Artificial Neural Networks of the 1991 International Conference on Artificial Neural Networks 1991 1 Stock market index prediction using neural networks DarmadiKomo Chein-IChang HanseokKo Applications of Artificial Neural Networks V SPIE 1994 2243 Stock market index prediction using deep neural network ensemble BingYang Zi-JiaGong WenqiYang 2017 36th chinese control conference (ccc) IEEE 2017 Investigating the informativeness of technical indicators and news sentiment in financial market price prediction SaeedeAnbaee Farimani Knowledge-Based Systems 247 108742 2022 A tensor-based information framework for predicting the stock market QingLi ACM Transactions on Information Systems (TOIS) 34 2 2016 EAN: Event attention network for stock price trend prediction based on sentimental embedding YaoweiWang Proceedings of the 10th ACM conference on web science the 10th ACM conference on web science 2019 F-HMTC: Detecting Financial Events for Investment Decisions Based on Neural Hierarchical Multi-Label Text Classification XinLiang IJCAI. 2020 Essential tensor learning for multimodal information-driven stock movement prediction JunWang Knowledge-Based Systems 262 110262 2023 A self-regulated generative adversarial network for stock price movement prediction based on the historical price and tweets HongfengXu DonglinCao ShaoziLi Knowledge-Based Systems 247 108712 2022 Financial time series forecasting with multi-modality graph neural network DaweiCheng Pattern Recognition 121 108218 2022 Exploring graph neural networks for stock market predictions with rolling window analysis DaikiMatsunaga ToyotaroSuzumura ToshihiroTakahashi arXiv:1909.10660 2019 arXiv preprint Deep learning for stock prediction using numerical and textual information RyoAkita IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS) IEEE 2016. 2016 Individualized Indicator for All: Stock-wise Technical Indicator Optimization with Stock Embedding ZhigeLi Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2019 Knowledge-driven stock trend prediction and explanation via temporal convolutional network ShuminDeng Companion proceedings of the 2019 world wide web conference 2019 Stock market forecasting using machine learning algorithms ShunrongShen HaomiaoJiang TongdaZhang 2012 Stanford, CA Department of Electrical Engineering, Stanford University Temporal and Heterogeneous Graph Neural Network for Financial Time Series Prediction ShengXiang Proceedings of the 31st ACM International Conference on Information amp the 31st ACM International Conference on Information amp Knowledge Management 10.1145/3511808.3557089 CIKM '22 ACM Oct. 2022 Applications of deep learning in stock market prediction: recent progress WeiweiJiang Expert Systems with Applications 184 115537 2021 Learning representations by back-propagating errors DavidERumelhart GeoffreyEHinton RonaldJWilliams Nature 323 1986 Long short-term memory SeppHochreiter JürgenSchmidhuber Neural computation 9 8 1997 Recurrent Neural Networks Approach to the Financial Forecast of Google Assets LucaDi Persio OleksandrHonchar 2017 Ensemble Application of Transfer Learning and Sample Weighting for Stock Market Prediction SimoneMerello 2019 International Joint Conference on Neural Networks (IJCNN) 2019 Learning Dynamic Multimodal Implicit and Explicit Networks for Multiple Financial Tasks GaryAng Ee-PengLim 2022 IEEE International Conference on Big Data (Big Data) 2022 Listening to Chaotic Whispers: A Deep Learning Framework for News-oriented Stock Trend Prediction ZiniuHu Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining the Eleventh ACM International Conference on Web Search and Data Mining 2018 The cross-sectional relationship between trading costs and lead/lag effects in stock & option markets L O'Matthew Connor Financial Review 34 4 1999 Knowledge graph-based event embedding framework for financial quantitative investments DaweiCheng Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval 2020 Combining Enterprise Knowledge Graph and News Sentiment Analysis for Stock Price Prediction JueLiu ZhuochengLu WPDu Hawaii International Conference on System Sciences 2019 Methods for Acquiring and Incorporating Knowledge into Stock Price Prediction: A Survey LipingWang arXiv:2308.04947 2023 q-fin.ST Modeling the stock relation with graph network for overnight stock movement prediction WeiLi Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence the twenty-ninth international conference on international joint conferences on artificial intelligence 2021 Learning Embedded Representation of the Stock Correlation Matrix using Graph Machine Learning BhaskarjitSarmah arXiv:2207.07183 2022 q-fin.CP Temporal Relational Ranking for Stock Prediction FuliFeng 10.1145/3309547 ACM Transactions on Information Systems 1558-2868 37 2 Mar. 2019 Hypergraph-Based Reinforcement Learning for Stock Portfolio Selection XiaojieLi 10.1109/ICASSP43922.2022.9747138 ICASSP 2022 -2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022 Deep attentive learning for stock movement prediction from social media text and company correlations RamitSawhney Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020 Time-aware Graph Relational Attention Network for Stock Recommendation XiaotingYing CIKM '20 2020 News-driven stock prediction via noisy equity state representation HeyanHuang Neurocomputing 470 2022 Stock market prediction using machine learning classifiers and social media, news WasiatKhan Journal of Ambient Intelligence and Humanized Computing 2022 Incorporating Corporation Relationship via Graph Convolutional Neural Networks for Stock Price Prediction YingmeiChen ZhongyuWei XuanjingHuang Proceedings of the 27th ACM International Conference on Information and Knowledge Management the 27th ACM International Conference on Information and Knowledge Management 2018 MDGNN: Multi-Relational Dynamic Graph Neural Network for Comprehensive and Dynamic Stock Investment Prediction Hao Qian arXiv:2402.06633 2024 q-fin.ST REST: Relational Event-driven Stock Trend Forecasting WentaoXu 10.1145/3442381.3450032 Proceedings of the Web Conference 2021 the Web Conference 2021 ACM Apr. 2021 WWW '21 Event Extraction as Question Generation and Answering DiLu arXiv:2307.05567 2023 cs.CL Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction JiQi arXiv:2305.13981 2023 cs.CL Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors KaiZhang BernalJiménez Gutiérrez YuSu arXiv:2305.11159 2023 cs.CL Evaluating time series forecasting models: an empirical study on performance estimation methods VitorCerqueira LuisTorgo IgorMozetič 10.1007/s10994-020-05910-7 Machine Learning 1573-0565 109 11 Oct. 2020