=Paper=
{{Paper
|id=Vol-3759/workshop7
|storemode=property
|title=Modeling and Generating Extreme Volumes of Financial Synthetic Time-Series Data with Knowledge Graphs
|pdfUrl=https://ceur-ws.org/Vol-3759/workshop7.pdf
|volume=Vol-3759
|authors=Laurentiu Vasiliu,S. Haleh S. Dizaji,Aaron Eberhart,Dumitru Roman,Radu Prodan
|dblpUrl=https://dblp.org/rec/conf/i-semantics/VasiliuDERP24
}}
==Modeling and Generating Extreme Volumes of Financial Synthetic Time-Series Data with Knowledge Graphs
==
Modeling and Generating Extreme Volumes of Financial
Synthetic Time-Series Data with Knowledge Graphs
Laurentiu Vasiliu1 , S. Haleh S. Dizaji2 , Aaron Eberhart3 , Dumitru Roman4 and Radu Prodan2
1
Peracton Ltd. DHKN Galway Financial Services Centre, Moneenageisha Rd, Galway, H91 V2R6, Ireland
2
Institute of Information Technology, University of Klagenfurt, Universitätsstraße 65-67, A-9020 Klagenfurt am Wörthersee,
Austria
3
metaphacts GmbH, 36 Daimlerstraße, Walldorf, 69190, Germany
4
SINTEF AS, Forskningsveien 1, 0373 Oslo, Norway
Abstract
This paper outlines the approach and technology employed to model and generate extreme volumes of synthetic
financial time-series data. We introduce the Graph-Massivizer project and its financial use case, focusing on green
sustainable finance. One project objective is to create synthetic financial data in extreme volumes to facilitate
advanced testing and simulations of investment and trading algorithms. Afterward, we provide an overview
of the methodology, detailing the utilization of ontologies and knowledge graphs. Furthermore, we elaborate
on modeling correlations between different markets’ time-series and how we can benefit in combination with
graph neural network models to generate financial data. We then present the current implementation status and
conclude with a discussion of future work.
Keywords
Knowledge graphs, ontologies, synthetic data, financial time-series, extreme data, machine learning, correlation
analysis, pattern recognition
1. Introduction
In the financial investment and trading domains, synthetic data—artificially generated datasets that
mimic real-world financial time-series characteristics—has become a robust solution for quantitative
analysis and back-testing. The demand for synthetic data has arisen due to increasingly complex
financial models and algorithms driven by data-demanding machine learning (ML) models. These
models find real historical data time-series to have multiple limitations, such as reduced volumes,
high costs, incomplete data, or irrelevance as we go further back in time. The core characteristic of
synthetic data is its ability to capture the statistical properties of real-world markets while maintaining
a completely artificial nature. This allows for intensive testing before financial models and algorithms
are further validated on real-time financial data. The Graph-Massivizer project [1] aims to develop a
software platform consisting of independent yet integrated tools. In one of its use cases, this platform
will generate synthetic data in extreme volumes, closely matching the quality and characteristics of
historical data samples of stocks and commodities futures, with plans to expand to other securities
such as ETFs, bonds, and options. At the core of the approach are knowledge graphs (KGs), chosen for
their ability to capture, store, and represent historical financial time-series. All technologies used are
designed around creating, processing, storing, and generating these KGs. KGs, designed to represent
entities and their relationships utilizing ontologies, can be significantly enhanced. Firstly, ontologies
provide a shared vocabulary and semantic alignment, particularly useful when integrating data from
different sources across various KGs. Secondly, KGs can leverage ontologies to perform inference and
reasoning, allowing the discovery of new relationships within the graph. Thirdly, ontologies can enrich
Woodstock’22: Symposium on the irreproducible science, June 07–11, 2022, Woodstock, NY
*
Corresponding author.
$ laurentiu.vasiliu@peracton.com (L. Vasiliu); Seyedehhaleh.Seyeddizaji@aau.at (S. H. S. Dizaji); ae@metaphacts.com
(A. Eberhart); Dumitru.Roman@sintef.no (D. Roman); radu.prodan@aau.at (R. Prodan)
0009-0000-9791-2759 (L. Vasiliu); 0000-0002-5886-9636 (S. H. S. Dizaji); 0000-0003-3007-5460 (A. Eberhart);
0000-0001-6397-3705 (D. Roman); 0000-0002-8247-5426 (R. Prodan)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
KGs by adding missing information, such as properties or classes that are not explicitly present but are
implied based on existing relationships.
2. Graph-Massivizer Project - The Financial Use Case
Green and sustainable finance This use case [2] aims to enhance algorithmic investment and
trading capabilities in green-focused products and investment/trading styles by generating and utilizing
extreme volumes of synthetic data for testing and training. In this respect, the Graph-Massivizer project
seeks to overcome the limitations posed by financial market data providers—such as restricted data
volume, reduced accessibility, and high costs, by enabling the rapid, semi-automated creation of realistic
and affordable synthetic financial datasets that are unlimited in size and accessibility. It also aims to
improve ML-based green investment and trading simulations, eliminating critical biases such as prior
knowledge, over-fitting, and indirect contamination due to current data scarcity. The approach first
maps samples of historical financial data (stocks and commodities futures) to a massive graph (F-MG)
through a time-series to graph transformation. Next, using a generative model, we create a synthetic
financial massive graph (SF-MG). Finally, we generate synthetic financial data from the SF-MG by
enforcing specific quality rules. To achieve this, the Graph-Massivizer platform is provided with 10 TB
of historical data samples, with the primary goal (KPI 1) of generating between 1 and 5 PB of synthetic
financial time-series data. Another goal (KPI 2) is to achieve 90% energy consumption accountability for
synthetic data creation. We use this data to test and improve financial algorithms, and aim to achieve
(KPI 3) a measurable return increase of 2-4% in the enhanced financial algorithms that use synthetic
data. Additionally, we aim to achieve (KPI 4) an increase in the financial algorithms’ alpha by 1-2% and
a Sharpe ratio greater than 1.5.
Graph-Massivizer toolkit The Graph-Massivizer toolkit is an integrated platform composed
of five tools (Graph-Inceptor, Graph-Scrutinizer, Graph-Optimizer, Graph-Greenifier, and Graph-
Choreographer) that perform specific and unique functions for massive graph processing:
• Graph-Inceptor: realizes a massive graph for the system to use.
• Graph-Scrutinizer: provides analytic capabilities and probabilistic reasoning for insights.
• Graph-Optimizer: ensures that large graph operations are completed efficiently.
• Graph-Greenifier: evaluates the energy consumption of massive graph operation.
• Graph-Choreographer: allows serverless deployment to use resources on-demand.
Further on, Figure 1 shows the overall Graph-Massivizer architecture and how the five tools are inter-
connected. Additionally, we can see in this diagram the external components, such as the metaphactory
platform, the graph database, and the hardware and infrastructure used by the toolkit.
3. Challenges in Modeling Financial Data
Modeling financial data presents several challenges due to financial markets’ dynamic nature and
complexities. In addition to numerous variables, volatility clustering, fat tails, and noise, which are all
specific to financial data, we focus particularly on five aspects to generate synthetic data.
KG relations’ extraction and enrichment In large-scale financial data, extracting relationships
among various data types can be complex and non-intuitive, often requiring inference methods to
identify them. We aim to enhance the quality of KG relation extraction by utilizing ontologies and
reasoning methods to identify and extract non-obvious relationships.
Heterogeneous time-series data Financial data consists of different types of time-series data
with different semantics, domains, and dynamics. We can mitigate this diversity by using ontologies
underlying their relationships and finding correlations among them.
Figure 1: Graph-Massivizer architecture.
Changing statistical properties Financial time-series often display changing statistical properties
over time, such as means, variances, and covariances. These properties are used for measuring an asset’s
performance and risk. These statistical patterns must be identified and replicated in the synthetic data
to model the original data accurately.
Quality assessment of generated data The evaluation of synthetic data is ongoing research [3]
and [4] and [5] review several evaluation methods of financial and other synthetic time-series. The
method of [3] applies various qualitative, quantitative, and predictive methods. These methods consist
of statistical models and distances, such as various distribution divergence metrics, the Kolmogrov-
Smirnov test, real and synthetic data correlation analysis, and the non-parametric model of MMD. Other
methods evaluate the ML models on real data trained on synthetic data. Additionally, we can evaluate
specific quantities such as Value at Risk (VAR). These methods differ depending on the use case, and we
aim to select appropriate metrics.
4. Using Ontologies and Knowledge Graphs
KGs and ontologies allow scientists and domain experts to model complex relations between data in a
logically structured and machine-readable format. This capability allows ontologies to connect diverse
sources of information, such as the use case presented here and similar related data.
In the Graph-Massivizer project, ontologies represent and integrate data from diverse use cases. The
metaphactory platform was chosen to manage ontologies and integrate data with a front-end interface.
Metaphactory has many applications for developing and managing ontologies, KGs, and other related
semantic artifacts [6]. With metaphactory, users can interact with and create ontologies and integrate
and use data that aligns with the ontology.
By decoupling the data and the schema for the data, the ontology allows developers to model and
prepare for handling massive amounts of data in an abstract way. For instance, a user or developer
can write queries to inspect only the relevant data of interest inside the large data set. While queries
like this are not themselves a direct algorithmic optimization, they do play a critical role in ensuring
scalability is possible by identifying critical semantic information and metadata that can reduce a huge
chunk of data into something more tractable.
In this section, we will describe the data represented by ontology and then show the ontology that
schematizes it to integrate it with a KG.
4.1. Ontology data
This use case initially focuses on two types of financial products: stocks and commodities futures.
However, this paper will concentrate on one financial product: stocks. The financial ontology for stocks
consists of four main types of financial data:
Fundamental data Fundamental data [7] indicators represent accounting data related to a company
and its particular industry. These indicators have a low update frequency (quarterly on average or
yearly) [8] and provide long-term insights into a company’s valuation and price evolution.
Technical data In contrast with fundamental data, technical data [9] has a very high update frequency
(tick/second/minute, etc.), offering short-term insights into stock price movements. This data includes
fine-grained historical stock price information in the form of Open, High, Low, Close, and Volume
(OHLCV). Numerous additional statistics can be derived from this financial modeling and prediction
data, particularly for intra-day trading.
ESG data Environmental, social, governance (ESG) [10] data measures companies based on various
responsibility metrics, including environmental, social, and governance criteria. By considering these
criteria in their investments, investors encourage responsible corporate behavior and avoid investing in
companies with risky or unethical practices.
Figure 2: Financial ontology diagram.
Sentiment data Market sentiment data [11] reflects investors’ attitudes toward a company, sector,
or financial market. Various indicators derived from statistical technical analysis, social media, or
alternative data sources can be used to measure market sentiment.
4.2. Ontology diagram
The financial ontology overview in Figure 2 shows the objects and data constituting the
use case, namely historical and synthetic data and financial algorithms. They run in-
side the PeractonSecuritiesPlatform class that ingests the SyntheticStocksData and
SyntheticCommoditiesData generated by the Graph-MassivizerPlatform class.
4.3. Synthetic ontology diagram
The synthetic financial data ontology in Figure 3 shows the created categories and mirrors the original his-
torical financial data set structure. The SyntheticSymbol belongs to SyntheticCommoditiesData
and SyntheticStocks classes, with the TechnicalData and FundamentalData classes as features.
It also shows the SyntheticFinancialData class with SyntheticCommoditiesMultiverse
and SyntheticStocksMultiverse subclasses, with further SyntheticCommoditiesData and
SyntheticStocksData subclasses.
5. Related Work
We briefly review the literature on applying KG in financial data analysis, then elaborate on other
financial data modeling and generation methods.
5.1. Ontology in financial data analysis
Several methods leverage ontology and KG in financial analysis, such as KG extraction and enrichment,
querying and reasoning over KGs, extracting correlations, and modeling financial time-series.
[8] studies the effect of considering fundamental and technical data in stock price prediction ML
models, showing that models benefiting from both indicators outperform models considering them
alone. The method of [12] proposes KG extraction, enrichment, and querying methods. [13] drives
a high-quality financial KG given the ontology by a semi-automated method and utilizes this KG in
reasoning, stock prediction, and generation with two neural network models, multi-layer perceptron
and long short term memory (LSTM). [14] provides an ontology-based correlation extraction between
different companies. It drives the network of companies by assessing time-series and uses the node2vec
and k-nearest neighbor (kNN) methods to embed and cluster the extracted nodes. [15] proposes a joint
Figure 3: Synthetic financial ontology diagram.
graph learning and prediction model on time-series data. It uses KG and graph neural networks (GNNs)
to derive the correlation among different time-series. The method of [16] benefits from KGs in finding
first and second-order relationships among companies. It applies an LSTM for time-series embedding of
each node and a temporal graph convolutional network (GCN) to incorporate the varying neighborhood
effect. The method in [17] formulates stock prediction as a stochastic optimization and introduces
genetic programming with a generalized crowding method using financial KG to predict prices.
5.2. Financial time-series modeling
Statistical models Various statistical linear models such as autoregressive models exist for stock
prices, however, they can’t capture the complex non-linear structure of these data [18].
Event based models These methods formulate time-series data as event data and define it as
remarkable changes in time-series in continuous time. The methods of [19] and [20] construct correlation
(influence) graphs of time-series utilizing the Hawkes process. [20] defines events as long-lasting
volatility values. It uses an attention layer to weigh and capture neighborhoods and an LSTM to predict
the next price. The method in [21] uses an event graph and tackles dimensionality. It formulates the
intensity function utilizing GNNs and the attention layer to dynamically embed node features and
recurrent neural networks (RNNs) to embed event sequences. Graphical event models are another form
of marked point processes event type utilizing graphical information for event graph construction [21].
Pattern recognition These methods perform pattern-matching techniques to predict trends in
time-series, including perceptually important points, template matching, and dynamic tree wrapping
algorithm. The survey [22] provides a detailed review of these methods.
ML models The review in [22] categorizes these models into supervised and unsupervised models.
Supervised learning methods use various ML models such as support vector machines, random forest,
Adaboost, kNN, and eXtreme gradient boosting methods for stock prediction. The unsupervised learning
methods include clustering methods to help in finding correlations among markets [22].
Recently, several studies used deep learning models for modeling financial time-series data con-
sisting of convolutional neural networks, RNNs, attention mechanisms, and generative adversarial
networks (GANs) capable of capturing non-linear and complex data features. The survey [23] provides
a comprehensive review of these models.
GANs are powerful data generation models appropriate for time-series and adjusted for event data
generation. The method in [24] proposes a marked event data generation method using separate
generators and discriminators for each event type and preserves type correlations using a central
discriminator. This method provides synthetic data for downstream tasks. The GAN model of [18]
constructs the correlation graph among stocks using different correlation analysis methods and uses
GCNs to encode interdependent time-series data. The method in [3] captures correlation among stocks
and applies three generative models, including GAN. The survey [4] presents more models of this
category.
The other category of methods uses natural language processing (NLP) to benefit from financial text
data such as financial news and SEC filings. They consist of N-grams and word2vec embeddings as
inputs for prediction models. The survey [22], and paper [16] introduce methods of this category.
6. Correlation Analysis among Financial Data
Financial data can show various correlations across financial products, companies, and markets. These
correlations may be based on deeper links between companies and industries or simply random, lacking
any underlying economic or financial rationale. We aim to identify the relevant and meaningful
correlations within historical financial time-series and use these insights to generate synthetic data.
6.1. Point process approach
Point process models are stochastic processes that successfully model event sequences [25]. They vary
depending on the definition of the conditional intensity function, representing the expected number
of events in a small time interval given the event history. We consider a multi-dimensional Hawkes
(self-exciting) process [26] to model dependencies between various time-series of markets and obtain
the influence graph. We convert financial time-series data to event sequences by defining events as
relatively significant changes in time-series.
Self-exciting process This process represents the triggering effect of event history in the intensity
and occurrence of future events, usually as an exponential exciting kernel (Eq. 1) [26]:
∑︁
𝜆𝑘 (𝑡) = 𝜇𝑘 + 𝑎𝑘𝑖 ,𝑘 · 𝑒−𝛽·(𝑡−𝑡𝑖 ) , (1)
𝑖;𝑡𝑖 <𝑡
where 𝜆𝑘 (𝑡) is the intensity of event type 𝑘 at time t, 𝜇𝑘 is the base intensity for even type 𝑘, 𝛽 is the
excitation decay rate and 𝑎𝑘𝑖 ,𝑘 is the influence parameter between event types of 𝑘𝑖 and 𝑘.
We infer model parameters (𝜇 and influence matrix 𝐴 = (𝑎𝑖,𝑗 )) by maximizing the log-likelihood of
events given in Eq. 2 using the expectation-maximization or stochastic gradient descent methods:
∑︁ ∫︁ 𝑇
ℒ= log 𝜆𝑘𝑖 (𝑡𝑖 ) − 𝜆(𝑡)𝑑𝑡, (2)
𝑖:𝑡𝑖 <𝑇 0
where [0,𝑇 ] is the time interval of events we consider for correlation analysis, and 𝜆(𝑡) is the sum of
intensities of all event types.
Stocks’ correlation analysis within a given exchange (market) We consider different point
processes for the time-series of stocks and obtain the correlation graph among them by analyzing event
data and inferring the influence matrix of the multi-dimensional process [27]. This matrix reveals the
weighted dependency (influence) graph among different market’ stocks. Due to the ever-changing
dependencies among companies, we can update this graph over time and drive a dynamic dependency.
As a remedy for high dimensional market data, which decreases the accuracy of these models, we can
drive an initial dependency graph by processing various textual data using NLP techniques.
Correlation analysis of internal stock data Despite the expressiveness of point process models,
they are inaccurate in high-dimensional spaces. In correlation analysis of internal stock data (funda-
mental and technical), we can mitigate this problem by leveraging KGs and considering only relations
appearing in the KG.
Formulating intensity function using neural networks In addition to high dimensionality, the
assumption of solid intensity functions such as the formulation in Eq. 1 with predefined triggering
effect (positive and additive effect of history), might not apply to every real scenario with intricate
dependencies. Therefore, several methods benefit from neural networks such as LSTMs in [28] to formu-
late the intensity function of the point process models that capture variable and complex dependencies
and adjust the model for the specific use case. Additionally, the attention mechanism [29], which can
explicitly model the influence of event types, guides in finding the correlation graph.
6.2. Spurious correlations
Every correlation does not represent a causality relationship known as spurious correlation [30]. This
correlation can be a random effect or caused by other hidden variables and, therefore, be spurious [30].
In particular, deep learning models usually result in spurious correlations without enough diverse data
[31]. To mitigate this misinterpretation, we will apply methods to test the correlations.
• Considering other factors in the stock market in correlation analysis, in addition to stocks, as
much as possible.
• Considering long-term correlations among time-series or comparing these correlations in the
long term to test if they are not random.
• Applying null hypothesis for finding significant p-values, such as the method of [32], which
constructs a spurious relationship graph among time-series of stocks using Granger causal relation
test to evaluate causality between time-series and T-test to estimate the p-value.
We aim to prune the initial dependency graph and reduce the dimensionality of point process models by
detecting spurious correlations and driving a more meaningful model based on causality dependencies.
7. Modeling Synthetic Financial time-series
An important input in modeling synthetic time-series is using a correlation graph among stocks, as
presented in Figure 4. The temporal and dependency features of stocks are embedded using this
correlation graph. These features serve as inputs for ML models to generate time-series data. In the
following sections, we propose various types of these models:
Improving event-based models using GNNs and LSTM The correlation graph can be used
directly for generating time-series. However, to enhance the initial point process models for modeling
time-series (that consist of more details than event sequences) and to improve graph weights, we will
add GNN with attention layers [33]. This model can simultaneously encode time-series and correlation
graph structure and obtains graph weights [20] (Figure 4). Then, we will model the synthetic time-series
data using an LSTM given the encoded data [20].
3
1
5
2 ML-based (x11, x12, ....)
4
time series (x21, x22, ....)
8 generator (x31, x32, ....)
6
. 7
Multi-
.
Correlation dimensional
Multi- GNN
graph time-series
dimensional
event data
Figure 4: Multi-dimensional synthetic time-series model.
Generating synthetic time-series using correlation graphs The other method combines the
correlation graph and GANs to generate interrelated time-series data. We will apply this graph to
embed time-series data, capturing their relationships and providing inputs for the GAN model.
8. Implementation Considerations
Generating extreme volumes of synthetic financial time-series data has unique challenges and require-
ments from both software and hardware perspectives. Here, we outline the most relevant ones.
Scalable data generation The five tools of the Graph-Massivizer platform are being implemented
scalable to generate synthetic time-series data across a distributed system. One strategy employed is task-
level parallelism, which divides the data generation process into small chunks processed simultaneously
across different nodes in the HPC cluster. As a computing framework, Apache Spark provides libraries
and functionalities for parallel processing and data management on large clusters.
Data storage and streaming Managing and storing the produced synthetic data at the petabyte level
requires various solutions, such as compression, third-party storage, and transferring data only when
necessary. Ideally, the synthetic data should remain within the HPC cluster, with financial simulations
and testing conducted in the same environment to minimize data movement.
Parallel processing of data generation and large memory capacity We avail CINECA’s Leonardo
Pre-exascale supercomputer [34] that will allow the parallel processing of synthetic data generation. A
low-latency, high-bandwidth network is in place for communication between compute nodes within
the HPC cluster to facilitate data exchange
Energy consumption monitoring The Graph-Massivizer platform has a dedicated tool called
’Graph-Greenifier’ to monitor and analyze the energy consumption used for generating the synthetic
data time-series.
Data security Even though the underlying historical financial time-series data does not contain
personally identifiable information, its synthetic data can still be commercially sensitive. Therefore,
security measures are necessary to ensure data confidentiality, data integrity, and secure coding practices,
such as strict access control, synthetic data encryption at rest and in transit, and digital signatures of
the synthetic data to ensure its authenticity and prevent tampering and data logging monitoring.
Cost optimization Finally, cost optimization in generating synthetic data is critical to the entire
process. It involves balancing hardware costs, licensing fees, and ongoing maintenance expenses with
the desired performance and scalability. The goal is to offer a more cost-effective solution than real
historical financial data while maintaining competitiveness in pricing.
9. Conclusions
The work presented in this paper, as part of the Graph-Massivizer EU project, is currently at the
halfway point and demonstrates the initial proof-of-concept developments for generating synthetic
data in extreme volumes. The mechanisms and approaches identified here will be further implemented
as a robust workflow within the five tools of the Graph-Massivizer platform. The focus will be on
software implementation, integrating the five tools, and identifying relevant correlations in historical
time-series to enhance the quality of the generated synthetic data. After generating synthetic data
samples, they will be tested using Peracton’s back-testing engine with various investment and trading
financial algorithms. The behavior of these algorithms will be analyzed and compared with their
performance on real historical data to assess differences and similarities. Feedback will be provided to
the Graph-Massivizer platform to fine-tune the quality of the synthetic data further.
Acknowledgement The Graph-Massivizer project has received funding from the European Union’s
Horizon Research and Innovation Actions under Grant Agreement Nº 101093202.1
References
[1] E. U. H. R. Graph-Massivizer EU Project, G. A. N. . Innovation Actions, Graph massivizer, 2023.
URL: https://graph-massivizer.eu/.
[2] E. U. H. R. Graph-Massivizer EU Project, G. A. N. . Innovation Actions, Use case 1 green-finance,
2023. URL: https://graph-massivizer.eu/project/green-and-sustainable-finance/.
[3] M. Dogariu, L.-D. Ştefan, B. A. Boteanu, C. Lamba, B. Kim, B. Ionescu, Generation of realistic
synthetic financial time-series, ACM Transactions on Multimedia Computing, Communications,
and Applications (TOMM) 18 (2022) 1–27.
[4] S. A. Assefa, D. Dervovic, M. Mahfouz, R. E. Tillman, P. Reddy, M. Veloso, Generating synthetic data
in finance: opportunities, challenges and pitfalls, in: Proceedings of the First ACM International
Conference on AI in Finance, 2020, pp. 1–8.
[5] M. Stenger, R. Leppich, I. Foster, S. Kounev, A. Bauer, Evaluation is key: a survey on evaluation
measures for synthetic time series, Journal of Big Data 11 (2024) 66.
[6] A. Eberhart, P. Haase, W. Schell, metaphactory for massive graphs, in: M. Vieira, V. Cardellini,
A. D. Marco, P. Tuma (Eds.), Companion of the 2023 ACM/SPEC International Conference on
Performance Engineering, ICPE 2023, Coimbra, Portugal, April 15-19, 2023, ACM, 2023, pp. 215–220.
URL: https://doi.org/10.1145/3578245.3585330. doi:10.1145/3578245.3585330.
[7] Investopedia, Fundamental data, 2024. URL: https://www.investopedia.com/terms/f/
fundamentalanalysis.asp.
[8] E. Beyaz, F. Tekiner, X.-j. Zeng, J. Keane, Comparing technical and fundamental indicators in stock
price forecasting, in: 2018 IEEE 20th international conference on high performance computing
and communications; IEEE 16th international conference on smart city; IEEE 4th international
conference on data science and systems (HPCC/SmartCity/DSS), IEEE, 2018, pp. 1607–1613.
[9] Investopedia, Technical data, 2024. URL: https://www.investopedia.com/terms/t/technicalanalysis.
asp.
[10] Investopedia, Esg data, 2024. URL: https://www.investopedia.com/terms/e/
environmental-social-and-governance-esg-criteria.asp.
[11] E. L. Jeffriess, J, Sentiment data, 2024. URL: https://www.eightcap.com/labs/
exploring-the-most-common-sentiment-indicators-on-tradingview/.
[12] S. Zehra, S. F. M. Mohsin, S. Wasi, S. I. Jami, M. S. Siddiqui, M. K.-U.-R. R. Syed, Financial knowledge
graph based financial report query system, IEEE Access 9 (2021) 69766–69782.
[13] N. Kertkeidkachorn, R. Nararatwong, Z. Xu, R. Ichise, Finkg: A core financial knowledge graph for
financial analysis, in: 2023 IEEE 17th International Conference on Semantic Computing (ICSC),
IEEE, 2023, pp. 90–93.
1
More information available at: https://graph-massivizer.eu/
[14] C. Erten, D. Kazakov, Ontology graph embeddings and ilp for financial forecasting, in: International
Conference on Inductive Logic Programming, Springer, 2021, pp. 111–124.
[15] S. Ibrahim, W. Chen, Y. Zhu, P.-Y. Chen, Y. Zhang, R. Mazumder, Knowledge graph guided simul-
taneous forecasting and network learning for multivariate financial time series, in: Proceedings of
the Third ACM International Conference on AI in Finance, 2022, pp. 480–488.
[16] D. Matsunaga, T. Suzumura, T. Takahashi, Exploring graph neural networks for stock market
predictions with rolling window analysis, arXiv preprint arXiv:1909.10660 (2019).
[17] X. Fu, X. Ren, O. J. Mengshoel, X. Wu, Stochastic optimization for market return prediction using
financial knowledge graph, in: 2018 IEEE International Conference on Big Knowledge (ICBK),
IEEE, 2018, pp. 25–32.
[18] D. Ma, D. Yuan, M. Huang, L. Dong, Vgc-gan: A multi-graph convolution adversarial network for
stock price prediction, Expert Systems with Applications 236 (2024) 121204.
[19] J. Etesami, N. Kiyavash, K. Zhang, K. Singhal, Learning network of multivariate hawkes processes:
A time series approach, arXiv preprint arXiv:1603.04319 (2016).
[20] T. Yin, C. Liu, F. Ding, Z. Feng, B. Yuan, N. Zhang, Graph-based stock correlation and prediction
for high-frequency trading systems, Pattern Recognition 122 (2022) 108209.
[21] K. Yoon, Y. Im, J. Choi, T. Jeong, J. Park, Learning multivariate hawkes process via graph recurrent
neural network, in: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery
and Data Mining, 2023, pp. 5451–5462.
[22] D. Shah, H. Isah, F. Zulkernine, Stock market analysis: A review and taxonomy of prediction
techniques, International Journal of Financial Studies 7 (2019) 26.
[23] W. Jiang, Applications of deep learning in stock market prediction: recent progress, Expert
Systems with Applications 184 (2021) 115537.
[24] A. Seyfi, J.-F. Rajotte, R. Ng, Generating multivariate time series with common source coordinated
gan (cosci-gan), Advances in neural information processing systems 35 (2022) 32777–32788.
[25] D. J. Daley, D. Vere-Jones, et al., An introduction to the theory of point processes: volume I:
elementary theory and methods, Springer, 2003.
[26] A. G. Hawkes, Spectra of some self-exciting and mutually exciting point processes, Biometrika 58
(1971) 83–90.
[27] E. Lewis, G. Mohler, A nonparametric em algorithm for multiscale hawkes processes, Journal of
nonparametric statistics 1 (2011) 1–20.
[28] H. Mei, J. M. Eisner, The neural hawkes process: A neurally self-modulating multivariate point
process, Advances in neural information processing systems 30 (2017).
[29] Y. Gu, Attentive neural point processes for event forecasting, in: Proceedings of the AAAI
Conference on Artificial Intelligence, volume 35, 2021, pp. 7592–7600.
[30] H. A. Simon, Spurious correlation: A causal interpretation, Journal of the American statistical
Association 49 (1954) 467–479.
[31] S. Wu, M. Yuksekgonul, L. Zhang, J. Zou, Discover and cure: Concept-aware mitigation of spurious
correlation, in: International Conference on Machine Learning, PMLR, 2023, pp. 37765–37786.
[32] G. Li, J. J. Jung, Dynamic relationship identification for abnormality detection on financial time
series, Pattern Recognition Letters 145 (2021) 194–199.
[33] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio, et al., Graph attention
networks, stat 1050 (2017) 10–48550.
[34] CINECA, High performance computing, leonardo pre-exascale supercomputer, 2024. URL: https:
//leonardo-supercomputer.cineca.eu/.