<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Forecasting Future Topic Trends in the Blockchain Domain: Using Graph Convolutional Network</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yejin Park</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Seonkyu Lim</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Changdai Gu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Min Song</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Korea Financial Telecommunications &amp; Clearings Institute</institution>
          ,
          <addr-line>432, Nonhyeon-ro, Gangnam-gu, Seoul</addr-line>
          ,
          <country>Republic of Korea</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Oncocross Co., Ltd.</institution>
          ,
          <addr-line>11, Saechang-ro, Mapo-gu, Seoul</addr-line>
          ,
          <country>Republic of Korea</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Yonsei University</institution>
          ,
          <addr-line>50, Yonsei-ro, Seodaemun-gu, Seoul</addr-line>
          ,
          <country>Republic of Korea</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Keeping up with evolving trends is crucial in modern society, but it can be challenging. Time series forecasting analysis has emerged as a promising approach to analyze data over time and identify trend patterns, particularly in fields like blockchain where rapid advancements occur. Graph convolutional networks (GCNs) have shown promise for analyzing structured data, but their efectiveness in domains other than trafic and stock forecasting remains unclear. Eforts to incorporate GCNs for forecasting topic trends have limitations, such as not integrating topic information. To address these limitations, we propose a new approach that combines topic modeling techniques and GCNs for forecasting future topic trends in the blockchain domain. We select an attention temporal graph convolutional network (A3T-GCN) model for its ability to capture global variation trends. Using paper data from the Scopus database, we preprocess the data, identify potential topics using Dirichlet Multinomial Regression and Latent Dirichlet Allocation, and apply agglomerative clustering. We construct two graphs, the random subgraph, and the topic graph, incorporating node features (word count and centralities) and edge weights (co-occurrence). The A3T-GCN model is trained on the random subgraph for forecasting, and the topic graph is used to predict future topic trends in the blockchain with pre-trained models. Our objective is to track key topics and leading keywords shaping the field. The proposed approach has implications for researchers, businesses, and policymakers in understanding topic trends. The paper concludes by presenting the methodology, experimental findings, and future research directions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Forecasting</kwd>
        <kwd>Topic modeling</kwd>
        <kwd>Agglomerative clustering</kwd>
        <kwd>A3T-GCN</kwd>
        <kwd>Blockchain</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>ing GCNs to forecasting in the field of trafic and stock
[5, 6, 7, 8, 9, 10] leaving open the question of whether
In modern society, where trends are constantly evolving, GCNs can be efectively applied to other domains.
Efkeeping up with the latest trends is crucial for individuals forts to incorporate deep learning methods such as Long
and organizations to succeed, yet it can be a challenging Short-Term Memory (LSTM) and GCNs to capture the
task. To overcome this challenge, time series forecasting temporal and structural dependencies between topics
analysis has emerged as a promising approach, enabling [11] have shown promise in sophisticated forecasting
individuals to analyze data over time and identify pat- topic trends, but there are still limitations to consider.
terns and trends. This approach holds particular rele- Notably, these models have yet to incorporate topic
modvance in fields like blockchain [ 1, 2], where rapid techno- eling [12] method that can be used to identify the main
logical advancements are commonplace. By leveraging themes and sub-topics in a corpus of documents and can
past and present trends, individuals and organizations help provide a more nuanced understanding of the
recan make informed predictions about the future, which search field. Therefore, there is a need to develop more
can give them a competitive edge in the market. advanced methods that can integrate deep learning
tech</p>
      <p>Recently, graph convolutional networks (GCNs) [3] niques with topic modeling analysis to better capture
which are variants of graph neural networks (GNNs) the complex dynamics of topic trends and the
intellec[4] have emerged as a promising approach for analyz- tual structure of research fields. Such methods could
ing structured data, including time-series data. How- potentially improve the accuracy of forecasting topic
ever, most of the existing studies have focused on apply- trends and provide valuable insights for researchers and
decision-makers in various domains.</p>
      <p>Joint Workshop of the 4th Extraction and Evaluation of Knowledge To address these limitations, we propose a new
Entities from Scientific Documents and the 3rd AI + Informetrics (EEKE- approach for forecasting future topic trends in the
AII2023), June 26, 2023, Santa Fe, New Mexico, USA and Online blockchain domain, which combines topic modeling
tech* Corresponding author. niques and GCNs. Given the rapid evolution of the
† These authors contributed equally. blockchain field and the importance of accurate topic
c$dgyue@jinyiopnasreki@.acy.oknrs(eCi..aGc.uk)r; (mY.inP.saorkn)g; @skyloimns@eik.aftcc.k.orr.(kMr.(SS.oLnigm) ); forecasting, our proposed method has implications for
© 2023 Copyright 2023 for this paper by its authors. Use permitted under Creative Commons researchers, businesses, and policymakers. To identify
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g LCicEenUseRAttWribuotironk4s.0hIontpernPatrioonacl e(CeCdBiYn4g.0)s. (CEUR-WS.org)</p>
      <sec id="sec-1-1">
        <title>1. Vocab: keywords from LDA + keywords from DMR</title>
      </sec>
      <sec id="sec-1-2">
        <title>2. Embed keywords in vocab using Bert embedding</title>
        <sec id="sec-1-2-1">
          <title>Topic 0</title>
        </sec>
        <sec id="sec-1-2-2">
          <title>Topic 1</title>
          <p>the most suitable GCN model for our task of topic trend tional network (ASTGCN), and attention temporal graph
forecasting, we analyze several GCN models, including convolutional network (A3T-GCN). DCRNN [7], which
the difusion convolutional recurrent neural network employs a bidirectional random walk, captures spatial
(DCRNN), temporal graph convolutional network (T- dependencies. T-GCN [8] combines GCNs and Gated
GCN), attention based spatial-temporal graph convolu- Recurrent Unit (GRU) to capture both spatial and
temporal dependencies. ASTGCN [9] employs an attention the blockchain industry? This question is crucial for
unmechanism to capture dynamic correlations in spatial derstanding the key factors that will shape the industry’s
and temporal dimensions. A3T-GCN [10] uses the atten- trajectory.
tion mechanism to improve T-GCN. Among these models, The remainder of this paper is structured as follows:
we select A3T-GCN, which appropriately captures the Section 2 presents the methodology that we have
proglobal variation trend by re-weighting the influence of posed, Section 3 details our experimental findings, and
historical information. ifnally, we conclude in Section 4.</p>
          <p>To conduct our research, we collect paper data from
the Scopus database over a five-year period and extract
titles, abstracts, and keywords. After preprocessing the 2. Methodology
collected data, we employ Dirichlet Multinomial
Regression (DMR) [13] and Latent Dirichlet Allocation (LDA) Figure 1 shows the overall workflow of the current study.
[14] techniques to identify potential topics. We then
apply agglomerative clustering [15, 16] to the resulting 2.1. Data Preparation
topic keywords from both models. Next, we proceed to We collect data from research papers published between
construct two distinct graphs: the first one is known as January 1st, 2017, and December 31st, 2022, from the
the random subgraph, which comprises keywords with a Scopus database using the search query "Blockchain or
count of 10 or more, encompassing both topic keywords Block-chain" (Figure 1). This search yields a total of
and other keywords. The second graph, referred to as 192,519 research papers. From these papers, we extract
the topic graph, solely consists of the topic keywords. relevant information such as titles, abstracts, and
keyThe graph reconstruction process involves incorporat- words. To prepare the extracted data for further analysis,
ing node features, including word count and centrali- We perform several preprocessing steps. Firstly, we
conties, as well as edge weights derived from co-occurrence vert all the text to lowercase. Then, we divide the text
analysis. These node features and edge weights are up- into sentence units using the Natural Language Toolkit
dated on a monthly basis, taking into account changes (NLTK) library. Subsequently, we tokenize the sentences
in keyword word count as indications of shifts in topic into words and employ NLTK for part-of-speech
tagtrends. Using the random subgraph, we train the A3T- ging. We specifically retain words that are tagged as
GCN model to forecast topic trends at diferent time in- nouns since they are typically more informative for our
tervals, specifically 1, 3, 6, 9, and 12 months into the analysis. In the final preprocessing step, we filter out
future. We use the topic graph to predict future topic stopwords, which include commonly used words,
meantrends in the blockchain domain at a point  +  (where n ingless words, and major topic words. Stopwords tend
represents the time interval, such as 1 month, 3 months, to occur frequently and do not contribute much to the
6 months, 9 months, or 12 months). These predictions overall understanding of the text. By removing them,
are based on the pre-trained A3T-GCN model. we aim to focus on more relevant and meaningful terms</p>
          <p>We focus on the blockchain field, chosen for its po- within the dataset.
tential to bring about transformation in sectors such as
ifnance, supply chain, and healthcare. Our objective is to
track leading keywords and identify the main topic that 2.2. Topic Modeling and Clustering
drives industry development. Specifically, we aim to
address the research question: Which primary blockchain
topics will have a substantial influence on the future of
In this study, we utilize two widely used topic modeling
methods, Latent Dirichlet Allocation (LDA) and
Dirichlet Multinomial Regression (DMR). While some previous
works have only employed either DMR or LDA [17, 18], 3. Perform one-hot encoding with vocab size to each
others [19] have shown that combining both methods topic composed of a set of keywords.
can yield more efective results. Therefore, we utilize 4. To obtain the embeddings for each topic, we
mulboth LDA and DMR to generate topics and obtain the tiply the one-hot encoded representation of the
topic distribution throughout the literature. By employ- topic with the corresponding Bert embeddings of
ing topic modeling, we are able to identify the topics the keywords in the topic.
present in the entire document set, determine the rel- 5. Cluster topics with similar meanings using
agative proportion of each topic in every document, and glomerative clustering based on cosine similarity
analyze the distribution of words associated with each of embedding values of topics while avoiding the
topic. inclusion of duplicated words.</p>
          <p>Selecting the optimal number of topics is a crucial step
in topic modeling as it significantly afects the model’s Agglomerative clustering is employed for topic merging,
performance during training. Generally, the perplexity utilizing the average linkage method with cosine as the
and coherence measures are used to determine the opti- afinity measure. The distance threshold is set to 0.05, as
mal number of topics. Lower perplexity indicates more it yields the highest silhouette score.
accurate predictions, while higher coherence indicates
better semantic consistency in the topic results. There- 2.3. Graph Reconstruction
fore, one [20] or both [21, 22] measures can be used to
determine the optimal number of topics. In this study, 2.3.1. Data for document graph
we utilize both coherence and perplexity as indicators for For the document graph, we perform a word count
analdetermining the optimal number of topics. To identify ysis on the preprocessed words in the corpus. We only
the ideal number of topics, we search for the intersec- consider words with a count exceeding 10 throughout the
tion point where the coherence value increases while the whole time span, aiming to focus on more meaningful and
perplexity value decreases rapidly. informative words. To calculate monthly co-occurrence,</p>
          <p>To achieve a more diverse and precise set of topics, we examine pairs of words at the document level for
we employ a clustering approach to merge the topics every month. For each document, we count all
combinagenerated by LDA and DMR. Our approach is inspired tions of pairs of words with equal weight, disregarding
by previous research [19] which utilized cosine simi- repeated occurrences of identical words to avoid any
polarity to merge the results from LDA and DMR. In this tential skew in the co-occurrence calculation that may
study, we conduct experiments to compare two methods: result from variations in document length. All the word
element-wise multiplication and sentence embedding, counts and co-occurrences are calculated on a monthly
in order to obtain the embedding value for each topic. basis.</p>
          <p>The element-wise multiplication method involves
embedding keywords and multiplying them with the one-hot 2.3.2. Document graph
encoding of each topic, while the sentence embedding
method treats keywords within a topic as a sentence and To reconstruct a time-serial document graph (Figure 3
obtains the embedding value for the entire topic. Our find- (a)), we employ word count and co-occurrence data on
ings reveal that the element-wise multiplication method a monthly basis. In the graph, every word node is
annoachieves a higher silhouette score of 0.8038 during clus- tated with its respective monthly and whole-time word
tering. Additionally, when examining the resulting key- count, while the co-occurrence edges are annotated with
words from topic clustering using both approaches, the the monthly and whole-time co-occurrence values
beelement-wise multiplication method outperforms the al- tween the words they represent at each month. To
evaluternative method by generating more cohesive clusters ate the centrality of nodes, we employ various methods
with semantically similar keywords. Considering these such as degree centrality, betweenness centrality, and
results and the improved clustering performance, we closeness centrality [26]. To impose edge distance
beselect the element-wise multiplication method as the pre- tween nodes, we use inverted co-occurrence counts in
ferred approach for constructing topic embeddings. calculating centralities.</p>
          <p>Initially, topics are generated using both LDA and
DMR, with each topic consisting of a set of keywords with 2.3.3. Random and topic subgraphs
similar meanings. Then, we merge similar topics using
the following steps, which are presented in Figure 2: In order to predict the topic trend and its corresponding
keywords, it is necessary to utilize the entire word nodes
1. Create a vocab by combining the keywords within and their corresponding edges from the whole document
the topic generated by DMR and LDA. during training. However, due to memory limitations, it
2. Embed the keywords in the vocab using Bert [23, becomes necessary to restrict data utilization. For this,
24, 25].</p>
          <p>Time-serial document graph construction
t0(2017.Jan.)</p>
          <p>tn(2022.Dec.)
(a) Document graph</p>
          <p>Graph analysis</p>
          <p>Node label
- Word count
Node features
- Centrality
(degree;
betweenness;
closeness)
Edge weight
- Co-occurrence
...
we employ randomly clustered or selected subgraphs that
contain a suficient number of nodes to cover the entire
document graph (Figure 3 (b)).</p>
          <p>To construct the random subgraph, we initially extract
word nodes using the random walk method, which is
a common node sampling technique in graph analysis
and machine learning tasks [27, 28] (Figure 3 (b)). The
number of nodes in each subgraph is randomly chosen
between 8 and 20, as the number of nodes for the topic
clustering results ranges from 10 to 15. For random
selection of nodes, a seed node is randomly chosen from
the document graph and used to construct a primary
random node pool. Based on the seed node, another node is
appended to the random node pool, ensuring the
connection of the newly selected to the random node pool. The
randomness of the selection process for a new random
node is weighted with the connectivity of the random
node pool to the new random node candidates. Then,
the selected word-related node annotations and edges
features are extracted from the document graph, to
reconstruct a time-serial random subgraph. Through the
random node selection and time-serial random subgraph
reconstruction process, we construct 2,000 random
subgraphs for each training, validation, and test dataset. We
use early (0 to ) time epoch data of the time-serial
random subgraphs for training and validation dataset, and
late time epoch data (+1 to ) for the test dataset, to
ensure time-independence between training/validation
and test timeline (Figure 3 (b) upper right).</p>
          <p>For the time-serial topic subgraphs, we extract each
corresponding keyword-related node and edge feature
from the document graph. The extracted features are
used to reconstruct time-serial topic subgraphs for each
topic (Figure 3 (c)). Each time-serial topic subgraphs of
the test time span ( +1 to ) are used for forecasting
based on the pre-trained A3T-GCN.</p>
          <p>The features facilitated to A3T-GCN include node
features (word count and centralities) and edge weight
(cooccurrence) for both random subgraphs and topic graphs
on a monthly basis. Since the number of nodes for the
subgraphs varies, we impose placeholder nodes to each
node pool and impute them with zero values for nodes
and null values for related edges.
2.4. Topic Trend Forecasting
In this study, we use the A3T-GCN model that can
effectively capture global variation trends by re-weighting
the influence of historical information. Our approach
involves constructing an A3T-GCN consisting of nodes
that represent keywords in the graph. Each node has
features that reflect the word count of its corresponding
keyword and the keyword’s centrality within the graph.</p>
          <p>The edge weight between nodes is determined by the
co-occurrence of keyword pairs. To capture the changes
in topic trends, we update the node features, and edge
weight on a monthly basis, as changes in keyword word
count are assumed to indicate changes in topic trends. By
predicting changes in word count using information such
as keyword centrality and co-occurrence, our proposed
A3T-GCN model ofers an efective approach for
accurately forecasting topic trends over time. Therefore, our
healthcare</p>
          <p>health
insurance</p>
          <p>Papers topic 6</p>
          <p>education
patients
privacy
patient
record
records
care
.
n
a
J
y
l
u</p>
          <p>J
20k
t
n
cou15k
d
r
o
fW10k
o
m
u
S5k
0
20172_001172_001372_001572_001772_001972_011182_001182_001382_001582_001782_001982_011192_001192_001392_001592_001792_001992_012102_002102_002302_002502_002702_002902_012112_002112_002312_002512_002712_002912_012122_002122_002322_002522_002722_00292_11</p>
          <p>Time
(a) The whole timeline word count for each topic of papers
01 02 03 04 05 06 07 08 09 10 11 12</p>
          <p>Month
(b) The average word count by month for paper topics
model is a valuable tool for a wide range of applications. designated for training and validation, while the
subsequent 36 months are used for testing. For the training
2.4.1. Training A3T-GCN model and validation phase, we utilize 2,000 random subgraphs
To optimize the A3T-GCN model, we perform feature se- extracted from the first 36 months of the overall dataset.
lection on the node features and conduct hyperparameter Similarly, for testing, we employ 2,000 random subgraphs
optimization. This allows us to identify the most relevant from the later 36 months.
features and tune the model parameters for improved
performance. Subsequently, we train individual models 2.4.2. Forecasting of the topic
to predict word count for future time periods, specifically Using the pre-trained models on the random subgraph,
1, 3, 6, 9, or 12 months ahead. The training process uti- we conduct forecasting on the topic graph. The
forecastlizes random subgraphs, with a fixed training lookback ing process involve predicting the outcome for future
window of 12 months. To facilitate the training and eval- time periods, specifically 1, 3, 6, 9, and 12 months ahead.
uation of the models, the random subgraphs are divided This forecasting is carried out using a fixed training
lookinto distinct time steps. The initial 36 months of data are back window of 12 months, meaning the model used the
A) Word counts by month for topics - papers
B) Word counts by month for topics - patents
C) Mean of word counts by month for topics - papers
D) Mean of word counts by month for topics - patents
past 12 months of topic graph data to make predictions
for the future (Figure 3c). To ensure heterogeneity in
the time span for the topic forecasting from training or
validation, we use later 36 months features of the topics,
which is the identical timeline to the test dataset.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Experiment Results</title>
      <p>3.1. Environments
3.2. Topic Modeling and Clustering
We used the method described in Section 2.2 to
determine the optimal number of topics for both LDA and
DMR models. The optimal number of topics was found
to be 10. After generating topics using both models, we
performed clustering as outlined in Figure 2, and the
result of the clustering is presented in Table 1. To
evaluate the clustering performance, we used the Silhouette
Score [29, 30, 31], which is a commonly used method for
clustering evaluation. The Silhouette Score obtained for
this study was 0.8038, which exceeds the threshold of 0.5,
indicating good clustering performance.
3.3. Time-series Graphs and Features
3.4. Topic Trend Forecasting
We constructed the document graph using word count
and co-occurrence of the data and extracted topic-specific We assessed the performance of the A3T-GCN model
subgraphs from the document graph (Figure 4), and three using two evaluation metrics, Mean Squared Error (MSE)
types of centralities were calculated at each time point. and Mean Absolute Error (MAE) [32].
As the timeline of collected data has 72-time points, 72
time-specific subgraphs with word count, co-occurrence,
eaancdhptroep-cica.lcuAlasteshdocwenntrianliFtiiegsuhreav4e, bceoe-nocecxutrrraecnteced fboer-   = ∑︀=1(Y − Y^ )2 , (1)
ttoowcc2eu0ern1r9e“,nhbceueatslgtbhrea”tdawuneadellny“rd“ehececorareldta”hs”ewdaa.nsOddn“octmhareineo”a,tnhatenrfdrho“amhneda2,l0tc1ho7”-   = 1 ∑=︁1| Y Y−Y^  |, (2)
and “privacy” were relatively more dominant in 2020 to where Y is the -th element of Y,  is the number of
2022. As a result, the structure of co-occurrence for topic elements.
graphs seems not static when comparing all the pairs of
nodes, but with partial structural movement by time.</p>
      <p>We further investigated the month-specific trend to an- 3.4.1. Training A3T-GCN model
alyze the seasonality of the document (Figure 5). Figure 5 To optimize the A3T-GCN model, we performed feature
(a) and Figure 5 (b) demonstrate that the word count of selection by trying various node feature combinations.
all topics exhibited a noticeable increase in January each Along with feature selection, we conducted
hyperparamyear compared to other months. However, even when eter optimization by varying the learning rate with values
excluding the January papers, we observed elevated word of 1e-2, 1e-3, and 1e-4. Table 2 displays the results of the
count tendencies in other months such as July and De- feature selection procedure, which were assessed through
cember (Figure 5 (b)). To account for this seasonality, MSE and MAE. Based on these results, we selected the
Ground truth
Forecasting
(a) Forecasting horizon = 3</p>
      <p>(b) Forecasting horizon = 6
(c) Forecasting horizon = 9
(d) Forecasting horizon = 12
node feature combination that yielded the lowest MSE Table 4
value, which included betweenness centrality and close- Forecasting results on topic graphs with pre-trained A3T-GCN
ness centrality as node features. model.</p>
      <p>After conducting feature selection and hyperparam- Topic Graphs
eter optimization, we trained the A3T-GCN model to Forecasting horizon MSE MAE
forecast future trends for horizons of 1, 3, 6, 9, and 12
months using a training dataset and validation dataset 31 00.0.001832402 00.0.078580510
consisting of 2000 random subgraphs. As mentioned pre- 6 0.01618 0.09025
viously, we fixed the training lookback window at 12. 9 0.02926 0.10925
The performance of the model was evaluated on a test 12 0.03055 0.10695
dataset consisting of 2000 random subgraphs, and the
results of the evaluation are presented in Table 3 with
the evaluation metrics MSE and MAE.
3.4.2. Forecasting of the topic
We utilized the pre-trained A3T- GCN model to perform
topic trend forecasting on topic graphs for each
forecasting horizon. Table 4 presents the forecasting results
using the evaluation metrics MSE and MAE. As the
forecasting horizon increased, the MSE and MAE values also
increased, but we observed an exceptional case for the
forecasting horizon of 3, which had the lowest MSE and
MAE value.</p>
      <p>We present the results of our topic trend forecasting
using visualizations that depict the actual and predicted
15k
10k
)
n
a
e
m
(
t
n
u
o
c
d
r 5k
o
0
10k
8k
)n 6k
a
m
(
t
o
c
d
r
o
n 4k
u
2k
0
10k
8k
n 6k
)
a
t
n 4k
u
e
m
(
o
c
d
r
o
W 2k
0
14k
12k
10k
)
n
e 8k
a
m
(
o
c
t
n 6k
u
rd 4k
o
2k
0</p>
      <p>Topic 02</p>
      <p>Time
Topic 05</p>
      <p>Time
Topic 08</p>
      <p>Time</p>
      <p>Topic 11
e</p>
      <p>e
W</p>
      <p>W
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
7 7 7 7 8 8 8 8 9 9 9 9 0 0 0 0 1 1 1 1 2 2 2 2
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1
1 4 7 0 1 4 7 0 1 4 7 0 1 4 7 0 1 4 7 0 1 4 7 0
20k
15k
)
n
a
e
u
o
c
FiAguprpee7n:dMixeBan. Mofewaonrdocfowunotr dfocrothuentofpoirc threendtofpoircectarestnindgfmoroedcealssbtiyngformecoadsetilnsgbhyorfiozorenc.asting horizon
word count of keywords for each topic. The blue line count across all forecasting horizons.
in the graph shows the actual frequency of keywords in
the topic, while the orange line represents the predicted
word count. Figure 6 shows the results for Topic 6, while
Appendix A provides the results for all topics. As
illustrated in Figure 6, the predicted line closely matches
the ground truth line, indicating the efectiveness of the
topic trend forecasting. To provide a more
comprehensive understanding of our topic trend forecasting models,
we also generated visualizations of the mean word count
for each topic across the forecasting horizons, which are
presented in Figure 7. The predicted mean word count
exhibits similar trends and values to the actual mean word</p>
    </sec>
    <sec id="sec-3">
      <title>4. Conclusions</title>
      <p>In this paper, we propose a novel approach for
forecasting future topic trends in the blockchain domain using
a combination of topic modeling techniques and graph
convolutional networks (GCNs). For the application of
our approach to the paper data, GCN model shows great
performance on the prediction of topic trend, even if it
was trained using random subgraphs of the overall
document. The proposed approach addresses the limitations
of previous studies by capturing the complex dynamics
of topic trends and the intellectual structure of research [4] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, S. Y.
ifelds. Philip, A comprehensive survey on graph neural</p>
      <p>This study shows that paper data have seasonality networks, IEEE transactions on neural networks
that can be leveraged for experiments. Our methodology and learning systems 32 (2020) 4–24.
significantly enhances the prediction of topic trends, as [5] W. Jiang, J. Luo, Graph neural network for trafic
demonstrated by experimental results. This approach has forecasting: A survey, Expert Systems with
Appliimplications for researchers, businesses, professionals, cations (2022) 117921.
and policymakers, as it can provide valuable insights [6] X. Yin, D. Yan, A. Almudaifer, S. Yan, Y. Zhou,
Forefor making informed predictions about the future in the casting stock prices using stock correlation graph:
rapidly evolving blockchain field. Although our approach A graph convolutional network approach, in: 2021
has not been extensively explored in previous studies, International Joint Conference on Neural Networks
our experiments demonstrate its potential for forecasting (IJCNN), IEEE, 2021, pp. 1–8.
future topic trends. [7] Y. Li, R. Yu, C. Shahabi, Y. Liu, Difusion
convolu</p>
      <p>Furthermore, we made an attempt to apply our ap- tional recurrent neural network: Data-driven trafic
proach to patent data using a pre-trained A3T-GCN forecasting, arXiv preprint arXiv:1707.01926 (2017).
model, however, the results did not meet our expecta- [8] L. Zhao, Y. Song, C. Zhang, Y. Liu, P. Wang, T. Lin,
tions. As a result, we are currently unable to apply our M. Deng, H. Li, T-gcn: A temporal graph
convolumodel to data sources other than academic papers. Our tional network for trafic prediction, IEEE
transacnext step involves the design and training of GCN mod- tions on intelligent transportation systems 21 (2019)
els tailored for forecasting topic trends in patent and 3848–3858.
news data. We aim to explore various architectural de- [9] S. Guo, Y. Lin, N. Feng, C. Song, H. Wan,
Attensigns and hyperparameters to improve the accuracy and tion based spatial-temporal graph convolutional
robustness of the models. Moreover, we plan to com- networks for trafic flow forecasting, in:
Proceedpare our proposed approach with other state-of-the-art ings of the AAAI conference on artificial
intellitime-series methodologies, including both deep-learning gence, volume 33, 2019, pp. 922–929.
and traditional methods, to demonstrate its efectiveness [10] J. Bai, J. Zhu, Y. Song, L. Zhao, Z. Hou, R. Du, H. Li,
and superiority in future research. Our ultimate goal A3t-gcn: Attention temporal graph convolutional
is to contribute to the advancement of research in the network for trafic forecasting, ISPRS International
blockchain domain and related fields by providing a pow- Journal of Geo-Information 10 (2021) 485.
erful and reliable tool for trend forecasting and analysis. [11] M. Xu, J. Du, Z. Xue, Z. Guan, F. Kou, L. Shi, A
sciOur proposed approach has the potential to be applied to entific research topic trend prediction model based
a wide range of real-world applications, such as financial on multi-lstm and graph convolutional network,
Inforecasting, risk management, and market trend analysis. ternational Journal of Intelligent Systems 37 (2022)
6331–6353.
[12] I. Vayansky, S. A. Kumar, A review of topic
model5. Acknowledgments ing methods, Information Systems 94 (2020) 101582.
[13] D. M. Mimno, A. McCallum, Topic models
This work was supported by the National Research Foun- conditioned on arbitrary features with
dirichletdation of Korea (NRF) grant funded by the Korean gov- multinomial regression., in: UAI, volume 24,
Citeernment (MSIT) (No. 2022R1A2B5B02002359). seer, 2008, pp. 411–418.
[14] D. M. Blei, A. Y. Ng, M. I. Jordan, Latent dirichlet
References allocation, Journal of machine Learning research 3
(2003) 993–1022.
[1] S. M. H. Bamakan, A. B. Bondarti, P. B. Bondarti, [15] F. Murtagh, P. Legendre, Ward’s hierarchical
agQ. Qu, Blockchain technology forecasting by patent glomerative clustering method: which algorithms
analytics and text mining, Blockchain: Research implement ward’s criterion?, Journal of
classificaand Applications 2 (2021) 100019. tion 31 (2014) 274–295.
[2] Y. Zou, T. Meng, P. Zhang, W. Zhang, H. Li, Fo- [16] D. Müllner, Modern hierarchical,
agglomercus on blockchain: A comprehensive survey on ative clustering algorithms, arXiv preprint
academic and application, IEEE Access 8 (2020) arXiv:1109.2378 (2011).</p>
      <p>187182–187201. [17] H. Kim, H. Park, M. Song, Developing a topic-driven
[3] T. N. Kipf, M. Welling, Semi-supervised classifi- method for interdisciplinarity analysis, Journal of
cation with graph convolutional networks, arXiv Informetrics 16 (2022) 101255.
preprint arXiv:1609.02907 (2016). [18] K. Porter, Analyzing the darknetmarkets subreddit
for evolutions of tools and trends using lda topic
modeling, Digital Investigation 26 (2018) S87–S97.
[19] H. Lee, J. Kwak, M. Song, C. O. Kim, Coherence
analysis of research and education using topic
modeling, Scientometrics 102 (2015) 1119–1137.
[20] S. Boon-Itt, Y. Skunkan, et al., Public perception
of the covid-19 pandemic on twitter: sentiment
analysis and topic modeling study, JMIR Public</p>
      <p>Health and Surveillance 6 (2020) e21978.
[21] Y. Fang, Y. Guo, C. Huang, L. Liu, Analyzing and
identifying data breaches in underground forums,</p>
      <p>IEEE Access 7 (2019) 48770–48777.
[22] M. Hasan, A. Rahman, M. R. Karim, M. S. I. Khan,</p>
      <p>M. J. Islam, Normalized approach to find optimal
number of topics in latent dirichlet allocation (lda),
in: Proceedings of International Conference on
Trends in Computational and Cognitive
Engineering: Proceedings of TCCE 2020, Springer, 2021, pp.</p>
      <p>341–354.
[23] Q. Xie, X. Zhang, Y. Ding, M. Song,
Monolingual and multilingual topic analysis using lda and
bert embeddings, Journal of Informetrics 14 (2020)
101055.
[24] S. Sia, A. Dalmia, S. J. Mielke, Tired of topic
models? clusters of pretrained word embeddings
make for fast and good topics too!, arXiv preprint
arXiv:2004.14914 (2020).
[25] D. Miller, Leveraging bert for extractive text
summarization on lectures, arXiv preprint
arXiv:1906.04165 (2019).
[26] J. Zhang, Y. Luo, Degree centrality, betweenness
centrality, and closeness centrality in social
network, in: 2017 2nd international conference on
modelling, simulation and applied mathematics
(MSAM2017), Atlantis press, 2017, pp. 300–303.
[27] J. D. Noh, H. Rieger, Random walks on complex</p>
      <p>networks, Physical review letters 92 (2004) 118701.
[28] F. Fouss, A. Pirotte, J.-M. Renders, M. Saerens,</p>
      <p>Random-walk computation of similarities between
nodes of a graph with application to collaborative
recommendation, IEEE Transactions on knowledge
and data engineering 19 (2007) 355–369.
[29] P. J. Rousseeuw, Silhouettes: a graphical aid to
the interpretation and validation of cluster analysis,
Journal of computational and applied mathematics
20 (1987) 53–65.
[30] K. R. Shahapure, C. Nicholas, Cluster quality
analysis using silhouette score, in: 2020 IEEE 7th
international conference on data science and advanced
analytics (DSAA), IEEE, 2020, pp. 747–748.
[31] G. Ogbuabor, F. Ugwoke, Clustering algorithm for
a healthcare dataset using silhouette score value,</p>
      <p>Int. J. Comput. Sci. Inf. Technol 10 (2018) 27–37.
[32] A. Jadon, A. Patil, S. Jadon, A comprehensive survey
of regression based loss functions for time series
forecasting, arXiv preprint arXiv:2211.02989 (2022).</p>
      <p>A. Visualization of the topic trend forecasting Results by forecasting
horizon for Each Topic</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>