Technological Forecasting Based on Spectral Clustering for
                                Word Frequency Time Series
                                Han Huang1, Xiaoguang Wang2,3 and Hongyu Wang1,*

                                1 School of Management, Wuhan University of Technology, Wuhan, China, 430070

                                2 School of Information Management, Wuhan University, Wuhan, China, 430072

                                3 Institute of Big Data, Wuhan University, Wuhan, China, 430072


                                                   Abstract
                                                   As an essential strategy for identifying technologies that should be given priority for future
                                                   development, the investigation into methods of technological forecasting holds considerable
                                                   importance. This study introduces a novel method for technological forecasting, the Time Trend
                                                   Clustering Model (TTCM) based on spectral clustering, and engages in an analysis and discussion
                                                   utilizing word frequency time series. To verify the efficacy of the model, this study initially applies
                                                   the TTCM model to analyze standard time series datasets. The experimental findings indicate the
                                                   model's effectiveness in distinguishing time series data with identical trends of variation. Further,
                                                   taking the Library and Information Science (LIS) discipline as an example, this study employs the
                                                   TTCM model to cluster the trends of word frequency time series, identifying emerging words
                                                   with burst trends, label words with high-frequency fluctuation trends, hotspot words with
                                                   increasing trends, and fading words with decreasing trends. By integrating the term function, the
                                                   effectiveness of the TTCM model in the discovery of domain knowledge and technological
                                                   forecasting is demonstrated.

                                                   Keywords
                                                   Technological forecasting, time series, temporal trend clustering, spectral clustering, term
                                                   frequency analysis1


                                1. Introduction                                                                        their own realities in an attempt to secure a proactive
                                                                                                                       and advantageous position in future competition[2–4].
                                In the current era, the development of the socio-                                      In this context, the significance of technological
                                economic landscape relies more heavily on the                                          forecasting has become increasingly prominent.
                                capability and efficacy of scientific and technological                                    From the perspective of knowledge management,
                                innovation than at any time before[1]. Nations,                                        technological forecasting is a process that involves the
                                regions, organizations, and corporations alike are                                     continuous refinement, filtering, discovery, and
                                dedicating efforts towards the strategic planning and                                  creation of knowledge based on the mining of a vast
                                foresight of science and technology, evaluating the                                    amount of data information (explicit knowledge) and
                                potential directions of technological revolutions,                                     expert experience (tacit knowledge), which then
                                selecting key frontier areas of science and technology,                                systematically selects research areas and general
                                and establishing innovation systems that align with


                                Joint Workshop of the 5th Extraction and Evaluation of
                                Knowledge Entities from Scientific Documents and the 4th AI +
                                Informetrics (EEKE-AII2024), April 23~24, 2024, Changchun,
                                China and Online
                                * Corresponding author.
                                   huanghan@whu.edu.cn (H. Huang); wxguang@whu.edu.cn (X.
                                Wang); hongyuwang@whut.edu.cn (H. Wang)
                                   0000-0002-1517-9731 (H. Huang); 0000-0003-1284-7164 (X.
                                Wang); 0000-0002-5063-9166 (H. Wang)
                                             © Copyright 2024 for this paper by its authors. Use permitted under
                                             Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings

                                                                                                                   9
technologies of strategic significance[5]. In an                  with similar evolution trends. Furthermore, taking the
environment where the indices of scientific literature            LIS discipline as an example, this study used the TTCM
and patents are growing exponentially, and the                    model to analyze the trends in word frequency time
hardware and software levels of technologies such as              series, identifying four types of word frequency
big data and artificial intelligence are continuously             temporal trends: burst, increasing, decreasing, and
improving[6,7], leveraging big data analytics to mine             high- frequency fluctuation. Based on these findings,
scientific texts and identify different patterns of               the study analyzed the future research trends in the
technological development, then supplemented by                   LIS discipline, further validating the scientific
expert judgement to evaluate the future trends of                 relevance and applicability of the TTCM model in
technology constitutes a crucial implementation path              technological forecasting.
for technological forecasting[8,9]. Among these, the
automated determination of technological evolution                2. Literature review
stages is an initial problem that needs to be addressed.
     Word frequency serves as a fundamental indicator             2.1. The methods of technological
reflecting the popularity and activity level of scientific             forecasting & foresight
and technological fields[10,11], with its temporal
                                                                  Technology foresight has evolved from large-scale
trends effectively revealing the dynamics of scientific
                                                                  technological prediction activities, specifically the
and technological development[12,13]. Some studies
                                                                  Delphi survey[5]. With the rapid development of
utilize word frequency analysis to understand the
                                                                  science and technology, the continuous changes in the
hotspots, frontiers, and their changes within specific
                                                                  economic and social environment, and the ongoing
disciplines or technological areas by analyzing high-
                                                                  accumulation of diverse and heterogeneous scientific
frequency words, new word retention rates, and time
                                                                  and technological data, the methods and tools for
series trends[10,14], often relying on the intervention
                                                                  technology foresight have gradually diversified[3].
of expert knowledge for manual interpretation of
                                                                  The methods of technology foresight can primarily be
these temporal trends. While some researchers have
                                                                  categorized into two types: one is driven by expert
employed statistical tests like the Man-Kendall
                                                                  experience and wisdom, primarily qualitative in
test[15], as well as curve clustering methods such as
                                                                  nature; the other is driven by data and technology,
the nearest-neighbor propagation algorithm[16], to
                                                                  primarily quantitative in nature.
analyze the time trends of word frequency sequences
                                                                      In qualitative-oriented technology foresight
in a (semi-)automated manner, these studies typically
                                                                  studies, the Delphi method is the most widely used
use small datasets and identify relatively simple
                                                                  research approach[19]. Countries such as Japan,
evolutionary patterns. Indeed, the variation of word
                                                                  Germany, South Korea, and China have all conducted
frequency within a specific time window can be
                                                                  national-level technology foresight activities based on
considered a typical time series[17,18], allowing for
                                                                  the Delphi survey[20,21], which has been extensively
the analysis of changing patterns using time series
                                                                  applied in various technological fields including
trend clustering models. By detecting word frequency
                                                                  agriculture, environment, healthcare, and ICT[22].
trends such as bursts, growth, sudden drops, and
                                                                  Besides the Delphi method, commonly used
declines, it is possible to reflect the evolutionary
                                                                  approaches also include technology road mapping,
stages of technological points. Further integrating the
                                                                  scenario analysis, brainstorming, morphological
different growth patterns of various technological
                                                                  analysis, and the Analytic Hierarchy Process
points within a tech field, combined with expert
                                                                  (AHP)[23–26]. The advantage of these methods lies in
knowledge, facilitates the foresight of key, common,
                                                                  their ability to fully leverage expert experience.
and emerging technologies in the technological
                                                                  However, due to their strong subjective nature and
domain.
                                                                  the high requirements for the number of experts, their
     To this end, this study introduces TTCM and
                                                                  fields of expertise, and their experience, as well as the
employs this model to analyze word frequency time
                                                                  significant amount of time and expense involved,
series for technological forecasting. TTCM integrates
                                                                  these methods are increasingly questioned and
the Dynamic Time Warping (DTW) algorithm with
                                                                  gradually becoming unsuitable in the information age,
spectral clustering, enabling the automatic clustering
                                                                  characterized by an explosive growth in data volume.
of time series with similar evolution trends. To verify
                                                                      The quantitative methods of data and technology-
the model's effectiveness, this study first applied the
                                                                  driven technology foresight primarily involve
TTCM model to cluster standard time series datasets
                                                                  extracting valuable information from vast datasets to
from the UCI repository, demonstrating TTCM's
                                                                  construct systematic foresight models[3]. These
capability to effectively differentiate time series data


                                                             10
methods identify effective information for technology           distribution or non-convex sample data. Spectral
foresight through the mining and visualization of               clustering methods applicable to various shape
scientific literature, patents, technical reports, news,        samples may be an effective alternative to such cases.
etc., covering aspects such as theme identification,            At present, some researchers have applied spectral
current state assessment, gap analysis, and trend               clustering to time series data clustering[52].
prediction. Key techniques include growth curves[27],
bibliometrics[28], patent analysis[29], social network          2.3. Spectral clustering algorithm
analysis[30], data envelopment analysis[31], and data
                                                                Spectral clustering is an unsupervised learning
mining methods such as clustering, classification, and
                                                                algorithm based on graph partitioning, capable of
regression[32–34]. By leveraging the mining of
                                                                transforming the clustering problem into a graph
objective data such as literature and patents, these
                                                                segmentation issue on an undirected weighted graph
methods reduce reliance on experts to some extent.
                                                                constructed from the data to be clustered[53,54].
However, they may also lead to decreased
                                                                Unlike algorithms such as K-means that work well
applicability and effectiveness in decision support due
                                                                only for convex sample data, the spectral clustering
to the lack of expert experience and dependency on
                                                                algorithm is applicable to sample spaces of any shape
technological pathways.
                                                                and converges to a global optimal solution, and it is
                                                                also applicable to high-dimensional data[55,56].
2.2. Time series clustering analysis
                                                                    Currently, spectral clustering has been widely
Time series analysis aims at mining useful                      used in image segmentation[57], face recognition[58],
information and knowledge from a large number of                earth science[59] and other related researches. Due to
complex time series data, among which cluster                   the good data applicability and clustering effect,
analysis is one of the important methods of time series         scholars have also applied spectral clustering to
data mining[35]. Time series clustering analysis                research in LIS discipline such as scientometrics and
method has been applied to the analysis and mining of           information retrieval: Chifu et al. [60] proposed a
stock data[36], social media data[37], landsat time             word sense discrimination method based on spectral
series data[38], smart grid data[39], health detection          clustering for ranking matching documents in
data[40], etc.                                                  information retrieval, thus improving the efficiency of
    The main process of time series clustering is               information retrieval, and similarly; similarly, Singh et
similarity measurement and clustering[41]. Among                al. [61] also used spectral clustering algorithm to
similarity measurement methods, shape-based                     improve the strategy of user ranking in community
approaches are the most commonly used[42]. One of               Q&A sites; Colavizza and Franceschet [62] used
the simpler approaches to implement is the Euclidean            spectral clustering algorithm to cluster literature
distance, and although it has some applications in              citations in physical reviews to find similar
distance measurement of time series[43], it is difficult        documents. Chen et al. [63] also used spectral
to effectively take into account the phase distortion           clustering method in multi-perspective analysis of co-
between time series[44]. At the same time, the                  citation networks; Feng et al. [64] used spectral
difference in Euclidean distance between subseries at           clustering to verify the impact of different feature
similar locations and waveforms can also be large due           combinations such as JIF, 5-Year JIF, and CiteScore on
to the difference in their amplitudes[45]. In contrast,         the journal classification.
the Dynamic Time Warping (DTW) distance[46],                        In addition, some researchers have also proposed
improves the process of calculating the Euclidean               optimization schemes to address the problems of high
distance. It realizes one-to-many matching of data              computational complexity of spectral clustering and
point in time series through the dynamic warping so             difficulties in data representation: for example, Wang
that it has good robustness to the phase deviation and          et al. [65] proposed a linear spatial embedding
amplitude deformation of time series, and performs              clustering method to optimize the similarity matrix
well in time series clustering task[41,47–49].                  and clustering results of spectral clustering by
Clustering algorithms for time series can be roughly            adaptive neighbors; Sapkota et al. [66] optimized the
divided into hierarchical clustering, model-based               initial clustering center to improve the stability of the
clustering, partition-based clustering and density-             algorithm; Some researchers have also implemented
based clustering. Partition-based clustering is the             spectral clustering algorithms based on the Spark big
most commonly used method, such as K-Means[50],                 data computing framework, Julia language, etc., which
K-Medoids[51], etc. However, K-means and related                improved the algorithm running efficiency by parallel
methods are not fully applicable to uneven sample               computing [67,68].


                                                           11
3. Methodology                                                 are completed, and the frequency of keywords over
                                                               different periods is tallied to obtain the time series
3.1. Model definition                                          data for subsequent clustering analysis. The spectral
                                                               clustering algorithm encompasses three core steps:
In this study, a temporal trend clustering model called
                                                               graph construction, graph partitioning, and classical
TTCM based on spectral clustering is proposed to
                                                               clustering. As the TTCM model is based on spectral
analyze the word frequency time series, which is
                                                               clustering, steps 1-3 in Figure 1 are also the critical
implemented using the Spark framework. The
                                                               steps for clustering the trends of word frequency time
algorithm flow is shown in Figure 1. Following the
                                                               series in the TTCM model. These three steps are
retrieval requirements,        the    collection and
                                                               introduced in detail below.
preprocessing of keywords in a given academic field


Figure 1: The algorithm flow of TTCM

3.1.1. Graph construction and its matrix                            For the time series 𝑋 = {𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑚 } and 𝑌 =
       representation                                          {𝑦1 , 𝑦2 , 𝑦3 , … , 𝑦𝑛 }, the DTW distance between them is
                                                               calculated as formula (1).
In this study, the time series is regarded as vertices
and the DTW distance between time series is used as            𝐷(𝑋, 𝑌) = 𝑊=𝑤𝑎𝑟𝑔𝑚𝑖𝑛      ∑𝐾            (𝑥 − 𝑦𝑗 )2 , 𝑖 ∈
                                                                              ,…,𝑤 …,𝑤 √ 𝑘=1,𝑤𝑘 =(𝑖,𝑗) 𝑖
                                                                               1    𝑘           𝐾
edge weight to construct an adjacency matrix A.                (1, 𝑚), 𝑗 ∈ (1, 𝑛)                                   (1)
    The basic idea of DTW is to find the optimal                   Where, 𝑤𝑘 = (𝑖, 𝑗) represents that the ith data
correspondence between two sequences and obtain                point of 𝑋 and the jth data point of 𝑌 in the path k are
the best match between two sequences to calculate              corresponding points, and 𝑊 is the optimal path,
the similarity. Its matching principle is shown in             which can minimize the value of 𝐷(𝑋, 𝑌).
Figure 2.                                                          Further, in order to reduce the dimensional
                                                               difference between distances, this study uses the local
                                                               scale Gaussian kernel function to normalize the DTW
                                                               distances between time series to obtain the similar
                                                               matrix 𝑊 = {𝑤11 , … , 𝑤1𝑛 , … , 𝑤𝑚𝑛 }, whose calculation
                                                               process is shown in Formula (2) [69].
                                                                                         𝑑2
                                                                                          𝑥𝑦
                                                                                    −
Figure 2: The matching principle of DTW algorithm                         𝑤𝑥𝑦 = 𝑒       𝜎𝑥 𝜎𝑦
                                                                                                , 𝜎𝑥 = 𝑑𝑥𝐾 , 𝜎𝑦 = 𝑑𝑦𝐾   (2)


                                                          12
    Where 𝑑𝑥𝑦 is the distance between time series 𝑋                  of 𝐻 as a vector in the current space, and conducts
and 𝑌 , 𝜎𝑥 is the local parameter of 𝑋 , and is the                  cluster analysis on it, and obtain the category of ℎ𝑛 is
distance between X and its Kth neighbor, the value of                the category of the n'th time series.
K is usually set as 7[69].
    Then, the similarity matrix is transformed into                  3.2. Algorithm parameter determination
Laplacian matrix. In order to prevent the analysis                   In the implementation of TTCM model, 𝜆, the number
error caused by the non-uniform dimension between
                                                                     of feature vectors, is a parameter that needs to be set
data, the symmetric normalized Laplacian matrix is
                                                                     in advance. In practical processes, 𝜆 is often set as 𝑘,
used to represent the graph, and its definition is
                                                                     the final expected number of spectral clustering.
shown in Formula (3).
                    1     1             1         1                  However, the clustering number 𝑘 is usually
         𝐿𝑠𝑦𝑚 = 𝐷−2 𝐿𝐷 −2 = 𝐼 − 𝐷−2 𝑊𝐷−2            (3)              determined according to the change of error sum of
    Where, 𝐼 is the identity matrix and 𝐷 is the degree              squares or contour coefficient of K-Means model at
matrix, that is, each column element of the similar                  the last stage. In order to determine λ in advance, a
matrix 𝑊 is added and placed on the diagonal matrix                  novel dimension determination method of indicator
formed by the corresponding row of the current                       matrix is designed based on the meaning and
column.                                                              properties of the Fiedler vector of the Laplace matrix.
                                                                         The Fiedler vector is the eigenvector
3.1.2. The determination and transformation of                       corresponding to the minimum non-zero eigenvalue
       graph partition criterion.                                    (also known as the second smallest eigenvalue) of the
The key of spectral clustering is to cut the undirected              Laplace matrix of the graph [70]. In this study, the
weighted graph reasonably to maximize the sum of                     Fidler vector of 𝐿𝑠𝑦𝑚 is first taken as the indicator
the weights between the samples in the subgraph, that                matrix H, and then k-means clustering is carried out
is, to minimize the sum of weights of the cut edges.                 on H, and the evolution trend between the number of
According to the graph representation of the                         clustering and the error sum of squares is observed to
symmetric normalized Laplacian matrix determined                     determine the optimal number of clustering K. Then,
above, this study adopt N-Cut partition criterion,                   λ is set as k, k-1 and k-2 to conduct the subsequent
whose objective function is shown in formula (4).                    analysis. Finally, on the basis of ensuring that the
                                    1
                                        ∑𝑘         ̅̅̅               difference between clusters, a small λ value is chosen
                                         𝑖=1 𝑊(𝐴𝑖 ,𝐴𝑖 )
         𝑁𝐶𝑢𝑡(𝐴1 , … , 𝐴𝑘 ) = ∑𝑘𝑖=1 2                     (4)        to reduce the time and space cost of the subsequent
                                            𝑣𝑜𝑙(𝐴𝑖 )
    𝑘 represents the total number of subsets, 𝐴𝑖                     calculation process, and avoid overfitting. In this
represents the i'th subset, 𝐴̅𝑖 is the complementary set             study, the above method of 𝜆 selection is called the
of 𝐴𝑖 , 𝑊(𝐴𝑖 , 𝐴̅𝑖 ) represents the sum of the weights of            principle of low.
the edges of points in subset 𝐴𝑖 and points outside of
subset 𝐴𝑖 , 𝑣𝑜𝑙(𝐴𝑖 ) is the sum of the weights of all                4. Experiments and result analysis
edges in subset 𝐴𝑖 . According to the mathematical
derivation, the solution of the objective function can               4.1. Model validation through time series
be transformed into solving the minimum eigenvalue                        standard dataset
of the Laplace matrix and its corresponding                          To verify the efficiency of TTCM in time series
eigenvector. In this study, the eigenvectors (also                   clustering, the time series standard dataset[71] in the
known as indicator vector) corresponding to the                      Knowledge Discovery Archive [72] of University of
minimum 𝜆 eigenvalues of 𝐿𝑠𝑦𝑚 should be solved, and                  California, Irvine (UCI) is used to test the model. And
the eigenmatrix 𝐻 (also known as indicator matrix)                   the Power Iteration Clustering (PIC) model[73] and
composed of these indicator vectors is the                           the Affinity Propagation (AP) clustering model[74],
approximate optimal solution to the graph partition                  which are also based on graph theory, are selected as
problem. 𝐻 is a matrix with dimension 𝑁 ∗ 𝜆, and 𝑁 is                the baseline to compare the model recognition effects.
the number of time series data.                                          There are 600 pieces of data in the time series
                                                                     dataset, and every 100 pieces represent a trend type,
3.1.3. Data clustering through classical clustering                  which are marked as normal, cyclic, increasing trend,
       algorithm.                                                    decreasing trend, upward shift, and downward shift.
After the graph is divided, the classical clustering                 Figure 3 shows the sample data of these six trends.
algorithm can be used to cluster 𝐻. Based on the K-                      According to the trends of the test dataset, the
means algorithm, this study regards the row data ℎ𝑛                  clustering number of TTCM model, PIC model and AP
                                                                     model is set to 6, and the maximum number of


                                                                13
iterations is set as 30. Meanwhile, in the TTCM model,              After the clustering is completed, the identified
λ is set to 4,5, and 6. In addition, the three model all        cluster labels are matched to the actual labels
use the similarity matrix W calculated by formula (2).          according to the data distribution in various clusters,
                                                                that is, if the identified cluster 1 contains the most
                                                                increasing trend data, the cluster 1 will be marked as
                                                                increasing trend. Then, the number of increasing
                                                                trend and other types of data in cluster 1 is compared
                                                                with the actual increasing trend data number (i.e.
                                                                100). After calculating the values of precision (P),
                                                                recall rate (R) and F1 respectively, the average values
                                                                of P, R and F1 in six categories are used as the
                                                                evaluation value of the effect of models, and the
                                                                results are shown in Table 1.

Figure 3: Sample data of the six trends in the time
series dataset

Table 1
Experimental results of TTCM/PIC/AP model on test dataset
    Model        TTCM (𝜆 = 4)         TTCM (𝜆 = 5)         TTCM (𝜆 = 6)              PIC                   AP
   Number            406                  578                  492                  200                   419
      P             64.02%              96.63%                81.14%               24.44%                70.48%
     R              67.67%              96.33%                82.00%               33.33%                69.83%
     F1             64.18%              96.33%                81.44%               27.50%                67.11%


   It can be seen that, when λ= 5, the TTCM model has     model with other values of λ. Specific to each category,
a good recognition effect, it can accurately identify     the recognition results of TTCM when λ= 5 are shown
578 time series data trends, F1 value up to 96.33%,       in Table 2.
much higher than PIC model, AP model and TTCM
Table 2
Confusion matrix of the six-classification problem corresponding to TTCM model (λ= 5)
     Model                                   Increasing         Decreasing                    Downward
                Normal          Cyclic                                         Upward shift              Recall
Actual                                         trend              trend                         shift
  Normal           98             0              2                  0                0           0      98.00%
   Cyclic          0             100             0                  0                0           0     100.00%
Increasing
                   0              0             100                  0               0             0        100.00%
   trend
Decreasing
                   0              0               0                 99               0             1        99.00%
   trend
 Upward
                   1              0              11                  0              88             0        88.00%
    shift
Downward
                   0              0               0                  7               0            93        93.00%
    shift
                                                                                                              F1 =
 Precision      98.99%        100.00%         88.50%              93.40%         100.00%        98.94%
                                                                                                            0.9633


    By observing Table 2, it can be further found that          effectively distinguish the evolution trends of time
TTCM can effectively distinguish the six types of               series and cluster time series with similar trends.
trends in the test data set, and only errors appear in
the recognition of a small number of increasing and
upward shifts and decreasing and downward shifts.
Overall, the TTCM model proposed in this paper can


                                                           14
4.2. Temporal trend clustering through word                     included in the Social Sciences Citation Index (SSCI) in
     frequency time series                                      the field of LIS from 2011 to 2020. The document
                                                                types of papers are limited to research article and
4.2.1. Data collection and preprocessing                        review, and the language is limited to English. Finally,
                                                                the case dataset containing 38932 scientific papers is
In order to further verify the effectiveness of TTCM in
                                                                obtained. Then the keywords of these papers are
detecting trends within word frequency time series,
                                                                carried out the preprocessing process including
combined with the disciplinary background of the
                                                                denoising, morphology reduction, abbreviation
team members, this study selected the LIS discipline
                                                                conversion. After preprocessing, the number and
for case analysis. This study adopts the same data
                                                                frequency of keywords in each year are statistically
collection principles as our previous study[13] and
                                                                analyzed as shown in Table 3.
collects the scientific papers published in the journals
Table 3
Confusion matrix of the six-classification problem corresponding to TTCM model (λ= 5)
   Year      Number of papers     Number of keywords            Frequency of keywords      Average word frequency
   2011          3298                   6324                           10706                        1.69
   2012          3414                   7125                           12230                        1.72
   2013          3618                   8087                           13876                        1.72
   2014          3732                   8267                           14089                         1.7
   2015          3822                   8806                           15546                        1.77
   2016          4155                  10082                           17922                        1.78
   2017          4139                  10156                           17273                         1.7
   2018          4079                  10632                           17894                        1.68
   2019          4205                  11034                           18917                        1.71
   2020          4470                  12270                           21215                        1.73


4.2.2. Results                                                  kind of trend is that the term frequency of keywords
                                                                is low in the early and middle period of the whole-time
In the analysis of word frequency time series,
                                                                span, but the term frequency shows a trend of rapid
consideration was given to the possibility that
                                                                rise in the middle and later periods. Figure 4 shows
keywords with a total frequency count too low might
                                                                the term frequency change curve of some keywords
not exhibit significant trends in time series changes
                                                                with a burst trend in the term frequency series, and a
(i.e., the frequency time series of such keywords could
                                                                total of 30 keywords are clustered as such trend.
be classified as having a uniform trend). Therefore,
adhering to common practice, this study filtered
keywords from a pool of 57,025 distinct keywords
spanning the entire study period, selecting those with
a total frequency count exceeding the length of the
time span. The filtration yielded 1,952 author
keywords that were mentioned in more than ten
articles from 2011 to 2020.
     Utilizing the TTCM model, this study conducted
trend identification on the time series of these 1,952
keywords. Following the principle of low introduced
in Section 3.2, the study set λ to 3 and the number of
clusters k to 5 for the TTCM. Subsequently, plots of the
word frequency time series within each trend
category were generated to facilitate a visual
                                                                Figure 4: Part of words with burst trend in word
observation and summary of the changing
                                                                frequency series (emerging words)
characteristics of the frequency time series trends
within each cluster.                                               In the term frequency series of keywords, the
     In the clustering results of temporal trend of term        second trend can be classified as the increasing trend.
frequency, the first kind of trend can be summarized            The term frequency series of this kind of trend shows
as the burst trend. The obvious characteristic of this


                                                           15
a general trend of fluctuation increase in the whole-          The term frequency series of this kind of trend shows
time span, but the term frequency remains at the low           a general trend of fluctuation decrease in the whole-
level in the whole-time span. Figure 5 shows part of           time span, and the term frequency remains at the low
keywords with an increasing trend in the term                  level in the whole-time span. Figure 7 shows part of
frequency series. There are 177 keywords with the              keywords with a decreasing trend in the term
increasing trends.                                             frequency series. There are 69 keywords with the
                                                               decreasing trends.


Figure 5: Part of words with increasing trend in word
frequency series (hotspot words)                               Figure 7: Part of words with decreasing trend in
                                                               word frequency series (fading words)
    In the clustering results of temporal trend of term
frequency, the third kind of trend can be summarized               The fifth type of trend identified by TTCM for term
as the high-frequency fluctuation trend. The obvious           frequency series contains a total of 1646 keywords,
characteristic of this kind of trend is that the term          and the observation of its trend curves failed to find
frequency of keywords remains at a high level in the           obvious characteristics. Therefore, this paper
whole-time span, and the term frequency fluctuates             speculates that the trend of this kind of term
slightly with the passage of time. Figure 6 shows the          frequency series should be the normal trend without
term frequency change curves of some keywords with             obvious regular fluctuation.
high-frequency fluctuation trend in term frequency                 It can be seen from the clustering results that the
series, and a total of 30 keywords are clustered as            TTCM model is highly effective in identifying
such trend.                                                    emerging words across various disciplines that have
                                                               suddenly burst onto the scene, successfully capturing
                                                               the rising trend of keyword frequencies towards the
                                                               end of the time span. Within the set of keywords
                                                               exhibiting an upward trend, the model accurately
                                                               identified research hotspots that are gradually
                                                               gaining widespread attention among LIS scholars. For
                                                               the keywords identified by the model as having high-
                                                               frequency fluctuations, their frequency levels
                                                               consistently remained high, often signifying core
                                                               research sub-fields or themes within the domain.
                                                               Conversely, the model effectively reflected keywords
                                                               in decline, indicating words that are gradually fading
                                                               from the focal interest of scholars in the discipline.

Figure 6: Part of words with high-frequency                    4.3. Technical foresight based on temporal
fluctuation trend in word frequency series (label                   trend of word frequency
words)
                                                               Further, this study, in accordance with the term
   In the term frequency series of keywords, the               function[75], divides the keywords with significant
fourth trend can be classified as the decreasing trend.        temporal trend into two categories: research


                                                          16
questions/objects               and            research         functional keywords showing varying trends, along
methods/technologies.     The     count   of   different        with examples, is displayed as shown in Table 4.

Table 4
Keyword statistics based on different trends of term function
      Trend               Research questions/objects                        Research methods/technologies
                                          17                                                 13
                           Electronic Health Records                                Machine Learning
                                     Covid-19                                     Artificial Intelligence
      Burst
                            Digital Transformation                                   Deep Learning
                                   Coronavirus                                          Blockchain
                                    Journalism                                      Neural Network
                                         117                                                 60
                                      Privacy                                   Social Network Analysis
                              Research Evaluation                                        Big Data
    Increasing
                                       Gender                                         Classification
                           Scholarly Communication                                      Altmetrics
                                Higher Education                                   Sentiment Analysis
                                          57                                                 12
                                      Internet                                        Focus Group
                                 E-government                                  Semistructured Interviews
    Decreasing
                                 Digital Library                                    Nanotechnology
                                      Web2.0                                      Citation Distribution
                                        Blog                                        Microsimulation
                                          18                                                 12
                                  Social Media                                        Bibliometrics
 High-frequency                 Academic Library                                        Qualitative
   fluctuation                Information Literacy                                  Citation Analysis
                                        China                                           Case Study
                                  Collaboration                                  Information Retrieval
       Total                             209                                                 97


    In general, the time series of word frequencies             quantity of keywords showing an upward trend, a
identified in this study predominantly comprise                 circumstance possibly attributable to the literature
words related to research questions/objects,                    inflation caused by technological explosions. Within
accounting for nearly 70% of the total.                         the rapid accumulation of technological literature, the
    There exists a considerable number of keywords              conservative tendencies of some researchers and/or
exhibiting an increasing trend, with both functional            the attribute of knowledge application contained
types of words showing a relatively balanced                    within certain words might prevent these fading-out
distribution. However, due to the phenomenon of                 keywords from becoming low-frequency words
technological literature inflation, although these              filtered out during the input phase of the TTCM model.
words continuously attract the attention of scholars in         However, these fading words, if not subject to
the field dynamically, the share of related research            knowledge innovation, are highly likely to decline
may not have expanded across the entire disciplinary            gradually. Simultaneously, among the words
spectrum in actuality. As some researchers delve into           demonstrating decreasing trends, words related to
new studies, there are concurrent instances of                  research questions/objects are notably higher in
existing scholars gradually losing focus. Should there          proportion compared to those concerning
be no emergence of new method or technology                     methods/technologies. This discrepancy may be
innovations or the continuation of integrating novel            attributed to the stronger applicability of
research     objects    corresponding      to    these          methods/technologies, where researchers, even
characteristic keywords, the frequencies of these               amidst shifts in research subjects or questions, tend to
words will gradually transition into a decreasing               employ classical and established technical
trend.                                                          methodologies. The proportion of emerging words
    Quantitatively, the number of faded keywords                displaying burst trends is relatively low. Although
exhibiting a decreasing trend is less than half of the          there is a higher absolute number of words related to


                                                           17
research questions/objects, the relative proportion of               This study proposes the TTCM model based on
words related to methods is higher, indicating that,            spectral clustering. Model validation results from
with changes in social and research environments,               Tables 1 and 2 demonstrate that the TTCM model can
researchers have begun to pay attention to some new             effectively distinguish the evolution trends of time
research objects, such as coronaviruses, open science,          series and automatically cluster time series with
and mobile payments, while introducing more                     similar trends. Applying the TTCM model to the
emerging technologies such as artificial intelligence,          analysis of word frequency time series reveals its
machine learning, and neural networks.                          successful identification of sudden emerging words,
    Specifically, regarding methods/technologies,               high-frequency fluctuating words, steadily increasing
technologies such as focus groups, semi-structured              hotspot words, and gradually decreasing fading
interviews, and microsimulation, primarily targeting            words, providing significant reference and guidance
small-scale data samples, exhibit a decreasing trend,           value for anticipatory analysis in disciplinary fields.
while big data analysis techniques such as artificial           Furthermore, combined with term functions,
intelligence, machine learning, deep learning, social           anticipatory analysis of subsequent research
network      analysis,   and     sentiment     analysis         development and technological shifts in the field helps
demonstrate a burst or increasing trend. This reflects          research institutions and relevant practitioners adjust
the progress and evolution of research methods and              research directions in a timely manner, grasp popular
technologies in LIS, with an increasing number of               scientific research trends and frontier opportunities,
researchers adopting emerging technologies to                   and also aids governments and industrial institutions
process and analyze information to gain deeper and              in identifying focal points and trends in the field,
broader insights. Simultaneously, the application of            providing decision-making support for the
emerging       technologies     also    reflects    the         formulation and planning of science and technology
transformation of research content in the LIS field             policies and strategies.
towards quantitative analysis, large-scale data                      Essentially, the TTCM model is a clustering model
processing, and deep data mining. The explosive                 whose clustering objects are time series, and the
growth trend of technologies such as artificial                 clustering basis is the evolution trend of time series,
intelligence also indicates that future research in the         i.e., clustering time series with similar evolution
LIS field may increasingly focus on leveraging                  trends into the same category. Therefore, the
advanced computing technologies to address issues               application of the TTCM model is not limited to word
related to information management, information                  frequency time series. For scientific literature,
retrieval, and user behavior analysis.                          analysis of research hotspots and frontier trends can
                                                                be conducted using time series data such as
5. Discussion                                                   publication volume, citation volume, and author
                                                                quantity. For patent literature, technological
Technological advancements and transformations are              forecasting can be conducted using time series data
not only complex interplays driven by societal,                 such as patent application volume, citation volume,
economic, and political well-being but also their               and patent conversion quantity. Additionally,
outcomes. Predicting and understanding the process              comprehensive analysis can be performed by
of technological change pose challenges for decision-           combining time series data from other sources such as
makers in governments and businesses[76].                       online news, social media, and stock securities. This
Appropriately      implemented        and      effective        aims to provide reference and guidance for
technological forecasting is of significant guiding             organizational decision-making in governments,
value to organizations such as governments and                  industries, and businesses.
businesses[3].    The    research       paradigm      of             Furthermore,      after   conducting     extensive
technological forecasting is still evolving, promoting          identification experiments on time series data using
the effective complementary integration of qualitative          the TTCM model, the identified types of evolution
and quantitative research methods. Seeking novel                trends can be solidified into pattern features. This can
research methodologies to enhance research quality              be further combined with traditional machine
is currently a hotspot and focus in the field of                learning models such as Support Vector Machines, K-
technological forecasting. Therefore, conducting                nearest neighbors, Conditional Random Fields, or
research on technological forecasting methods under             deep learning models such as Convolutional Neural
this backdrop holds certain theoretical significance            Networks, Recurrent Neural Networks, Long Short-
and practical value.                                            Term Memory Networks, to achieve rapid
                                                                identification of large-scale time series evolution


                                                           18
trends. This automation enables automated                        Acknowledgements
prediction of emerging research trends in the field or
potential technological growth points.                           This work was funded by the National Natural Science
                                                                 Fund of China (No. 71874129), the Open-end Fund of
6. Conclusion                                                    Information Engineering Lab of ISTIC and the
                                                                 Independent Innovation Foundation of Wuhan
The present study introduces a novel time series                 University of Technology (No. 233103002).
trend clustering model, named TTCM, and employs it
to analyze word frequency time series for                        References
technological forecasting. TTCM integrates dynamic
time warping algorithm and spectral clustering                       [1] F. Dotsika, A. Watkins, Identifying
algorithm to automatically cluster time series                           potentially disruptive trends by means of
exhibiting similar evolution trends. To validate the                     keyword network analysis, Technological
effectiveness of the model, this research initially                      forecasting and social change 119 (2017)
applies TTCM to cluster standard time series datasets                    114–127.
from the UCI repository, demonstrating its capability                    doi:10.1016/j.techfore.2017.03.020.
to effectively differentiate time series data with                   [2] R.N. Kostoff, R.R. Scaller, Science and
similar evolution trends. Furthermore, using the LIS                     technology roadmaps, IEEE transactions on
discipline as a case study, this research utilizes TTCM                  engineering management 48 (2001) 132–
to cluster the evolution trends of word frequency time                   143. doi:10.1109/17.922473.
series, identifying emerging words with burst trends,                [3] C. Lee, A review of data analytics in
label words with high-frequently fluctuation trends,                     technological forecasting, Technological
hotspot words with increasing trends, and decreasing                     forecasting and social change 166 (2021).
fading words. The integration of term function                           doi:10.1016/j.techfore.2021.120646.
confirms the efficacy of TTCM in domain knowledge                    [4] E. Amanatidou, Beyond the veil - The real
discovery and technological forecasting.                                 value of Foresight, Technological forecasting
    Nevertheless, this study has certain limitations.                    and social change 87 (2014) 274–291.
Firstly, due to computational constraints, only ten                      doi:10.1016/j.techfore.2013.12.030.
years of data were selected for analysis, potentially                [5] B.R. Martin, Foresight in science and
overlooking evolution trends that manifest over                          technology, Technology Analysis & strategic
longer time series. Secondly, the case study is limited                  management 7 (1995) 139-168.
to the LIS domain, warranting further verification of                [6] L. Bornmann, R. Mutz, Growth rates of
the analysis effectiveness of the TTCM model in word                     modern science: A bibliometric analysis
frequency time series from other disciplines and fields.                 based on the number of publications and
Additionally, the analysis in this study is limited to                   cited references, Journal of the Association
keyword        perspectives,    without       considering                for Information Science and Technology 66
interrelations among keywords in the thematic                            (2015) 2215–2222. doi:10.1002/asi.23329.
dimension.                                                           [7] C. Balili, U. Lee, A. Segev, J. Kim, M. Ko,
    In the future research, in addition to addressing                    TermBall: Tracking and predicting evolution
the shortcomings mentioned above, this study will                        types of research topics by using knowledge
incorporate other data sources such as patent data to                    structures in scholarly big data, IEEE Access
achieve technology foresight with multi-source data.                     8           (2020)           108514–108529.
Moreover, after extensive experimentation to                             doi:10.1109/ACCESS.2020.3000948.
determine evolution trends in different types of time                [8] A.C. Adamuthe, G.T. Thampi, Technology
series, this study will consider solidifying these trends                forecasting: A case study of computational
into pattern features and further integrating them                       technologies, Technological forecasting and
with classification models to achieve intelligent and                    social change 143 (2019) 181–189.
automated prediction of emerging research trends or                      doi:10.1016/j.techfore.2019.03.002.
potential technological growth points in large-scale                 [9] H. Lee, S. Lee, B. Yoon, Technology clustering
datasets.                                                                based on evolutionary patterns: The case of
                                                                         information         and      communications
                                                                         technologies, Technological forecasting and
                                                                         social change 78 (2011) 953–967.
                                                                         doi:10.1016/j.techfore.2011.02.002.


                                                            19
[10] W. Lu, S. Huang, J. Yang, Y. Bu, Q. Cheng, Y.        [19] J. Landeta, Current validity of the Delphi
     Huang, Detecting research topic trends by                 method in social sciences, Technological
     author-defined        keyword       frequency,            forecasting and social change 73 (2006)
     Information processing and management 58                  467–482.
     (2021)                                 102594.            doi:10.1016/j.techfore.2005.09.002.
     doi:10.1016/j.ipm.2021.102594.                       [20] T. Shin, Using Delphi for a long-range
[11] Y.H. Hu, C.T. Tai, K.E. Liu, C.F. Cai,                    technology forecasting, and assessing
     Identification of highly-cited papers using               directions of future R&D activities - The
     topic-model-based        and      bibliometric            Korean exercise, Technological forecasting
     features: The consideration of keyword                    and social change 58 (1998) 125–154.
     popularity, Journal of Informetrics 14 (2020)             doi:10.1016/S0040-1625(97)00053-X.
     101004. doi:10.1016/j.joi.2019.101004.               [21] M. Rongping, R. Zhongbao, Y. Sida, Q. Yan,
[12] T.Y. Huang, B. Zhao, Measuring popularity of              `Technology foresight towards 2020 in
     ecological topics in a temporal dynamical                 China’: the practice and its impacts,
     knowledge network, PLoS ONE 14 (2019)                     Technology         analysis     &       strategic
     e0208370.                                                 management          20    (2008)       287–307.
     doi:10.1371/journal.pone.0208370.                         doi:10.1080/09537320801999587.
[13] X. Wang, H. Wang, H. Huang, Evolutionary             [22] A. Suominen, A. Hajikhani, A. Ahola, Y. Kurogi,
     exploration and comparative analysis of the               K. Urashima, A quantitative and qualitative
     research topic networks in information                    approach on the evaluation of technological
     disciplines, Scientometrics 126 (2021)                    pathways: A comparative national-scale
     4991–5017.          doi:10.1007/s11192-021-               Delphi study, Futures 140 (2022).
     03963-6.                                                  doi:10.1016/j.futures.2022.102967.
[14] M. Petrova, P. Sutcliffe, K.W.M. Fulford, J.         [23] T. Heger, R. Rohrbeck, Strategic foresight for
     Dale, Search terms and a validated brief                  collaborative exploration of new business
     search filter to retrieve publications on                 fields, Technological forecasting and social
     health-related values in Medline: A word                  change         79       (2012)         819–831.
     frequency analysis study, Journal of the                  doi:10.1016/j.techfore.2011.11.003.
     American medical informatics association             [24] Y. Tang, H. Sun, Q. Yao, Y. Wang, The
     19 (2012) 479–488. doi:10.1136/amiajnl-                   selection of key technologies by the silicon
     2011-000243.                                              photovoltaic industry based on the Delphi
[15] M. Färber, C. Nishioka, A. Jatowt,                        method and AHP (analytic hierarchy
     ScholarSight: Visualizing temporal trends of              process): Case study of China, Energy 75
     scientific concepts, 2019 ACM/IEEE Joint                  (2014)                                 474–482.
     Conference on Digital Libraries, 2019, pp.                doi:10.1016/j.energy.2014.08.003.
     438–439. doi:10.1109/JCDL.2019.00108.                [25] C. Flick, E.D. Zamani, B.C. Stahl, A. Brem, The
[16] M. Trevisani, A. Tuzzi, Learning the                      future of ICT for health and ageing: Unveiling
     evolution of disciplines from scientific                  ethical and social issues through horizon
     literature: A functional clustering approach              scanning         foresight,      Technological
     to normalized keyword count trajectories,                 forecasting and social change 155 (2020).
     Knowledge-based systems 146 (2018) 129–                   doi:10.1016/j.techfore.2020.119995.
     141. doi:10.1016/j.knosys.2018.01.035.               [26] M. Hussain, E. Tapinos, L. Knight, Scenario-
[17] C. Boothby, S. Milojević, An exploratory full-            driven roadmapping for technology
     text analysis of Science Careers in a changing            foresight, Technological forecasting and
     academic job market, Scientometrics 126                   social change 124 (2017) 160–177.
     (2021) 4055–4071. doi:10.1007/s11192-                     doi:10.1016/j.techfore.2017.05.005.
     021-03905-2.                                         [27] Y. Jeong, I. Park, B. Yoon, Forecasting
[18] E.S. Atlam, M. Okada, M. Shishibori, J. ichi Aoe,         technology substitution based on hazard
     An evaluation method of words tendency                    function, Technological forecasting and
     depending on time-series variation and its                social change 104 (2016) 259–272.
     improvements, Information processing and                  doi:10.1016/j.techfore.2016.01.014.
     management         38    (2002)      157–171.        [28] W. Yeo, S. Kim, H. Park, J. Kang, A
     doi:10.1016/S0306-4573(01)00028-0.                        bibliometric method for measuring the
                                                               degree      of    technological     innovation,


                                                     20
     Technological forecasting and social change           [38] Y. Zhao, L. Lin, W. Lu, Y. Meng, Landsat time
     95               (2015)              152–162.              series clustering under modified dynamic
     doi:10.1016/j.techfore.2015.01.018.                        time warping, in: Q. Weng, P. Gamba, G. Xian,
[29] C. Lee, Y. Cho, H. Seol, Y. Park, A stochastic             J.M. Chen, S. Liang, 4rth international
     patent citation analysis approach to                       workshop on earth observation and remote
     assessing future technological impacts,                    sensing applications, IEEE, New York, USA,
     Technological forecasting and social change                2016.
     79                (2012)               16–29.         [39] H. Son, Y. Kim, S. Kim, Time series clustering
     doi:10.1016/j.techfore.2011.06.009.                        of electricity demand for industrial areas on
[30] M. Coccia, L. Wang, Path-breaking directions               smart      grid,    Energies     13    (2020).
     of nanotechnology-based chemotherapy and                   doi:10.3390/en13092377.
     molecular cancer therapy, Technological               [40] C.H. Sudre, K.A. Lee, M.N. Lochlainn, T.
     forecasting and social change 94 (2015)                    Varsavsky, B. Murray, M.S. Graham, C. Menni,
     155–169.                                                   M. Modat, R.C.E. Bowyer, L.H. Nguyen, D.A.
     doi:10.1016/j.techfore.2014.09.007.                        Drew, A.D. Joshi, W. Ma, C.-G. Guo, C.-H. Lo, S.
[31] D.-J. Lim, T.R. Anderson, O.L. Inman,                      Ganesh, A. Buwe, J.C. Pujol, J.L. du Cadet, A.
     Choosing effective dates from multiple                     Visconti, M.B. Freidin, J.S.E.-S. Moustafa, M.
     optima in Technology Forecasting using                     Falchi, R. Davies, M.F. Gomez, T. Fall, M.J.
     Data Envelopment Analysis (TFDEA),                         Cardoso, J. Wolf, P.W. Franks, A.T. Chan, T.D.
     Technological forecasting and social change                Spector, C.J. Steves, S. Ourselin, Symptom
     88                (2014)               91–97.              clusters in COVID-19: A potential clinical
     doi:10.1016/j.techfore.2014.06.003.                        prediction tool from the COVID symptom
[32] S. Jun, S.S. Park, D.S. Jang, Technology                   study app, Science advances 7 (2021).
     forecasting using matrix map and patent                    doi:10.1126/sciadv.abd4177.
     clustering, Industrial management & data              [41] T. Li, X. Wu, J. Zhang, Time series clustering
     systemS        112      (2012)       786–807.              model based on DTW for classifying car
     doi:10.1108/02635571211232352.                             parks,       Algorithms        13      (2020).
[33] S. Jun, A Forecasting Model for Technological              doi:10.3390/a13030057.
     Trend Using Unsupervised Learning, in: T.H.           [42] S. Zolhavarieh, S. Aghabozorgi, Y.W. Teh, A
     Kim, H. Adeli, A. Cuzzocrea, T. Arslan, Y.C.               Review of Subsequence Time Series
     Zhang, J.H. Ma, K.I. Chung, S. Mariyam, X.F.               Clustering, Scientific world journal (2014).
     Song, Database theory application, bio-                    doi:10.1155/2014/312521.
     science bio-technology, Springer-Verlag               [43] X. Guo, Y. Pang, G. Yan, T. Qiao, Time series
     Berlin, Berlin, Germany, 2011: pp. 51–60.                  forecasting based on deep extreme learning
[34] N. Gozuacik, C.O. Sakar, S. Ozcan,                         machine, in: 29th Chinese control and
     Technological      forecasting    based    on              decision conference, CCDC 2017, 2017: pp.
     estimation of word embedding matrix using                  6151–6156.
     LSTM networks, Technological forecasting                   doi:10.1109/CCDC.2017.7978277.
     and social change 191 (2023) 122520.                  [44] E.J. Keogh, M.J. Pazzani, Relevance feedback
     doi:10.1016/J.TECHFORE.2023.122520.                        retrieval of time series data, in: 22nd annual
[35] P. Esling, C. Agon, Time-Series Data Mining,               international ACM SIGIR conference on
     ACM computing surveys 45 (2012).                           research and development in information
     doi:10.1145/2379776.2379788.                               retrieval, SIGIR 1999, 1999: pp.183–190.
[36] C. Guo, H. Jia, N. Zhang, Time Series                      doi:10.1145/312624.312676.
     Clustering Based on ICA for Stock Data                [45] X.L. Dong, C.K. Gu, Z.O. Wang, Research on
     Analysis, in: 4th international conference on              shape-based time series similarity measure,
     wireless communications, networking and                    in: 2006 international conference on
     mobile computing, VOLS 1-31, IEEE, New                     machine learning and cybernetics, 2006:
     York, USA, 2008: pp. 10903+.                               pp.1253–1258.
[37] H. Zhu, Y. Mei, J. Wei, C. Shen, Prediction of             doi:10.1109/ICMLC.2006.258648.
     online topics’ popularity patterns, Journal of        [46] E. Keogh, C.A. Ratanamahatana, Exact
     information science 48 (2022) 141–151.                     indexing of dynamic time warping,
     doi:10.1177/0165551520961026.                              Knowledge and information systems 7


                                                      21
     (2005) 358–386. doi:10.1007/s10115-004-                           processing and management 51 (2015)
     0154-9.                                                           616–624. doi:10.1016/j.ipm.2015.05.007.
[47] B. Cai, G. Huang, N. Samadiani, G. Li, C.H. Chi,             [56] A.Y. Ng, M.I. Jordan, Y. Weiss, On spectral
     Efficient Time Series Clustering by                               clustering: Analysis and an algorithm, in:
     Minimizing Dynamic Time Warping                                   15th Annual Conference on Neural
     Utilization, IEEE access 9 (2021) 46589-                          Information Processing Systems, Vancouver,
     46599,. doi:10.1109/ACCESS.2021.3067833.                          Canada, 2002: pp. 849–856.
[48] W. Wang, G. Lyu, Y. Shi, X. Liang, Time Series               [57] K. Xia, X. Gu, Y. Zhang, Oriented grouping-
     Clustering Based on Dynamic Time Warping,                         constrained spectral clustering for medical
     in: IEEE 9th International Conference on                          imaging segmentation, Multimedia systems
     Software Engineering and Service Science,                         26 (2020) 27–36. doi:10.1007/s00530-019-
     Beijing,                China,                2018.               00626-8.
     doi:10.1109/ICSESS.2018.8663857.                             [58] D. Xu, C. Li, T. Chen, F. Lang, A novel low rank
[49] V.T. Huy, D.T. Anh, An efficient                                  spectral clustering method for face
     implementation of anytime K-medoids                               identification,      Recent       patents     on
     clustering for time series under dynamic                          engineering        13      (2019)      387–394.
     time warping, in: 7th symposium on                                doi:10.2174/18722121126661808281242
     information and communication technology,                         11.
     2016:                   pp.                  22–29.          [59] H. Talebi, L.J.M. Peeters, U. Mueller, R.
     doi:10.1145/3011077.3011128.                                      Tolosana-Delgado, K.G. van den Boogaart,
[50] X. Huang, Y. Ye, L. Xiong, R.Y.K. Lau, N. Jiang,                  Towards geostatistical learning for the
     S. Wang, Time series k-means: A new k-                            geosciences: A case study in improving the
     means type smooth subspace clustering for                         spatial awareness of spectral clustering,
     time series data, Information sciences 367                        Mathematical geosciences 52 (2020) 1035–
     (2016) 1–13. doi:10.1016/j.ins.2016.05.040.                       1048. doi:10.1007/s11004-020-09867-0.
[51] Y. Chen, X. Liu, X. Li, X. Liu, Y. Yao, G. Hu, X. Xu,        [60] A.G. Chifu, F. Hristea, J. Mothe, M. Popescu,
     F. Pei, Delineating urban functional areas                        Word sense discrimination in information
     with building-level social media data: A                          retrieval: A spectral clustering-based
     dynamic time warping (DTW) distance                               approach, Information processing and
     based k-medoids method, Landscape and                             management           51      (2015)       16–31.
     urban planning 160 (2017) 48–60.                                  doi:10.1016/j.ipm.2014.10.007.
     doi:10.1016/j.landurbplan.2016.12.001.                       [61] A.K. Singh, N.K. Nagwani, S. Pandey, A user
[52] H. Abbasimehr, A. Bahrini, An analytical                          ranking algorithm for efficient information
     framework based on the recency, frequency,                        management of community sites using
     and monetary model and time series                                spectral clustering and folksonomy, Journal
     clustering      techniques          for    dynamic                of information science 45 (2019) 592–606.
     segmentation,        Expert        systems      with              doi:10.1177/0165551518808198.
     applications                192             (2022).          [62] G. Colavizza, M. Franceschet, Clustering
     doi:10.1016/j.eswa.2021.116373.                                   citation histories in the physical review,
[53] M. Alshammari, M. Takatsuka, Approximate                          Journal of informetrics 10 (2016) 1037–
     spectral clustering with eigenvector                              1051. doi:10.1016/j.joi.2016.07.009.
     selection and self-tuned k, Pattern                          [63] C. Chen, F. Ibekwe-SanJuan, J. Hou, The
     recognition letters 122 (2019) 31–37.                             structure and dynamics of cocitation
     doi:10.1016/j.patrec.2019.02.006.                                 clusters: A multiple-perspective cocitation
[54] P.K. Srijith, M. Hepple, K. Bontcheva, D.                         analysis, Journal of the American society for
     Preotiuc-Pietro, Sub-story detection in                           information science and technology 61
     Twitter      with       hierarchical       Dirichlet              (2010) 1386–1409. doi:10.1002/asi.21309.
     processes, Information processing and                        [64] L. Feng, J. Zhou, S.L. Liu, N. Cai, J. Yang,
     management          53      (2017)      989–1003.                 Analysis of journal evaluation indicators: an
     doi:10.1016/j.ipm.2016.10.004.                                    experimental study based on unsupervised
[55] T. Semertzidis, D. Rafailidis, M.G. Strintzis, P.                 Laplacian score, Scientometrics 124 (2020)
     Daras, Large-scale spectral clustering based                      233–254. doi:10.1007/s11192-020-03422-
     on pairwise constraints, Information                              8.


                                                             22
[65] Q. Wang, Z. Qin, F. Nie, X. Li, Spectral              [75] J. Wang, Q. Cheng, W. Lu, Y. Dou, P. Li, A term
     embedded adaptive neighbors clustering,                    function-aware keyword citation network
     IEEE transactions on neural networks and                   method for science mapping analysis,
     learning systems 30 (2019) 1265–1271.                      Information processing & management 60
     doi:10.1109/TNNLS.2018.2861209.                            (2023). doi:10.1016/j.ipm.2023.103405.
[66] N. Sapkota, A. Alsadoon, P.W.C. Prasad, A.            [76] V. Coates, M. Farooque, R. Klavans, K. Lapid,
     Elchouemi, A.K. Singh, Data summarization                  H.A. Linstone, C. Pistorius, A.L. Porter, On the
     using clustering and classification: Spectral              future    of    technological       forecasting,
     clustering combined with k-means using                     Technological forecasting and social change
     NFPH, in: the international conference on                  67 (2001) 1–17. doi:10.1016/S0040-
     machine learning, big data, cloud and                      1625(00)00122-0.
     parallel computing: trends, perspectives and
     prospects, Faridabad, India, 2019: pp.146–
     151. doi:10.1109/COMITCon.2019.8862218.
[67] Z. Huo, G. Mei, G. Casolla, F. Giampaolo,
     Designing an efficient parallel spectral
     clustering      algorithm    on     multi-core
     processors in Julia, Journal of parallel and
     distributed computing 138 (2020) 211–221.
     doi:10.1016/j.jpdc.2020.01.003.
[68] Z. Xing, G. Li, Intelligent classification
     method of remote sensing image based on
     big data in Spark environment, International
     journal of wireless information networks 26
     (2019) 183–192. doi:10.1007/s10776-019-
     00440-z.
[69] L. Zelnik-Manor, P. Perona, Self-tuning
     spectral clustering, in: 17th international
     conference       on     neural    information
     processing systems, 2004: pp. 1601–1608.
[70] H. Qiu, E.R. Hancock, Graph matching and
     clustering using spectral partitions, Pattern
     recognition        39      (2006)      22–34.
     doi:10.1016/j.patcog.2005.06.014.
[71] D.T. Pham, A.B. Chan, Control chart pattern
     recognition using a new type of self-
     organizing neural network, Proceedings of
     the Institution of Mechanical Engineers. Part
     I: Journal of systems and control engineering
     212               (1998)             115–127.
     doi:10.1243/0959651981539343.
[72] S. Hettich, S.D. Bay, The UCI KDD Archive,
     Irvine, CA: University of California,
     department of information and computer
     Science (1999).
[73] F. Lin, W.W. Cohen, Power iteration
     clustering, in: 27th international conference
     on machine learning (ICML-10), Haifa, Israel.,
     2010: pp. 655–662.
[74] B.J. Frey, D. Dueck, Clustering by passing
     messages between data points, Science 315
     (2007)                               972–976.
     doi:10.1126/science.1136800.


                                                      23