Technological Forecasting Based on Spectral Clustering for Word Frequency Time Series Han Huang1, Xiaoguang Wang2,3 and Hongyu Wang1,* 1 School of Management, Wuhan University of Technology, Wuhan, China, 430070 2 School of Information Management, Wuhan University, Wuhan, China, 430072 3 Institute of Big Data, Wuhan University, Wuhan, China, 430072 Abstract As an essential strategy for identifying technologies that should be given priority for future development, the investigation into methods of technological forecasting holds considerable importance. This study introduces a novel method for technological forecasting, the Time Trend Clustering Model (TTCM) based on spectral clustering, and engages in an analysis and discussion utilizing word frequency time series. To verify the efficacy of the model, this study initially applies the TTCM model to analyze standard time series datasets. The experimental findings indicate the model's effectiveness in distinguishing time series data with identical trends of variation. Further, taking the Library and Information Science (LIS) discipline as an example, this study employs the TTCM model to cluster the trends of word frequency time series, identifying emerging words with burst trends, label words with high-frequency fluctuation trends, hotspot words with increasing trends, and fading words with decreasing trends. By integrating the term function, the effectiveness of the TTCM model in the discovery of domain knowledge and technological forecasting is demonstrated. Keywords Technological forecasting, time series, temporal trend clustering, spectral clustering, term frequency analysis1 1. Introduction their own realities in an attempt to secure a proactive and advantageous position in future competition[2–4]. In the current era, the development of the socio- In this context, the significance of technological economic landscape relies more heavily on the forecasting has become increasingly prominent. capability and efficacy of scientific and technological From the perspective of knowledge management, innovation than at any time before[1]. Nations, technological forecasting is a process that involves the regions, organizations, and corporations alike are continuous refinement, filtering, discovery, and dedicating efforts towards the strategic planning and creation of knowledge based on the mining of a vast foresight of science and technology, evaluating the amount of data information (explicit knowledge) and potential directions of technological revolutions, expert experience (tacit knowledge), which then selecting key frontier areas of science and technology, systematically selects research areas and general and establishing innovation systems that align with Joint Workshop of the 5th Extraction and Evaluation of Knowledge Entities from Scientific Documents and the 4th AI + Informetrics (EEKE-AII2024), April 23~24, 2024, Changchun, China and Online * Corresponding author. huanghan@whu.edu.cn (H. Huang); wxguang@whu.edu.cn (X. Wang); hongyuwang@whut.edu.cn (H. Wang) 0000-0002-1517-9731 (H. Huang); 0000-0003-1284-7164 (X. Wang); 0000-0002-5063-9166 (H. Wang) © Copyright 2024 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 9 technologies of strategic significance[5]. In an with similar evolution trends. Furthermore, taking the environment where the indices of scientific literature LIS discipline as an example, this study used the TTCM and patents are growing exponentially, and the model to analyze the trends in word frequency time hardware and software levels of technologies such as series, identifying four types of word frequency big data and artificial intelligence are continuously temporal trends: burst, increasing, decreasing, and improving[6,7], leveraging big data analytics to mine high- frequency fluctuation. Based on these findings, scientific texts and identify different patterns of the study analyzed the future research trends in the technological development, then supplemented by LIS discipline, further validating the scientific expert judgement to evaluate the future trends of relevance and applicability of the TTCM model in technology constitutes a crucial implementation path technological forecasting. for technological forecasting[8,9]. Among these, the automated determination of technological evolution 2. Literature review stages is an initial problem that needs to be addressed. Word frequency serves as a fundamental indicator 2.1. The methods of technological reflecting the popularity and activity level of scientific forecasting & foresight and technological fields[10,11], with its temporal Technology foresight has evolved from large-scale trends effectively revealing the dynamics of scientific technological prediction activities, specifically the and technological development[12,13]. Some studies Delphi survey[5]. With the rapid development of utilize word frequency analysis to understand the science and technology, the continuous changes in the hotspots, frontiers, and their changes within specific economic and social environment, and the ongoing disciplines or technological areas by analyzing high- accumulation of diverse and heterogeneous scientific frequency words, new word retention rates, and time and technological data, the methods and tools for series trends[10,14], often relying on the intervention technology foresight have gradually diversified[3]. of expert knowledge for manual interpretation of The methods of technology foresight can primarily be these temporal trends. While some researchers have categorized into two types: one is driven by expert employed statistical tests like the Man-Kendall experience and wisdom, primarily qualitative in test[15], as well as curve clustering methods such as nature; the other is driven by data and technology, the nearest-neighbor propagation algorithm[16], to primarily quantitative in nature. analyze the time trends of word frequency sequences In qualitative-oriented technology foresight in a (semi-)automated manner, these studies typically studies, the Delphi method is the most widely used use small datasets and identify relatively simple research approach[19]. Countries such as Japan, evolutionary patterns. Indeed, the variation of word Germany, South Korea, and China have all conducted frequency within a specific time window can be national-level technology foresight activities based on considered a typical time series[17,18], allowing for the Delphi survey[20,21], which has been extensively the analysis of changing patterns using time series applied in various technological fields including trend clustering models. By detecting word frequency agriculture, environment, healthcare, and ICT[22]. trends such as bursts, growth, sudden drops, and Besides the Delphi method, commonly used declines, it is possible to reflect the evolutionary approaches also include technology road mapping, stages of technological points. Further integrating the scenario analysis, brainstorming, morphological different growth patterns of various technological analysis, and the Analytic Hierarchy Process points within a tech field, combined with expert (AHP)[23–26]. The advantage of these methods lies in knowledge, facilitates the foresight of key, common, their ability to fully leverage expert experience. and emerging technologies in the technological However, due to their strong subjective nature and domain. the high requirements for the number of experts, their To this end, this study introduces TTCM and fields of expertise, and their experience, as well as the employs this model to analyze word frequency time significant amount of time and expense involved, series for technological forecasting. TTCM integrates these methods are increasingly questioned and the Dynamic Time Warping (DTW) algorithm with gradually becoming unsuitable in the information age, spectral clustering, enabling the automatic clustering characterized by an explosive growth in data volume. of time series with similar evolution trends. To verify The quantitative methods of data and technology- the model's effectiveness, this study first applied the driven technology foresight primarily involve TTCM model to cluster standard time series datasets extracting valuable information from vast datasets to from the UCI repository, demonstrating TTCM's construct systematic foresight models[3]. These capability to effectively differentiate time series data 10 methods identify effective information for technology distribution or non-convex sample data. Spectral foresight through the mining and visualization of clustering methods applicable to various shape scientific literature, patents, technical reports, news, samples may be an effective alternative to such cases. etc., covering aspects such as theme identification, At present, some researchers have applied spectral current state assessment, gap analysis, and trend clustering to time series data clustering[52]. prediction. Key techniques include growth curves[27], bibliometrics[28], patent analysis[29], social network 2.3. Spectral clustering algorithm analysis[30], data envelopment analysis[31], and data Spectral clustering is an unsupervised learning mining methods such as clustering, classification, and algorithm based on graph partitioning, capable of regression[32–34]. By leveraging the mining of transforming the clustering problem into a graph objective data such as literature and patents, these segmentation issue on an undirected weighted graph methods reduce reliance on experts to some extent. constructed from the data to be clustered[53,54]. However, they may also lead to decreased Unlike algorithms such as K-means that work well applicability and effectiveness in decision support due only for convex sample data, the spectral clustering to the lack of expert experience and dependency on algorithm is applicable to sample spaces of any shape technological pathways. and converges to a global optimal solution, and it is also applicable to high-dimensional data[55,56]. 2.2. Time series clustering analysis Currently, spectral clustering has been widely Time series analysis aims at mining useful used in image segmentation[57], face recognition[58], information and knowledge from a large number of earth science[59] and other related researches. Due to complex time series data, among which cluster the good data applicability and clustering effect, analysis is one of the important methods of time series scholars have also applied spectral clustering to data mining[35]. Time series clustering analysis research in LIS discipline such as scientometrics and method has been applied to the analysis and mining of information retrieval: Chifu et al. [60] proposed a stock data[36], social media data[37], landsat time word sense discrimination method based on spectral series data[38], smart grid data[39], health detection clustering for ranking matching documents in data[40], etc. information retrieval, thus improving the efficiency of The main process of time series clustering is information retrieval, and similarly; similarly, Singh et similarity measurement and clustering[41]. Among al. [61] also used spectral clustering algorithm to similarity measurement methods, shape-based improve the strategy of user ranking in community approaches are the most commonly used[42]. One of Q&A sites; Colavizza and Franceschet [62] used the simpler approaches to implement is the Euclidean spectral clustering algorithm to cluster literature distance, and although it has some applications in citations in physical reviews to find similar distance measurement of time series[43], it is difficult documents. Chen et al. [63] also used spectral to effectively take into account the phase distortion clustering method in multi-perspective analysis of co- between time series[44]. At the same time, the citation networks; Feng et al. [64] used spectral difference in Euclidean distance between subseries at clustering to verify the impact of different feature similar locations and waveforms can also be large due combinations such as JIF, 5-Year JIF, and CiteScore on to the difference in their amplitudes[45]. In contrast, the journal classification. the Dynamic Time Warping (DTW) distance[46], In addition, some researchers have also proposed improves the process of calculating the Euclidean optimization schemes to address the problems of high distance. It realizes one-to-many matching of data computational complexity of spectral clustering and point in time series through the dynamic warping so difficulties in data representation: for example, Wang that it has good robustness to the phase deviation and et al. [65] proposed a linear spatial embedding amplitude deformation of time series, and performs clustering method to optimize the similarity matrix well in time series clustering task[41,47–49]. and clustering results of spectral clustering by Clustering algorithms for time series can be roughly adaptive neighbors; Sapkota et al. [66] optimized the divided into hierarchical clustering, model-based initial clustering center to improve the stability of the clustering, partition-based clustering and density- algorithm; Some researchers have also implemented based clustering. Partition-based clustering is the spectral clustering algorithms based on the Spark big most commonly used method, such as K-Means[50], data computing framework, Julia language, etc., which K-Medoids[51], etc. However, K-means and related improved the algorithm running efficiency by parallel methods are not fully applicable to uneven sample computing [67,68]. 11 3. Methodology are completed, and the frequency of keywords over different periods is tallied to obtain the time series 3.1. Model definition data for subsequent clustering analysis. The spectral clustering algorithm encompasses three core steps: In this study, a temporal trend clustering model called graph construction, graph partitioning, and classical TTCM based on spectral clustering is proposed to clustering. As the TTCM model is based on spectral analyze the word frequency time series, which is clustering, steps 1-3 in Figure 1 are also the critical implemented using the Spark framework. The steps for clustering the trends of word frequency time algorithm flow is shown in Figure 1. Following the series in the TTCM model. These three steps are retrieval requirements, the collection and introduced in detail below. preprocessing of keywords in a given academic field Figure 1: The algorithm flow of TTCM 3.1.1. Graph construction and its matrix For the time series 𝑋 = {𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑚 } and 𝑌 = representation {𝑦1 , 𝑦2 , 𝑦3 , … , 𝑦𝑛 }, the DTW distance between them is calculated as formula (1). In this study, the time series is regarded as vertices and the DTW distance between time series is used as 𝐷(𝑋, 𝑌) = 𝑊=𝑤𝑎𝑟𝑔𝑚𝑖𝑛 ∑𝐾 (𝑥 − 𝑦𝑗 )2 , 𝑖 ∈ ,…,𝑤 …,𝑤 √ 𝑘=1,𝑤𝑘 =(𝑖,𝑗) 𝑖 1 𝑘 𝐾 edge weight to construct an adjacency matrix A. (1, 𝑚), 𝑗 ∈ (1, 𝑛) (1) The basic idea of DTW is to find the optimal Where, 𝑤𝑘 = (𝑖, 𝑗) represents that the ith data correspondence between two sequences and obtain point of 𝑋 and the jth data point of 𝑌 in the path k are the best match between two sequences to calculate corresponding points, and 𝑊 is the optimal path, the similarity. Its matching principle is shown in which can minimize the value of 𝐷(𝑋, 𝑌). Figure 2. Further, in order to reduce the dimensional difference between distances, this study uses the local scale Gaussian kernel function to normalize the DTW distances between time series to obtain the similar matrix 𝑊 = {𝑤11 , … , 𝑤1𝑛 , … , 𝑤𝑚𝑛 }, whose calculation process is shown in Formula (2) [69]. 𝑑2 𝑥𝑦 − Figure 2: The matching principle of DTW algorithm 𝑤𝑥𝑦 = 𝑒 𝜎𝑥 𝜎𝑦 , 𝜎𝑥 = 𝑑𝑥𝐾 , 𝜎𝑦 = 𝑑𝑦𝐾 (2) 12 Where 𝑑𝑥𝑦 is the distance between time series 𝑋 of 𝐻 as a vector in the current space, and conducts and 𝑌 , 𝜎𝑥 is the local parameter of 𝑋 , and is the cluster analysis on it, and obtain the category of ℎ𝑛 is distance between X and its Kth neighbor, the value of the category of the n'th time series. K is usually set as 7[69]. Then, the similarity matrix is transformed into 3.2. Algorithm parameter determination Laplacian matrix. In order to prevent the analysis In the implementation of TTCM model, 𝜆, the number error caused by the non-uniform dimension between of feature vectors, is a parameter that needs to be set data, the symmetric normalized Laplacian matrix is in advance. In practical processes, 𝜆 is often set as 𝑘, used to represent the graph, and its definition is the final expected number of spectral clustering. shown in Formula (3). 1 1 1 1 However, the clustering number 𝑘 is usually 𝐿𝑠𝑦𝑚 = 𝐷−2 𝐿𝐷 −2 = 𝐼 − 𝐷−2 𝑊𝐷−2 (3) determined according to the change of error sum of Where, 𝐼 is the identity matrix and 𝐷 is the degree squares or contour coefficient of K-Means model at matrix, that is, each column element of the similar the last stage. In order to determine λ in advance, a matrix 𝑊 is added and placed on the diagonal matrix novel dimension determination method of indicator formed by the corresponding row of the current matrix is designed based on the meaning and column. properties of the Fiedler vector of the Laplace matrix. The Fiedler vector is the eigenvector 3.1.2. The determination and transformation of corresponding to the minimum non-zero eigenvalue graph partition criterion. (also known as the second smallest eigenvalue) of the The key of spectral clustering is to cut the undirected Laplace matrix of the graph [70]. In this study, the weighted graph reasonably to maximize the sum of Fidler vector of 𝐿𝑠𝑦𝑚 is first taken as the indicator the weights between the samples in the subgraph, that matrix H, and then k-means clustering is carried out is, to minimize the sum of weights of the cut edges. on H, and the evolution trend between the number of According to the graph representation of the clustering and the error sum of squares is observed to symmetric normalized Laplacian matrix determined determine the optimal number of clustering K. Then, above, this study adopt N-Cut partition criterion, λ is set as k, k-1 and k-2 to conduct the subsequent whose objective function is shown in formula (4). analysis. Finally, on the basis of ensuring that the 1 ∑𝑘 ̅̅̅ difference between clusters, a small λ value is chosen 𝑖=1 𝑊(𝐴𝑖 ,𝐴𝑖 ) 𝑁𝐶𝑢𝑡(𝐴1 , … , 𝐴𝑘 ) = ∑𝑘𝑖=1 2 (4) to reduce the time and space cost of the subsequent 𝑣𝑜𝑙(𝐴𝑖 ) 𝑘 represents the total number of subsets, 𝐴𝑖 calculation process, and avoid overfitting. In this represents the i'th subset, 𝐴̅𝑖 is the complementary set study, the above method of 𝜆 selection is called the of 𝐴𝑖 , 𝑊(𝐴𝑖 , 𝐴̅𝑖 ) represents the sum of the weights of principle of low. the edges of points in subset 𝐴𝑖 and points outside of subset 𝐴𝑖 , 𝑣𝑜𝑙(𝐴𝑖 ) is the sum of the weights of all 4. Experiments and result analysis edges in subset 𝐴𝑖 . According to the mathematical derivation, the solution of the objective function can 4.1. Model validation through time series be transformed into solving the minimum eigenvalue standard dataset of the Laplace matrix and its corresponding To verify the efficiency of TTCM in time series eigenvector. In this study, the eigenvectors (also clustering, the time series standard dataset[71] in the known as indicator vector) corresponding to the Knowledge Discovery Archive [72] of University of minimum 𝜆 eigenvalues of 𝐿𝑠𝑦𝑚 should be solved, and California, Irvine (UCI) is used to test the model. And the eigenmatrix 𝐻 (also known as indicator matrix) the Power Iteration Clustering (PIC) model[73] and composed of these indicator vectors is the the Affinity Propagation (AP) clustering model[74], approximate optimal solution to the graph partition which are also based on graph theory, are selected as problem. 𝐻 is a matrix with dimension 𝑁 ∗ 𝜆, and 𝑁 is the baseline to compare the model recognition effects. the number of time series data. There are 600 pieces of data in the time series dataset, and every 100 pieces represent a trend type, 3.1.3. Data clustering through classical clustering which are marked as normal, cyclic, increasing trend, algorithm. decreasing trend, upward shift, and downward shift. After the graph is divided, the classical clustering Figure 3 shows the sample data of these six trends. algorithm can be used to cluster 𝐻. Based on the K- According to the trends of the test dataset, the means algorithm, this study regards the row data ℎ𝑛 clustering number of TTCM model, PIC model and AP model is set to 6, and the maximum number of 13 iterations is set as 30. Meanwhile, in the TTCM model, After the clustering is completed, the identified λ is set to 4,5, and 6. In addition, the three model all cluster labels are matched to the actual labels use the similarity matrix W calculated by formula (2). according to the data distribution in various clusters, that is, if the identified cluster 1 contains the most increasing trend data, the cluster 1 will be marked as increasing trend. Then, the number of increasing trend and other types of data in cluster 1 is compared with the actual increasing trend data number (i.e. 100). After calculating the values of precision (P), recall rate (R) and F1 respectively, the average values of P, R and F1 in six categories are used as the evaluation value of the effect of models, and the results are shown in Table 1. Figure 3: Sample data of the six trends in the time series dataset Table 1 Experimental results of TTCM/PIC/AP model on test dataset Model TTCM (𝜆 = 4) TTCM (𝜆 = 5) TTCM (𝜆 = 6) PIC AP Number 406 578 492 200 419 P 64.02% 96.63% 81.14% 24.44% 70.48% R 67.67% 96.33% 82.00% 33.33% 69.83% F1 64.18% 96.33% 81.44% 27.50% 67.11% It can be seen that, when λ= 5, the TTCM model has model with other values of λ. Specific to each category, a good recognition effect, it can accurately identify the recognition results of TTCM when λ= 5 are shown 578 time series data trends, F1 value up to 96.33%, in Table 2. much higher than PIC model, AP model and TTCM Table 2 Confusion matrix of the six-classification problem corresponding to TTCM model (λ= 5) Model Increasing Decreasing Downward Normal Cyclic Upward shift Recall Actual trend trend shift Normal 98 0 2 0 0 0 98.00% Cyclic 0 100 0 0 0 0 100.00% Increasing 0 0 100 0 0 0 100.00% trend Decreasing 0 0 0 99 0 1 99.00% trend Upward 1 0 11 0 88 0 88.00% shift Downward 0 0 0 7 0 93 93.00% shift F1 = Precision 98.99% 100.00% 88.50% 93.40% 100.00% 98.94% 0.9633 By observing Table 2, it can be further found that effectively distinguish the evolution trends of time TTCM can effectively distinguish the six types of series and cluster time series with similar trends. trends in the test data set, and only errors appear in the recognition of a small number of increasing and upward shifts and decreasing and downward shifts. Overall, the TTCM model proposed in this paper can 14 4.2. Temporal trend clustering through word included in the Social Sciences Citation Index (SSCI) in frequency time series the field of LIS from 2011 to 2020. The document types of papers are limited to research article and 4.2.1. Data collection and preprocessing review, and the language is limited to English. Finally, the case dataset containing 38932 scientific papers is In order to further verify the effectiveness of TTCM in obtained. Then the keywords of these papers are detecting trends within word frequency time series, carried out the preprocessing process including combined with the disciplinary background of the denoising, morphology reduction, abbreviation team members, this study selected the LIS discipline conversion. After preprocessing, the number and for case analysis. This study adopts the same data frequency of keywords in each year are statistically collection principles as our previous study[13] and analyzed as shown in Table 3. collects the scientific papers published in the journals Table 3 Confusion matrix of the six-classification problem corresponding to TTCM model (λ= 5) Year Number of papers Number of keywords Frequency of keywords Average word frequency 2011 3298 6324 10706 1.69 2012 3414 7125 12230 1.72 2013 3618 8087 13876 1.72 2014 3732 8267 14089 1.7 2015 3822 8806 15546 1.77 2016 4155 10082 17922 1.78 2017 4139 10156 17273 1.7 2018 4079 10632 17894 1.68 2019 4205 11034 18917 1.71 2020 4470 12270 21215 1.73 4.2.2. Results kind of trend is that the term frequency of keywords is low in the early and middle period of the whole-time In the analysis of word frequency time series, span, but the term frequency shows a trend of rapid consideration was given to the possibility that rise in the middle and later periods. Figure 4 shows keywords with a total frequency count too low might the term frequency change curve of some keywords not exhibit significant trends in time series changes with a burst trend in the term frequency series, and a (i.e., the frequency time series of such keywords could total of 30 keywords are clustered as such trend. be classified as having a uniform trend). Therefore, adhering to common practice, this study filtered keywords from a pool of 57,025 distinct keywords spanning the entire study period, selecting those with a total frequency count exceeding the length of the time span. The filtration yielded 1,952 author keywords that were mentioned in more than ten articles from 2011 to 2020. Utilizing the TTCM model, this study conducted trend identification on the time series of these 1,952 keywords. Following the principle of low introduced in Section 3.2, the study set λ to 3 and the number of clusters k to 5 for the TTCM. Subsequently, plots of the word frequency time series within each trend category were generated to facilitate a visual Figure 4: Part of words with burst trend in word observation and summary of the changing frequency series (emerging words) characteristics of the frequency time series trends within each cluster. In the term frequency series of keywords, the In the clustering results of temporal trend of term second trend can be classified as the increasing trend. frequency, the first kind of trend can be summarized The term frequency series of this kind of trend shows as the burst trend. The obvious characteristic of this 15 a general trend of fluctuation increase in the whole- The term frequency series of this kind of trend shows time span, but the term frequency remains at the low a general trend of fluctuation decrease in the whole- level in the whole-time span. Figure 5 shows part of time span, and the term frequency remains at the low keywords with an increasing trend in the term level in the whole-time span. Figure 7 shows part of frequency series. There are 177 keywords with the keywords with a decreasing trend in the term increasing trends. frequency series. There are 69 keywords with the decreasing trends. Figure 5: Part of words with increasing trend in word frequency series (hotspot words) Figure 7: Part of words with decreasing trend in word frequency series (fading words) In the clustering results of temporal trend of term frequency, the third kind of trend can be summarized The fifth type of trend identified by TTCM for term as the high-frequency fluctuation trend. The obvious frequency series contains a total of 1646 keywords, characteristic of this kind of trend is that the term and the observation of its trend curves failed to find frequency of keywords remains at a high level in the obvious characteristics. Therefore, this paper whole-time span, and the term frequency fluctuates speculates that the trend of this kind of term slightly with the passage of time. Figure 6 shows the frequency series should be the normal trend without term frequency change curves of some keywords with obvious regular fluctuation. high-frequency fluctuation trend in term frequency It can be seen from the clustering results that the series, and a total of 30 keywords are clustered as TTCM model is highly effective in identifying such trend. emerging words across various disciplines that have suddenly burst onto the scene, successfully capturing the rising trend of keyword frequencies towards the end of the time span. Within the set of keywords exhibiting an upward trend, the model accurately identified research hotspots that are gradually gaining widespread attention among LIS scholars. For the keywords identified by the model as having high- frequency fluctuations, their frequency levels consistently remained high, often signifying core research sub-fields or themes within the domain. Conversely, the model effectively reflected keywords in decline, indicating words that are gradually fading from the focal interest of scholars in the discipline. Figure 6: Part of words with high-frequency 4.3. Technical foresight based on temporal fluctuation trend in word frequency series (label trend of word frequency words) Further, this study, in accordance with the term In the term frequency series of keywords, the function[75], divides the keywords with significant fourth trend can be classified as the decreasing trend. temporal trend into two categories: research 16 questions/objects and research functional keywords showing varying trends, along methods/technologies. The count of different with examples, is displayed as shown in Table 4. Table 4 Keyword statistics based on different trends of term function Trend Research questions/objects Research methods/technologies 17 13 Electronic Health Records Machine Learning Covid-19 Artificial Intelligence Burst Digital Transformation Deep Learning Coronavirus Blockchain Journalism Neural Network 117 60 Privacy Social Network Analysis Research Evaluation Big Data Increasing Gender Classification Scholarly Communication Altmetrics Higher Education Sentiment Analysis 57 12 Internet Focus Group E-government Semistructured Interviews Decreasing Digital Library Nanotechnology Web2.0 Citation Distribution Blog Microsimulation 18 12 Social Media Bibliometrics High-frequency Academic Library Qualitative fluctuation Information Literacy Citation Analysis China Case Study Collaboration Information Retrieval Total 209 97 In general, the time series of word frequencies quantity of keywords showing an upward trend, a identified in this study predominantly comprise circumstance possibly attributable to the literature words related to research questions/objects, inflation caused by technological explosions. Within accounting for nearly 70% of the total. the rapid accumulation of technological literature, the There exists a considerable number of keywords conservative tendencies of some researchers and/or exhibiting an increasing trend, with both functional the attribute of knowledge application contained types of words showing a relatively balanced within certain words might prevent these fading-out distribution. However, due to the phenomenon of keywords from becoming low-frequency words technological literature inflation, although these filtered out during the input phase of the TTCM model. words continuously attract the attention of scholars in However, these fading words, if not subject to the field dynamically, the share of related research knowledge innovation, are highly likely to decline may not have expanded across the entire disciplinary gradually. Simultaneously, among the words spectrum in actuality. As some researchers delve into demonstrating decreasing trends, words related to new studies, there are concurrent instances of research questions/objects are notably higher in existing scholars gradually losing focus. Should there proportion compared to those concerning be no emergence of new method or technology methods/technologies. This discrepancy may be innovations or the continuation of integrating novel attributed to the stronger applicability of research objects corresponding to these methods/technologies, where researchers, even characteristic keywords, the frequencies of these amidst shifts in research subjects or questions, tend to words will gradually transition into a decreasing employ classical and established technical trend. methodologies. The proportion of emerging words Quantitatively, the number of faded keywords displaying burst trends is relatively low. Although exhibiting a decreasing trend is less than half of the there is a higher absolute number of words related to 17 research questions/objects, the relative proportion of This study proposes the TTCM model based on words related to methods is higher, indicating that, spectral clustering. Model validation results from with changes in social and research environments, Tables 1 and 2 demonstrate that the TTCM model can researchers have begun to pay attention to some new effectively distinguish the evolution trends of time research objects, such as coronaviruses, open science, series and automatically cluster time series with and mobile payments, while introducing more similar trends. Applying the TTCM model to the emerging technologies such as artificial intelligence, analysis of word frequency time series reveals its machine learning, and neural networks. successful identification of sudden emerging words, Specifically, regarding methods/technologies, high-frequency fluctuating words, steadily increasing technologies such as focus groups, semi-structured hotspot words, and gradually decreasing fading interviews, and microsimulation, primarily targeting words, providing significant reference and guidance small-scale data samples, exhibit a decreasing trend, value for anticipatory analysis in disciplinary fields. while big data analysis techniques such as artificial Furthermore, combined with term functions, intelligence, machine learning, deep learning, social anticipatory analysis of subsequent research network analysis, and sentiment analysis development and technological shifts in the field helps demonstrate a burst or increasing trend. This reflects research institutions and relevant practitioners adjust the progress and evolution of research methods and research directions in a timely manner, grasp popular technologies in LIS, with an increasing number of scientific research trends and frontier opportunities, researchers adopting emerging technologies to and also aids governments and industrial institutions process and analyze information to gain deeper and in identifying focal points and trends in the field, broader insights. Simultaneously, the application of providing decision-making support for the emerging technologies also reflects the formulation and planning of science and technology transformation of research content in the LIS field policies and strategies. towards quantitative analysis, large-scale data Essentially, the TTCM model is a clustering model processing, and deep data mining. The explosive whose clustering objects are time series, and the growth trend of technologies such as artificial clustering basis is the evolution trend of time series, intelligence also indicates that future research in the i.e., clustering time series with similar evolution LIS field may increasingly focus on leveraging trends into the same category. Therefore, the advanced computing technologies to address issues application of the TTCM model is not limited to word related to information management, information frequency time series. For scientific literature, retrieval, and user behavior analysis. analysis of research hotspots and frontier trends can be conducted using time series data such as 5. Discussion publication volume, citation volume, and author quantity. For patent literature, technological Technological advancements and transformations are forecasting can be conducted using time series data not only complex interplays driven by societal, such as patent application volume, citation volume, economic, and political well-being but also their and patent conversion quantity. Additionally, outcomes. Predicting and understanding the process comprehensive analysis can be performed by of technological change pose challenges for decision- combining time series data from other sources such as makers in governments and businesses[76]. online news, social media, and stock securities. This Appropriately implemented and effective aims to provide reference and guidance for technological forecasting is of significant guiding organizational decision-making in governments, value to organizations such as governments and industries, and businesses. businesses[3]. The research paradigm of Furthermore, after conducting extensive technological forecasting is still evolving, promoting identification experiments on time series data using the effective complementary integration of qualitative the TTCM model, the identified types of evolution and quantitative research methods. Seeking novel trends can be solidified into pattern features. This can research methodologies to enhance research quality be further combined with traditional machine is currently a hotspot and focus in the field of learning models such as Support Vector Machines, K- technological forecasting. Therefore, conducting nearest neighbors, Conditional Random Fields, or research on technological forecasting methods under deep learning models such as Convolutional Neural this backdrop holds certain theoretical significance Networks, Recurrent Neural Networks, Long Short- and practical value. Term Memory Networks, to achieve rapid identification of large-scale time series evolution 18 trends. This automation enables automated Acknowledgements prediction of emerging research trends in the field or potential technological growth points. This work was funded by the National Natural Science Fund of China (No. 71874129), the Open-end Fund of 6. Conclusion Information Engineering Lab of ISTIC and the Independent Innovation Foundation of Wuhan The present study introduces a novel time series University of Technology (No. 233103002). trend clustering model, named TTCM, and employs it to analyze word frequency time series for References technological forecasting. TTCM integrates dynamic time warping algorithm and spectral clustering [1] F. Dotsika, A. Watkins, Identifying algorithm to automatically cluster time series potentially disruptive trends by means of exhibiting similar evolution trends. To validate the keyword network analysis, Technological effectiveness of the model, this research initially forecasting and social change 119 (2017) applies TTCM to cluster standard time series datasets 114–127. from the UCI repository, demonstrating its capability doi:10.1016/j.techfore.2017.03.020. to effectively differentiate time series data with [2] R.N. Kostoff, R.R. Scaller, Science and similar evolution trends. Furthermore, using the LIS technology roadmaps, IEEE transactions on discipline as a case study, this research utilizes TTCM engineering management 48 (2001) 132– to cluster the evolution trends of word frequency time 143. doi:10.1109/17.922473. series, identifying emerging words with burst trends, [3] C. Lee, A review of data analytics in label words with high-frequently fluctuation trends, technological forecasting, Technological hotspot words with increasing trends, and decreasing forecasting and social change 166 (2021). fading words. The integration of term function doi:10.1016/j.techfore.2021.120646. confirms the efficacy of TTCM in domain knowledge [4] E. Amanatidou, Beyond the veil - The real discovery and technological forecasting. value of Foresight, Technological forecasting Nevertheless, this study has certain limitations. and social change 87 (2014) 274–291. Firstly, due to computational constraints, only ten doi:10.1016/j.techfore.2013.12.030. years of data were selected for analysis, potentially [5] B.R. Martin, Foresight in science and overlooking evolution trends that manifest over technology, Technology Analysis & strategic longer time series. Secondly, the case study is limited management 7 (1995) 139-168. to the LIS domain, warranting further verification of [6] L. Bornmann, R. Mutz, Growth rates of the analysis effectiveness of the TTCM model in word modern science: A bibliometric analysis frequency time series from other disciplines and fields. based on the number of publications and Additionally, the analysis in this study is limited to cited references, Journal of the Association keyword perspectives, without considering for Information Science and Technology 66 interrelations among keywords in the thematic (2015) 2215–2222. doi:10.1002/asi.23329. dimension. [7] C. Balili, U. Lee, A. Segev, J. Kim, M. Ko, In the future research, in addition to addressing TermBall: Tracking and predicting evolution the shortcomings mentioned above, this study will types of research topics by using knowledge incorporate other data sources such as patent data to structures in scholarly big data, IEEE Access achieve technology foresight with multi-source data. 8 (2020) 108514–108529. Moreover, after extensive experimentation to doi:10.1109/ACCESS.2020.3000948. determine evolution trends in different types of time [8] A.C. Adamuthe, G.T. Thampi, Technology series, this study will consider solidifying these trends forecasting: A case study of computational into pattern features and further integrating them technologies, Technological forecasting and with classification models to achieve intelligent and social change 143 (2019) 181–189. automated prediction of emerging research trends or doi:10.1016/j.techfore.2019.03.002. potential technological growth points in large-scale [9] H. Lee, S. Lee, B. Yoon, Technology clustering datasets. based on evolutionary patterns: The case of information and communications technologies, Technological forecasting and social change 78 (2011) 953–967. doi:10.1016/j.techfore.2011.02.002. 19 [10] W. Lu, S. Huang, J. Yang, Y. Bu, Q. Cheng, Y. [19] J. Landeta, Current validity of the Delphi Huang, Detecting research topic trends by method in social sciences, Technological author-defined keyword frequency, forecasting and social change 73 (2006) Information processing and management 58 467–482. (2021) 102594. doi:10.1016/j.techfore.2005.09.002. doi:10.1016/j.ipm.2021.102594. [20] T. Shin, Using Delphi for a long-range [11] Y.H. Hu, C.T. Tai, K.E. Liu, C.F. Cai, technology forecasting, and assessing Identification of highly-cited papers using directions of future R&D activities - The topic-model-based and bibliometric Korean exercise, Technological forecasting features: The consideration of keyword and social change 58 (1998) 125–154. popularity, Journal of Informetrics 14 (2020) doi:10.1016/S0040-1625(97)00053-X. 101004. doi:10.1016/j.joi.2019.101004. [21] M. Rongping, R. Zhongbao, Y. Sida, Q. Yan, [12] T.Y. Huang, B. Zhao, Measuring popularity of `Technology foresight towards 2020 in ecological topics in a temporal dynamical China’: the practice and its impacts, knowledge network, PLoS ONE 14 (2019) Technology analysis & strategic e0208370. management 20 (2008) 287–307. doi:10.1371/journal.pone.0208370. doi:10.1080/09537320801999587. [13] X. Wang, H. Wang, H. Huang, Evolutionary [22] A. Suominen, A. Hajikhani, A. Ahola, Y. Kurogi, exploration and comparative analysis of the K. Urashima, A quantitative and qualitative research topic networks in information approach on the evaluation of technological disciplines, Scientometrics 126 (2021) pathways: A comparative national-scale 4991–5017. doi:10.1007/s11192-021- Delphi study, Futures 140 (2022). 03963-6. doi:10.1016/j.futures.2022.102967. [14] M. Petrova, P. Sutcliffe, K.W.M. Fulford, J. [23] T. Heger, R. Rohrbeck, Strategic foresight for Dale, Search terms and a validated brief collaborative exploration of new business search filter to retrieve publications on fields, Technological forecasting and social health-related values in Medline: A word change 79 (2012) 819–831. frequency analysis study, Journal of the doi:10.1016/j.techfore.2011.11.003. American medical informatics association [24] Y. Tang, H. Sun, Q. Yao, Y. Wang, The 19 (2012) 479–488. doi:10.1136/amiajnl- selection of key technologies by the silicon 2011-000243. photovoltaic industry based on the Delphi [15] M. Färber, C. Nishioka, A. Jatowt, method and AHP (analytic hierarchy ScholarSight: Visualizing temporal trends of process): Case study of China, Energy 75 scientific concepts, 2019 ACM/IEEE Joint (2014) 474–482. Conference on Digital Libraries, 2019, pp. doi:10.1016/j.energy.2014.08.003. 438–439. doi:10.1109/JCDL.2019.00108. [25] C. Flick, E.D. Zamani, B.C. Stahl, A. Brem, The [16] M. Trevisani, A. Tuzzi, Learning the future of ICT for health and ageing: Unveiling evolution of disciplines from scientific ethical and social issues through horizon literature: A functional clustering approach scanning foresight, Technological to normalized keyword count trajectories, forecasting and social change 155 (2020). Knowledge-based systems 146 (2018) 129– doi:10.1016/j.techfore.2020.119995. 141. doi:10.1016/j.knosys.2018.01.035. [26] M. Hussain, E. Tapinos, L. Knight, Scenario- [17] C. Boothby, S. Milojević, An exploratory full- driven roadmapping for technology text analysis of Science Careers in a changing foresight, Technological forecasting and academic job market, Scientometrics 126 social change 124 (2017) 160–177. (2021) 4055–4071. doi:10.1007/s11192- doi:10.1016/j.techfore.2017.05.005. 021-03905-2. [27] Y. Jeong, I. Park, B. Yoon, Forecasting [18] E.S. Atlam, M. Okada, M. Shishibori, J. ichi Aoe, technology substitution based on hazard An evaluation method of words tendency function, Technological forecasting and depending on time-series variation and its social change 104 (2016) 259–272. improvements, Information processing and doi:10.1016/j.techfore.2016.01.014. management 38 (2002) 157–171. [28] W. Yeo, S. Kim, H. Park, J. Kang, A doi:10.1016/S0306-4573(01)00028-0. bibliometric method for measuring the degree of technological innovation, 20 Technological forecasting and social change [38] Y. Zhao, L. Lin, W. Lu, Y. Meng, Landsat time 95 (2015) 152–162. series clustering under modified dynamic doi:10.1016/j.techfore.2015.01.018. time warping, in: Q. Weng, P. Gamba, G. Xian, [29] C. Lee, Y. Cho, H. Seol, Y. Park, A stochastic J.M. Chen, S. Liang, 4rth international patent citation analysis approach to workshop on earth observation and remote assessing future technological impacts, sensing applications, IEEE, New York, USA, Technological forecasting and social change 2016. 79 (2012) 16–29. [39] H. Son, Y. Kim, S. Kim, Time series clustering doi:10.1016/j.techfore.2011.06.009. of electricity demand for industrial areas on [30] M. Coccia, L. Wang, Path-breaking directions smart grid, Energies 13 (2020). of nanotechnology-based chemotherapy and doi:10.3390/en13092377. molecular cancer therapy, Technological [40] C.H. Sudre, K.A. Lee, M.N. Lochlainn, T. forecasting and social change 94 (2015) Varsavsky, B. Murray, M.S. Graham, C. Menni, 155–169. M. Modat, R.C.E. Bowyer, L.H. Nguyen, D.A. doi:10.1016/j.techfore.2014.09.007. Drew, A.D. Joshi, W. Ma, C.-G. Guo, C.-H. Lo, S. [31] D.-J. Lim, T.R. Anderson, O.L. Inman, Ganesh, A. Buwe, J.C. Pujol, J.L. du Cadet, A. Choosing effective dates from multiple Visconti, M.B. Freidin, J.S.E.-S. Moustafa, M. optima in Technology Forecasting using Falchi, R. Davies, M.F. Gomez, T. Fall, M.J. Data Envelopment Analysis (TFDEA), Cardoso, J. Wolf, P.W. Franks, A.T. Chan, T.D. Technological forecasting and social change Spector, C.J. Steves, S. Ourselin, Symptom 88 (2014) 91–97. clusters in COVID-19: A potential clinical doi:10.1016/j.techfore.2014.06.003. prediction tool from the COVID symptom [32] S. Jun, S.S. Park, D.S. Jang, Technology study app, Science advances 7 (2021). forecasting using matrix map and patent doi:10.1126/sciadv.abd4177. clustering, Industrial management & data [41] T. Li, X. Wu, J. Zhang, Time series clustering systemS 112 (2012) 786–807. model based on DTW for classifying car doi:10.1108/02635571211232352. parks, Algorithms 13 (2020). [33] S. Jun, A Forecasting Model for Technological doi:10.3390/a13030057. Trend Using Unsupervised Learning, in: T.H. [42] S. Zolhavarieh, S. Aghabozorgi, Y.W. Teh, A Kim, H. Adeli, A. Cuzzocrea, T. Arslan, Y.C. Review of Subsequence Time Series Zhang, J.H. Ma, K.I. Chung, S. Mariyam, X.F. Clustering, Scientific world journal (2014). Song, Database theory application, bio- doi:10.1155/2014/312521. science bio-technology, Springer-Verlag [43] X. Guo, Y. Pang, G. Yan, T. Qiao, Time series Berlin, Berlin, Germany, 2011: pp. 51–60. forecasting based on deep extreme learning [34] N. Gozuacik, C.O. Sakar, S. Ozcan, machine, in: 29th Chinese control and Technological forecasting based on decision conference, CCDC 2017, 2017: pp. estimation of word embedding matrix using 6151–6156. LSTM networks, Technological forecasting doi:10.1109/CCDC.2017.7978277. and social change 191 (2023) 122520. [44] E.J. Keogh, M.J. Pazzani, Relevance feedback doi:10.1016/J.TECHFORE.2023.122520. retrieval of time series data, in: 22nd annual [35] P. Esling, C. Agon, Time-Series Data Mining, international ACM SIGIR conference on ACM computing surveys 45 (2012). research and development in information doi:10.1145/2379776.2379788. retrieval, SIGIR 1999, 1999: pp.183–190. [36] C. Guo, H. Jia, N. Zhang, Time Series doi:10.1145/312624.312676. Clustering Based on ICA for Stock Data [45] X.L. Dong, C.K. Gu, Z.O. Wang, Research on Analysis, in: 4th international conference on shape-based time series similarity measure, wireless communications, networking and in: 2006 international conference on mobile computing, VOLS 1-31, IEEE, New machine learning and cybernetics, 2006: York, USA, 2008: pp. 10903+. pp.1253–1258. [37] H. Zhu, Y. Mei, J. Wei, C. Shen, Prediction of doi:10.1109/ICMLC.2006.258648. online topics’ popularity patterns, Journal of [46] E. Keogh, C.A. Ratanamahatana, Exact information science 48 (2022) 141–151. indexing of dynamic time warping, doi:10.1177/0165551520961026. Knowledge and information systems 7 21 (2005) 358–386. doi:10.1007/s10115-004- processing and management 51 (2015) 0154-9. 616–624. doi:10.1016/j.ipm.2015.05.007. [47] B. Cai, G. Huang, N. Samadiani, G. Li, C.H. Chi, [56] A.Y. Ng, M.I. Jordan, Y. Weiss, On spectral Efficient Time Series Clustering by clustering: Analysis and an algorithm, in: Minimizing Dynamic Time Warping 15th Annual Conference on Neural Utilization, IEEE access 9 (2021) 46589- Information Processing Systems, Vancouver, 46599,. doi:10.1109/ACCESS.2021.3067833. Canada, 2002: pp. 849–856. [48] W. Wang, G. Lyu, Y. Shi, X. Liang, Time Series [57] K. Xia, X. Gu, Y. Zhang, Oriented grouping- Clustering Based on Dynamic Time Warping, constrained spectral clustering for medical in: IEEE 9th International Conference on imaging segmentation, Multimedia systems Software Engineering and Service Science, 26 (2020) 27–36. doi:10.1007/s00530-019- Beijing, China, 2018. 00626-8. doi:10.1109/ICSESS.2018.8663857. [58] D. Xu, C. Li, T. Chen, F. Lang, A novel low rank [49] V.T. Huy, D.T. Anh, An efficient spectral clustering method for face implementation of anytime K-medoids identification, Recent patents on clustering for time series under dynamic engineering 13 (2019) 387–394. time warping, in: 7th symposium on doi:10.2174/18722121126661808281242 information and communication technology, 11. 2016: pp. 22–29. [59] H. Talebi, L.J.M. Peeters, U. Mueller, R. doi:10.1145/3011077.3011128. Tolosana-Delgado, K.G. van den Boogaart, [50] X. Huang, Y. Ye, L. Xiong, R.Y.K. Lau, N. Jiang, Towards geostatistical learning for the S. Wang, Time series k-means: A new k- geosciences: A case study in improving the means type smooth subspace clustering for spatial awareness of spectral clustering, time series data, Information sciences 367 Mathematical geosciences 52 (2020) 1035– (2016) 1–13. doi:10.1016/j.ins.2016.05.040. 1048. doi:10.1007/s11004-020-09867-0. [51] Y. Chen, X. Liu, X. Li, X. Liu, Y. Yao, G. Hu, X. Xu, [60] A.G. Chifu, F. Hristea, J. Mothe, M. Popescu, F. Pei, Delineating urban functional areas Word sense discrimination in information with building-level social media data: A retrieval: A spectral clustering-based dynamic time warping (DTW) distance approach, Information processing and based k-medoids method, Landscape and management 51 (2015) 16–31. urban planning 160 (2017) 48–60. doi:10.1016/j.ipm.2014.10.007. doi:10.1016/j.landurbplan.2016.12.001. [61] A.K. Singh, N.K. Nagwani, S. Pandey, A user [52] H. Abbasimehr, A. Bahrini, An analytical ranking algorithm for efficient information framework based on the recency, frequency, management of community sites using and monetary model and time series spectral clustering and folksonomy, Journal clustering techniques for dynamic of information science 45 (2019) 592–606. segmentation, Expert systems with doi:10.1177/0165551518808198. applications 192 (2022). [62] G. Colavizza, M. Franceschet, Clustering doi:10.1016/j.eswa.2021.116373. citation histories in the physical review, [53] M. Alshammari, M. Takatsuka, Approximate Journal of informetrics 10 (2016) 1037– spectral clustering with eigenvector 1051. doi:10.1016/j.joi.2016.07.009. selection and self-tuned k, Pattern [63] C. Chen, F. Ibekwe-SanJuan, J. Hou, The recognition letters 122 (2019) 31–37. structure and dynamics of cocitation doi:10.1016/j.patrec.2019.02.006. clusters: A multiple-perspective cocitation [54] P.K. Srijith, M. Hepple, K. Bontcheva, D. analysis, Journal of the American society for Preotiuc-Pietro, Sub-story detection in information science and technology 61 Twitter with hierarchical Dirichlet (2010) 1386–1409. doi:10.1002/asi.21309. processes, Information processing and [64] L. Feng, J. Zhou, S.L. Liu, N. Cai, J. Yang, management 53 (2017) 989–1003. Analysis of journal evaluation indicators: an doi:10.1016/j.ipm.2016.10.004. experimental study based on unsupervised [55] T. Semertzidis, D. Rafailidis, M.G. Strintzis, P. Laplacian score, Scientometrics 124 (2020) Daras, Large-scale spectral clustering based 233–254. doi:10.1007/s11192-020-03422- on pairwise constraints, Information 8. 22 [65] Q. Wang, Z. Qin, F. Nie, X. Li, Spectral [75] J. Wang, Q. Cheng, W. Lu, Y. Dou, P. Li, A term embedded adaptive neighbors clustering, function-aware keyword citation network IEEE transactions on neural networks and method for science mapping analysis, learning systems 30 (2019) 1265–1271. Information processing & management 60 doi:10.1109/TNNLS.2018.2861209. (2023). doi:10.1016/j.ipm.2023.103405. [66] N. Sapkota, A. Alsadoon, P.W.C. Prasad, A. [76] V. Coates, M. Farooque, R. Klavans, K. Lapid, Elchouemi, A.K. Singh, Data summarization H.A. Linstone, C. Pistorius, A.L. Porter, On the using clustering and classification: Spectral future of technological forecasting, clustering combined with k-means using Technological forecasting and social change NFPH, in: the international conference on 67 (2001) 1–17. doi:10.1016/S0040- machine learning, big data, cloud and 1625(00)00122-0. parallel computing: trends, perspectives and prospects, Faridabad, India, 2019: pp.146– 151. doi:10.1109/COMITCon.2019.8862218. [67] Z. Huo, G. Mei, G. Casolla, F. Giampaolo, Designing an efficient parallel spectral clustering algorithm on multi-core processors in Julia, Journal of parallel and distributed computing 138 (2020) 211–221. doi:10.1016/j.jpdc.2020.01.003. [68] Z. Xing, G. Li, Intelligent classification method of remote sensing image based on big data in Spark environment, International journal of wireless information networks 26 (2019) 183–192. doi:10.1007/s10776-019- 00440-z. [69] L. Zelnik-Manor, P. Perona, Self-tuning spectral clustering, in: 17th international conference on neural information processing systems, 2004: pp. 1601–1608. [70] H. Qiu, E.R. Hancock, Graph matching and clustering using spectral partitions, Pattern recognition 39 (2006) 22–34. doi:10.1016/j.patcog.2005.06.014. [71] D.T. Pham, A.B. Chan, Control chart pattern recognition using a new type of self- organizing neural network, Proceedings of the Institution of Mechanical Engineers. Part I: Journal of systems and control engineering 212 (1998) 115–127. doi:10.1243/0959651981539343. [72] S. Hettich, S.D. Bay, The UCI KDD Archive, Irvine, CA: University of California, department of information and computer Science (1999). [73] F. Lin, W.W. Cohen, Power iteration clustering, in: 27th international conference on machine learning (ICML-10), Haifa, Israel., 2010: pp. 655–662. [74] B.J. Frey, D. Dueck, Clustering by passing messages between data points, Science 315 (2007) 972–976. doi:10.1126/science.1136800. 23