Technology Convergence Prediction From a Timeliness Perspective: An Improved Contribution Index in a Dynamic Network⋆ Jinzhu Zhang1, Bing Yan1 1 Department of Information Management, School of Economics and Management, Nanjing University of Science and Technology, Nanjing China Abstract Technology convergence prediction can identify potential trends and directions in technological development, as well as providing valuable guidance for innovation strategies, research investment, and industrial development. Current methods often construct a technological co-occurrence network to explore the potential associations between technologies for technology convergence prediction. However, its calculation of node importance is often based on quantity statistics of frequencies, failing to break down and distinguish the technological features in each co-occurrence, and assuming equal importance for each technology in every convergence. In addition, the current approach for assessing technological timeliness is too broad, making it difficult to accurately capture technological change. The perspective needs to shift from the life cycle to more specific points in time. Therefore, this paper introduces a contribution index designed to measure changes in the importance of technology convergence from a timeliness perspective. Firstly, we extract and filter valid technical topics to represent technology categories. Secondly, we use dynamic time weights to calculate the semantic similarity between technical topics and patent texts, to indicate the contribution of the technology in each convergence. Thirdly, this paper labels the contributions in technology co-occurrence network to build a dynamic technology network that records changes in technology importance. Finally, we utilize a graph neural network to generate node embeddings for link prediction. In experiments within the field of new energy vehicles, the dynamic network prediction model based on contribution features improved the AUC by 8.92%, 3.52%, and 1.11%, compared to the frequency feature network. It proves that the proposed technological contribution index can effectively enhance the accuracy and effectiveness of technology convergence prediction. Keywords technology convergence, timeliness, semantic similarity, graph neural network 1 1. Introduction direction of technology convergence from the mass of existing technologies has become a significant task. The convergence of technologies from different Research often explores the co-occurrence of disciplines can solve increasingly complex technical technologies to analyze the current state of problems and social needs. At the same time, it is a key technological convergence. And the technology factor to ensure technological timeliness for increasing networks are constructed to explore the potential competitiveness in research and investment. Therefore, correlation between technologies. Current methods for how to efficiently and accurately predict the potential technological convergence using patent data include Joint Workshop of the 5th Extraction and Evaluation of Knowledge Entities from Scientific Documents and the 4th AI + Informetrics (EEKE-AII2024), April 23~24, 2024, Changchun, China and Online EMAIL: zhangjinzhu@njust.edu.cn ( Jinzhu Zhang ); Yanbing7051@1 63.com ( Bing Yan ) Β© Copyright 2024 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 34 approaches based on patent co-classification [1, 2], patent cross-referencing [3], and text mining methods [4]. In these methods, technology categories are typically identified by patent classification numbers [5, 6] and technical topics [7, 8]. In addition, scholars have expanded the research on technology convergence to a Figure 1(b). broader perspective, such as the construction of market characteristics [9, 10], social impacts [11, 12] and time Figure 1: Changes in the contribution of technology characteristics [13, 14], etc., to further improve the within patents prediction index system of technology convergence. Therefore, this paper proposes an index designed However, these methods do not consider differences from a timeliness perspective to measure changes in the in the importance of technologies in each convergence importance of technology. We obtain the contribution of and changes in timeliness. They assume that the the technology in each co-occurrence by calculating the importance of each technology is the same in each case semantic similarity between the technical topic and the of technological convergence. In addition, the patent, and then combining the dynamic time weights to technology timeliness is often distinguished by obtain the final value. To capture changes in the technology lifecycle segmentation [15] and linear weight importance of technology in each convergence, we assignment [16], which are too broad and difficult to improve the timeliness of technology by refining it from capture small differences between different the lifecycle and dates to more precise convergence time technologies. In this paper, a technological combination points, constructing a dynamic technological co- co-occurring within a same patent is considered a occurrence network. Finally, we use link prediction to convergence event. As shown in Figure 1(a), if explore the prediction of technology convergence, technologies 𝑇1 𝑇2 𝑇3 co-occur with the same frequency, aiming to better evaluate the timeliness of technology their importance is considered equal, and there is no and its impact on convergence. distinction made among the timeliness of their co- occurrence at different points in time. Actually, different technologies contribute differently to the overall 2. Data and Method technological combination and have different timeliness The method for predicting technology convergence with each co-occurrence. As a result, their impact within from a timeliness perspective includes three parts, as the technological network differs in scope and extent. shown in Figure 2. Firstly, this paper extracts and filter For example, as shown in Figure 1(b), although out the valid technical topics characterizing the technical technology 𝑇1 is present in each convergence, its categories in patent texts. Then, cosine similarity is used contribution declines over time, indicating declining for semantic similarity computation on the patent texts importance and possibly gradual obsolescence. On the and technical topics to measure the contribution of other hand, technology 𝑇2 maintains a stable different technical topics in each co-occurrence. To contribution, suggesting that it may be a foundational obtain the total contribution score for technical topics, technology or in a phase of steady development. this study introduces dynamic time weights and follows Meanwhile, technology 𝑇3 shows a higher contribution, the principle of time decay to sum the contributions indicating a greater impact or greater timeliness within from each co-occurrence. Then we extract co- the technology combination, making it more likely to occurrence relationships to construct a technological combine with other technologies. network. Label the contribution of each technical topic on the matching nodes to build a dynamic technical topic co-occurrence network. Finally, graph neural networks are used to learn the node representations of technical topics, and quantitative evaluation is performed by link prediction. Figure 1(a). 2.1. Data collection In this paper, the full text data of patent applications were batch downloaded from the USPTO (United States Patent and Trademark Office) patent search platform in 35 December 2023, parsed and stored in a PostgreSQL data from 2012-2021 and 6,817 patents were used as database. We use SQL queries to search for relevant test data from 2022-2023. The training set contains patents in the field of new energy vehicles, as shown in 192,602 co-occurring relationships. Relationships that Figure 3. A total of 23,792 relevant patents were were not present in the training set were filtered out to retrieved and the titles, abstracts and application time of create the actual test set. An equal number of negative the patents were extracted as the data source for the samples study. A total of 16,975 patents were used as training Figure 2: Framework of the method Figure 3: SQL statement for querying patents related to new energy vehicles were generated, resulting in a final test set of 51,562 2.2.1. Extraction of technical topics relationships. Technical topics offer a more flexible and comprehensive expression of technical content, making 2.2. Construction of Contribution Index and them more explainable. Therefore, we choose to use Dynamic Network technical topics to represent different technical Firstly, this paper extracts technical topics from categories. patent text, representing specific technical categories. This paper determines the optimal number of topics Secondly, we sum the semantic similarity between based on the topic coherence score. And each technical technical topics and patent texts using dynamic time topic has 20 representative keywords to reduce overlap weights, to represent the contribution of the technology. between topics. As shown in Figure 4, u_mass and c_v Then, we construct a dynamic network of technical gradually converged when the number of topics was topics by integrating the technological contribution around 500. After comparing extreme values, 507 was index. This will help the network to reflect changes in the identified as the optimal number of topics for this paper. contribution of technology over time and provide more Secondly, the TF-IDF weighting is applied to improve the technical clues. LDA model's process of generating feature words for technical topic extraction, with the aim of improving the representativeness of the topic words. 36 is the current year, and 𝑇𝑖 is the year when the nth convergence occurs. 2.2.3. Construction of the dynamic technical topic co-occurrence network A dynamic technical topic co-occurrence network construction primarily involves the following two steps. The first step is to identify the technical topics present in the patent, we set the probability distribution threshold to 0.2 [15]. Technical topics exceeding this threshold are considered to be present in the patent, resulting in the Figure 4: U_mass and C_v variation curves generation of a technology co-occurrence matrix. Then 2.2.2. Calculation of technical contribution index we extract co-occurrence relationships using the networkx package, forming node pairs that represent As the technical topics and patents in this paper are technical topics. Finally, we mark the obtained technical both textual content, and the higher the similarity topic contributions from Section 2.2.2 on the matching between technical topics and patent texts, the higher nodes, establishing a dynamic co-occurrence network of the weight of that technology in the patent. Therefore, technical topics. we use the semantic similarity between technical topics and patent texts to represent the contribution of 2.3. Prediction of technology convergence technology in each co-occurrence. The study uses based on graph neural networks Doc2vec [17] to obtain semantic representations of technical topics and patent texts respectively. Then, it We initially employ a graph neural network model to applies cosine similarity to calculate the semantic aggregate the structural and nodal attribute information similarity between them, obtaining the contribution of the technological co-occurrence network. This helps values of different technologies in each co-occurrence, to address the issue of sparse feature dimensions in as shown in Formula (1). Thirdly, it is important to technology convergence prediction, resulting in a more consider the timeliness of technology. The further away accurate representation of node features. Secondly, we from the current moment, the lower the timeliness transform the research on predicting technology tends to be. To address this, dynamic time weights are convergence into a link prediction problem. Probability introduced, based on the retention function of memory scores are then calculated for the technology capacity [18]. This assigns weighted sums to the combinations formed between technical topic nodes, contribution of technical topics in each convergence, and the model's performance is evaluated using the AUC resulting in the final contribution index score for the metric. given technical topic, as shown in Formula (2). 2.3.1. Node embedding based on graph neural βˆ‘π‘š 𝑖=1(𝑃𝑖 Γ— 𝑇𝑖 ) network model π‘π‘œπ‘›π‘‡π‘–π‘› = π‘π‘œπ‘ (πœƒ) = , (1) βˆšβˆ‘π‘–=1(𝑃𝑖 )2 Γ— βˆšβˆ‘π‘š π‘š 𝑖=1(𝑇𝑖 ) 2 Graph neural network model can automatically capture high-level abstract representations of networks Formula (1) defines π‘π‘œπ‘›π‘‡π‘–π‘› as the contribution of by aggregating low-level information, avoiding the need technical topic 𝑇𝑖 in the nth co-occurrence. The semantic for complex feature engineering. These models combine representation m-dimensional vectors of patent i and both topological structure and attribute information for technical topic i are denoted as 𝑃𝑖 and 𝑇𝑖 , respectively. learning, effectively aggregating attribute features and 𝑛 topological structure information from neighboring π‘π‘œπ‘›π‘‡π‘– = βˆ‘ π‘π‘œπ‘›π‘‡π‘–π‘› Γ— π‘‡π‘–π‘šπ‘’π‘€π‘’π‘–π‘”β„Žπ‘‘ , (2) 1 nodes, to obtain a more accurate feature representation 𝑒 0.42 for the target node. π‘‡π‘–π‘šπ‘’π‘€π‘’π‘–π‘”β„Žπ‘‘ = , (𝑑0 + 𝑑𝑖 )0.0225 (3) This paper uses technical topics in patents as nodes, with co-occurrence relationships between topics serving In Formula (2), π‘π‘œπ‘› 𝑇𝑖 represents the weighted sum as edges in the graph. The technical contribution of contributions of technical topic 𝑇𝑖 in all co- features are combined and used as node attribute occurrences, π‘‡π‘–π‘šπ‘’π‘€π‘’π‘–π‘”β„Žπ‘‘ is the dynamic time weight, 𝑇0 information. Specifically, we first use the co-occurrence 37 relationships in the training set as the graph structure. For the three types of co-occurrence networks, we The contribution index of corresponding nodes is input use three graph neural network models, namely GCN, as node attribute information into the graph neural GNN and GAT, to learn node representations, using link network for training, thereby obtaining the embedding prediction for quantitative evaluation. The main vectors of known technical topics. Secondly, using a link difference between GCN and traditional GNN lies in the prediction model, we calculate the probability of fusion use of convolutional operators for information between technical nodes, obtaining fusion scores aggregation, while GAT uses self-attention mechanisms between nodes. The link prediction model is introduced for node weight allocation. The results for different in Section 2.3.2. Additionally, different graph neural feature networks and methods are shown in Table 1. network models have their own characteristics. This paper will compare models and choose the one most Table 1 suitable for prediction of technology convergence. AUC of Different Methods T-Co1 T-Co2 T-Co3 2.3.2. Link prediction model based on probability GCN 0.7120 0.7106 0.7998 ranking GNN 0.7106 0.7125 0.7236 The prediction of technology convergence can be GAT 0.6379 0.6411 0.6731 simplified as predicting the emergence of a new link edge. In this context, technologies can be seen as nodes, The results show that the performance of T-Co3 is and the relationships between them as convergence generally superior to T-Co1 and T-Co2 across different links. Thus, this paper transforms the task of predicting model representations, with GCN performing best on T- technology convergence opportunities into a link Co3. In the GCN model, the AUC value of T-Co3 has prediction problem for research. increased by 8.78% compared to T-Co1 and 8.92% The link prediction method proposed in this paper compared to T-Co2. In the GNN and GAT models, the relies on a co-occurrence graph of technical topics, AUC value of T-Co3 has also increased by 1.3% and 3.52%, where the relationships between technical topics serve respectively. Compared to other indicators of as edges. The technology contribution is trained as node importance, the contribution index reflecting features on the co-occurrence relationships of technical technological timeliness provides better, more topics. Once the representations of the technical topic comprehensive, and accurate clues for predicting nodes are obtained, the probability score for the technological convergence. And in this experiment, the technology combination formed by two points is GCN model performed better and showed better calculated. This probability score can be regarded as the discriminative capabilities for different features. It is link prediction score. The higher the score, the greater more suitable for the technology convergence prediction the possibility of a future link between the two nodes, task in this paper. indicating a higher probability of convergence between these two technical topics. Finally, we choose AUC as the 4. Conclusion evaluation metric to assess the performance of the This paper refines the assessment of technological prediction model based on graph neural networks. importance from a timeliness perspective, shifting from traditional distinctions based on lifecycle and dates to a 3. Results more precise measurement within each convergence This paper generated three co-occurrence networks event. We replace frequency indicators in the co- with different features, to compare and validate the occurrence network with the technological contribution effectiveness of the proposed method. The first network, index for building dynamic technology networks. The T-Co1, only considers the frequency of co-occurrence of results show that this approach outperforms frequency- technical topics. The second network, T-Co2, includes based models. As a next step, we aim to improve the centrality indices as features for technical topic nodes. technological timeliness index by incorporating The third network, T-Co3, integrates technical additional temporal cues. In addition, the exploration of contribution as features for technical topic nodes. The more efficient embedding models is expected to centrality measure chosen here is degree centrality, improve predictive performance. which reflects the number of connections a node has. A higher degree centrality indicates a stronger node centrality, signifying greater importance. 38 Acknowledgements semantic analyses vs IPC co-classification analyses of patents, Foresight 15 (2013) 446-64. This work is supported by the National Natural [10] Liu W, Yang Z, Cao Y, Huo J, Discovering the Science Foundation of China (No. 72374103, 71974095) influences of the patent innovations on the stock and the Postgraduate Research & Practice Innovation market, Information Processing and Management Program of Jiangsu Province (No. SJCX23_0161). 59 (2022) doi: 10.1016/j.ipm.2022.102908 [11] Zhang Y, Wu M, Miao W, Huang L, Lu J, Bi-layer References network analytics: A methodology for [1] C. S. Curran, J. Leker, Patent Indicators for characterizing emerging general-purpose Monitoring Convergence – Examples from Nff and technologies, SSRN Electronic Journal 15 (2021). Ict, Technological Forecasting & Social Change 78 doi:10.1016/j.joi.2021.101202 (2011) 256-73. [12] Caferoglu H, Elsner D, Moehrle MG, The Interplay doi:10.1016/j.techfore.2010.06.021 Between Technology and Pre-Industry [2] M. Karvonen, T. Kassi, Patent Citation Analysis as a Convergence: An Analysis in the Technology Field Tool for Analysing Industry Convergence, 2011 of Smart Mobility, IEEE Transactions on Proceedings of PICMET' 11: Technology Engineering Management 70 (2021) 1504-1517. Management in the Energy Smart World (PICMET), doi:10.1109/TEM.2021.3092211 IEEE, Portland, USA, 2011. [13] Fu Q, Sun Y, Evaluation of Technological Influence: [3] K. Sung, H. K. Kong, and T. Kim, Convergence Based on Patent Timeliness and PageRank Indicator: The Case of Cloud Computing, Journal of Algorithm, Journal of Systems & Management 27 Supercomputing 65 (2013) 27-37. (2018) 352-8. doi:10.1007/s11227-011-0706-1 [14] Wenjing Zhu, Bohong Ma, Lele Kang, Technology [4] I. Park and B. Yoon, Technological Opportunity convergence among various technical fields: Discovery for Technological Convergence Based on improvement of entropy estimation in patent the Prediction of Technology Knowledge Flow in a analysis, Scientometrics 127(2022) 7731-50. Citation Network, Journal of Inforindexs 12 (2018) [15] J. Ma, C. Wang, L. Yan, and S. Yao, Analysis of 1199-222. doi:10.1016/j.joi.2018.09.007 Patent Technology Topic Evolution Based on [5] Li C, Zhou J, Yang Z, Research on the Identification Product Life Cycle, Journal of the China Society for of Technology Fusion Growth Points from the Scientific and Technical Information 41 (2022) 684- Perspective of Dynamic Evolution Process, Library 691. doi:10.3772/j.issn.1000-0135.2022.07.003 And Information Service 66 (2022) 99-109. doi: [16] Zhang X, Liu H, Shi J, Mao C, Meng G, LSTM and 10.13266/j.issn.0252-3116.2022.07.010 artificial neural network for urban bus travel time [6] Zhang J, Li Y, Technology Convergence Prediction prediction based on spatiotemporal eigenvectors, by the Semantic Representation of Patent Journal of Computer Applications 41(2021) 875- Classification Sequence and Text, Journal of the 880. doi:10.11772/j.issn.1001-9081.2020060467 China Society for Scientific and Technical [17] Le, Q. V. and Mikolov, T, Distributed Information 41 (2022) 609-24. Representations of Sentences and Documents, doi:10.3772/j.issn.1000-0135.2022.06.006 JMLR.org, 2014. doi:10.48550/arXiv.1405.4053. [7] Zhang Y, Zhang G, Chen H, Porter AL, Zhu D and Lu [18] Jiang Z, On the forgetting function - a J, Topic analysis and forecasting for science, mathematical discussion on the psychology of technology and innovation: methodology with a memory, Advances in Psychological Science case study focusing on big data research, 56(1988) 56-60. Technological Forecasting And Social Change 105 (2016) 179-91. doi:10.1016/j.techfore.2016.01.015 [8] Wang L, Liu X, Measuring Diffusion of Technology Topics with Patent Data, Data Analysis and Knowledge Discovery 6 (2022) 1-10. doi: 10.11925/infotech.2096-3467.2021.0915 [9] Daim T, Preschitschek N, Niemann H, Leker J, G. Moehrle M, Anticipating industry convergence: 39