Technology Convergence Prediction From a Timeliness
                                Perspective: An Improved Contribution Index in a Dynamic
                                Network⋆
                                Jinzhu Zhang1, Bing Yan1

                                1 Department of Information Management, School of Economics and Management, Nanjing University of Science

                                and Technology, Nanjing China


                                                    Abstract
                                                    Technology convergence prediction can identify potential trends and directions in technological
                                                    development, as well as providing valuable guidance for innovation strategies, research investment,
                                                    and industrial development. Current methods often construct a technological co-occurrence network
                                                    to explore the potential associations between technologies for technology convergence prediction.
                                                    However, its calculation of node importance is often based on quantity statistics of frequencies, failing
                                                    to break down and distinguish the technological features in each co-occurrence, and assuming equal
                                                    importance for each technology in every convergence. In addition, the current approach for assessing
                                                    technological timeliness is too broad, making it difficult to accurately capture technological change.
                                                    The perspective needs to shift from the life cycle to more specific points in time. Therefore, this paper
                                                    introduces a contribution index designed to measure changes in the importance of technology
                                                    convergence from a timeliness perspective. Firstly, we extract and filter valid technical topics to
                                                    represent technology categories. Secondly, we use dynamic time weights to calculate the semantic
                                                    similarity between technical topics and patent texts, to indicate the contribution of the technology in
                                                    each convergence. Thirdly, this paper labels the contributions in technology co-occurrence network
                                                    to build a dynamic technology network that records changes in technology importance. Finally, we
                                                    utilize a graph neural network to generate node embeddings for link prediction. In experiments within
                                                    the field of new energy vehicles, the dynamic network prediction model based on contribution
                                                    features improved the AUC by 8.92%, 3.52%, and 1.11%, compared to the frequency feature network.
                                                    It proves that the proposed technological contribution index can effectively enhance the accuracy and
                                                    effectiveness of technology convergence prediction.

                                                    Keywords
                                                    technology convergence, timeliness, semantic similarity, graph neural network 1


                                1. Introduction                                                                                direction of technology convergence from the mass of
                                                                                                                               existing technologies has become a significant task.
                                    The convergence of technologies from different                                                 Research often explores the co-occurrence of
                                disciplines can solve increasingly complex technical                                           technologies to analyze the current state of
                                problems and social needs. At the same time, it is a key                                       technological convergence. And the technology
                                factor to ensure technological timeliness for increasing                                       networks are constructed to explore the potential
                                competitiveness in research and investment. Therefore,                                         correlation between technologies. Current methods for
                                how to efficiently and accurately predict the potential                                        technological convergence using patent data include

                                Joint Workshop of the 5th Extraction and Evaluation of Knowledge
                                Entities from Scientific Documents and the 4th AI + Informetrics
                                (EEKE-AII2024), April 23~24, 2024, Changchun, China and Online
                                EMAIL: zhangjinzhu@njust.edu.cn ( Jinzhu Zhang ); Yanbing7051@1
                                63.com ( Bing Yan )
                                              © Copyright 2024 for this paper by its authors. Use permitted
                                              under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings

                                                                                                                          34
approaches based on patent co-classification [1, 2],
patent cross-referencing [3], and text mining methods
[4]. In these methods, technology categories are
typically identified by patent classification numbers [5, 6]
and technical topics [7, 8]. In addition, scholars have
expanded the research on technology convergence to a                Figure 1(b).
broader perspective, such as the construction of market
characteristics [9, 10], social impacts [11, 12] and time           Figure 1: Changes in the contribution of technology
characteristics [13, 14], etc., to further improve the              within patents
prediction index system of technology convergence.
                                                                        Therefore, this paper proposes an index designed
    However, these methods do not consider differences
                                                                    from a timeliness perspective to measure changes in the
in the importance of technologies in each convergence
                                                                    importance of technology. We obtain the contribution of
and changes in timeliness. They assume that the
                                                                    the technology in each co-occurrence by calculating the
importance of each technology is the same in each case
                                                                    semantic similarity between the technical topic and the
of technological convergence. In addition, the
                                                                    patent, and then combining the dynamic time weights to
technology timeliness is often distinguished by
                                                                    obtain the final value. To capture changes in the
technology lifecycle segmentation [15] and linear weight
                                                                    importance of technology in each convergence, we
assignment [16], which are too broad and difficult to
                                                                    improve the timeliness of technology by refining it from
capture small differences between different
                                                                    the lifecycle and dates to more precise convergence time
technologies. In this paper, a technological combination
                                                                    points, constructing a dynamic technological co-
co-occurring within a same patent is considered a
                                                                    occurrence network. Finally, we use link prediction to
convergence event. As shown in Figure 1(a), if
                                                                    explore the prediction of technology convergence,
technologies 𝑇1 𝑇2 𝑇3 co-occur with the same frequency,
                                                                    aiming to better evaluate the timeliness of technology
their importance is considered equal, and there is no
                                                                    and its impact on convergence.
distinction made among the timeliness of their co-
occurrence at different points in time. Actually, different
technologies contribute differently to the overall
                                                                    2. Data and Method
technological combination and have different timeliness                 The method for predicting technology convergence
with each co-occurrence. As a result, their impact within           from a timeliness perspective includes three parts, as
the technological network differs in scope and extent.              shown in Figure 2. Firstly, this paper extracts and filter
For example, as shown in Figure 1(b), although                      out the valid technical topics characterizing the technical
technology 𝑇1 is present in each convergence, its                   categories in patent texts. Then, cosine similarity is used
contribution declines over time, indicating declining               for semantic similarity computation on the patent texts
importance and possibly gradual obsolescence. On the                and technical topics to measure the contribution of
other hand, technology 𝑇2 maintains a stable                        different technical topics in each co-occurrence. To
contribution, suggesting that it may be a foundational              obtain the total contribution score for technical topics,
technology or in a phase of steady development.                     this study introduces dynamic time weights and follows
Meanwhile, technology 𝑇3 shows a higher contribution,               the principle of time decay to sum the contributions
indicating a greater impact or greater timeliness within            from each co-occurrence. Then we extract co-
the technology combination, making it more likely to                occurrence relationships to construct a technological
combine with other technologies.                                    network. Label the contribution of each technical topic
                                                                    on the matching nodes to build a dynamic technical topic
                                                                    co-occurrence network. Finally, graph neural networks
                                                                    are used to learn the node representations of technical
                                                                    topics, and quantitative evaluation is performed by link
                                                                    prediction.

Figure 1(a).                                                        2.1. Data collection
                                                                       In this paper, the full text data of patent applications
                                                                    were batch downloaded from the USPTO (United States
                                                                    Patent and Trademark Office) patent search platform in


                                                               35
December 2023, parsed and stored in a PostgreSQL                   data from 2012-2021 and 6,817 patents were used as
database. We use SQL queries to search for relevant                test data from 2022-2023. The training set contains
patents in the field of new energy vehicles, as shown in           192,602 co-occurring relationships. Relationships that
Figure 3. A total of 23,792 relevant patents were                  were not present in the training set were filtered out to
retrieved and the titles, abstracts and application time of        create the actual test set. An equal number of negative
the patents were extracted as the data source for the              samples
study. A total of 16,975 patents were used as training


Figure 2: Framework of the method


Figure 3: SQL statement for querying patents related to new energy vehicles

were generated, resulting in a final test set of 51,562            2.2.1. Extraction of technical topics
relationships.
                                                                       Technical topics offer a more flexible and
                                                                   comprehensive expression of technical content, making
2.2. Construction of Contribution Index and
                                                                   them more explainable. Therefore, we choose to use
Dynamic Network
                                                                   technical topics to represent different technical
    Firstly, this paper extracts technical topics from             categories.
patent text, representing specific technical categories.               This paper determines the optimal number of topics
Secondly, we sum the semantic similarity between                   based on the topic coherence score. And each technical
technical topics and patent texts using dynamic time               topic has 20 representative keywords to reduce overlap
weights, to represent the contribution of the technology.          between topics. As shown in Figure 4, u_mass and c_v
Then, we construct a dynamic network of technical                  gradually converged when the number of topics was
topics by integrating the technological contribution               around 500. After comparing extreme values, 507 was
index. This will help the network to reflect changes in the        identified as the optimal number of topics for this paper.
contribution of technology over time and provide more              Secondly, the TF-IDF weighting is applied to improve the
technical clues.                                                   LDA model's process of generating feature words for
                                                                   technical topic extraction, with the aim of improving the
                                                                   representativeness of the topic words.


                                                              36
                                                                  is the current year, and 𝑇𝑖 is the year when the nth
                                                                  convergence occurs.

                                                                  2.2.3. Construction of the dynamic technical
                                                                  topic co-occurrence network
                                                                      A dynamic technical topic co-occurrence network
                                                                  construction primarily involves the following two steps.
                                                                  The first step is to identify the technical topics present in
                                                                  the patent, we set the probability distribution threshold
                                                                  to 0.2 [15]. Technical topics exceeding this threshold are
                                                                  considered to be present in the patent, resulting in the
Figure 4: U_mass and C_v variation curves
                                                                  generation of a technology co-occurrence matrix. Then
2.2.2. Calculation of technical contribution index                we extract co-occurrence relationships using the
                                                                  networkx package, forming node pairs that represent
    As the technical topics and patents in this paper are
                                                                  technical topics. Finally, we mark the obtained technical
both textual content, and the higher the similarity
                                                                  topic contributions from Section 2.2.2 on the matching
between technical topics and patent texts, the higher
                                                                  nodes, establishing a dynamic co-occurrence network of
the weight of that technology in the patent. Therefore,
                                                                  technical topics.
we use the semantic similarity between technical topics
and patent texts to represent the contribution of                 2.3. Prediction of technology convergence
technology in each co-occurrence. The study uses
                                                                  based on graph neural networks
Doc2vec [17] to obtain semantic representations of
technical topics and patent texts respectively. Then, it              We initially employ a graph neural network model to
applies cosine similarity to calculate the semantic               aggregate the structural and nodal attribute information
similarity between them, obtaining the contribution               of the technological co-occurrence network. This helps
values of different technologies in each co-occurrence,           to address the issue of sparse feature dimensions in
as shown in Formula (1). Thirdly, it is important to              technology convergence prediction, resulting in a more
consider the timeliness of technology. The further away           accurate representation of node features. Secondly, we
from the current moment, the lower the timeliness                 transform the research on predicting technology
tends to be. To address this, dynamic time weights are            convergence into a link prediction problem. Probability
introduced, based on the retention function of memory             scores are then calculated for the technology
capacity [18]. This assigns weighted sums to the                  combinations formed between technical topic nodes,
contribution of technical topics in each convergence,             and the model's performance is evaluated using the AUC
resulting in the final contribution index score for the           metric.
given technical topic, as shown in Formula (2).
                                                                  2.3.1. Node embedding based on graph neural
                           ∑𝑚
                            𝑖=1(𝑃𝑖 × 𝑇𝑖 )
                                                                  network model
    𝑐𝑜𝑛𝑇𝑖𝑛 = 𝑐𝑜𝑠(𝜃) =                            ,   (1)
                      √∑𝑖=1(𝑃𝑖 )2 × √∑𝑚
                        𝑚
                                      𝑖=1(𝑇𝑖 )
                                               2
                                                                      Graph neural network model can automatically
                                                                  capture high-level abstract representations of networks
    Formula (1) defines 𝑐𝑜𝑛𝑇𝑖𝑛 as the contribution of             by aggregating low-level information, avoiding the need
technical topic 𝑇𝑖 in the nth co-occurrence. The semantic         for complex feature engineering. These models combine
representation m-dimensional vectors of patent i and              both topological structure and attribute information for
technical topic i are denoted as 𝑃𝑖 and 𝑇𝑖 , respectively.        learning, effectively aggregating attribute features and
                   𝑛
                                                                  topological structure information from neighboring
         𝑐𝑜𝑛𝑇𝑖 = ∑ 𝑐𝑜𝑛𝑇𝑖𝑛 × 𝑇𝑖𝑚𝑒𝑤𝑒𝑖𝑔ℎ𝑡 ,             (2)
                   1
                                                                  nodes, to obtain a more accurate feature representation
                                𝑒 0.42                            for the target node.
            𝑇𝑖𝑚𝑒𝑤𝑒𝑖𝑔ℎ𝑡 =                    ,
                           (𝑡0 + 𝑡𝑖 )0.0225          (3)              This paper uses technical topics in patents as nodes,
                                                                  with co-occurrence relationships between topics serving
   In Formula (2), 𝑐𝑜𝑛 𝑇𝑖 represents the weighted sum             as edges in the graph. The technical contribution
of contributions of technical topic 𝑇𝑖 in all co-                 features are combined and used as node attribute
occurrences, 𝑇𝑖𝑚𝑒𝑤𝑒𝑖𝑔ℎ𝑡 is the dynamic time weight, 𝑇0            information. Specifically, we first use the co-occurrence


                                                             37
relationships in the training set as the graph structure.            For the three types of co-occurrence networks, we
The contribution index of corresponding nodes is input           use three graph neural network models, namely GCN,
as node attribute information into the graph neural              GNN and GAT, to learn node representations, using link
network for training, thereby obtaining the embedding            prediction for quantitative evaluation. The main
vectors of known technical topics. Secondly, using a link        difference between GCN and traditional GNN lies in the
prediction model, we calculate the probability of fusion         use of convolutional operators for information
between technical nodes, obtaining fusion scores                 aggregation, while GAT uses self-attention mechanisms
between nodes. The link prediction model is introduced           for node weight allocation. The results for different
in Section 2.3.2. Additionally, different graph neural           feature networks and methods are shown in Table 1.
network models have their own characteristics. This
paper will compare models and choose the one most                Table 1
suitable for prediction of technology convergence.               AUC of Different Methods
                                                                                   T-Co1         T-Co2          T-Co3
2.3.2. Link prediction model based on probability                    GCN          0.7120        0.7106         0.7998
ranking                                                              GNN          0.7106        0.7125         0.7236
     The prediction of technology convergence can be                 GAT          0.6379        0.6411         0.6731
simplified as predicting the emergence of a new link
edge. In this context, technologies can be seen as nodes,            The results show that the performance of T-Co3 is
and the relationships between them as convergence                generally superior to T-Co1 and T-Co2 across different
links. Thus, this paper transforms the task of predicting        model representations, with GCN performing best on T-
technology convergence opportunities into a link                 Co3. In the GCN model, the AUC value of T-Co3 has
prediction problem for research.                                 increased by 8.78% compared to T-Co1 and 8.92%
     The link prediction method proposed in this paper           compared to T-Co2. In the GNN and GAT models, the
relies on a co-occurrence graph of technical topics,             AUC value of T-Co3 has also increased by 1.3% and 3.52%,
where the relationships between technical topics serve           respectively. Compared to other indicators of
as edges. The technology contribution is trained as node         importance, the contribution index reflecting
features on the co-occurrence relationships of technical         technological timeliness provides better, more
topics. Once the representations of the technical topic          comprehensive, and accurate clues for predicting
nodes are obtained, the probability score for the                technological convergence. And in this experiment, the
technology combination formed by two points is                   GCN model performed better and showed better
calculated. This probability score can be regarded as the        discriminative capabilities for different features. It is
link prediction score. The higher the score, the greater         more suitable for the technology convergence prediction
the possibility of a future link between the two nodes,          task in this paper.
indicating a higher probability of convergence between
these two technical topics. Finally, we choose AUC as the        4. Conclusion
evaluation metric to assess the performance of the
                                                                     This paper refines the assessment of technological
prediction model based on graph neural networks.
                                                                 importance from a timeliness perspective, shifting from
                                                                 traditional distinctions based on lifecycle and dates to a
3. Results
                                                                 more precise measurement within each convergence
    This paper generated three co-occurrence networks            event. We replace frequency indicators in the co-
with different features, to compare and validate the             occurrence network with the technological contribution
effectiveness of the proposed method. The first network,         index for building dynamic technology networks. The
T-Co1, only considers the frequency of co-occurrence of          results show that this approach outperforms frequency-
technical topics. The second network, T-Co2, includes            based models. As a next step, we aim to improve the
centrality indices as features for technical topic nodes.        technological timeliness index by incorporating
The third network, T-Co3, integrates technical                   additional temporal cues. In addition, the exploration of
contribution as features for technical topic nodes. The          more efficient embedding models is expected to
centrality measure chosen here is degree centrality,             improve predictive performance.
which reflects the number of connections a node has. A
higher degree centrality indicates a stronger node
centrality, signifying greater importance.


                                                            38
Acknowledgements                                                         semantic analyses vs IPC co-classification analyses
                                                                         of patents, Foresight 15 (2013) 446-64.
    This work is supported by the National Natural                [10]   Liu W, Yang Z, Cao Y, Huo J, Discovering the
Science Foundation of China (No. 72374103, 71974095)                     influences of the patent innovations on the stock
and the Postgraduate Research & Practice Innovation                      market, Information Processing and Management
Program of Jiangsu Province (No. SJCX23_0161).                           59 (2022) doi: 10.1016/j.ipm.2022.102908
                                                                  [11]   Zhang Y, Wu M, Miao W, Huang L, Lu J, Bi-layer
References                                                               network       analytics:   A    methodology      for
[1]   C. S. Curran, J. Leker, Patent Indicators for                      characterizing       emerging      general-purpose
      Monitoring Convergence – Examples from Nff and                     technologies, SSRN Electronic Journal 15 (2021).
      Ict, Technological Forecasting & Social Change 78                  doi:10.1016/j.joi.2021.101202
      (2011)                                      256-73.         [12]   Caferoglu H, Elsner D, Moehrle MG, The Interplay
      doi:10.1016/j.techfore.2010.06.021                                 Between        Technology      and     Pre-Industry
[2]   M. Karvonen, T. Kassi, Patent Citation Analysis as a               Convergence: An Analysis in the Technology Field
      Tool for Analysing Industry Convergence, 2011                      of Smart Mobility, IEEE Transactions on
      Proceedings of PICMET' 11: Technology                              Engineering Management 70 (2021) 1504-1517.
      Management in the Energy Smart World (PICMET),                     doi:10.1109/TEM.2021.3092211
      IEEE, Portland, USA, 2011.                                  [13]   Fu Q, Sun Y, Evaluation of Technological Influence:
[3]   K. Sung, H. K. Kong, and T. Kim, Convergence                       Based on Patent Timeliness and PageRank
      Indicator: The Case of Cloud Computing, Journal of                 Algorithm, Journal of Systems & Management 27
      Supercomputing          65      (2013)        27-37.               (2018) 352-8.
      doi:10.1007/s11227-011-0706-1                               [14]   Wenjing Zhu, Bohong Ma, Lele Kang, Technology
[4]   I. Park and B. Yoon, Technological Opportunity                     convergence among various technical fields:
      Discovery for Technological Convergence Based on                   improvement of entropy estimation in patent
      the Prediction of Technology Knowledge Flow in a                   analysis, Scientometrics 127(2022) 7731-50.
      Citation Network, Journal of Inforindexs 12 (2018)          [15]   J. Ma, C. Wang, L. Yan, and S. Yao, Analysis of
      1199-222. doi:10.1016/j.joi.2018.09.007                            Patent Technology Topic Evolution Based on
[5]   Li C, Zhou J, Yang Z, Research on the Identification               Product Life Cycle, Journal of the China Society for
      of Technology Fusion Growth Points from the                        Scientific and Technical Information 41 (2022) 684-
      Perspective of Dynamic Evolution Process, Library                  691. doi:10.3772/j.issn.1000-0135.2022.07.003
      And Information Service 66 (2022) 99-109. doi:              [16]   Zhang X, Liu H, Shi J, Mao C, Meng G, LSTM and
      10.13266/j.issn.0252-3116.2022.07.010                              artificial neural network for urban bus travel time
[6]   Zhang J, Li Y, Technology Convergence Prediction                   prediction based on spatiotemporal eigenvectors,
      by the Semantic Representation of Patent                           Journal of Computer Applications 41(2021) 875-
      Classification Sequence and Text, Journal of the                   880. doi:10.11772/j.issn.1001-9081.2020060467
      China Society for Scientific and Technical                  [17]   Le, Q. V. and Mikolov, T, Distributed
      Information          41       (2022)        609-24.                Representations of Sentences and Documents,
      doi:10.3772/j.issn.1000-0135.2022.06.006                           JMLR.org, 2014. doi:10.48550/arXiv.1405.4053.
[7]   Zhang Y, Zhang G, Chen H, Porter AL, Zhu D and Lu           [18]   Jiang Z, On the forgetting function - a
      J, Topic analysis and forecasting for science,                     mathematical discussion on the psychology of
      technology and innovation: methodology with a                      memory, Advances in Psychological Science
      case study focusing on big data research,                          56(1988) 56-60.
      Technological Forecasting And Social Change 105
      (2016)                                      179-91.
      doi:10.1016/j.techfore.2016.01.015
[8]   Wang L, Liu X, Measuring Diffusion of Technology
      Topics with Patent Data, Data Analysis and
      Knowledge Discovery 6 (2022) 1-10. doi:
      10.11925/infotech.2096-3467.2021.0915
[9]   Daim T, Preschitschek N, Niemann H, Leker J, G.
      Moehrle M, Anticipating industry convergence:


                                                             39