=Paper= {{Paper |id=Vol-3004/paper13 |storemode=property |title=Topic Evolution Path and Semantic Relationship Discovery Based on Patent Entity Relationship |pdfUrl=https://ceur-ws.org/Vol-3004/paper13.pdf |volume=Vol-3004 |authors=Jinzhu Zhang,Linqi Jiang |dblpUrl=https://dblp.org/rec/conf/jcdl/ZhangJ21 }} ==Topic Evolution Path and Semantic Relationship Discovery Based on Patent Entity Relationship== https://ceur-ws.org/Vol-3004/paper13.pdf
                EEKE 2021 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents


    Topic Evolution Path and Semantic Relationship Discovery Based on
                        Patent Entity Relationship∗
                                                           Jinzhu Zhang†
                             Department of Information Management, School of Economics & Management
                                            Nanjing University of Science and Technology
                                                           Nanjing China
                                                      zhangjinzhu@njust.edu.cn

                                                             Linqi Jiang
                             Department of Information Management, School of Economics & Management
                                            Nanjing University of Science and Technology
                                                            Nanjing China
                                                        sufi_jiang@163.com

ABSTRACT
Topic evolution analysis describes the emergence, transition, and        1   Introduction
extinction of a topic in a technical field, which can help               Topic evolution analysis describes the emergence, transition, and
researchers understand the history and current situation of the          extinction of a topic in a technical field, which can help
research field. Current studies is mainly patent text-based              researchers understand the history and current situation of the
methods, which often uses relationships among keywords to                research field. The result can quickly identify research hotspots,
construct co-occurrence network and analyses evolution using             trends and gaps, which is essential to scientific and technological
topic clustering algorithms. However, it didn't consider all the         innovation (Liu H,2020).
words in the patent and the semantic relationship between them.          In the study of topic evolution analysis, topic evolution path and
In addition, the relationships among topics should be more               relationship discovery play important role in related research.
concrete, we should not only find the evolution relationship, but        Current studies could be classified into two classes, including
also need to reveal the semantic relationships among topics.             patent citation analysis-based and patent text-based methods (Yu
Therefore, this paper uses representation learning method to get         D,2020). This paper focuses on the latter method, which often
the semantic representation of each entity/word, and computes the        uses relationships among keywords to construct co-occurrence
semantic similarity among them to find out pairs of words which          network and analyses evolution using topic clustering algorithms
are different but with the same meaning in a special context.            (No H. J,2015). Then the evolution path and relationship are
Moreover, we define multiple semantic relationships among                discovered through comparisons of common keywords among
topics, and design a method to use patent entity relationships to        topics in different time series.
obtain the semantic relationships among topics. Experiments in           However, these common keywords cannot cover the pair of words
the technical field of UAV transportation have confirmed that the        which are different but with the same meaning in a special context.
method in this paper can effectively identify the evolutionary           In addition, the relationships among topics should be more
relationship between topics and the semantic relationship between        concrete, for example, we should not only find the evolution
topic, Make the evolutionary relationship between topics more            relationship (i.e., emergence, transition, and extinction), but also
abundant and Interpretable. And provide a reference for further          need to reveal the semantic relationships among topics (i.e.,
enriching and improving the topic evolution analysis method.             function-realization or function-area).
                                                                         Therefore, this paper uses representation learning method
KEYWORDS                                                                 (Birunda,2021) to get the semantic representation of each
Topic Evolution Path, semantic relationship between topic, Patent        entity/word, and computes the semantic similarity among them to
Entity Relationship                                                      find out pairs of words which are different but with the same
                                                                         meaning in a special context. Moreover, we define multiple
                                                                         semantic relationships among topics, and design a method to use
                                                                         patent entity relationships to obtain the semantic relationships
                                                                         among topics.


                                                                         2   Data and method
                                                                         Firstly, a manual labelled dataset is made and a neural network
                                                                         model is trained to extract all patent entities. Then the topic is
                                                                         identified by the clustering method and the evolution path is
Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).



                                                                    77
                 EEKE 2021 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents

detected based on semantic similarity among. Finally, a neural
network model is trained to extract the relationship between
patent entities, and the semantic relationship among topics is
discovered through relationships between patent entities                                division


2.1 Data collection                                                                                      Topic T1(t1) is divided two or more topics in
This paper uses the Derwent Innovations Index database as the                                                                  t2
patent data retrieval platform and the data is retrieved on July
2019. The patent search expression is “IP = B64* AND TI= (((un-
manned OR automatic OR autonomous OR remotely piloted OR
nonhuman) AND (aircraft OR "aerial vehicle" OR airship* OR                            integration
drone OR plane OR aircraft* OR airplane OR aerobat* OR
aerostat*)) OR "UAV"”, the time interval is from 2008 to 2020. A
total of 4507 patents with title, abstract, patent application date                                       Two or more topics integrate into one topic
and other features are retrieved and processed as the data source.
It is divided into different time series for evolution analysis
considering the number of patents in each time period.
                                                                                       extinction
2.2 Discovery of Topic Evolution Path Based on
    Semantic Similarity Among Entities
                                                                                                           T2(t2) is extinct which has no evolutionary
It has following four steps for semantic similarity among entities.                                        relationship with the following topic in t3
Firstly, a subset about training and testing set is made where the
entities are manually labelled. Secondly, a BiLSTM-CRF (Lample,
G.2016) model is trained on this dataset and evaluated through
quantitative indicators. After 100 training iterations, the accuracy
of the model exceeded 90% and became stable. Similarly, the loss                       emerging
dropped to below 5.8 and stabilized. Thirdly, this model is used to
detect entities on all patens of each patent. Finally, K-Means is
                                                                                                          T2(t2) is treated as an emerging topic which
applied for clustering topics and get entities of each topic. Patent                                        has no evolutionary relationship with
documents contain a lot of long professional vocabulary.                                                                  previous topic.
Compared with commonly used LDA, the identified patent entity
will not lose professional information. Fourthly, a word                    2.3 Discovery of Semantic Relationship Among
representation learning method is applied and the semantic                      Topics
similarity among entities of each topic could be calculated.
                                                                            Firstly, we predefine five types of semantic relationships among
Based on semantic similarity among entities, a similarity
                                                                            patent entities, which are shown in Table 2. Secondly, we
threshold is determined, in which two different entities could be           manually label a small dataset for training and testing with
treated as the same meaning if the similarity is higher than                predefined relationships. Thirdly, we train a OpenNRE (Han, X.,
threshold. We define five topic evolution patterns, including               2019) model on this dataset and evaluate it through quantitative
development, division, integration, extinction, and emerging.               indicators. Fourthly, the model is used to predict all relationships
They are shown in Table 1, in which the size of the circle
                                                                            among entities. Finally, the semantic relationship between two
represents the number of entities under each topic. Moreover,
                                                                            topics is determined based on the semantic relationship among all
T1(t1) and T2(t2) means the topic T1 and T2 at time t1 and t2
                                                                            pairs of entities.
respectively.
                                                                                       Table 2: Semantic relationship among topics
         Table 1: Five patterns of topic evolution path
                                                                               Semantic relationship                   Expression and illustration
        Evolution                                                                    Mechanical              refers to the containment relationship, position
                              Expression and illustration
         pattern                                                                   relationship(M)             relationship, etc. of some mechanical parts.
                                                                                       Efficacy
                                                                                                            refers to the relationship of efficacy enhancement
                                                                                   relationship(E)
                                                                                    Function-Area        refers to the application field or function field of certain
       development
                                                                                  relationship(FA)                          machines or systems.
                                                                                Function-Realization      refers to some patented devices or systems that realize
                           Topic T1(t2) only comes from T1(t1)
                                                                                 Relationship(FR)                              certain functions.
                                                                               Control relationship(C)             refers to a certain control relationship.




                                                                       78
                         EEKE 2021 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents

3    Result                                                                                                 analysis method. In the next step, we would like to apply other
                                                                                                            neural network models that may do better in patent entity
We set semantic similarity to 0.7 where a pair of entities with
                                                                                                            relationship extraction and compared with baseline method for
similarity higher than it is considered to have the same meaning.
                                                                                                            deep analysis.
In the discovery of topic evolution path, the results are obtained
from 2015-2016 as an example. There are six topics in 2015 and
eight topics in 2016, where the evolution probability is shown in                                           ACKNOWLEDGMENTS
Table 3.
                                                                                                            This work is supported by the National Natural Science
                                                                                                            Foundation of China (No. 71974095).
        Table 3: Evolution probability among topics (%)
2015/2016       T0(t2)       T1(t2)      T2(t2)     T3(t2)     T4(t2)     T5(t2)    T6(t2)    T7(t2)        REFERENCES
  T0(t1)         5.1         14.4        32.4       12.4         0        20.5      16.5        0
  T1(t1)        40.0         21.9        24.5        7.2        5.4       10.6       1.4       6.8
                                                                                                            Liu, H., Chen, Z., Tang, J. et al. Mapping the technology
  T2(t1)         4.8         24.7        18.2       20.5         0        14.0      19.3        0               evolution path: a novel model for dynamic topic detection and
  T3(t1)        100          11.2        12.9         0        100         1.6        0       100               tracking. Scientometrics 125, 2043–2090 (2020).
  T4(t1)         3.4         23.3        18.4       13.5         0        17.9      24.3        0               https://doi.org/10.1007/s11192-020-03700-5
  T5(t1)         5.0         14.0         8.6       34.1         0.       23.0      16.2        0
                                                                                                            Yu, D., Xu, Z., & Wang, X. (2020). Bibliometric analysis of
As shown in the Table 3, the pairs of topics with probability more                                              support vector machines research trend: a case study in
than 20% are emphasized with bold. They mean that there is a                                                    China. International Journal of Machine Learning and
certain evolutionary relationship between them. For example,                                                    Cybernetics, 11(3), 715-728.
T1(t1), T2(t1), T4(t1) are integrated into T1(t2). T1(t1) is divided into                                   No, H. J., An, Y., & Park, Y. (2015). A structured approach to
T0(t2), T1(t2) and T2(t2). T3(t1) is developed into T7(t2) because these                                        explore knowledge flows through technology-based business
two topics are almost the same.                                                                                 methods by integrating patent citation analysis and text
Then, we obtain the semantic relationships between topics which                                                 mining. Technological Forecasting and Social Change, 97,
have an evolution path, which is shown in Table 4. The                                                          181-192.
                                                                                                            Birunda, S. S., & Devi, R. K. (2021). A Review on Word
abbreviations of semantic relationships are illustrated in Table 2.
                                                                                                                Embedding Techniques for Text Classification. In Innovative
                                                                                                                Data Communication Technologies and Application (pp. 267-
            Table 4: Semantic relations of topic evolution                                                      281). Springer, Singapore.
 2015/2016     T0(t2)     T1(t2)      T2(t2)      T3(t2)     T4(t2)     T5(t2)     T6(t2)    T7(t2)         Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., &
   T0(t1)                              E                                 E                                      Dyer, C. (2016). Neural architectures for named entity
   T1(t1)       M           E          E                                                                        recognition. arXiv preprint arXiv:1603.01360.
   T2(t1)                   E                      FA
   T3(t1)       M                                             M                               M
                                                                                                            Han, X., Gao, T., Yao, Y., Ye, D., Liu, Z., & Sun, M. (2019).
   T4(t1)                   E                                                        E                          OpenNRE: An open and extensible toolkit for neural relation
   T5(t1)                                          FA                    FA                                     extraction. arXiv preprint arXiv:1909.13078..
According to the results in Table 4, the semantic relationship
between T1(t1) (UAV communication medium and function) and
T0(t2) (camera device and image transmission system) is
“Mechanical relationship". It shows that patents on
communication media and functions of UAVs have developed
over time, and a large part of the research topics have been split
into image information transmission, which is very consistent
with the development in the field.
In addition, the semantic relationship among T2(t1) (UAV
functional module), T5(t1) (UAV kinetic energy device) and T3(t2)
(UAV application field) is "Function-Area relationship".
Obviously, a large part of patent applications in the field of UAV
transportation are related to UAV system applications, including
geological survey, forest fire prevention, water resources
inspection and protection, environmental science and ecology,
agriculture and other key technologies used in industries.


4    Conclusion
This paper proposes a method for discovery of Topic Evolution
Path and Semantic Relationship among topics Based on Patent
Entity Relationship. The result could prove the effectiveness of
this method and could enrich and improve the topic evolution




                                                                                                       79