<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Parlina A, Ramli K, &amp; Murfi H. Theme Mapping and
understand the evolution trends and the changing Bibliometrics Analysis of One Decade of Big Data
patterns of material performance. Research in the Scopus Database. Information</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3390/info11020069</article-id>
      <title-group>
        <article-title>Material performance evolution discovery based on entity extraction and social circle theory⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jinzhu Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wenwen Sun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Management, School of Economics and Management, Nanjing University of Science and Technology</institution>
          ,
          <addr-line>Nanjing</addr-line>
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>11</volume>
      <issue>72374103</issue>
      <fpage>175</fpage>
      <lpage>183</lpage>
      <abstract>
        <p>Topic evolution analysis describes the emerge, develop, and extinct of topics in a field, which can help researchers understand the history and current situation of the research field. However, the material patent text has a certain domain specificity, and the general entity extraction models cannot extract special entities effectively. Moreover, the belief that topics with high similarity have evolution relationship contradicts the rule of “first the change, then the new topic”, which cannot clearly present the dynamic changes and accumulation of topics. Therefore, we design a method to extract the material performance entities accurately and construct dynamic evolution path for material performance topics. Firstly, we propose a material entity extraction model BERT-BiLSTMCRF, which integrates syntactic dependency analysis and attention mechanism, realizing the accurate extraction of material performance entities. Secondly, we design an algorithm for identifying the evolution relationship between performance nodes based on ring boundaries, which can mine the evolution relationship between performance nodes and existing topics, realizing the dynamic accumulation and change of topics. Finally, we construct the dynamic evolution path of material performance, exploring the complex associations of material performance. Experiments in the field of metal materials confirm that the proposed method can effectively construct the dynamic evolution path of material performance topics, which makes the evolution relationships between topics more abundant and interpretable.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Entity extraction</kwd>
        <kwd>material performance evolution</kwd>
        <kwd>patent entity relationship1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>With the emergence of a large number of patents on
materials manufacturing and materials innovation, it has
become critical to explore the complex associations and
evolutionary trends of material performance. Such
exploration can help researchers deepen their
understanding of material performance and promotes
the invention of new materials [1].</p>
      <p>Through the review of existing studies, we learned that
the material performance mainly depends on their
microstructure, while the manufacture process directly
affect the material's microstructure. So, current
researches perspectives on the evolution of material
performance are mainly divided into the three
perspectives: "performance evolution - microstructure" ,
"performance evolution - microstructure - manufacture
process" , and "performance evolution - manufacture
process". The specific relationships are shown in Figure
1. Among them, Perspectives I and II [2, 3, 4] involve
microstructure, require high levels of expertise and
experimental equipment from researchers and readers ;
Perspectives III [5] usually proceed in the form of
controlling variables, such as varying the temperature of
a certain process, then exploring the influence of on the
microstructure of the material and the evolution of the
material performance, which to some extents restricts
the comprehensive understanding and description of the
material performance evolution.</p>
      <p>III
affect
material
performance</p>
      <p>I
determine
material s
microstructure</p>
      <p>II
change
manufacture
process</p>
    </sec>
    <sec id="sec-2">
      <title>2. Data and method</title>
      <p>This paper takes the material performance as the
research object. Firstly, we integrate syntactic
dependency analysis and attention mechanism to
construct the material entity extraction model
(BERTBiLSTM-CRF), we obtaining the performance nodes of
each material. And then divided the performance nodes
of all materials into time batches by year. After that, we
designed an algorithm based on the initial performance
topics, to realize the dynamic accumulation of material
performance topics and the construction of evolution
path.</p>
      <p>We considered that to do material evolution, if we
collect data at random intervals, it may lead to a lack of
completeness and accuracy in the final analysis of the
evolution results.So we takes the concept of Germany’s
“Industrie 4.0” as the background, and selects metal
material as an example, which is one of the key
foundational materials closely related to this concept.
Then we use the Derwent Innovations Index database as
the patent data retrieval platform. The patent search
expression is “TS=(‘Metal materials’ OR ‘Metallic
materials’ OR ‘Metal alloys’ OR ‘Metal compositions’ OR
‘Metal-based materials’) AND WC=(‘Materials Science’)”,
with a time interval from 2011 to 2023, where 2011 is
the year when the concept of "Industry 4.0" was first
introduced. Then, the top 10,000 relevant patent texts
were selected as the dataset. In addition, considering the
number of patents in each period, we divide it into year
batches for material performance evolution.</p>
      <sec id="sec-2-1">
        <title>2.2. Method for extracting the material entities</title>
        <p>Under the background of the continuous
improvement of the manufacture process, the
processing method of materials is constantly progressing.
At the same time, the change in the manufacture process
of the material will bring about changes in the material
performance [13].</p>
        <p>Therefore, we defined the performance entity and
manufacture process entity of metal material. In
addition, we refer to the relationship shown in Fig. 1,
establishing the causal relationship between the two for
subsequent analysis. (Among them, the manufacture
process entity and causal relationship will be used in our
next step of exploring the reasons for performance
evolution, so it is rarely involved in this paper.)</p>
        <p>Then, considering the content of material patents
contains a large number of technical terms, material
components, we constructed an entity extraction model
(BERT-BiLSTM-CRF) by combining syntactic dependency
analysis and attention mechanism. The combined use of
these methods provides a more accurate extraction of
the material performance entities and manufacture
process entities from patent contents, providing a basis
for subsequent material analysis and research.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.3. Method for constructing dynamic evolution path for material performance topics</title>
        <p>In this part, we first define six evolution types, then
we designed an algorithm for identifying the evolution
relationship of performance nodes based on ring
boundaries. Finally, we present the detailed process of
the method for constructing dynamic evolutionary path
of material performance topics.</p>
        <sec id="sec-2-2-1">
          <title>2.3.1. Social Circle Theory</title>
          <p>Social circle theory suggests that the social circle
formed around a person reflects the closeness of his or
her social relationships. That is, a person's intimate social
circle usually consists of relationships with a high degree
of relevance; then followed by the normal friends circle
and the strangers circle. In addition, there may exist such
a part of people in the sea of people: they are
temporarily outside your normal friends circle, but there
are certain similarities between each other, and they
may become your friends or even intimate friends in the
future, so this paper defines them as potential friends.
Therefore, centered on the individual, their affinity rank
order is: intimate friends, normal friends, potential
friends, strangers, and the position belongs to: within
the intimate friends circle, outside the intimate friends
circle within the normal friends circle, outside strangers
circle within the normal friends circle, outside the
strangers circle, specifically as shown in Figure 2.
intimate
friends circle</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.3.2. Definition of evolution types</title>
          <p>This paper defines six evolution types based on
existing studies. Among them, the four types of develop,
evolve, emerge, and fuse are derived from four different
social relationships in the social circle theory, and the
two types of extinct and split refer to the existing studies
to ensure the diversity of evolution types. In addition,
this paper also improves the fuse and split types, by
further refining the different contributions of each
theme in them, which helps to consider the dynamic
interactions between themes in more detail. See
Appendix B for details.</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>2.3.3. Identifying the evolution relationship of performance nodes based on ring boundaries</title>
          <p>We refer to social circle theory and improve the
model proposed by Zhang et al. [14], proposing
algorithm for identifying the evolution relationship of
performance nodes based on ring boundaries, the
specific algorithm and its correspondence are shown in
Figure 3. Firstly, for the existing performance topics, the
centroid of each topic is calculated, and the maximum
Euclidean distance between each topic’s patent and its
centroid is taken as the topic boundary. The topic
boundary is extended outward by a ratio less than 1 to
obtain the outer ring boundary and shrunk inward by the
same ratio to obtain the inner ring boundary. After
several comparison tests, we finally set the ratio in this
study to 0.2.</p>
          <p>Strangers</p>
          <p>Potential
Friends</p>
          <p>Normal
Friends
Mutual
Friends</p>
          <p>Intimate
Friends
intimate
friendscircle
normal
friendscircle
Strangerscircle</p>
          <p>Then, hierarchical clustering is introduced to obtain
the different types of performance topics in the batch,
and merge similar topics that exceed a threshold (we set
it to 0.8 in this paper). For a topic in the previous batch,
if its number of topics obtained more than two in this
batch, the evolution type is considered as split.
Furthermore, in the construction of the evolutionary
path, if a topic has no evolution relationship with the
following topics, we consider it as extinct type.</p>
        </sec>
        <sec id="sec-2-2-4">
          <title>2.3.4. Construction of the dynamic evolution path for metal material performance topics</title>
          <p>The construction of the dynamic evolution path of
metal material performance mainly includes the
following steps, which are shown in Figure 4. Firstly, after
the extraction of performance entities of each material,
we get the performance node of each material, and then,
all material performance nodes are divided into time
batches according to the year. Secondly, the K-Means
algorithm is used to cluster the first batch of data to
obtain the initial performance topics.</p>
          <p>Subsequently, for performance nodes in subsequent
batches, the algorithm for identifying the evolution
relationship of performance nodes (see Section 3.3.3 for
the specific algorithmic process) is used to identify their
evolution relationships with each performance topic.
Then, hierarchical clustering is introduced to obtain the
performance topics of different evolution types in this
batch, and merge similar topics that exceed a threshold.
Finally, incremental iterations are carried out in the
above manner to obtain the material performance topics
at different year batches, thereby achieving the dynamic
construction of the material performance evolution path.
(see Appendix D for the entire picture, where Cluster
stands for topic)
One time slice</p>
          <p>Cluster1 Cluster2
Metal Material
Patent Abstract Text</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Result</title>
      <p>In the construction of the evolution path of metal
material performance, we use examples from the years
2021-2023 to obtain the results. Specifically, the number
of performance clusters in 2021, 2022, and 2023 are 66,
55 and 60. The result of the evolution path are shown in
Figure 5, where yellow, red, and green represent the
years 2021, 2022, and 2023 respectively. (see Appendix
E for the entire picture, where Cluster stands for topic).</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>This paper proposes an algorithm for identifying the
evolution relationship of performance nodes based on
ring boundaries, which can not only realize the dynamic
accumulation and construction of metal material
performance evolution path, but enrich and improve the
topic evolutionary analysis method. Currently, we are
combining the manufacture process entities of each
material to further analyze the causes of the evolution of</p>
    </sec>
    <sec id="sec-5">
      <title>A. Appendices</title>
      <p>Strangers
Potential
Friends
Intimate
Friends</p>
    </sec>
    <sec id="sec-6">
      <title>B. Appendices</title>
      <p>fuse
Cluster3</p>
      <sec id="sec-6-1">
        <title>Explanation</title>
        <p>Cluster2 only develops
from Cluster1
Cluster2 only evolves from
Cluster1
Cluster4 is treated as an
emerge Cluster which has
no evolutionary
relationship with previous
Clusters.
cluster3 comes from
cluster1 and cluster2
together.
/
/</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>C. Appendices</title>
      <p>Cluster3
topic
boundary</p>
    </sec>
    <sec id="sec-8">
      <title>D. Appendices</title>
      <p>One time slice
Cluster1</p>
      <p>Cluster2</p>
      <p>Metal Material
Patent Abstract Text
fuse
Data flow
Cluster3
Cluster1</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>