<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Novel Approach for Patent Similarity Measurement Based on Sequence Alignment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xin An1</string-name>
          <email>anxin@bjfu.edu.cn</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liang Chen4</string-name>
          <email>25565853@qq.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jinghong Li2</string-name>
          <email>724298617@qq.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sainan Pi5</string-name>
          <email>silencepipi@bjfu.edu.cn</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shuo Xu3*</string-name>
          <email>xushuo@bjut.edu.cn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Scientific and, Technical Information of China</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">P.R. China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Research Base of Beijing Modern, Manufacturing Development, College of Economics and</institution>
          ,
          <addr-line>Management</addr-line>
          ,
          <institution>Beijing University of Technology</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">P.R. China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Economics &amp;</institution>
          ,
          <addr-line>Management</addr-line>
          ,
          <institution>Beijing Forestry University</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">P.R. China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>45</fpage>
      <lpage>49</lpage>
      <abstract>
        <p>Patent similarity measurement, as one of fundamental building blocks for patent analysis, not only can derive technical intelligence efficiently, but also can detect the risk of infringement and evaluate whether the invention meets the criteria of novelty and innovation. However, traditional approaches make implicitly several assumptions, such as bag of words in each component, semantic direction irrelevance and so on. In order to relax these assumptions, this study proposes a novel approach on the basis of sequence alignment, which takes semantic direction of each sequence structure and the word order information of each component into consideration. Meanwhile, an algorithm for calculating the global importance of each sequence structure is put forward. Finally, to verify the effectiveness and performance of the improved semantic analysis, a case study is conducted on the thin film head subfield in the field of hard disk drive. Extensive experimental results show that our approach is significantly more accurate and is not sensitive to several core parameters.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Patent similarity measurement, Semantic analysis, Entities and
semantic relations, Sequence alignment</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction</title>
      <p>
        According to many surveys of authorities, patents cover more than
90% latest technical information of the world, of which 80%
would not be published in other forms [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Thus, patents analysis
is increasingly vital for mining technical intelligence. Patent
similarity measurement, as one of fundamental building blocks for
patent analysis, not only can derive technical intelligence
efficiently, but also can detect the risk of infringement and
evaluate whether the invention meets the criteria of novelty and
innovation [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
*
      </p>
      <sec id="sec-2-1">
        <title>Corresponding author</title>
        <p>
          Nowadays, Subject-Action-Object (SAO) semantic analysis
[
          <xref ref-type="bibr" rid="ref17 ref2 ref4 ref9">2, 4, 9, 17</xref>
          ] is the most widely used method to measure patent
similarity, which stresses the key concepts and functional
relations. By function, it means “the action changing a feature of
any object” [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. That is to say, SAO structure explicitly describes
a relation between components in the patent documents. However,
on closer examination, one can see that traditional SAO semantic
analysis [
          <xref ref-type="bibr" rid="ref17 ref2 ref4 ref9">2, 4, 9, 17</xref>
          ] has several shortcomings. First, the semantic
direction of each SAO structure and the word order in each
component of a SAO structure are not taken into account. Second,
intuitively, each SAO structure carries different amount of
domain-specific information. To say it in another way, the
importance of each SAO structure should be different [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. But
the SAO semantic analysis usually assigns equal weight to each
SAO structure. Last but not least, the SAO semantic analysis only
focuses on the functional relations, but ignores the valuable
technology intelligence underlying in the non-functional relations
which is based on the prepositions [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>In order to overcome these issues, this article proposes an
improved semantic analysis approach for assessing patent
similarity on the basis of sequence alignment. Different from
previous studies, the sequence structures are used in this paper. A
sequence structure can be explained as an “Entity(1) – Relation –
Entity(2)” sequence. This type of structure embraces the functional
and non-functional relations. For example, the phrases, “…the
seed film acting as a stop layer…” and “…planar layers on
opposing sides of a pole piece…”, reflecting the form and spatial
relation respectively, will generate two sequence structures as
“[seed film] (E) – form(R) – [stop layer] (E)” and “[planar layers](E)
– spatial(R) – [pole piece](E)”. It is worth mentioning that the
“sequence” emphasizes two aspects in this study: the semantic
direction of these functional and non-functional structures and the
word order of each entity. Meanwhile, an algorithm for
calculating the global importance of each sequence structure is put
forward.</p>
        <p>Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2 Related Work</title>
      <p>Before delving into more specifies, discussion of the literature
pertinent to patent similarity measurement is in order.</p>
    </sec>
    <sec id="sec-4">
      <title>2.1 Patent Similarity Measurement based on SAO structures</title>
      <p>
        Some researchers utilized SAO structures based on semantic
similarity to evaluate the risk of patent infringement [
        <xref ref-type="bibr" rid="ref2 ref9">2, 9</xref>
        ],
identified the evolving technological trend for R&amp;D planning
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], build a technology tree for technology planning [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and so
on. But in these approaches, each SAO structure is assigned the
same weight. As an improvement, Wang et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] has
constructed a DWSAO indicator through assigning different
weights to SAO structures for measuring patent similarity.
However, it neglects the influence of the number of SAO
structures of patents, which may result in the phenomenon that
patents with high similarity values are actually not similar.
Besides, it is not a symmetrical indicator.
      </p>
      <p>In addition, previous methods implicitly omit the word order
information of each component in a SAO structure. As we all
know, the meaning of a phrase may be varied when the words are
permutated. For example, the phrases “car gasoline” and “gasoline
car” both consist of the same words but in different orders. The
former is a kind of fuels while the latter is one kind of cars, so
they should not be seen as the same thing.</p>
      <p>
        Finally, just as An et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] mentioned, the SAO analysis
only focuses on functional relations between the components, but
ignores the valuable technology intelligence in the form of
nonfunctional relations. They proposed an approach based on
preposition semantic network where prepositions aid to revealing
the relations between keywords related to technologies and
applied it to mine intelligence information in the patents. Thus,
prepositional semantic analysis can be viewed to be
complementary to SAO semantic analysis. This study integrates
functional and non-functional relations, which are collectively
referred to sequence structures.
2.2
      </p>
    </sec>
    <sec id="sec-5">
      <title>WordNet</title>
      <p>
        In order to calculate lexical semantic similarity, WordNet is
usually chosen as the source of word relations. WordNet is a
lexical database which groups English concepts into sets of
synonyms called “synsets” and constructs the hierarchical
structure to connect “synsets” by means of hypernym/hyponym
relations. Just because of this property, WordNet is commonly
used to calculate the semantic similarity of concepts. In this paper,
the information-content (IC) based approach is used, which
measures semantic similarity between concepts based on the
notion of IC that is calculated in accordance to the probability of
encountering a concept [
        <xref ref-type="bibr" rid="ref12 ref6 ref7">6, 7, 12</xref>
        ]. The IC-based approach can be
formally defined as follows [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]:
      </p>
      <p>( 1,  2) =  2(×1)+( ( )2) (1)</p>
      <p>Here,  ( 1,  2) is the similarity between two concepts  1
and  2. LCS is the Least Common Subsumer (hypernym) of two
concepts, and IC represents the information content value of the
concepts.</p>
      <p>Note that a word may express different meaning (concept) in
different context, viz. polysemy. This paper uses the concepts
corresponding to the highest similarity between two words. At
length, given that the synset of word1 and word2 in WordNet is
Syn1 and Syn2 respectively, the similarity of two words can be
defined as follows.</p>
      <p>( 1,   ,   ) (2)
2) = 
  ∈</p>
      <p>1  ∈
(
2
3</p>
    </sec>
    <sec id="sec-6">
      <title>Methodology</title>
      <p>
        As shown in Figure 1, our research framework consists of four
phases. The first is to extract sequence structures (functional and
non-functional semantic relations) from patent documents through
natural language processing (NLP) techniques and tools. At the
second phase, the similarity between sequence structures is
measured, which takes semantic direction of each sequence
structure and the word order information of each component into
consideration. The third phase is to calculate the global
importance of each sequence structure based on the TV_LinkA
algorithm [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Finally, the similarity between patents is assessed
with the well-known optimal transportation problem solver [
        <xref ref-type="bibr" rid="ref10 ref14">10,
14</xref>
        ]. These phases are described in more details in the following
subsections.
      </p>
    </sec>
    <sec id="sec-7">
      <title>3.1 Sequence structures extraction</title>
      <p>
        Recently, Chen et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] have proposed a promising patent
information extraction framework, where two deep-learning
models are respectively used for entity identification and semantic
relation extraction. This framework can be used here to extract the
sequence structures mentioned in the patent documents. For more
elaborate and detailed descriptions, we refer the readers to Chen et
al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
    </sec>
    <sec id="sec-8">
      <title>3.2 Similarity between sequence structures</title>
      <p>
        After extracting sequence structures, each patent can be
represented by a collection of different number of sequence
structures. In this way, patent similarity calculation problem can
be transformed to compute the similarity between the collections
of sequence structures. Before this, this subsection illustrates how
to calculate the semantic similarity between two sequence
structures, as shown in Figure 2. Since each sequence structure
consists of three components: E(1) (Entity(1)), R (relation) and E(2)
(Entity(2)), the key is how to align the components from different
structures and even the words in each component.
that Wang et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] omits the word order information. Figure 3
(b)-(c) illustrates the alignment of words in the entities “car
gasoline” and “gasoline car” based on our approach, in which the
symbol “_” denotes a gap. When a word corresponds with “_”, the
resulting similarity is regarded as zero. Thus, the similarity
between two entities is the average of the similarity of the aligned
words, that is, 0.3333. Compared to Wang et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], this result
seems be more realistic and credible.
material](E)-partof(R)-[planar layers](E)” that means “insulating
material” is a whole and “planar layers” is part of it, and another
is “[seed film](E)-form(R)-[stop layer](E)” that means “stop layer” is
a whole or a product and “seed film” is part of it or the material
making of it. We can define the semantic direction of the former
as “insulating material ← planar layers”, and the latter as “seed
film → stop layer”. Hence, “insulating material” and “stop layer”
are the homogeneous components which can be considered to
match, so do “planar layers” and “seed film”. After matching the
components, we use the Needleman-Wunsch algorithm to align
words and then calculate the similarity between the aligned
components. The similarity between two sequence structures is
the average of the similarity of the aligned components.
      </p>
    </sec>
    <sec id="sec-9">
      <title>Weight estimation of sequence structures of each patent</title>
      <p>
        Base on the concept that each sequence structure carries different
amount of domain-specific information. This paper introduces a
new method to calculate the global importance of each component
of sequence structures based on TV_LinkA algorithm [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. First,
the network  (, ℰ ) is constructed, where  is the set of nodes
which consist of abstracts, sentences and components (entities and
relations), and ℰ is the set of edges. Each abstract links to the
sentences which are original from it, and each sentence links to
the components which are extracted from it. Second, the values of
sentence and component nodes are preset to 1. Third, set the
appropriate number of iterations. For each iteration, the value of
each component node is updated to the sum of the values of the
sentence nodes connected to it and the updated values are
standardized by the L2 norm. So does the value of each sentence
node. Repeat the above steps to continuously update the value of
the node until it is stable. At last, given that a terminology
occurring a few times in domain-relevant sentences is more likely
to be domain specific than another occurring many times in some
general sentences, inverse document frequency (IDF) is multiplied
the resulting value of each node.
      </p>
      <p>After that, we can obtain the global importance of each
component in all patent documents. Thus, the importance of each
sequence structure is the average of the importance of the
corresponding components. To let the weights lie from 0 to 1, for
all the sequence structures in a same patent, the weights are
normalized so that their summary is guaranteed to be equal to 1.</p>
    </sec>
    <sec id="sec-10">
      <title>3.4 Patent similarity assessment</title>
      <p>
        From the similarity matrix to the patent similarity, in order to
make full use of all the information, patent similarity
measurement problem can be transformed into the well-known
optimal transportation problem [
        <xref ref-type="bibr" rid="ref10 ref14">10, 14</xref>
        ]. Just as Figure 5, the
patent distance matrix, which can get from 1 minus patent
similarity matrix, and the weight vectors are fed to an optimal
transportation problem solver to obtain the shortest distance
between two patents. The similarity of two interested patents is
equal to 1 minus the shortest distance.
      </p>
    </sec>
    <sec id="sec-11">
      <title>4 Case Study</title>
    </sec>
    <sec id="sec-12">
      <title>4.1 Dataset</title>
      <p>
        To evaluate the performance of our methodology, an annotated
corpus1 by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is used in this work. This dataset comes from thin
film head subfield in the field of hard disk drive. It contains 1,010
patent documents. Note that, in this dataset, there are 84 pairs of
patents coming from the same patent family. That is, each pair of
patents both has the same abstract and the identical collection of
sequence structures so that they should have higher similarity than
others. These patents can be used to assess the effectiveness and
1 https://github.com/awesome-patent-mining/TFH_Annotated_Dataset
performance of our method. If a method can better identify these
84 pairs of patents, its performance should be better.
      </p>
      <p>Before comparing the sequence structures, we should judge
the semantic direction in accordance to the type of semantic
relations between the components E(1) and E(2) in a sequence
structure so that they can correctly match to the E(1) and E(2) of
another sequence structures. As shown in Table 1, we have
defined 4 types of semantic directions. If the sequence structures
are both single-direction, we can match the components E(1) and
E(2) between two sequence structures and apply Eq. (3) to
calculate the similarity, Eq. (4) otherwise.</p>
    </sec>
    <sec id="sec-13">
      <title>4.2 Experiment Setup</title>
      <p>
        In this paper, we use WordNet as the source of word relations to
calculate semantic similarity of words, but unfortunately, some
words in the dataset are not included in WordNet. To solve this
problem, we apply the “gestalt pattern matching” algorithm [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
as a supplement, which computes the similarity of two strings as
the number of matching characters divided by the total number of
characters in the two strings.
      </p>
      <p>In our methodology, there are two parameters needed to be
preset by user. The first one is the number of iterations when
calculating the weight of each sequence structure, and the second
one is the gap penalty in the Needleman-Wunsch algorithm.</p>
      <p>As for the number of iterations, one can determine whether it
is stable by observing the trend of the weights after several
iterations. Through the experiment, we find that the weights of
components gradually stabilize after 4 iterations. Thus, the
number of iterations is fixed to 10 in this article.</p>
      <p>As for the gap penalty, to assess its impact on patent
similarity, we choose multiple values for comparison, such as
0.05, -0.1, -0.15, -0.2 and -0.3. But we find that no matter which
value to choose, the word alignment, patent similarity matrix and
patent similarity will not be affected. Hence, the gap penalty is set
to -0.05 in this paper.</p>
    </sec>
    <sec id="sec-14">
      <title>4.3 Experimental results and discussions</title>
      <p>
        To verify the effectiveness and performance of our approach, the
result will be used to compare with the result of Wang et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
chosen to form 5 collections and then to judge how many of 84
pairs of patents are covered. If we select Top 1 highest similarity
of each patent, our method can obtain 54 pairs of patents that
come from a patent family when the weights are determined by
the weighting algorithm (section 3.3), while 58 pairs can be
outputted by our approach
      </p>
      <p>with the same weights. But the
DWSAO analysis can even recognize none of them. If Top 2
collection is considered, our weighted and non-weighted versions
contain 70 pairs and 78 pairs respectively, while the DWSAO
analysis only identifies 2 pairs. When we enlarge to Top 5 highest
similarity of each patent, the weighted one can identify 81 pairs
and the non-weighted one can fully recognize 84 pairs of patents
while only 3 pairs are identified by the DWSAO analysis.
method.</p>
      <p>It is no doubt that our patent similarity measurement is
significantly more accurate than the DWSAO approach. At the
meanwhile, a significant advantage of the improved semantic
analysis is that the results are not sensitive to several core
parameters. But this method with different weights does not
perform as well as the method with the equal importance. In our
opinion, the main reason is that the weighting algorithm actually
considers the importance of each sequence structure in the global
context, not the importance in the local context (i.e., each patent).
In the near future, a locally weighting method will be further
investigated.
5</p>
    </sec>
    <sec id="sec-15">
      <title>Conclusion</title>
      <p>This study proposes an improved semantic analysis for
assessing patent similarity on the basis of entities and semantic
relations (functional and non-functional relations), which takes
semantic direction of each sequence structure and the word order
information of each component into consideration. Meanwhile,
we introduce a new method to calculate the global importance of
each
sequence
structure.</p>
      <p>To
verify the
effectiveness
and
performance of the improved semantic analysis, a case study on
patent similarity measurement related to thin film head subfield in
the field of hard disk drive was used. Extensive experimental
results demonstrate that our patent similarity measurement is
significantly more accurate. Meanwhile, a significant advantage
of the improved semantic analysis is that the results are not
sensitive to several core parameters. But this
method
with
different weights does not perform as well as the method with the
equal importance. In our opinion, the main reason is that this
weighting process actually considers the importance of each
sequence structure in the global context, not the importance in the
local context (i.e., each patent). In the near future, a locally
weighting method will be further investigated.</p>
    </sec>
    <sec id="sec-16">
      <title>ACKNOWLEDGMENTS</title>
      <p>This research received the financial support from the Social
Science Foundation of Beijing Municipality under grant number
17GLB074, and</p>
      <sec id="sec-16-1">
        <title>Natural Science Foundation of Guangdong</title>
      </sec>
      <sec id="sec-16-2">
        <title>Province (Grant Number 2018A030313695).</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>An</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mortara</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Deriving technology intelligence from patents: Preposition-based semantic analysis</article-title>
          .
          <source>Journal of Informetrics</source>
          ,
          <volume>12</volume>
          (
          <issue>1</issue>
          ),
          <fpage>217</fpage>
          -
          <lpage>236</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.joi.
          <year>2018</year>
          .
          <volume>01</volume>
          .001
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Bergmann</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Butzke</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walter</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fuerste</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moehrle</surname>
            ,
            <given-names>M. G.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Erdmann</surname>
            ,
            <given-names>V. A.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Evaluating the risk of patent infringement by means of semantic patent analysis: the case of DNA chips</article-title>
          . R&amp;
          <string-name>
            <given-names>D</given-names>
            <surname>Management</surname>
          </string-name>
          ,
          <volume>38</volume>
          (
          <issue>5</issue>
          ),
          <fpage>550</fpage>
          -
          <lpage>562</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Lei</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>A deep learning based method for extracting semantic information from patent documents</article-title>
          .
          <source>Scientometrics.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J. Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>An SAO-based text mining approach to building a technology tree for technology planning</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>39</volume>
          (
          <issue>13</issue>
          ),
          <fpage>11443</fpage>
          -
          <lpage>11455</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2012</year>
          .
          <volume>04</volume>
          .014
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Zha</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Study on early warning of competitive technical intelligence based on the patent map</article-title>
          .
          <source>Journal of Computers</source>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ).
          <source>doi:10.4304/jcp.5.2</source>
          .
          <fpage>274</fpage>
          -
          <lpage>281</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>1998</year>
          ).
          <article-title>An information-theoretic definition of similarity</article-title>
          .
          <source>In Proceedings of the 15th International Conference on Machine Learning</source>
          , pp.
          <fpage>296</fpage>
          -
          <lpage>304</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>J. J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Conrath</surname>
            ,
            <given-names>D. W.</given-names>
          </string-name>
          (
          <year>1997</year>
          ).
          <source>Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. arXiv: Computation and Language.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Needleman</surname>
            ,
            <given-names>S. B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Wunsch</surname>
            ,
            <given-names>C. D.</given-names>
          </string-name>
          (
          <year>1970</year>
          ).
          <article-title>A general method applicable to the search for similarities in the amino acid sequence of two proteins</article-title>
          .
          <source>Journal of Molecular Biology</source>
          ,
          <volume>48</volume>
          (
          <issue>3</issue>
          ),
          <fpage>443</fpage>
          -
          <lpage>453</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Identifying patent infringement using SAO based semantic technological similarities</article-title>
          .
          <source>Scientometrics</source>
          ,
          <volume>90</volume>
          (
          <issue>2</issue>
          ),
          <fpage>515</fpage>
          -
          <lpage>529</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Rachev</surname>
            ,
            <given-names>S.T.</given-names>
          </string-name>
          (
          <year>1998</year>
          ). In L. Ruschendorf (Ed.),
          <article-title>Mass transportation problems: Volume I: Theory (probability and its applications)</article-title>
          . New York, NY: Springer.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Ratcliff</surname>
            ,
            <given-names>J. W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Metzener</surname>
            ,
            <given-names>D. E.</given-names>
          </string-name>
          (
          <year>1988</year>
          ).
          <article-title>Pattern-matching-the gestalt approach</article-title>
          .
          <source>Dr Dobbs Journal</source>
          ,
          <volume>13</volume>
          (
          <issue>7</issue>
          ),
          <fpage>46</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Resnik</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>1995</year>
          ).
          <article-title>Using information content to evaluate semantic similarity in a taxonomy</article-title>
          .
          <source>International Joint Conference on Artificial Intelligence</source>
          ,
          <fpage>448</fpage>
          -
          <lpage>453</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qiao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Measuring patent similarity with SAO semantic analysis</article-title>
          .
          <source>Scientometrics</source>
          ,
          <volume>121</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11192-019-03191-z
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>An</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>A novel method for topic linkages between scientific publications and patents</article-title>
          .
          <source>Journal of the Association for Information Science and Technology</source>
          ,
          <volume>70</volume>
          (
          <issue>9</issue>
          ),
          <fpage>1026</fpage>
          -
          <lpage>1042</lpage>
          . doi:
          <volume>10</volume>
          .1002/asi.24175
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qiao</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>A novel approach for measuring chinese terms semantic similarity based on pairwise sequence alignment</article-title>
          .
          <source>In Proceedings of the 5th International Conference on Semantics, Knowledge and Grid</source>
          (pp.
          <fpage>92</fpage>
          -
          <lpage>98</lpage>
          ). IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>A delimiter-based general approach for Chinese term extraction</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          ,
          <volume>61</volume>
          (
          <issue>1</issue>
          ),
          <fpage>111</fpage>
          -
          <lpage>125</lpage>
          . doi:
          <volume>10</volume>
          .1002/asi.21221
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Yoon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Identifying rapidly evolving technological trends for R&amp;D planning using SAO-based semantic patent networks</article-title>
          .
          <source>Scientometrics</source>
          ,
          <volume>88</volume>
          (
          <issue>1</issue>
          ),
          <fpage>213</fpage>
          -
          <lpage>228</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11192-011-0383-0
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Savransky</surname>
            ,
            <given-names>S.D.</given-names>
          </string-name>
          (
          <year>2000</year>
          )
          <article-title>Engineering of creativity: Introduction to TRIZ methodology of inventive problem solving</article-title>
          .
          <source>Boca Raton</source>
          , FL: CRC Press.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>