<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ACM
Hypertext Conference, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>DeepKEA: Employing Deep Learning Models for Keyword Extraction from Patent Documents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rima Dessi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hidir Aras</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lei Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FIZ Karlsruhe - Leibniz Institute for Information Infrastructure</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>4</volume>
      <issue>2023</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Patents are an important source for technological innovation, utilized by companies to disclose their inventions as well as protect intellectual properties legally. Due to the exponential growth of available patent data, keyword extraction has become an important task for the eficient analysis and organization of patent documents. Keywords extracted from the text of a patent comprise highly relevant terms or phrases that represent the content of the patent document. Further, such terms are exploited by various patent applications such as freedom-to-operate analysis, prior-art-search, etc. Most of the existing methods are not able to extract useful terms that help to understand the core of the patent, i.e., the invention. Moreover, these approaches focus on either supervised settings, which require large amounts of training data, or unsupervised settings that cannot extract semantically meaningful keywords. To ifll these gaps, this paper proposes a weakly-supervised deep neural network model (DeepKEA) which is designed to extract terms that are closely related to the topic of a given patent document and the invention it describes. It consists of two main modules: (1) a training data generation module, (2) a deep neural network module. The experiments show that our model yields better performance than existing baselines.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;information retrieval</kwd>
        <kwd>deep learning</kwd>
        <kwd>keyword extraction</kwd>
        <kwd>patent analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Intellectual property (IP) rights play an important role in the creation, dissemination, and use
of new knowledge for further technological innovation. Patent documents which are complex,
heterogeneous, and lengthy in nature, contain scientific, technical, legal, and business-relevant
information. The so-called full text of a single patent document consists of a title, an abstract,
claims, and a detailed description. While the description part describes the embodiment of
the invention, its use, and the benefits it ofers for target applications, the claims give a clear
definition of what the patent legally protects, i.e. they define the scope and boundaries of an
invention for the purpose of legal protection. In order to deal with steadily growing patent
data, researchers started to employ AI-based approaches to support experts in patent retrieval
and analysis processes. One crucial task for patent searchers is to find the right information
that can be used to support business-critical decisions. To seek such crucial information and
explore patent data fast and eficiently, various automatic keyword extraction methods have
been developed [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ].
      </p>
      <p>
        Recently, several supervised approaches have been proposed [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ], however, they require a
large amount of labeled data. Obtaining labeled data is an expensive and time-consuming task.
On the other hand, the most prominent unsupervised approaches employ bag-of-words (BoW),
graph-based, and topic-modeling techniques to perform the keyword extraction task. The BoW
methods rely on measures such as Term Frequency-Inverse Document Frequency (TF-IDF) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
and they do not require any training data. However, these methods cannot capture the semantic
meaning of the text and the extracted keywords. The graph-based approaches [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] model the
text into graphs and select the most scored nodes as keywords; however their performance drop
with an increasing number of extracted keywords. Finally, Latent Dirichlet Allocation (LDA) is
a popular topic modeling technique which does not require any training data, yet the extracted
keywords with this technique are too general to convey the meaning of the text [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Therefore,
the keywords extracted via statistical or hybrid methods [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] cannot fulfill the requirement of
the patent experts.
      </p>
      <p>In this study, we present our preliminary work toward a novel patent keyword extraction
model (DeepKEA) based on deep neural networks. DeepKEA starts by extracting noun phrases
from the abstracts and claims of patent documents. These extracted noun phrases serve as
initial candidates for relevant terms. To refine these phrases expert validation is employed. In
other words, each extracted noun phrase is reviewed and validated by a patent expert internally.
It is important to note that this study focuses on the abstracts and claims, and the exploration
of noun phrases within patent descriptions will be addressed in future work. In the second step,
DeepKEA uses the extracted keywords and their corresponding original patent text to train a
deep neural network. Finally, the trained model allows to extract a list of keywords for a given
arbitrary patent document.</p>
      <p>Overall, the main contributions of the paper are:
• A pipeline to generate training data for the patent keyword extraction task,
• A neural network architecture for generating embeddings of patent documents and
keywords,
• The adoption of an approximate nearest neighbor search for eficient keyword extraction
based on dense vectors.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Keyword Extraction from Patents (DeepKEA)</title>
      <p>Problem Formulation. Given an input patent document , which contains a set of noun
phrases d = {1, 2, ..., }, the goal is to output the most relevant top- noun phrases as
keywords ′ = {1′, 2′, ..., ′ }, where ′ ⊆ .</p>
      <p>Overview. The general workflow of DeepKEA is shown in Figure 1. There are two main
modules of the proposed workflow: (1) the Training Data Generation Module, and (2) the Deep
Neural Network Module. Section 2.1 and Section 2.2 provide a detailed description of each module
and the feature sets that have been utilized by each module.</p>
      <sec id="sec-2-1">
        <title>2.1. Training Data Generation Module</title>
        <p>The training data generation module is responsible for assigning meaningful noun phrases
present in the abstracts and claims as keywords to the respective patent documents. The noun
phrases are identified by employing an NLP pipeline using Spacy 1. These extracted noun phrases
are validated and refined internally by patent experts. This examination involves the reviewing
of keywords based on the content of the corresponding patent document. This curation process
ensures that the extracted noun phrases align with the content of the patent document. Further,
they serve as a set of keywords specifically assigned to the corresponding patent documents for
the training phase.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Deep Neural Network Module</title>
        <p>
          The second module of the workflow involves the training and serving of a deep neural network
model. We have adapted an approach which uses deep learning for recommendation systems [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
to our keyword extraction task, where we consider keyword extraction as extreme multi-class
classification. The prediction problem is to classify a specific keyword (as a class) among all
keyword classes based on the features of a patent document. For this study, we have used
the title, abstract and claim parts of the patent text. Three diferent embedding vectors, i.e.,
Title Embedding, Abstract Embedding and Claim Embedding are utilized as the input to the
model (see Fig. 1). Each input embedding is obtained by exploiting the Sentence Transformers
with BERT for Patents2 which has been trained by Google on over 100M patents. Given the
concatenation of these input embeddings, two fully connected hidden layers are added on top,
the output of which can be thought of as the embeddings of patent documents. Based on that,
the softmax layer outputs a probability distribution over all keyword classes, each of which can
be thought of as a separate keyword embedding. In training, a cross-entropy loss is minimized
with gradient descent on the output of the softmax. At the end of the training stage, the model
produces two separate sets of output embeddings: one for patent documents and the other for
keywords. Those can be thought of as semantic representations of the patent documents and
keywords, respectively.
        </p>
        <sec id="sec-2-2-1">
          <title>1https://spacy.io/ 2https://huggingface.co/anferico/bert-for-patents</title>
          <p>At serving time, we need to compute the most likely  classes (keywords) for each patent
document based on the generated patent and keyword embeddings. However, scoring a large
amount of keywords is expensive. Therefore, to address the challenge of computational expense,
an approximate nearest neighbor lookup based on inner-product is performed to eficiently
generate the top- keywords, for which Faiss3, a library for eficient similarity search of dense
vectors, is used.</p>
          <p>Finally, when confronted with a new patent document in the serving stage, the workflow
involves creating embeddings for the document’s title, abstract, and claim sections using the
sentence transformer with BERT for patents, as previously described. Subsequently, the trained
deep neural network, DeepKEA takes these embeddings as input and generates the embedding
of the given patent document. Utilizing inner-product similarity top-N keywords are identified
by referencing precomputed keyword embeddings.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Results</title>
      <p>This section provides a description of the dataset and the baselines, followed by the experimental
results and a comparison to the baseline approaches.</p>
      <sec id="sec-3-1">
        <title>3.1. Dataset and Baselines</title>
        <p>For the evaluation of the proposed model, we gathered internally a dataset consisting of 9,664
unique patents. The patent documents are utilized as input to the proposed workflow. Each
patent document is paired with its keywords that are extracted from the abstracts and claims by
applying the first step of the workflow, i.e., training data generation. The extracted noun phrases
are then validated by experts. Further, the keywords that appeared less than 100 times in the
entire dataset are filtered out. After applying the first step the average number of extracted
keywords per patent is 13.71, resulting in a total of 14,957 unique keywords. The dataset4 is
organized into pairs consisting of keywords and their corresponding patent documents. Finally,
the data is split into 13,586 and 1,371 pairs as train and test data, respectively. The test data
consists of 100 unique documents and their corresponding keywords.</p>
        <p>To evaluate the performance of DeepKEA, two diferent baselines are selected:
• TF-IDF is a standard baseline due to its simplicity and efectiveness. It is applied on
noun phrases of the patent documents and based on the tf-idf score top- keywords are
assigned.
• BERT for Patents exploits the vector similarity between documents and keywords. Each
document and its keywords (noun phrases present in the document) are firstly converted
into their vector representations with the help of sentence transformers with BERT for
Patents. Based on the vector similarity between the document and keywords, top-
keywords are assigned.</p>
        <sec id="sec-3-1-1">
          <title>3https://faiss.ai/ 4https://github.com/rima-turker/Patent-Keyword-Extraction.git</title>
          <p>0.678
0.572
0.313</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Evaluation</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and Future Work</title>
      <p>In this preliminary study, we present DeepKEA, a deep neural network model for extracting
highly relevant keywords (noun phrases) from patent documents. First, the training data is
generated by leveraging the abstracts and claims of patent documents. This initial set of noun
phrases is then subjected to validation and filtering by domain experts. Second, the training
data is used to train a deep neural network for obtaining the embeddings of documents and
keywords. Finally, the model assigns keywords to individual (unseen) documents by applying
an approximate nearest-neighbor search based on dense vectors. The experimental results
showed that DeepKEA outperforms the baselines. As for future work, we aim to (1) improve
the training data generation module by exploiting additional semantic information as well as
description part, (2) improve the deep neural network module by including more structured
features of patent documents, such as CPC (Cooperative Patent Classification) codes, citations,
inventors and applicants.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Suzuki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Takatsuka</surname>
          </string-name>
          ,
          <article-title>Extraction of keywords of novelties from patent claims</article-title>
          , in: COLING,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alzaidy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Caragea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Giles</surname>
          </string-name>
          ,
          <article-title>Bi-lstm-crf sequence labeling for keyphrase extraction from scholarly documents</article-title>
          ,
          <source>in: WWW</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>Keyphrase extraction using deep recurrent neural networks on twitter</article-title>
          ,
          <source>in: EMNLP</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alzaidy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Caragea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Giles</surname>
          </string-name>
          ,
          <article-title>Bi-lstm-crf sequence labeling for keyphrase extraction from scholarly documents</article-title>
          ,
          <source>in: The world wide web conference</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          , T. Liu,
          <article-title>Keywords extraction with deep neural network model</article-title>
          ,
          <source>Neurocomputing</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Salton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Buckley</surname>
          </string-name>
          ,
          <article-title>Term-weighting approaches in automatic text retrieval</article-title>
          ,
          <source>Information processing &amp; management 24</source>
          (
          <year>1988</year>
          )
          <fpage>513</fpage>
          -
          <lpage>523</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <article-title>A patent keywords extraction method using textrank model with prior public knowledge</article-title>
          ,
          <source>Complex &amp; Intelligent Systems</source>
          <volume>8</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>Patent keyword extraction algorithm based on distributed representation for patent classification</article-title>
          ,
          <source>Entropy</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Covington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Adams</surname>
          </string-name>
          , E. Sargin,
          <article-title>Deep neural networks for youtube recommendations</article-title>
          ,
          <source>in: Proceedings of the 10th ACM Conference on Recommender Systems</source>
          , ACM,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>