=Paper=
{{Paper
|id=Vol-2909/paper5
|storemode=property
|title=PatentMatch: A Dataset for Matching Patent Claims & Prior Art
|pdfUrl=https://ceur-ws.org/Vol-2909/paper5.pdf
|volume=Vol-2909
|authors=Julian Risch,Nicolas Alder,Christoph Hewel,Ralf Krestel
}}
==PatentMatch: A Dataset for Matching Patent Claims & Prior Art==
<pdf width="1500px">https://ceur-ws.org/Vol-2909/paper5.pdf</pdf>
<pre>
PatentMatch: A Dataset for Matching Patent Claims & Prior Art
                                  Julian Risch                                                                   Nicolas Alder
                         Hasso Plattner Institute                                                          Hasso Plattner Institute
                     University of Potsdam, Germany                                                    University of Potsdam, Germany
                           julian.risch@hpi.de                                                          nicolas.alder@student.hpi.de

                             Christoph Hewel                                                                      Ralf Krestel
    BETTEN & RESCH Patent- und Rechtsanwälte PartGmbB                                                      Hasso Plattner Institute
                    Munich, Germany                                                                    University of Potsdam, Germany
                 c.hewel@bettenpat.com                                                                       ralf.krestel@hpi.de

ABSTRACT                                                                                  experts, but also illustrates how experts solve this very complex
Patent examiners need to solve a complex information retrieval task                       IR-problem.
when they assess the novelty and inventive step of claims made                                In general, a patent entitles the patent owner to exclude others
in a patent application. Given a claim, they search for prior art,                        from making, using, or selling an invention. For this purpose, the
which comprises all relevant publicly available information. This                         patent comprises so-called patent claims (usually at the end of a
time-consuming task requires a deep understanding of the respec-                          technical description of the invention). These claims legally specify
tive technical domain and the patent-domain-specific language. For                        the scope of protection of the invention.To be even more precise,
these reasons, we address the computer-assisted search for prior                          the legally relevant definition can be found in the independent
art by creating a training dataset for supervised machine learning                        claims, i.e., usually in claim No. 1. Said claim 1 may be only a few
called PatentMatch. It contains pairs of claims from patent appli-                        lines long and may comprise only rather generalized terms, in order
cations and semantically corresponding text passages of different                         to keep the scope of protection as broad as possible. There may
degrees from cited patent documents. Each pair has been labeled                           be more than one independent claim, e.g., an independent system
by technically-skilled patent examiners from the European Patent                          claim 1 and an independent method claim 15. The further claims
Office. Accordingly, the label indicates the degree of semantic cor-                      are so-called dependent claims, i.e., they depend on an independent
respondence (matching), i.e., whether the text passage is prejudicial                     claim. This dependency is explicitly defined in the preamble of the
to the novelty of the claimed invention or not. Preliminary experi-                       dependent claim, e.g. by starting with: “2. The system according to
ments using a baseline system show that PatentMatch can indeed                            claim 1, wherein. . . ”. The function of dependent claims is to define
be used for training a binary text pair classifier and a dense passage                    optional features of the invention, which are preferable but not
retriever on this challenging information retrieval task. The dataset                     mandatory for the invention (e.g., “. . . wherein the light source is
is available online: https://hpi.de/naumann/s/patentmatch.                                an OLED”).
                                                                                              In order to obtain a patent, it is required that the invention as
CCS CONCEPTS                                                                              defined in the claims is new and inventive over prior art [19]. A
                                                                                          patent application therefore has to be filed at a patent office where
• Computing methodologies → Language resources; Supervised
                                                                                          it is examined on novelty and inventive step by a technically skilled
learning; • Social and professional topics → Patents; • Infor-
                                                                                          examiner. In case a patent is granted, said patent is published again
mation systems → Retrieval tasks and goals.
                                                                                          as a separate patent document. For this reason, there exists a huge
                                                                                          corpus of publicly available patent documents, i.e., published patent
KEYWORDS                                                                                  applications and patents.
patent documents, document classification, dataset, prior art search,                         As a further consequence of this huge patent literature corpus,
dense passage retrieval, deep learning                                                    the examiners usually focus their prior art search on relevant patent
                                                                                          documents. Accordingly, they try to retrieve at least one older
1    PASSAGE RETRIEVAL FROM PRIOR ART                                                     patent document that discloses the complete invention as defined
Language understanding is a very difficult task. Even more so when                        in the claims, in particular in independent claim 1. In other words,
considering technical, patent-domain-specific documents. Modern                           such a novelty-destroying document must comprise passages that
deep learning approaches come close in grasping the semantic                              semantically match with the definition of claim 1 of the examined
meaning of simple texts, but require a huge amount of training data.                      patent application. Said novelty-destroying document is manually
We provide a large annotated dataset of patent claims and corre-                          marked by an expert as “X” document in the search report issued by
sponding prior art, which not only can be used to train machine                           the patent office [17]. Any retrieved document that does not disclose
learning algorithms to recommend suitable passages to human                               the complete invention defined in claim 1 but at least renders it
                                                                                          obvious, is marked as “Y” document in the search report. Further
                                                                                          found documents that form technological background but are not
PatentSemTech, July 15th, 2021, online                                                    relevant to the novelty or inventive step of claim 1, are marked as
© 2021 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org)        “A” documents. As a consequence, only one retrieved “X” document

                                                                                     40
PatentMatch: A Dataset for Matching Patent Claims & Prior Art                                                                PatentSemTech, July 15th, 2021, online


or “Y” document is enough to refuse claim 1 and hence the patent                 relevant passages in a corpus of text documents to, e.g., decide on
application. Due to this circumstance, the search task is rather                 the novelty of the claim. In the CLEF-IP series of shared tasks, there
focused on precision than on recall. Usually, a search report issued             was a claims to passage task in 2012 [7, 21]. The shared task dataset
for an examined patent application only comprises a few (e.g., 5)                contains 2.3 million documents and 2700 relevance judgements of
cited patent documents, wherein (as far as possible) at least one                passages for training, which were manually extracted from search
document is novelty destroying (marked as “X” document).                         reports. The passages are contained in “X” documents and “Y” docu-
   Advantageously, a search report issued by the European Patent                 ments referenced by patent examiners in the search reports. Similar
Office (EPO) not only cites patent documents deemed relevant by an               passage retrieval tasks can be found in other domains as well, e.g.,
expert but also indicates for each cited document which paragraphs               passage retrieval for question answering within Wikipedia [3]. To
within the document are found to be relevant for the examined                    the best of our knowledge, the dense passage retrieval (DPR) model
claims. Figure 1 exemplifies such a search report. The EPO search                for open-domain question answering by Karpukhin et al. [12] has
report annotates each claim of the examined patent application                   not been used in the patent domain so far and we are the first to
with specific text passages (i.e., paragraphs) of a cited document.              train a DPR model on patent data, which we describe in one of our
The EPO calls this rich-format citation. Given the application with              preliminary experiments. Research in the patent domain is limited
the filing number EP18214053, a patent officer cited prior art with              for three reasons: patent-domain-specific knowledge is necessary
the publication number EP1351172A1. For example, paragraphs 27-                  to understand (1) different types of documents (patent applications,
28, 60 and 70-74 are relevant passages for assessing the novelty of              granted patents, search reports), (2) different classification schemes
claims 1 and 3 to 9 (marked by an “X” ). Furthermore, said para-                 (IPC, CPC, USPC) and (3) the steps of the patenting process (filing,
graphs are also relevant for the inventive step of claim 2 (marked               examination, publication, granting, opposition).
by an “Y” ). The search report also lists which search terms were                   In this paper, we present PatentMatch, a dataset of claims
used. In this case, it is the IPC subclass G06K.                                 from patent applications matched with paragraphs from prior art,
                                                                                 e.g., published patent documents. Professional patent examiners
                                                                                 labeled the claims with references to paragraphs that are prejudicial
2    RELATED WORK                                                                to the novelty of the claim (“X” documents, positive samples) or
Finding relevant prior art is even for well-trained experts a hard               that are not prejudicial but represent merely technical background
and cumbersome task [10]. Due to the large volume of literature                  (“A” documents, negative samples). We collected these labels from
to be considered as well as the required domain knowledge, patent                search reports created by patent examiners, resolved the claims and
officers rely on modern information systems to support them with                 paragraphs referenced therein, and extracted the corresponding
their task [18]. Nevertheless, the outcome of a prior art search,                text passages from the patent documents. This procedure resulted
either to check for patentability or validity of a patent, remains               in a dataset of six million examined claims and semantically cor-
imperfect and biased based on the patent examiner and her search                 responding (matching) text passages that are prejudicial or not
strategy [15]. In addition, different patent offices can reach different         prejudicial to the novelty of the claims. The remainder of this pa-
conclusions for the same search [19]. With this paper we hope to                 per is structured as follows: Section 3 describes the data collection
open the door to qualitatively and systematically analyse the search             and processing steps in detail and provides dataset examples and
practice particularly at European Patent Office.                                 statistics. Section 4 outlines research tasks that could benefit from
   Traditionally, related work at the intersection of information                the dataset and presents two preliminary experiments for two of
retrieval and patent analysis aims to support the experts by auto-               these tasks. Finally, Section 5 concludes with a discussion of the
matically identifying technical terms in patent documents [11] or                potential impact of the presented dataset.
keywords that relate to the novelty of claims in applications [24].
A challenge that all natural language processing applications in the             3    PATENTMATCH DATASET
patent domain have is to cope with the legal jargon and special-                 The basis of our dataset is the EP full-text data for text analytics
ized terminology, which led to the use of patent-domain-specific                 by the EPO.1 It contains the XML-formatted full-texts and publica-
word embeddings in deep learning approaches [1, 22]. Further,                    tion meta-data of all filed patent applications and published patent
patent classification is the most prominent task for the application             documents processed by the EPO since 1978. From 2012 onwards,
of natural language processing in this domain, with supervised                   the search reports for all patent applications are also included. In
deep learning approaches outperforming all other methods [16, 22].               these reports, patent examiners cite paragraphs from prior art doc-
Large amounts of labeled training data are available for this task be-           uments if these paragraphs are relevant for judging the novelty
cause every published patent document and application is classified              and inventive step of an application claim. Although there are no
according to standardized, hierarchical classification schemes.                  search reports available for applications filed before 2012, we do
   Prior art search is a document retrieval task where the goal is               not discard these older applications because their corresponding
to find related work for a given patent document or application.                 published patent documents are frequently referenced as prior art.
Formulating the corresponding search query is a research challenge               We use all available search reports to create a dataset of claims of
typically addressed with keyword extraction [8, 25, 27]. Further,                patent applications matched with prior art, more precisely, para-
there is research on tools to support expert users in defining search            graphs of cited “X” documents and “A” documents. Accordingly,
queries [23] or non-expert users in exploring the search space step              “X” citations represent positive samples and “A” citations represent
by step [14]. The task that we focus on in this paper is patent pas-
sage retrieval. Given a query passage, e.g., a claim, the task is to find        1 https://www.epo.org/searching-for-patents/data/bulk-data-sets/text-analytics

                                                                            41
PatentSemTech, July 15th, 2021, online                                                                Julian Risch, Nicolas Alder, Christoph Hewel, and Ralf Krestel


Figure 1: In this excerpt from a search report, a patent examiner cites paragraph numbers of the published patent document
EP1351172A1 for assessing the novelty of claim 1 and 3-9 of application EP18214053.


negative samples. These two categories “X” and “A” differ signifi-              Table 1: Dataset statistics: Each sample is a pair of an appli-
cantly regarding the level of semantic relevance of a given citation            cation’s claim and paragraph cited from either an “X” docu-
for a given claim. “Y” citations are not used in this work, as they             ment (positive sample) or “A” document (negative sample).
seem too close to “X” citations with regard to their level of semantic
relevance to generate a good training signal.                                                Samples                                      6,259,703
   Our data processing pipeline uses Elasticsearch for storing and                           “X” document citations                       3,492,987
searching through this large corpus of about 210GB of text data.                             “A” document citations                       2,766,716
As a first data preparation step, an XML parser extracts the full
                                                                                             Distinct patent applications                    31,238
text and meta-data from the raw, multi-nested XML files. Further,
                                                                                             Distinct cited documents                        33,195
for each citation within a search report, it extracts claim number,
                                                                                             Distinct claim texts                           297,147
patent application ID, date, paragraph number, and the type of the
                                                                                             Distinct cited paragraphs                      520,376
references, i.e., “X” document or “A” document.
   Since the search reports were written in a rather systematic, but                         Median claim length (chars)                          274
still unstructured and non-consistent way, a second parsing step                             Median paragraph length (chars)                      476
standardizes the data format of paragraph references. References
like “[paragraph 23]-[paragraph 28]” or “0023 - 28” are converted to
complete enumerations of paragraph numbers “[23,24,25,26,27,28]”.               also a sample with the same claim text with a different referenced
Furthermore, references by patent examiners comprise not only text              paragraph labeled “A” and vice versa. This balanced training set
paragraphs but also figures, figure captions, or the whole document.            consists of 347,880 samples. In this version of the dataset, different
In our standardization process, all references that do not resolve to           claim texts can have different numbers of references. The number
text paragraphs are discarded.                                                  of “X” and “A” labels is only balanced for each claim text itself.
   In the final step, we use the index of our Elasticsearch document               The second variation balances not only the label distribution but
database to resolve the referenced paragraph numbers (together                  also the distribution of claim texts. Further downsampling ensures
with the corresponding document identifiers) to the paragraph                   that there is exactly one sample with label “X” and one sample with
texts. Similarly, we resolve the claim texts corresponding to the               label “A” for each claim text. As a result, every claim in the dataset
claim numbers. Thereby, we obtain a dataset that consists of a                  occurs in exactly two samples. This restriction reduces the dataset
total of 6,259,703 samples, where each sample contains a claim                  to 25,340 samples.
text, a referenced paragraph text, and a label indicating one of                   The PatentMatch dataset is published online with example
the two types of reference: “X” document (positive sample) or “A”               code that shows how to use it for supervised machine learning, and
document (negative sample). Table 1 lists statistics of the full dataset        a description of the data collection and preparation process.2 As the
and Figure 2 exemplifies a claim text and cited paragraph texts of              underlying raw data has been released by the EPO under Creative
positive and negative samples.                                                  Commons Attribution 4.0 International Public License, we also
   We also provide two variations of the data for simplified usage              release our dataset under the same license.3 To foster comparable
in machine learning scenarios. The first variation balances the label           evaluation settings in future work, we separated it into a training
distributions by downsampling the majority class. For each sample
with a claim text and a referenced paragraph labeled “X”, there is              2 https://hpi.de/naumann/s/patentmatch
                                                                                3 https://creativecommons.org/licenses/by/4.0/

                                                                           42
PatentMatch: A Dataset for Matching Patent Claims & Prior Art                                                              PatentSemTech, July 15th, 2021, online


 Claim 1 of application EP17862550:                                            implementation uses the FARM framework and the pre-trained
 An engine for a ship, comprising: …an air supply apparatus                    bert-base-uncased model.4
 supplying the air to the cylinder wherein the air supply                         The test set accuracy on the balanced variation of the data is 54%.
 apparatus includes an auxiliary air supply member …                           On the second variation of the data, which contains exactly one “X”
                                                                               document citation and one “A” document citation per claim, the
 Paragraphs 35-37 of “X” document US5271358A:                                  accuracy on the test set is 52%. For both variations, the accuracy
 …the engine system 10 includes a second gaseous injector 57                   improvements per training epoch are small and the validation loss
 in fluid communication with the cylinder bore 16 through fuel                 stops to decrease after training for 6 epochs. It is not to our surprise
 injection port 27 in addition to the gaseous fuel injector 56…                that the task poses a difficult challenge and that a fine-tuned BERT
                                                                               model is only slightly better than random guessing. The complex
 Paragraphs 31-32 of “A” document US2016298554A1:                              linguistic patterns, the legal jargon, and the patent-domain-specific
 …gaseous fuel may be injected from gaseous fuel                               language make it sheer impossible for laymen to manually solve
 injector 38 while the air intake ports 32 are open…                           this task and therefore an interesting research challenge for future
                                                                               work.
                                                                                  A second exemplary task is dense passage retrieval (DPR). In-
Figure 2: An excerpt from a search report showing a claim                      spired by the work by Karpukhin et al. [12], we transform the
and cited paragraphs. The “X” document (positive sample)                       PatentMatch dataset into the DPR format used for open-domain
is novelty-destroying for the claim while the “A” document                     question answering. Dense passage retrieval is the first step of open-
(negative sample) is not novelty-destroying and merely con-                    domain question answering and the DPR format contains lists of
stitutes technical background.                                                 questions, where each question is accompanied with the correct
                                                                               answer, a passage that contains the answer (positive context), and
                                                                               a passage that does not contain the answer but is still semanti-
                                                                               cally similar to the question (hard negative context). We apply this
set (80%) and a test set (20%) with a time-wise split based on the             format to our scenario of matching patent claims with passages
application filing date: All applications contained in the training            from prior art, such that the claim represents the question and the
set have an earlier filing date than all applications contained in the         paragraph text from the referenced “X” document is the positive
test set (March 29th, 2017).                                                   context and the paragraph text from the referenced “A” document
                                                                               is the hard negative context. This version of the PatentMatch
                                                                               dataset contains exactly one sample with label “X” and one sample
4    PRELIMINARY EXPERIMENTS                                                   with label “A” for each claim text, which results in about 12500
Modern information retrieval systems do not solely rely on match-              triples (claim, positive, hard negative) in DPR format.
ing keywords from queries with documents. Especially for com-                     Using the dataset in DPR format, we train a DPR model, which
plex information needs, semantic knowledge needs to be incorpo-                comprises two BERT models (bert-base-uncased) [4]. One model
rated [5]. With the rise of deep learning models, as well as word              encodes patent claims while the other encodes paragraph texts
and document embeddings, improvements in grasping the semantic                 from “X” and “A” documents. As in the original DPR paper [12], we
meaning of queries and documents have been made [2]. A num-                    leverage in-batch negatives for training, which means that given
ber of related tasks aim at finding semantically related informa-              a batch with claims and paragraph texts from corresponding “X”
tion, making use of advanced semantic representations [6] and                  and “A” documents as positive and hard negative contexts, we
intelligent retrieval models [20]. Passage retrieval [13], document            use the positive context of each claim as an additional negative
clustering [9], and question answering [28] all rely on identifying            context for all other claims in the same batch. Using a batch size
semantically related information.                                              of 8, there are 8 claims in each batch, 8 positive contexts, 8 hard
   Addressing a first exemplary task, we conducted preliminary                 negative contexts, and implicitly also 7 in-batch (non-hard) negative
experiments on text pair classification with Bidirectional Encoder             contexts for each claim. The learning rate is set to 10−5 using Adam,
Representations from Transformers (BERT) [4] as a baseline system.             linear scheduling with warm-up, and a dropout rate of 0.1. Due to
The text pair classification uses the same neural network architec-            memory constraints on the GPU, we limit the claim texts to 200
ture as the next sentence prediction task: Given a pair of sentences,          tokens and the paragraph texts to 256 tokens. In our preliminary
the next sentence prediction task is to predict if the second sen-             experiment, the model achieves an average in-batch rank of 1.42
tence is a likely continuation of the first sentence. In our text pair         after training for 5 epochs, which means that the positive context is
classification scenario, given a claim text and a cited paragraph text,        ranked between second and third position out of eight on average
the task is to decide whether the paragraph corresponds to an “X”              (rank 0 corresponds to first position). Although the method does
document (positive sample) or an “A” document (negative sample).               not return perfect results, it is very useful as a tool for experts
To make this decision, the model needs to assess the novelty of the            who now need to only look at a handful of candidates instead of
claim in comparison to the paragraph. To this end, it transforms               thousands to find the right paragraph.
the input text to sub-word tokens and transforms them to their
embedding representations. These representation pass through 12
layers of bidirectional Transformers [26] and the final hidden state
of the special token [CLS] encodes the output class label. Our                 4 https://github.com/deepset-ai/FARM, https://huggingface.co/bert-base-uncased

                                                                          43
PatentSemTech, July 15th, 2021, online                                                                                 Julian Risch, Nicolas Alder, Christoph Hewel, and Ralf Krestel


5    IMPACT & CONCLUSIONS                                                                            795–798, 2015.
                                                                                                 [7] J. Gobeill and P. Ruch. Bitem site report for the claims to passage task in CLEF-IP
With this paper, we not only introduce an extensive dataset that can                                 2012. In Proceedings of the CLEF-IP Workshop, 2012.
be used to train and test systems for the aforementioned tasks, but                              [8] M. Golestan Far, S. Sanner, M. R. Bouadjenek, G. Ferraro, and D. Hawking. On
                                                                                                     term selection techniques for patent prior art search. In Proceedings of the
also provide training data for patent passage retrieval [21]: a very                                 International Conference on Research and Development in Information Retrieval
challenging search task mostly conducted by highly-trained patent-                                   (SIGIR), pages 803–806, 2015.
domain experts. The need to at least partially automate this task                                [9] A. Huang. Similarity measures for text document clustering. In Proceedings of the
                                                                                                     New Zealand Computer Science Research Student Conference (NZCSRSC), volume 4,
arises from the growing number of patent applications worldwide.                                     pages 9–56, 2008.
   And with deep learning methods requiring large training sets, we                             [10] J. A. Jeffery. Preserving the presumption of patent validity: An alternative to
                                                                                                     outsourcing the us patent examiner’s prior art search. Cath. UL Rev., 52:761, 2002.
hope to foster research in the patent analysis domain by providing                              [11] A. Judea, H. Schütze, and S. Brügmann. Unsupervised training set generation for
such a dataset. We presented a novel dataset that comprises pairs of                                 automatic acquisition of technical terminology in patents. In Proceedings of the
semantically similar texts in the patent domain. More precisely, the                                 International Conference on Computational Linguistics (COLING), pages 290–300,
                                                                                                     2014.
dataset contains claims from patent applications and paragraphs                                 [12] V. Karpukhin, B. Oguz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.-t. Yih.
from prior art. It was created based on search reports by patent                                     Dense passage retrieval for open-domain question answering. In Proceedings of
officers at the EPO. The simple structure of the dataset reduces                                     the Conference on Empirical Methods in Natural Language Processing (EMNLP),
                                                                                                     pages 6769–6781, 2020.
the amount of patent-domain knowledge required for analyzing                                    [13] M. Kaszkiel and J. Zobel. Passage retrieval revisited. In Proceedings of the
the data or using it for supervised machine learning. With the                                       International Conference on Research and Development in Information Retrieval
                                                                                                     (SIGIR), pages 178–185, 1997.
release of the dataset, we thus hope to foster research on the (semi-                           [14] T. Kulahcioglu, D. Fradkin, and S. Palanivelu. Incorporating task analysis in the
)automation of passage retrieval tasks and on user interfaces that                                   design of a tool for a complex and exploratory search task. In Proceedings of the
support experts in searching through prior art and creating search                                   Conference on Conference Human Information Interaction and Retrieval (CHIIR),
                                                                                                     page 373–376, 2017.
reports.                                                                                        [15] Z. Lei and B. D. Wright. Why weak patents? testing the examiner ignorance
   Further, we hope to spark research in analysing how patent                                        hypothesis. Journal of Public Economics, 148:43 – 56, 2017. ISSN 0047-2727.
experts search for relevant patents and, maybe more interesting,                                     doi: https://doi.org/10.1016/j.jpubeco.2017.02.004. URL http://www.sciencedirect.
                                                                                                     com/science/article/pii/S0047272717300178.
which relevant patents they miss and for what reason. By providing                              [16] S. Li, J. Hu, Y. Cui, and J. Hu. Deeppatent: patent classification with convolutional
the matched claims and paragraphs, the search process of patent of-                                  neural networks and word embedding. Scientometrics, 117(2):721–744, 2018.
                                                                                                [17] K. Loveniers. How to interpret epo search reports. World Patent Information, 54:
ficers can be analyzed and search results compared. For future work,                                 23–28, 2018.
our learned model could be used to adapt the experts’ keyword                                   [18] E. Marttin and A.-C. Derrien. How to apply examiner search strategies in
queries for higher recall and to understand the relationship between                                 espacenet. a case study. World Patent Information, 54:S33 – S43, 2018. ISSN
                                                                                                     0172-2190. doi: https://doi.org/10.1016/j.wpi.2017.06.001. URL http://www.
results from manually curated queries and (relevant) results from                                    sciencedirect.com/science/article/pii/S0172219016301089. Best of Search Matters.
deep learning models.                                                                           [19] J. Michel and B. Bettels. Patent citation analysis. a closer look at the basic input
                                                                                                     data from patent search reports. Scientometrics, 51(1):185–201, 2001.
                                                                                                [20] H. Palangi, L. Deng, Y. Shen, J. Gao, X. He, J. Chen, X. Song, and R. Ward. Deep
ACKNOWLEDGMENTS                                                                                      sentence embedding using long short-term memory networks: Analysis and
We would like to thank Sonia Kaufmann and Martin Kracker from                                        application to information retrieval. IEEE/ACM Transactions on Audio, Speech,
                                                                                                     and Language Processing, 24(4):694–707, 2016.
the European Patent Office (EPO) for their support and advise.                                  [21] F. Piroi, M. Lupu, A. Hanbury, A. P. Sexton, W. Magdy, and I. V. Filippov. CLEF-IP
                                                                                                     2012: Retrieval experiments in the intellectual property domain. In Proceedings
                                                                                                     of the CLEF-IP Workshop, pages 1–16, 2012.
REFERENCES                                                                                      [22] J. Risch and R. Krestel. Domain-specific word embeddings for patent classification.
 [1] L. Abdelgawad, P. Kluegl, E. Genc, S. Falkner, and F. Hutter. Optimizing neural                 Data Technologies and Applications, 53(1):108–122, 2019.
     networks for patent classification. In Joint European Conference on Machine                [23] T. Russell-Rose, J. Chamberlain, and F. Shokraneh. A visual approach to query
     Learning and Knowledge Discovery in Databases (ECML PKDD), pages 688–703,                       formulation for systematic search. In Proceedings of the Conference on Human
     2019.                                                                                           Information Interaction and Retrieval (CHIIR), page 379–383, 2019.
 [2] D. Cohen and W. B. Croft. A hybrid embedding approach to noisy answer passage              [24] S. Suzuki and H. Takatsuka. Extraction of keywords of novelties from patent
     retrieval. In Advances in Information Retrieval, pages 127–140, 2018.                           claims. In Proceedings of the International Conference on Computational Linguistics
 [3] D. Cohen, L. Yang, and W. B. Croft. WikiPassageQA: A benchmark collection                       (COLING), pages 1192–1200, 2016.
     for research on non-factoid answer passage retrieval. In Proceedings of the                [25] Y.-H. Tseng and Y.-J. Wu. A study of search tactics for patentability search: A case
     International Conference on Research and Development in Information Retrieval                   study on patent engineers. In Proceedings of the Workshop on Patent Information
     (SIGIR), page 1165–1168, 2018.                                                                  Retrieval (PaIR@CIKM), pages 33–36, 2008.
 [4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of                    [26] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser,
     deep bidirectional transformers for language understanding. arXiv preprint                      and I. Polosukhin. Attention is all you need. In Advances in Neural Information
     arXiv:1810.04805, pages 1–16, 2018.                                                             Processing Systems (NeurIPS), pages 5998–6008, 2017.
 [5] M. Fernández, I. Cantador, V. López, D. Vallet, P. Castells, and E. Motta. Semanti-        [27] X. Xue and W. B. Croft. Transforming patents into prior-art queries. In Proceed-
     cally enhanced information retrieval: An ontology-based approach. Journal of                    ings of the International Conference on Research and Development in Information
     Web Semantics, 9(4):434–452, 2011.                                                              Retrieval (SIGIR), pages 808–809, 2009.
 [6] D. Ganguly, D. Roy, M. Mitra, and G. J. Jones. Word embedding based generalized            [28] S. Yang, L. Zou, Z. Wang, J. Yan, and J.-R. Wen. Efficiently answering technical
     language model for information retrieval. In Proceedings of the International                   questions—a knowledge graph approach. In Proceedings of the Conference on
     Conference on Research and Development in Information Retrieval (SIGIR), pages                  Artificial Intelligence (AAAI), pages 3111–3118, 2017.


                                                                                           44

</pre>