=Paper=
{{Paper
|id=Vol-2909/paper7
|storemode=property
|title=PatentExplorer: Refining Patent Search with Domain-specific Topic Models
|pdfUrl=https://ceur-ws.org/Vol-2909/paper7.pdf
|volume=Vol-2909
|authors=Mark Buckley,Sophia Althammer,Arber Qoku
}}
==PatentExplorer: Refining Patent Search with Domain-specific Topic Models==
<pdf width="1500px">https://ceur-ws.org/Vol-2909/paper7.pdf</pdf>
<pre>
                                   PatentExplorer: Refining Patent Search
                                    with Domain-specific Topic Models

                  Mark Buckley                                       Sophia Althammer∗                                      Arber Qoku∗†
                 Siemens AG                                               TU Vienna                             German Cancer Consortium (DKTK)
              Munich, Germany                                          Vienna, Austria                                 Heidelberg, Germany
          mark.buckley@siemens.com                             sophia.althammer@tuwien.ac.at                      arber.qoku@dkfz-heidelberg.de

ABSTRACT
                                                                                                A real time database system configured to store database
Practitioners in the patent domain require high recall search so-                               content. . .
lutions with precise results to be found in a large search space.                               such that the replicas of each partition are contained on
Traditional search solutions focus on retrieving semantically sim-                              different physical storage units. . .
ilar documents, however we reason that the different topics in a                                wherein the system provides an interface for user
patent document should be taken into account for search. In this pa-                            searches for documents types including video, audio. . .
per we present PatentExplorer, an in-use system for patent search,
which empowers users to explore different topics of semantically                              Figure 1: Example (abriged) of a multi-topic patent text
similar patents and refine the search by filtering by these topics.
PatentExplorer uses similarity search to first retrieve patents for a
list of patent IDs or given patent text and then offers the ability to                       To provide an effective patent search tool under these condi-
refine the search results by their different topics using topic models                    tions we present PatentExplorer, an in-use system for patent search,
trained on the domains in which our users are active.                                     which empowers the users to explore different topics in search
                                                                                          results and refine the results by their topics. PatentExplorer uses
KEYWORDS                                                                                  similarity search for first stage retrieval and domain-specific topic
Patent search, Topic models, User interface                                               modelling for refinement of the search results. We propose topic
                                                                                          modelling for search refinement because it is typical that a patent
1    INTRODUCTION                                                                         document will deal with multiple related but orthogonal subjects.
                                                                                          For a particular information need, some but not all of these will be
The ever-increasing volume [1] and linguistic complexity of pub-
                                                                                          relevant. Therefore we combine a document level analysis (similar-
lished patent documents mean that searching for both high pre-
                                                                                          ity) with a sub-document level analysis (topic models) for patent
cision and high recall results for a given information need is a
                                                                                          search. The intention is that the user can retrieve a large set of
challenging problem. Practitioners in the patent domain require
                                                                                          semantically related patents and inspect the topic distributions of
search results of high quality [21], as they provide the input to
                                                                                          the most similar ones. In order to refine the results the user can
processes such as infringement litigation or freedom-to-operate
                                                                                          apply filters on specific topics, thereby increasing the task-specific
clearing [15, 23]. The use of machine learning and deep learning
                                                                                          relevance of the most highly ranked results.
methods for patent analysis is a vibrant research area [5, 12] with
                                                                                             This paper presents the design and user interface of the in-use
application in technology forecasting, patent retrieval [4, 19], patent
                                                                                          web application which implements this idea as well as the technical
text generation [13] or litigation analysis. There has been much
                                                                                          description. The system has been designed with a particular user
research on the patent domain language which shows that the sec-
                                                                                          persona in mind. The intended user is a patent search professional,
tions in patents constitute different genres depending on their legal
                                                                                          and therefore is familiar with patent search tools and also has deep
or technical purpose [20, 23]. We reason that patents consist of
                                                                                          knowledge of existing patent search methodologies, such as boolean
different topics contained in the different sections of the document.
                                                                                          retrieval and category filtering, as well as having broad technical
The example in Figure 1 shows how a patent in the field of data-
                                                                                          knowledge of the relevant industrial domains.
base systems can include topics such as physical storage of data
or search interfaces—for a given patent search goal one of these                          2     BACKGROUND
could be relevant while the other is not. In industrial settings it is
additionally important that search tools are particularly sensitive                       In this section we give some background about related work on
to individual companies’ domains of interest, thereby improving                           patent search tools, furthermore we introduce the methods for simi-
the quality of search results.                                                            larity search and topic models which we employ in PatentExplorer.
∗ Work done while at Siemens AG.
† Also with German Cancer Research Center (DKFZ), Heidelberg, Germany.
                                                                                          2.1     Related work
                                                                                          Patent search holds several domain-specific challenges for informa-
                                                                                          tion retrieval [15]. Furthermore serving the specific use-case setting
PatentSemTech, July 15th, 2021, online                                                    of practitioners in a company requires company-specific adapta-
© 2021 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org)        tion of the search solution. Different techniques and approaches
                                                                                          have been explored to improve and refine the search results in the
                                                                                     50
PatentExplorer: Refining Patent Search with Domain-specific Topic Models                                               PatentSemTech, July 15th, 2021, online


patent domain, ranging from query expansion [2, 16, 25] to term
selection [10]. For prior art retrieval in the CLEF-IP workshop [19],
Verma and Varma [26] demonstrate high retrieval performance by
representing a patent document by its IPC classes and computing
similarity of patents based on the IPC classes. For patent search
tools, mainly the challenge of high coverage of all published patents
is addressed with an federated approach [22] or a single access point
via text editor [7].                                                                            Figure 2: Similarity search process

2.2     Methods
2.2.1 Similarity search. Similarity search is a method for retrieval
where for a given query document, a ranked list of semantically
relevant documents is computed, as shown in Figure 2. The gen-
eral approach is to first embed the query document into a vector
representation which encodes its semantics. This representation
is then compared to the equivalent representations for each of the
known documents in the search index. The results are then sorted
by similarity score and the highest ranking results are presented to
the user. The similarity function is usually cosine similarity.
    The crucial step is to find an embedding which computes a suit-
able document representation. Different representations have been
                                                                                  Figure 3: Similarity Search in PatentExplorer entering a list
used in previous research, for instance tf-idf weighted sparse repre-
                                                                                  of patent IDs or a patent text
sentations, latent semantic indexing, or contextualised document
embeddings, for instance computed by a BERT model [9].
    Despite the semantic richness of contextualised document em-                  non-negative values into the product of smaller matrices, in this
beddings, sparse representations have been found to be competitive                case into the matrices 𝑍 and 𝐷. In each case the topic distributions
in large scale retrieval scenarios [14]. We employ tf-idf weighted                (the rows of the matrix 𝐷) can be interpreted as a document rep-
sparse representation in PatentExplorer for retrieving similar patents            resentation and thus can be compared and analysed. The index of
in the first stage. Large scale retrieval needs to use efficient indexing,        the largest value of each row of 𝐷 is interpreted as the most likely
such as algorithms for approximate nearest neighbour search [11],                 topic for that document. Topic modelling has previously been used
to avoid computing the cosine similarity scores for every document                in the patent domain, for instance for technology forecasting [23].
in the search space. Therefore we employ approximate nearest
neighbor search on the sparse representations in PatentExplorer.                  3     PATENTEXPLORER
2.2.2 Topic models. Topic models help to understand the internal                  In this section we first show the user interface of PatentExplorer
structure of large text data sets by summarising the themes which                 and give some implementation details about the architecture, the
occur in the documents [8]. Topic modelling is an unsupervised                    data and the similarity and topic models being employed in Patent-
approach (ie no labelled data is required) and can be applied to any              Explorer.
domain. The only assumptions are the distributional hypothesis,
that the frequency of occurrence of words and phrases is a good                   3.1    User interface
reflection of the strength and prevalence of themes, and the assump-              The user interaction begins with the submission of a list of patent
tion that in general documents are a mixture of several topics. The               IDs (accession numbers) or the text of a patent, as shown in Figure 3.
topic modelling process begins by converting a set of documents                   The system retrieves the text of the patents given in the list of
into a sparse term-document matrix 𝑇 containing weighted fea-                     patent IDs and creates a local copy of the text content of each of the
ture frequencies for each document. The topic modelling algorithm                 patents. How many of the patents in the list are found in the index
transforms this matrix into a pair of matrices 𝑍 and 𝐷 such that                  is indicated with "Dataset contains - documents". The user can then
                                                                                  submit the "Dataset" to the system to retrieve similar documents
                                 𝑇 ≈𝑍 ×𝐷
                                                                                  based on the similarity search.
   𝑍 , the term-topic matrix, encodes the weight of each feature with                 For each of the similar documents, the system also computes
respect to the topics and 𝐷, the document-topic matrix, contains a                their topic distribution. The distribution is displayed along with the
latent representation for each document showing which topics it                   accession number and similarity score between the query patent
belongs to.                                                                       and each similar patent, as shown in Figure 4. The most highly
   We consider two algorithms for topic modelling in this work,                   weighted words for each topic, drawn from the matrix 𝑍 , are dis-
latent Dirichlet allocation (LDA) [6] and non-negative matrix fac-                played by hovering over the bars. The figure also shows the filter
torisation (NMF) [24]. LDA is a generative model which treats                     function which the system provides to re-rank the search results
documents as a distribution over topics and topics as a distribution              according to their topics. Both positive and negative filters can be
over words. NMF is a method for decomposing large matrices of                     applied. Positive filters lead to matching documents being lifted to
                                                                             51
PatentSemTech, July 15th, 2021, online                                                                     Mark Buckley, Sophia Althammer, and Arber Qoku


              Figure 4: PatentExplorer interface for exploring and refining the topic distribution of the search results.


the top of the ranked list, negative filters lead to the matching doc-          documents. The All-Patents data set is the collection of all patents
uments being discarded from the results set. For both filter types, a           published between 2014 and 2020, which contains approximately
list of topics can be specified in the text field on the left hand side,        15 million documents. For both data sets we extract the title and
as well as two slider values. The two slider values restrict when the           abstract of the patents.
filter will match: a document matches if at least one of the chosen
topics has a weight in the topic distribution of that document of at            3.2.2 Architecture. The architecture of the system is shown in
least “min probability”. The default value is 0.1. With a max rank              Figure 5. The two main components are the similarity search and
of 𝑟 , the filter will also only match if the chosen topic is among             the topic model. Each component offers an API with one function:
the 𝑟 most highly weighted topics in the distribution for that doc-             “get-similar-ids” and “get-topic-distribution”, respectively. The “get-
ument. So if the query document is the example in Figure 1, the                 similar-ids” function receives one or more patent IDs and retrieves
user could inspect the topic distribution to find, for instance, the            the most similar documents from the search index, defined as the
topic concerning physical storage, and apply a negative filter to               cosine similarity between their representations. This is equivalent
remove it, leaving those results which have more to do with user                to finding the nearest neighbours of the query document in the
search. Finally, when the user is finished applying filters to the              representation space. The “get-topic-distribution” receives a single
search results, the results set can be downloaded as tabular data,              patent ID and computes the topic distribution for that document
preserving the filtered order and including similarity scores.                  from the previously trained topic model. The search index and the
                                                                                topic model are static resources which are not changed during run
                                                                                time. Both components retrieve the patent document content from
3.2     Technical implementation
                                                                                the database “Patent documents” directly as required, so that the
3.2.1 Data. To prepare the components of our system we collected                user must only supply document IDs.
two overlapping data sets. The source is a commercially provided
database of patent abstracts in which patents from patent offices               3.2.3 Training the topic model. The topic model is trained on the
worldwide have been translated into a consistent, English-language              Our-Portfolio data set. The documents were preprocessed to remove
form. We chose this data source in order to achieve maximum                     approximately 50 patent-specific stop words, such as “invention” or
uniformity of the input data, however PatentExplorer makes no                   “apparatus”, as well as usual English stop words. We performed stem-
strong assumptions about the content of the documents, and would                ming and then extracted all n-grams for 𝑛 = 1, 2, 3, 4 to construct
also work on publicly available patent data. The Our-Portfolio data             the term-document matrix. We discarded words which occurred in
set contains the patents whose assignee is our company or its                   fewer than 10 documents or in more than 40% of the documents.
subsidiaries. It contains 73k documents. We filtered this data set                 In preliminary experiments we used a coherence metric to in-
to only contain patents filed since 2010, resulting in a set of 36k             vestigate the optimal parameters for the topic model. In recent
                                                                           52
PatentExplorer: Refining Patent Search with Domain-specific Topic Models                                                      PatentSemTech, July 15th, 2021, online


                                                                                embedding for each of the 15m documents in the All-Patents data
                                                                                set.
                                                                                   To implement the lookup of documents given a query docu-
                                                                                ment we use Annoy1 , a library which provides approximate nearest
                                                                                neighbour search. Each document embedding is normalised before
                                                                                insertion so that the cosine similarity can be computed with the dot
                                                                                product function. The similarity component of the system provides
                                                                                an endpoint which returns the IDs of the 𝑛 most similar documents
                                                                                for some query document and some 𝑛.

                                                                                4    CONCLUSION AND FUTURE WORK
Figure 5: System architecture of PatentExplorer containing
the Similarity Search and Topic Modelling component                             In this paper we present PatentExplorer, an in-use system for patent
                                                                                search. PatentExplorer gives users the ability to retrieve similar
                                 50𝑘     100𝑘     250𝑘                          patents given a list of patent IDs or the patent text and refine their
                       NMF       0.65    0.67      0.69                         search results depending on the different topics of the patents. The
                       LDA       0.61    0.63      0.65                         topic models are tailored to the domain-specific topics of a company
                                                                                operating in the technical domain.
Table 1: Coherence scores (𝐶 𝑁 𝑃𝑀𝐼 ) for NMF and LDA across
                                                                                   Tailoring the search representation and topic models to our
three data set sizes. Each score is the average over the coher-
                                                                                domains turned out in initial user testing to offer mixed results.
ence scores for 𝑘 ∈ {5, 10, ..., 95, 100}
                                                                                Feedback from patent search experts indicates that while the sys-
years, several approaches to measure coherence have been devel-                 tem can deliver relevant results within our domains, outside of
oped based on distributional properties of word pairs over a set of             these domains it can return results with few or no relevant docu-
words [17, 18], which mostly differ in the pairwise scoring metric              ments among the ten highest ranked results. While building and
being used. A typical choice is pointwise mutual information (PMI),             testing our system we have found that the requirements of patent
which measures the strength of association between words in a                   search use cases place high demands on the accuracy of dedicated
data set within windows of a given size.                                        search tools. In order to reduce the latency of the similarity search
   We use the coherence score 𝐶 𝑁 𝑃𝑀𝐼 as proposed by Aletras and                to an acceptable level we were forced to simplify the similarity
Stevenson [3]. An 𝑁 -dimensional context vector is created for each             computation, using a compressed tf-idf representation where a con-
word 𝑤, whose elements are the normalised PMI values of 𝑤 with                  textualised document embedding may well have produced better
each of the other top words of the topic. Each word 𝑤 is then                   results. It is also crucial to provide full coverage: The dataset of
assigned the cosine similarity of its context vector and the sum of             patents which the system contains goes back to 2014, however
the other context vectors. The coherence score of the topic is the              for prior art searches, all previously published patents should be
average of all of these cosine similarities.                                    discoverable. Finally the need to update the search index continu-
   To investigate which parametrisation of topic modelling works                ously leads to considerable recurring computational load and data
best for patent text we took a sample of 513k English-language                  management tasks—this is not yet provided for.
patents from those published in 2010. We removed duplicates and                    Our future work to improve the system will include expanding
documents which were either very long or very short, leaving a set              the system architecture to efficiently handle a larger number of
of approximately 255k documents. As we show in Table 1, both LDA                documents in the search space. In the longer term we intend to
and NMF exhibit similar performance on this data set, as measured               investigate introducing more appropriate document representa-
by 𝐶 𝑁 𝑃𝑀𝐼 , with NMF discovering marginally better topics. We find             tions to be used in the search index, for instance by using a large
upon manual inspection that NMF is more robust across a wide                    language model such as BERT, or by learning the representations
range of number of topics. We therefore choose NMF to implement                 via a supervised auxiliary task.
the system. We finally use NMF with 75 topics to train the topic
model for the system on the Our-Portfolio data set.                             REFERENCES
                                                                                 [1] [n.d.]. U.S. Patent Statistics Chart. https://www.uspto.gov/web/offices/ac/ido/
3.2.4 Compiling the search index. To compile the search index we                     oeip/taf/us_stat.htm. Accessed: 2021-06-04.
must first compute an embedding for each document in the search                  [2] Bashar Al-Shboul and Sung-Hyon Myaeng. 2011. Query Phrase Expansion Using
                                                                                     Wikipedia in Patent Class Search. In Information Retrieval Technology, Mohamed
space. We use latent semantic indexing (LSI) to compute the doc-                     Vall Mohamed Salem, Khaled Shaalan, Farhad Oroumchian, Azadeh Shakery, and
ument vectors, which is the result of tf-idf vectorisation followed                  Halim Khelalfa (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 115–126.
by SVD compression [8]. Rather than computing the tf-idf weights                 [3] Nikolaos Aletras and Mark Stevenson. 2013. Evaluating Topic Coherence Us-
                                                                                     ing Distributional Semantics. In Proceedings of the 10th International Conference
from the entire All-Patents data set, we instead compute the tf-idf                  on Computational Semantics (IWCS 2013) – Long Papers. Association for Com-
weights from the Our-Portfolio data set, so that each document                       putational Linguistics, Potsdam, Germany, 13–22. https://www.aclweb.org/
                                                                                     anthology/W13-0102
embedding in the search space will encode information which is                   [4] Sophia Althammer, Sebastian Hofstätter, and Allan Hanbury. 2021. Cross-domain
relevant to our industrial domains. We then apply an SVD com-                        Retrieval in the Legal and Patent Domains: a Reproducibility Study. In Advances
pression into 200 dimensions in order to reduce the size of each                     in Information Retrieval, 43rd European Conference on IR Research, ECIR 2021.
document vector and therefore the size of the overall search index.
We use the resulting LSI projection function to compute a document              1 https://github.com/spotify/annoy

                                                                           53
PatentSemTech, July 15th, 2021, online                                                                                           Mark Buckley, Sophia Althammer, and Arber Qoku


 [5] Leonidas Aristodemou and Frank Tietze. 2018. The state-of-the-art on Intellectual               (2020). arXiv:2005.00181 https://arxiv.org/abs/2005.00181
     Property Analytics (IPA): A literature review on artificial intelligence, machine          [15] Mihai Lupu and Allan Hanbury. 2013. Patent Retrieval. Foundations and Trends®
     learning and deep learning methods for analysing intellectual property (IP) data.               in Information Retrieval 7, 1 (2013), 1–97. https://doi.org/10.1561/1500000027
     World Patent Information 55 (12 2018), 37–51. https://doi.org/10.1016/j.wpi.2018.          [16] Walid Magdy and Gareth J.F. Jones. 2011. A Study on Query Expansion Methods
     07.002                                                                                          for Patent Retrieval. In Proceedings of the 4th Workshop on Patent Information Re-
 [6] David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation.             trieval (Glasgow, Scotland, UK) (PaIR ’11). Association for Computing Machinery,
     Journal of machine Learning research 3, Jan (2003), 993–1022.                                   New York, NY, USA, 19–24. https://doi.org/10.1145/2064975.2064982
 [7] Manajit Chakraborty, David Zimmermann, and Fabio Crestani. 2021. Paten-                    [17] David Mimno, Hanna M Wallach, Edmund Talley, Miriam Leenders, and Andrew
     tQuest: A User-Oriented Tool for Integrated Patent Search. In Proceedings of                    McCallum. 2011. Optimizing semantic coherence in topic models. In Proceedings
     the 11th International Workshop on Bibliometric-enhanced Information Retrieval                  of the conference on empirical methods in natural language processing. Association
     co-located with 43rd European Conference on Information Retrieval (ECIR 2021),                  for Computational Linguistics, 262–272.
     Lucca, Italy (online only), April 1st, 2021 (CEUR Workshop Proceedings, Vol. 2847),        [18] David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. 2010. Automatic
     Ingo Frommholz, Philipp Mayr, Guillaume Cabanac, and Suzan Verberne (Eds.).                     evaluation of topic coherence. In Human Language Technologies: The 2010 Annual
     CEUR-WS.org, 89–101. http://ceur-ws.org/Vol-2847/paper-09.pdf                                   Conference of the North American Chapter of the Association for Computational
 [8] Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, and                       Linguistics. Association for Computational Linguistics, 100–108.
     Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the               [19] Florina Piroi, Mihai Lupu, and Allan Hanbury. 2013. Overview of CLEF-IP
     American society for information science 41, 6 (1990), 391–407.                                 2013 Lab. In Information Access Evaluation. Multilinguality, Multimodality, and
 [9] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT:                   Visualization, Pamela Forner, Henning Müller, Roberto Paredes, Paolo Rosso, and
     Pre-training of Deep Bidirectional Transformers for Language Understanding. In                  Benno Stein (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 232–249.
     Proceedings of the 2019 Conference of the North American Chapter of the Association        [20] Julian Risch and Ralf Krestel. 2019. Domain-specific word embeddings for patent
     for Computational Linguistics: Human Language Technologies, Volume 1 (Long and                  classification. Data Technol. Appl. 53 (2019), 108–122.
     Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota,          [21] Tony Russell-Rose, Jon Chamberlain, and Leif Azzopardi. 2018. Information
     4171–4186. https://doi.org/10.18653/v1/N19-1423                                                 retrieval in the workplace: A comparison of professional search practices. In-
[10] Mona Golestan Far, Scott Sanner, Mohamed Reda Bouadjenek, Gabriela Ferraro,                     formation Processing and Management 54 (11 2018), 1042–1057. Issue 6. https:
     and David Hawking. 2015. On Term Selection Techniques for Patent Prior                          //doi.org/10.1016/j.ipm.2018.07.003
     Art Search. In Proceedings of the 38th International ACM SIGIR Conference on               [22] Mike Salampasis and Allan Hanbury. 2014. PerFedPat: An integrated federated
     Research and Development in Information Retrieval (Santiago, Chile) (SIGIR ’15).                system for patent search. World Patent Information 38 (09 2014). https://doi.org/
     Association for Computing Machinery, New York, NY, USA, 803–806. https:                         10.1016/j.wpi.2014.08.001
     //doi.org/10.1145/2766462.2767801                                                          [23] Walid Shalaby and Wlodek Zadrozny. 2019. Patent retrieval : a literature review.
[11] Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity                   Knowledge and Information Systems (2019). https://doi.org/10.1007/s10115-018-
     search with GPUs. IEEE Transactions on Big Data (2019), 1–1. https://doi.org/10.                1322-7
     1109/TBDATA.2019.2921572                                                                   [24] Rashish Tandon and Suvrit Sra. 2010. Sparse nonnegative matrix approximation:
[12] Ralf Krestel, Renukswamy Chikkamath, Christoph Hewel, and Julian Risch. 2021.                   new formulations and algorithms. (2010).
     A survey on deep learning for patent analysis. World Patent Information 65 (6              [25] Wolfgang Tannebaum, Parvaz Mahdabi, and Andreas Rauber. 2015. Effect of
     2021). https://doi.org/10.1016/j.wpi.2021.102035                                                Log-Based Query Term Expansion on Retrieval Effectiveness in Patent Searching,
[13] Jieh-Sheng Lee and Jieh Hsiang. 2020. PatentTransformer-2: Controlling Patent                   Vol. 9283. 300–305. https://doi.org/10.1007/978-3-319-24027-5_32
     Text Generation by Structural Metadata. arXiv:2001.03708 [cs.CL]                           [26] Manisha Verma and Vasudeva Varma. 2011. Exploring Keyphrase Extraction and
[14] Yi Luan, Jacob Eisenstein, Kristina Toutanova, and Michael Collins. 2020. Sparse,               IPC Classification Vectors for Prior Art Search., Vol. 1177.
     Dense, and Attentional Representations for Text Retrieval. CoRR abs/2005.00181


                                                                                           54

</pre>