<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">PatentExplorer: Refining Patent Search with Domain-specific Topic Models</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Mark</forename><surname>Buckley</surname></persName>
							<email>mark.buckley@siemens.com</email>
						</author>
						<author>
							<persName><forename type="first">Sophia</forename><surname>Althammer</surname></persName>
							<email>sophia.althammer@tuwien.ac.at</email>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="institution">Siemens AG Munich</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="institution">TU Vienna Vienna</orgName>
								<address>
									<country key="AT">Austria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="institution">Arber Qoku</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<orgName type="department">German Cancer Consortium (DKTK)</orgName>
								<address>
									<settlement>Heidelberg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff4">
								<orgName type="department">German Cancer Research Center (DKFZ)</orgName>
								<address>
									<settlement>Heidelberg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">PatentExplorer: Refining Patent Search with Domain-specific Topic Models</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">434EF3A9A1462B1149221FA513DCEC6C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T08:52+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Patent search</term>
					<term>Topic models</term>
					<term>User interface</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Practitioners in the patent domain require high recall search solutions with precise results to be found in a large search space. Traditional search solutions focus on retrieving semantically similar documents, however we reason that the different topics in a patent document should be taken into account for search. In this paper we present PatentExplorer, an in-use system for patent search, which empowers users to explore different topics of semantically similar patents and refine the search by filtering by these topics. PatentExplorer uses similarity search to first retrieve patents for a list of patent IDs or given patent text and then offers the ability to refine the search results by their different topics using topic models trained on the domains in which our users are active.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>The ever-increasing volume <ref type="bibr">[1]</ref> and linguistic complexity of published patent documents mean that searching for both high precision and high recall results for a given information need is a challenging problem. Practitioners in the patent domain require search results of high quality <ref type="bibr" target="#b20">[21]</ref>, as they provide the input to processes such as infringement litigation or freedom-to-operate clearing <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b22">23]</ref>. The use of machine learning and deep learning methods for patent analysis is a vibrant research area <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b11">12]</ref> with application in technology forecasting, patent retrieval <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b18">19]</ref>, patent text generation <ref type="bibr" target="#b12">[13]</ref> or litigation analysis. There has been much research on the patent domain language which shows that the sections in patents constitute different genres depending on their legal or technical purpose <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b22">23]</ref>. We reason that patents consist of different topics contained in the different sections of the document. The example in Figure <ref type="figure">1</ref> shows how a patent in the field of database systems can include topics such as physical storage of data or search interfaces-for a given patent search goal one of these could be relevant while the other is not. In industrial settings it is additionally important that search tools are particularly sensitive to individual companies' domains of interest, thereby improving the quality of search results.</p><p>A real time database system configured to store database content. . . such that the replicas of each partition are contained on different physical storage units. . . wherein the system provides an interface for user searches for documents types including video, audio. . .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 1: Example (abriged) of a multi-topic patent text</head><p>To provide an effective patent search tool under these conditions we present PatentExplorer, an in-use system for patent search, which empowers the users to explore different topics in search results and refine the results by their topics. PatentExplorer uses similarity search for first stage retrieval and domain-specific topic modelling for refinement of the search results. We propose topic modelling for search refinement because it is typical that a patent document will deal with multiple related but orthogonal subjects. For a particular information need, some but not all of these will be relevant. Therefore we combine a document level analysis (similarity) with a sub-document level analysis (topic models) for patent search. The intention is that the user can retrieve a large set of semantically related patents and inspect the topic distributions of the most similar ones. In order to refine the results the user can apply filters on specific topics, thereby increasing the task-specific relevance of the most highly ranked results.</p><p>This paper presents the design and user interface of the in-use web application which implements this idea as well as the technical description. The system has been designed with a particular user persona in mind. The intended user is a patent search professional, and therefore is familiar with patent search tools and also has deep knowledge of existing patent search methodologies, such as boolean retrieval and category filtering, as well as having broad technical knowledge of the relevant industrial domains.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">BACKGROUND</head><p>In this section we give some background about related work on patent search tools, furthermore we introduce the methods for similarity search and topic models which we employ in PatentExplorer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Related work</head><p>Patent search holds several domain-specific challenges for information retrieval <ref type="bibr" target="#b14">[15]</ref>. Furthermore serving the specific use-case setting of practitioners in a company requires company-specific adaptation of the search solution. Different techniques and approaches have been explored to improve and refine the search results in the patent domain, ranging from query expansion <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b24">25]</ref> to term selection <ref type="bibr" target="#b9">[10]</ref>. For prior art retrieval in the CLEF-IP workshop <ref type="bibr" target="#b18">[19]</ref>, Verma and Varma <ref type="bibr" target="#b25">[26]</ref> demonstrate high retrieval performance by representing a patent document by its IPC classes and computing similarity of patents based on the IPC classes. For patent search tools, mainly the challenge of high coverage of all published patents is addressed with an federated approach <ref type="bibr" target="#b21">[22]</ref> or a single access point via text editor <ref type="bibr" target="#b6">[7]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Methods</head><p>2.2.1 Similarity search. Similarity search is a method for retrieval where for a given query document, a ranked list of semantically relevant documents is computed, as shown in Figure <ref type="figure" target="#fig_0">2</ref>. The general approach is to first embed the query document into a vector representation which encodes its semantics. This representation is then compared to the equivalent representations for each of the known documents in the search index. The results are then sorted by similarity score and the highest ranking results are presented to the user. The similarity function is usually cosine similarity.</p><p>The crucial step is to find an embedding which computes a suitable document representation. Different representations have been used in previous research, for instance tf-idf weighted sparse representations, latent semantic indexing, or contextualised document embeddings, for instance computed by a BERT model <ref type="bibr" target="#b8">[9]</ref>.</p><p>Despite the semantic richness of contextualised document embeddings, sparse representations have been found to be competitive in large scale retrieval scenarios <ref type="bibr" target="#b13">[14]</ref>. We employ tf-idf weighted sparse representation in PatentExplorer for retrieving similar patents in the first stage. Large scale retrieval needs to use efficient indexing, such as algorithms for approximate nearest neighbour search <ref type="bibr" target="#b10">[11]</ref>, to avoid computing the cosine similarity scores for every document in the search space. Therefore we employ approximate nearest neighbor search on the sparse representations in PatentExplorer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.2">Topic models.</head><p>Topic models help to understand the internal structure of large text data sets by summarising the themes which occur in the documents <ref type="bibr" target="#b7">[8]</ref>. Topic modelling is an unsupervised approach (ie no labelled data is required) and can be applied to any domain. The only assumptions are the distributional hypothesis, that the frequency of occurrence of words and phrases is a good reflection of the strength and prevalence of themes, and the assumption that in general documents are a mixture of several topics. The topic modelling process begins by converting a set of documents into a sparse term-document matrix 𝑇 containing weighted feature frequencies for each document. The topic modelling algorithm transforms this matrix into a pair of matrices 𝑍 and 𝐷 such that 𝑇 ≈ 𝑍 × 𝐷 𝑍 , the term-topic matrix, encodes the weight of each feature with respect to the topics and 𝐷, the document-topic matrix, contains a latent representation for each document showing which topics it belongs to.</p><p>We consider two algorithms for topic modelling in this work, latent Dirichlet allocation (LDA) <ref type="bibr" target="#b5">[6]</ref> and non-negative matrix factorisation (NMF) <ref type="bibr" target="#b23">[24]</ref>. LDA is a generative model which treats documents as a distribution over topics and topics as a distribution over words. NMF is a method for decomposing large matrices of </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">PATENTEXPLORER</head><p>In this section we first show the user interface of PatentExplorer and give some implementation details about the architecture, the data and the similarity and topic models being employed in Patent-Explorer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">User interface</head><p>The user interaction begins with the submission of a list of patent IDs (accession numbers) or the text of a patent, as shown in Figure <ref type="figure">3</ref>. The system retrieves the text of the patents given in the list of patent IDs and creates a local copy of the text content of each of the patents. How many of the patents in the list are found in the index is indicated with "Dataset contains -documents". The user can then submit the "Dataset" to the system to retrieve similar documents based on the similarity search.</p><p>For each of the similar documents, the system also computes their topic distribution. The distribution is displayed along with the accession number and similarity score between the query patent and each similar patent, as shown in Figure <ref type="figure" target="#fig_1">4</ref>. The most highly weighted words for each topic, drawn from the matrix 𝑍 , are displayed by hovering over the bars. The figure also shows the filter function which the system provides to re-rank the search results according to their topics. Both positive and negative filters can be applied. Positive filters lead to matching documents being lifted to the top of the ranked list, negative filters lead to the matching documents being discarded from the results set. For both filter types, a list of topics can be specified in the text field on the left hand side, as well as two slider values. The two slider values restrict when the filter will match: a document matches if at least one of the chosen topics has a weight in the topic distribution of that document of at least "min probability". The default value is 0.1. With a max rank of 𝑟 , the filter will also only match if the chosen topic is among the 𝑟 most highly weighted topics in the distribution for that document. So if the query document is the example in Figure <ref type="figure">1</ref>, the user could inspect the topic distribution to find, for instance, the topic concerning physical storage, and apply a negative filter to remove it, leaving those results which have more to do with user search. Finally, when the user is finished applying filters to the search results, the results set can be downloaded as tabular data, preserving the filtered order and including similarity scores.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Technical implementation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1">Data.</head><p>To prepare the components of our system we collected two overlapping data sets. The source is a commercially provided database of patent abstracts in which patents from patent offices worldwide have been translated into a consistent, English-language form. We chose this data source in order to achieve maximum uniformity of the input data, however PatentExplorer makes no strong assumptions about the content of the documents, and would also work on publicly available patent data. The Our-Portfolio data set contains the patents whose assignee is our company or its subsidiaries. It contains 73k documents. We filtered this data set to only contain patents filed since 2010, resulting in a set of 36k documents. The All-Patents data set is the collection of all patents published between 2014 and 2020, which contains approximately 15 million documents. For both data sets we extract the title and abstract of the patents.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3.2.2</head><p>Architecture. The architecture of the system is shown in Figure <ref type="figure" target="#fig_2">5</ref>. The two main components are the similarity search and the topic model. Each component offers an API with one function: "get-similar-ids" and "get-topic-distribution", respectively. The "getsimilar-ids" function receives one or more patent IDs and retrieves the most similar documents from the search index, defined as the cosine similarity between their representations. This is equivalent to finding the nearest neighbours of the query document in the representation space. The "get-topic-distribution" receives a single patent ID and computes the topic distribution for that document from the previously trained topic model. The search index and the topic model are static resources which are not changed during run time. Both components retrieve the patent document content from the database "Patent documents" directly as required, so that the user must only supply document IDs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3.2.3</head><p>Training the topic model. The topic model is trained on the Our-Portfolio data set. The documents were preprocessed to remove approximately 50 patent-specific stop words, such as "invention" or "apparatus", as well as usual English stop words. We performed stemming and then extracted all n-grams for 𝑛 = 1, 2, 3, 4 to construct the term-document matrix. We discarded words which occurred in fewer than 10 documents or in more than 40% of the documents.</p><p>In preliminary experiments we used a coherence metric to investigate the optimal parameters for the topic model. In recent Table <ref type="table">1</ref>: Coherence scores (𝐶 𝑁 𝑃𝑀𝐼 ) for NMF and LDA across three data set sizes. Each score is the average over the coherence scores for 𝑘 ∈ {5, 10, ..., 95, 100} years, several approaches to measure coherence have been developed based on distributional properties of word pairs over a set of words <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b17">18]</ref>, which mostly differ in the pairwise scoring metric being used. A typical choice is pointwise mutual information (PMI), which measures the strength of association between words in a data set within windows of a given size.</p><p>We use the coherence score 𝐶 𝑁 𝑃𝑀𝐼 as proposed by Aletras and Stevenson <ref type="bibr" target="#b2">[3]</ref>. An 𝑁 -dimensional context vector is created for each word 𝑤, whose elements are the normalised PMI values of 𝑤 with each of the other top words of the topic. Each word 𝑤 is then assigned the cosine similarity of its context vector and the sum of the other context vectors. The coherence score of the topic is the average of all of these cosine similarities.</p><p>To investigate which parametrisation of topic modelling works best for patent text we took a sample of 513k English-language patents from those published in 2010. We removed duplicates and documents which were either very long or very short, leaving a set of approximately 255k documents. As we show in Table <ref type="table">1</ref>, both LDA and NMF exhibit similar performance on this data set, as measured by 𝐶 𝑁 𝑃𝑀𝐼 , with NMF discovering marginally better topics. We find upon manual inspection that NMF is more robust across a wide range of number of topics. We therefore choose NMF to implement the system. We finally use NMF with 75 topics to train the topic model for the system on the Our-Portfolio data set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3.2.4</head><p>Compiling the search index. To compile the search index we must first compute an embedding for each document in the search space. We use latent semantic indexing (LSI) to compute the document vectors, which is the result of tf-idf vectorisation followed by SVD compression <ref type="bibr" target="#b7">[8]</ref>. Rather than computing the tf-idf weights from the entire All-Patents data set, we instead compute the tf-idf weights from the Our-Portfolio data set, so that each document embedding in the search space will encode information which is relevant to our industrial domains. We then apply an SVD compression into 200 dimensions in order to reduce the size of each document vector and therefore the size of the overall search index. We use the resulting LSI projection function to compute a document embedding for each of the 15m documents in the All-Patents data set.</p><p>To implement the lookup of documents given a query document we use Annoy<ref type="foot" target="#foot_1">1</ref> , a library which provides approximate nearest neighbour search. Each document embedding is normalised before insertion so that the cosine similarity can be computed with the dot product function. The similarity component of the system provides an endpoint which returns the IDs of the 𝑛 most similar documents for some query document and some 𝑛.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">CONCLUSION AND FUTURE WORK</head><p>In this paper we present PatentExplorer, an in-use system for patent search. PatentExplorer gives users the ability to retrieve similar patents given a list of patent IDs or the patent text and refine their search results depending on the different topics of the patents. The topic models are tailored to the domain-specific topics of a company operating in the technical domain.</p><p>Tailoring the search representation and topic models to our domains turned out in initial user testing to offer mixed results. Feedback from patent search experts indicates that while the system can deliver relevant results within our domains, outside of these domains it can return results with few or no relevant documents among the ten highest ranked results. While building and testing our system we have found that the requirements of patent search use cases place high demands on the accuracy of dedicated search tools. In order to reduce the latency of the similarity search to an acceptable level we were forced to simplify the similarity computation, using a compressed tf-idf representation where a contextualised document embedding may well have produced better results. It is also crucial to provide full coverage: The dataset of patents which the system contains goes back to 2014, however for prior art searches, all previously published patents should be discoverable. Finally the need to update the search index continuously leads to considerable recurring computational load and data management tasks-this is not yet provided for.</p><p>Our future work to improve the system will include expanding the system architecture to efficiently handle a larger number of documents in the search space. In the longer term we intend to investigate introducing more appropriate document representations to be used in the search index, for instance by using a large language model such as BERT, or by learning the representations via a supervised auxiliary task.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Similarity search process</figDesc><graphic coords="2,319.54,83.69,237.07,71.36" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: PatentExplorer interface for exploring and refining the topic distribution of the search results.</figDesc><graphic coords="3,154.68,83.69,302.68,264.09" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: System architecture of PatentExplorer containing the Similarity Search and Topic Modelling component 50𝑘 100𝑘 250𝑘 NMF 0.65 0.67 0.69 LDA 0.61 0.63 0.65</figDesc><graphic coords="4,65.48,83.69,216.89,106.56" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">PatentExplorer: Refining Patent Search with Domain-specific Topic Models PatentSemTech, July 15th, 2021, online</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_1">https://github.com/spotify/annoy</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">U</forename><forename type="middle">S</forename></persName>
		</author>
		<ptr target="https://www.uspto.gov/web/offices/ac/ido/oeip/taf/us_stat.htm" />
		<title level="m">Patent Statistics Chart</title>
				<imprint>
			<date type="published" when="2021-06-04">2021-06-04</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Query Phrase Expansion Using Wikipedia in Patent Class Search</title>
		<author>
			<persName><forename type="first">Bashar</forename><surname>Al</surname></persName>
		</author>
		<author>
			<persName><forename type="first">-Shboul</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Sung-Hyon</forename><surname>Myaeng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Information Retrieval Technology</title>
				<editor>
			<persName><forename type="first">Mohamed</forename><surname>Vall</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Mohamed</forename><surname>Salem</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Khaled</forename><surname>Shaalan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Farhad</forename><surname>Oroumchian</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Azadeh</forename><surname>Shakery</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Halim</forename><surname>Khelalfa</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg; Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="115" to="126" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Evaluating Topic Coherence Using Distributional Semantics</title>
		<author>
			<persName><forename type="first">Nikolaos</forename><surname>Aletras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mark</forename><surname>Stevenson</surname></persName>
		</author>
		<ptr target="https://www.aclweb.org/anthology/W13-0102" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) -Long Papers. Association for Computational Linguistics</title>
				<meeting>the 10th International Conference on Computational Semantics (IWCS 2013) -Long Papers. Association for Computational Linguistics<address><addrLine>Potsdam, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="13" to="22" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Cross-domain Retrieval in the Legal and Patent Domains: a Reproducibility Study</title>
		<author>
			<persName><forename type="first">Sophia</forename><surname>Althammer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sebastian</forename><surname>Hofstätter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Allan</forename><surname>Hanbury</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval, 43rd European Conference on IR Research</title>
				<meeting><address><addrLine>ECIR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021. 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">The state-of-the-art on Intellectual Property Analytics (IPA): A literature review on artificial intelligence, machine learning and deep learning methods for analysing intellectual property (IP) data</title>
		<author>
			<persName><forename type="first">Leonidas</forename><surname>Aristodemou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Frank</forename><surname>Tietze</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.wpi.2018.07.002</idno>
		<ptr target="https://doi.org/10.1016/j.wpi.2018.07.002" />
	</analytic>
	<monogr>
		<title level="j">World Patent Information</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<biblScope unit="issue">12</biblScope>
			<biblScope unit="page" from="37" to="51" />
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Latent dirichlet allocation</title>
		<author>
			<persName><forename type="first">Andrew</forename><forename type="middle">Y</forename><surname>David M Blei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><forename type="middle">I</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><surname>Jordan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of machine Learning research</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="993" to="1022" />
			<date type="published" when="2003-01">2003. Jan (2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Paten-tQuest: A User-Oriented Tool for Integrated Patent Search</title>
		<author>
			<persName><forename type="first">Manajit</forename><surname>Chakraborty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Zimmermann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fabio</forename><surname>Crestani</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org/Vol-2847/paper-09.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 11th International Workshop on Bibliometric-enhanced Information Retrieval co-located with 43rd European Conference on Information Retrieval (ECIR 2021)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">Ingo</forename><surname>Frommholz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Philipp</forename><surname>Mayr</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Guillaume</forename><surname>Cabanac</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Suzan</forename><surname>Verberne</surname></persName>
		</editor>
		<meeting>the 11th International Workshop on Bibliometric-enhanced Information Retrieval co-located with 43rd European Conference on Information Retrieval (ECIR 2021)<address><addrLine>Lucca, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021-04-01">2021. April 1st, 2021</date>
			<biblScope unit="volume">2847</biblScope>
			<biblScope unit="page" from="89" to="101" />
		</imprint>
	</monogr>
	<note>CEUR-WS.</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Indexing by latent semantic analysis</title>
		<author>
			<persName><forename type="first">Scott</forename><surname>Deerwester</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Susan</forename><forename type="middle">T</forename><surname>Dumais</surname></persName>
		</author>
		<author>
			<persName><forename type="first">George</forename><forename type="middle">W</forename><surname>Furnas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Thomas</forename><forename type="middle">K</forename><surname>Landauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Harshman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the American society for information science</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="391" to="407" />
			<date type="published" when="1990">1990. 1990</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</title>
		<author>
			<persName><forename type="first">Jacob</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ming-Wei</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kenton</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kristina</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1423</idno>
		<ptr target="https://doi.org/10.18653/v1/N19-1423" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long and Short Papers</title>
		<meeting>the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Minneapolis, Minnesota</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="4171" to="4186" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">On Term Selection Techniques for Patent Prior Art Search</title>
		<author>
			<persName><forename type="first">Mona</forename><surname>Golestan Far</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Scott</forename><surname>Sanner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mohamed</forename><forename type="middle">Reda</forename><surname>Bouadjenek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gabriela</forename><surname>Ferraro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Hawking</surname></persName>
		</author>
		<idno type="DOI">10.1145/2766462.2767801</idno>
		<ptr target="https://doi.org/10.1145/2766462.2767801" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval</title>
				<meeting>the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval<address><addrLine>Santiago, Chile; New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="803" to="806" />
		</imprint>
	</monogr>
	<note>SIGIR &apos;15</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Billion-scale similarity search with GPUs</title>
		<author>
			<persName><forename type="first">Jeff</forename><surname>Johnson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Matthijs</forename><surname>Douze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hervé</forename><surname>Jégou</surname></persName>
		</author>
		<idno type="DOI">10.1109/TBDATA.2019.2921572</idno>
		<ptr target="https://doi.org/10.1109/TBDATA.2019.2921572" />
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Big Data</title>
		<imprint>
			<biblScope unit="page" from="1" to="1" />
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">A survey on deep learning for patent analysis</title>
		<author>
			<persName><forename type="first">Ralf</forename><surname>Krestel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Renukswamy</forename><surname>Chikkamath</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christoph</forename><surname>Hewel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Julian</forename><surname>Risch</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.wpi.2021.102035</idno>
		<ptr target="https://doi.org/10.1016/j.wpi.2021.102035" />
	</analytic>
	<monogr>
		<title level="j">World Patent Information</title>
		<imprint>
			<biblScope unit="volume">65</biblScope>
			<biblScope unit="issue">6</biblScope>
			<date type="published" when="2021">2021. 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">Jieh-Sheng</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jieh</forename><surname>Hsiang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2001.03708</idno>
		<title level="m">PatentTransformer-2: Controlling Patent Text Generation by Structural Metadata</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note>cs.CL</note>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Sparse, Dense, and Attentional Representations for Text Retrieval</title>
		<author>
			<persName><forename type="first">Yi</forename><surname>Luan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jacob</forename><surname>Eisenstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kristina</forename><surname>Toutanova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Collins</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2005.00181</idno>
		<ptr target="https://arxiv.org/abs/2005.00181" />
		<imprint>
			<date type="published" when="2020">2020. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Patent Retrieval</title>
		<author>
			<persName><forename type="first">Mihai</forename><surname>Lupu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Allan</forename><surname>Hanbury</surname></persName>
		</author>
		<idno type="DOI">10.1561/1500000027</idno>
		<ptr target="https://doi.org/10.1561/1500000027" />
	</analytic>
	<monogr>
		<title level="j">Foundations and Trends® in Information Retrieval</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="97" />
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">A Study on Query Expansion Methods for Patent Retrieval</title>
		<author>
			<persName><forename type="first">Walid</forename><surname>Magdy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">F</forename><surname>Gareth</surname></persName>
		</author>
		<author>
			<persName><surname>Jones</surname></persName>
		</author>
		<idno type="DOI">10.1145/2064975.2064982</idno>
		<ptr target="https://doi.org/10.1145/2064975.2064982" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 4th Workshop on Patent Information Retrieval</title>
				<meeting>the 4th Workshop on Patent Information Retrieval<address><addrLine>Glasgow, Scotland, UK; New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="19" to="24" />
		</imprint>
	</monogr>
	<note>PaIR &apos;11</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Optimizing semantic coherence in topic models</title>
		<author>
			<persName><forename type="first">David</forename><surname>Mimno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Edmund</forename><surname>Hanna M Wallach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Miriam</forename><surname>Talley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrew</forename><surname>Leenders</surname></persName>
		</author>
		<author>
			<persName><surname>Mccallum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the conference on empirical methods in natural language processing</title>
				<meeting>the conference on empirical methods in natural language processing</meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="262" to="272" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Automatic evaluation of topic coherence</title>
		<author>
			<persName><forename type="first">David</forename><surname>Newman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jey</forename><surname>Han Lau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Karl</forename><surname>Grieser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Timothy</forename><surname>Baldwin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="100" to="108" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Overview of CLEF-IP 2013 Lab</title>
		<author>
			<persName><forename type="first">Florina</forename><surname>Piroi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mihai</forename><surname>Lupu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Allan</forename><surname>Hanbury</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Information Access Evaluation. Multilinguality, Multimodality, and Visualization</title>
				<editor>
			<persName><forename type="first">Pamela</forename><surname>Forner</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Henning</forename><surname>Müller</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Roberto</forename><surname>Paredes</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Paolo</forename><surname>Rosso</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Benno</forename><surname>Stein</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg; Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="232" to="249" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Domain-specific word embeddings for patent classification</title>
		<author>
			<persName><forename type="first">Julian</forename><surname>Risch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ralf</forename><surname>Krestel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Data Technol. Appl</title>
		<imprint>
			<biblScope unit="volume">53</biblScope>
			<biblScope unit="page" from="108" to="122" />
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Information retrieval in the workplace: A comparison of professional search practices</title>
		<author>
			<persName><forename type="first">Tony</forename><surname>Russell-Rose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jon</forename><surname>Chamberlain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Leif</forename><surname>Azzopardi</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.ipm.2018.07.003</idno>
		<ptr target="https://doi.org/10.1016/j.ipm.2018.07.003" />
	</analytic>
	<monogr>
		<title level="j">formation Processing and Management</title>
		<imprint>
			<biblScope unit="volume">54</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="1042" to="1057" />
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">PerFedPat: An integrated federated system for patent search</title>
		<author>
			<persName><forename type="first">Mike</forename><surname>Salampasis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Allan</forename><surname>Hanbury</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.wpi.2014.08.001</idno>
		<ptr target="https://doi.org/10.1016/j.wpi.2014.08.001" />
	</analytic>
	<monogr>
		<title level="j">World Patent Information</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<date type="published" when="2014-09">2014. 09 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Patent retrieval : a literature review</title>
		<author>
			<persName><forename type="first">Walid</forename><surname>Shalaby</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wlodek</forename><surname>Zadrozny</surname></persName>
		</author>
		<idno type="DOI">10.1007/s10115-018-1322-7</idno>
		<ptr target="https://doi.org/10.1007/s10115-018-1322-7" />
	</analytic>
	<monogr>
		<title level="m">Knowledge and Information Systems</title>
				<imprint>
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Sparse nonnegative matrix approximation: new formulations and algorithms</title>
		<author>
			<persName><forename type="first">Rashish</forename><surname>Tandon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Suvrit</forename><surname>Sra</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2010">2010. 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">Effect of Log-Based Query Term Expansion on Retrieval Effectiveness in Patent Searching</title>
		<author>
			<persName><forename type="first">Wolfgang</forename><surname>Tannebaum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Parvaz</forename><surname>Mahdabi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andreas</forename><surname>Rauber</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-24027-5_32</idno>
		<ptr target="https://doi.org/10.1007/978-3-319-24027-5_32" />
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">9283</biblScope>
			<biblScope unit="page" from="300" to="305" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Exploring Keyphrase Extraction and IPC Classification Vectors for Prior Art Search</title>
		<author>
			<persName><forename type="first">Manisha</forename><surname>Verma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vasudeva</forename><surname>Varma</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="volume">1177</biblScope>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
