<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Flexible tool for Cross-Collection Patent Search</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Stefania</forename><surname>Marrara</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Informatics, Systems and Communication (DISCo)</orgName>
								<orgName type="institution">University of Milano-Bicocca</orgName>
								<address>
									<addrLine>Building U14</addrLine>
									<postCode>I-20126</postCode>
									<settlement>Milano</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Gabriella</forename><surname>Pasi</surname></persName>
							<email>pasi@disco.unimib.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Informatics, Systems and Communication (DISCo)</orgName>
								<orgName type="institution">University of Milano-Bicocca</orgName>
								<address>
									<addrLine>Building U14</addrLine>
									<postCode>I-20126</postCode>
									<settlement>Milano</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Flexible tool for Cross-Collection Patent Search</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">89634C145339058FFF34EFECB5DE85E9</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T20:46+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Patent Search</term>
					<term>Fuzzy Logic</term>
					<term>Information Retrieval</term>
					<term>Flexible Query Language</term>
					<term>XML</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Prior-art retrieval is a crucial application of Patent Retrieval aimed to determine the novelty of a new invention. In this scenario patent authors require an exhaustive knowledge of all related patents and the search often involves multiple patent collections across the world, which do not share the same document structure or vocabulary. For this reason, despite of the numerous patent search applications already available, we propose in this paper PatentLight[1] a search tool that offers novel and flexible functionalities based on both fuzzy logic and IR to help users looking for relevant patents here represented as XML documents. We show some examples of the proposed search tool to inquiry the WIPO and USPTO collections in a flexible way.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Patent Information Retrieval (PIR) is a specialized branch of Information Retrieval, which is aimed to support users, often professionals such as patent attorneys or inventors, in retrieving patents that satisfy their information needs <ref type="bibr" target="#b1">[2]</ref>.</p><p>In this scenario, a crucial application is prior-art retrieval <ref type="bibr" target="#b2">[3]</ref>, which is performed by patent searchers to determine the novelty of a new invention. In fact patent authors require an exhaustive knowledge of all related patents since overlooking a single important patent could lead to detrimental and very expensive consequences, such as patent infringements and litigation.</p><p>Today patents are commonly available thanks to collections such as USPTO (United States Patent and Trade Office), EPO (European Patent Office) and WIPO (World Intellectual Property Organization).</p><p>Each collection contains several thousands of patents and continues to grow up year by year; this situation poses a serious issue to patent professionals: the cost of filing patents, defining claims and defending a claim of infringement is increasing with time, making the process often too expensive, due to the complexity in finding relevant patents. In 2010 the estimated cost to find relevant patents was $1,500 per patent filing <ref type="bibr" target="#b3">[6]</ref>.</p><p>For the above reasons Patent Retrieval stimulates an increasing interest of the scientific community, and it is also considered a complex challenging task since the vocabulary used in patents is often obscure as it contains a lot of specialized or technical words. Often the obfuscation of content is intentional by writers who wish their patents difficult to retrieve; patents contain an intrinsic structure which often include description, claims or prior-art for instance and can be different in different collections. Finally typical queries in patent retrieval include a huge amount of words, often entire claims.</p><p>Most Patent Search tools available today are collection dependent. The most known, Google Patents [4] and PatentsSearcher <ref type="bibr">[5,</ref><ref type="bibr" target="#b11">14]</ref>, are centered on the USPTO collection even if the issue of world-wide patents search is perceived. In fact PatentsSearcher claims to "rely on external services to query international patents and applications" (see www.patentsearcher.com/aboutSearch.jsp), while Google Patents includes the WIPO and EPO collections restricted to US patents only.</p><p>Most approaches presented in the literature, based on keyword extraction or query expansion techniques, proved to produce poor results (see Section 1.1).</p><p>Despite this fact, we believe that a traditional keyword-based analysis of XML patents joined with our flexible search approach can be promising with respect to both recall and precision.</p><p>The first experimental results produced by <ref type="bibr" target="#b0">[1]</ref> on the USPTO collection have motivated us to further investigate in this direction.</p><p>In this paper we present the development of our flexible tool PatentLight plus some examples obtained on more than one patent collection from different English language countries. Each document collection stores more or less the same information (i.e., abstract, author names, topic, description,etc.) but with different tree structures and tag vocabulary.</p><p>For this task our PatentLight tool <ref type="bibr" target="#b0">[1]</ref> has been improved by the introduction of the similar constraint on tag names (see Section 2.2).</p><p>The approach we propose in this paper relies on the recent outcomes of research in XML Retrieval, overcoming the weaknesses of traditional keywordbased approaches in the patents domain.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Related Work</head><p>Patent IR evolved as a separate branch of IR showing characteristics that drastically reduce the effectiveness of traditional retrieval techniques.</p><p>In the last years, several approaches have been proposed, which can be broadly classified into three categories:</p><p>-approaches based on query expansion techniques to reduce vocabulary mismatch; -approaches based on query extraction techniques to reduce verbose queries -approaches based on query translation techniques, which include approaches for querying multilingual patent collections, and approaches to query patents section by section instead as a whole document.</p><p>Because of the peculiarities of patent retrieval w.r.t traditional retrieval (as described in the Introduction), standard IR techniques such as query expansion proved not to work effectively with patent queries due to the presence of noisy terms in the typical queries.</p><p>In real practice however, most patent examiners formulate their queries for invalidating claims by selecting high frequency terms from the query-patent claim text, and hence the first approaches proposed in the literature <ref type="bibr" target="#b4">[7,</ref><ref type="bibr" target="#b5">8]</ref> moved their steps from this practice and were based on keyword extraction to reduce queries dimensions, unfortunately achieving results of low quality. More recently, <ref type="bibr" target="#b6">[9]</ref> and <ref type="bibr" target="#b7">[10]</ref> showed that using the whole patent text with raw term frequency (i.e., simple number of term occurrences in each document) reduces the job complexity and the best results are obtained when terms are taken from all the fields of the query patent.</p><p>Other approaches, <ref type="bibr" target="#b8">[11]</ref> and <ref type="bibr" target="#b9">[12]</ref>, used citation extraction to improve the retrieval effectiveness of keyword based IR methods, and this idea is also adopted in <ref type="bibr" target="#b10">[13]</ref>that also applies a query expansion technique on segmented queries.</p><p>Another important work is <ref type="bibr" target="#b11">[14]</ref> which also adopted a query expansion technique based on some structural properties of patents such as abstract, description and image descriptions.</p><p>The last class of patent retrieval approaches tries to take advantage from the multilinguality of most patent collections, which means that the same patent can be stored in more than one language. Most works are based on natural language processing (NLP) approaches <ref type="bibr" target="#b12">[15]</ref><ref type="bibr" target="#b13">[16]</ref><ref type="bibr" target="#b14">[17]</ref>. In particular, the most recent <ref type="bibr" target="#b14">[17]</ref> uses NLP, and specifically statistical word alignment to translate patent queries from language to language. More generally, query translation has been a popular research mainstream and it is usually realized by means of dictionaries, machine translation systems, ontologies or combinations of these (see <ref type="bibr" target="#b15">[18]</ref> for an overview).</p><p>Finally a nice survey on users issues and expectations associated to Patent Retrieval is <ref type="bibr" target="#b16">[19]</ref>. In this paper authors perform a deep analysis of patent users and their search requirements with respect to current IR systems and applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">FleXy: a Flexible XML Query language</head><p>In <ref type="bibr" target="#b21">[24]</ref> a flexible extension of the XQuery Full Text language (FleXy) by introducing flexible constraints on both XML document structure and content was defined.</p><p>A patent search application based on FleXy has been proposed in <ref type="bibr" target="#b0">[1]</ref> Patent-Light.</p><p>In PatentLight the structure-based constraints of Flexy named below and near, and the content-based flexible constraint around where employed. In this section we introduce the constraint similar which applies on tag names, and we show how the combination of content-based and structure-based evaluation of results can improve the effectiveness of PatentLight. Here below a short explanation of the above flexible constraints is given.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">A short description of below, near and around</head><p>The constraint below retrieves the fragments of an XML document (in this case a patent) that are closer to the path required by the user's query. The syntax of the below constraint follows the standard XQuery axis syntax, and it is specified as: c/below::t, where c is the context node, and t is the target node. The best retrieved path is the one in which t is direct child of c. Others paths, those in which t is simply descendant of c, will be retrieved but ranked in a lower position w.r.t. the best one. To create the list of results we compute a path relevance degree for each retrived fragment, w c,t , computed as w c,t = 1 |desc arc(c,t)| where desc arc(c,t) is a function that returns the set of descending arcs from c to t if and only if t is a descendant node of c.</p><p>The flexible constraint near retrieves elements that are connected to the context node by any path (not only the descendant relationship), i.e., also ancestor and sibling elements are evaluated. For the near constraint, the scoring function is defined as: w c,t = 1 |arcs(c,t)| where arcs(c,t) is the function that returns the set of arcs that connects the context node c to the target node t following the shortest path. The near constraint syntax is: c/near::t, where, as for the constraint below, c is the context node and t is the target node.</p><p>Around is a flexible constraint which applies to numerical data and its evaluation function is formally defined as the membership function of a fuzzy subset on the considered numerical domain; the membership function expresses the similarity between the retrieved values and the numerical value requested by the user. In the patent domain, the constraint around is defined to the aim of analyzing date contents.</p><p>The FleXy syntax of around is 'tag-date/@date[x around b]', where tag-date is the attribute having the date value that has to be evaluated, x is the date value of the examined patent, and b is the date written by the user in his/her query.</p><p>The evaluation function of the around constraint produces a score in the interval [0,1] based on the date value b specified by the user and the date value x of the patent. Patents with a date value close to the one specified by the user will receive a higher score (score close to 1) than other patents. The evaluation function of the flexible constraint around on the Date domain can be defined as fuzzy subset with a triangular membership function centered on b.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">The similar constraint to assess tag similarity</head><p>Similar is a flexible constraint defined on tag names that allows to retrieve fragments with a target node name similar to the name used in the user query.</p><p>Similar is defined as a function whose FleXy syntax is 'similar(x)', where x is the node name we are looking for. The evaluation of the function returns a list of XML fragments with a target node name similar to x where the similarity degree is number in the interval [0,1] computed as ws = 1 1+ed with ed = edit distance between the retrieved tag name and x.</p><p>Fig. <ref type="figure">1</ref> shows how the similar constraint works on two document fragments, the left one from the USPTO collection, the right one from the WIPO collection.</p><p>Although the query Q1 is looking for a fragment containing the tag name last-name, the system is able to retrieve also the patent fragment containing the tag name orgname with a similarity degree of 0.16.</p><p>Note that traditional XML query languages would not retrieve the second fragment in the same situation.</p><p>Moreover, to avoid the retrieval of unuseful fragments we can set a threshold value for ws. At present we do not evaluate synonyms since this option would include the use of a dictionary or an ontology. When a query involves more that one flexible constraint, for instance a flexible axis and the similar constraint for the target node name, the overall relevance degree wo c,t is computed as a combination between the two scores, w c,t and ws. In principle we prefer a conservative evaluation and therefore we use wo c,t = min(w c,t , ws) but different solutions could be tested.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">How the flexible constraints works in patent search</head><p>This section explains how the flexible constraints introduced in the previous section are used to exploit the patent search task. In particular, the proposed approach has been defined and tested in <ref type="bibr" target="#b0">[1]</ref> on the USPTO patent collection that can be freely downloaded on the web. USPTO is the corpus adopted by most patent search applications such as Google Patents and PatentsSearcher. Google Patents has been recently extended to include the WIPO and EPO collections but search is restricted to US patents only.</p><p>In any case it was noted that also EPO and WIPO patent documents show more or less the same structure of USPTO, even if with different tags. In this paper we use the similar constraint to extend our tests to a cross collections composed by USPTO and WIPO.</p><p>The proposed approach allows users to search patents using the formulation of a keyword based query, in addition the user can choose the similar tag option to extend the search also to tag names similar to the standard ones. Subsection 3.1 describes how search results are categorized according to keyword-based queries, while subsection 3.2 shows how the similar constraint evaluation changes the original approach in <ref type="bibr" target="#b0">[1]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Keyword-based Query: the approach</head><p>An important functionality of FlexSearch <ref type="bibr" target="#b0">[1]</ref> is to categorize patents by exploiting their XML structure. The engine organizes the XML patents into meaningful semantic XML elements covering the main patent information. In this way the categorization process described below can easily capture what the user topical search intent is by identifying the possible interpretations associated with a patent.</p><p>By analyzing the patents in the USPTO collection, four categories were identified in <ref type="bibr" target="#b0">[1]</ref>: People, Title, Description, and Claims.</p><p>The same categories are here adopted for the cross-collection due to the structural similarity between the WIPO and the USPTO collections.</p><p>Formally, let E be the set of XML elements defined in a patent collection, and Cat be the set of categories, then one or more elements e i ∈ E are mapped into each category c ∈ Cat, i.e {e 1 , ..., e m } → c. In the application the four identified categories along with the corresponding XML elements are: People (the associated elements are Applicants, Agents, Assignee, Examiners), Title (title), Description (Description), Claims (claims).</p><p>A user specified keyword based query (here below "query terms") is automatically rewritten into four distinct FleXy queries, one for each of the four categories. The structure of each query is predefined in order to search the query terms in pre-established elements as follows: The proposed query translation process uses the near constraint in the FleXy query related to the category People, and the context node is the tag applicants; this means that we assume that the applicant role (i.e., the inventor) has more importance in the search with respect to the other roles defined in the patent such as Agent, Examiner. This choice has been motivated to be coherent with respect to standard patent search applications (i.e., Google Patents, PatentSearcher, etc.).</p><p>In case of user queries formulated by the standard textual search area, where a user writes a name of a person it is supposed that he/she is interested in finding inventors of patents. However, it is important to notice that by the approach also patents containing the name with a different role will be retrieved.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">The evaluation of similar in PatentLight</head><p>In this work we improve the PatentLight engine by introducing the possibility to retrieve also fragments with different tag names w.r.t. those expressed by the query.</p><p>This feature is useful when we inquiry collections of which we roughly know the internal structure (tag names and node positions) or when we want to apply the same query to a composition of patent collections that contain more or less the same information stored in different tag nodes (for instance the node orgname instead of the node Last-Name).</p><p>In the engine, if the user chose to add the similar tag evaluation (in the prototype the user just flags the option in the interface), the set of FleXy queries would change accordingly as shown below: Note that the similar constraint is applied to the relevant nodes of each query and therefore the categories People and Claims will contain two queries instead of one.</p><p>The retrieved fragments are ranked according to two values: the degree of structural relevance based on the evaluation of FleXy constraints (wo c,t ), and the degree of relevance obtained by the full-text scoring of the XQuery Full Text language (the prototype in <ref type="bibr" target="#b0">[1]</ref> uses the BaseX system <ref type="bibr" target="#b22">[25]</ref>). The approach privileges the structural ranking w.r.t. the content based relevance since it was observed that the paragraphs most related to the invention are usually structurally closer to the tag Description.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">PatentLight development and preliminary evaluation</head><p>Aim of this work was to evaluate the retrieval capabilities of the flexible language FleXy when applied to patent collections, with particular emphasis on the similar constraint able to execute the same queries on different tag vocabularies.</p><p>For this reason PatentLight has been developed on top of the BaseX system <ref type="bibr" target="#b22">[25]</ref>, inheriting its indexing system and query execution engine. In this set of tests we did not use any known collection as the MAREC collection<ref type="foot" target="#foot_0">1</ref> but we created our test collection with XML patents from the USPTO and WIPO collections published in a small time slot (i.e., from 2015-01-01 to 2015-01-15) in order to have heterogeneous document structures and tag names. The final collection is composed by 146.413 XML patents, 82.800 from the WIPO collections and 63.613 from the USPTO collection. The architecture of the system is depicted in Figure <ref type="figure" target="#fig_3">2</ref>. The main module is the BaseX Query engine, which is in charge of the collection indexing process and querying. During the querying process each class query is executed independently with the support of a dictionary for the similar constraint evaluation. One of the main characteristics of the approach is that each query produces a set of results, one for each class (People, Title, Descriptions, Claims), which are not merged. For each result two scores are computed, the overall relevance degree wo (see Section 2.2) and the degree of relevance obtained by the full-text scoring of the XQuery Full Text language as implemented by BaseX. The ranking module reorganizes each class of results by first considering wo and next the degree of content-based relevance as explained in Section 2.2.</p><p>To the aim of exploring the usefulness of FleXy on the patent collection we performed several different searches, the most interesting are shown in Figure <ref type="figure" target="#fig_4">3</ref>. For each search we started with a very simple query, then we refined it with one more keyword or in one case two. In most cases three keywords were enough to achieve satisfactory results, i.e., a not so large number of retrieved results in each class without loosing any of the relevant documents found with the less specific query.</p><p>In this set of trials we wanted to compare PatentLight and Google Patents w.r.t the prior-art retrieval task. In this task users really need to find the highest number of relevant patents as possible; in our preliminary evaluations we have carefully checked the first 50 results for each search, and compared them with the results provided by Google Patents for the same query. Figure <ref type="figure" target="#fig_4">3</ref> shows the number of retrieved patents for each query, in parentheses the number of relevant patents found within the first 50 results.  on the document section where the query keywords were found, while Google Patents has no classification system and results appear all together in a single list.</p><p>As shown in Figure <ref type="figure" target="#fig_4">3</ref>, in most cases PatentLight retrieved the same number or a higher number of relevant patents w.r.t. Google Patents within the first 50 results. Moreover the results classification of PatentLight, with in average short lists, was really useful to easily find the relevant documents and discard the unuseful classes as a whole.</p><p>See for instance the first query. In this example we were looking for patents about kettle bells. Google Patents found only 4 patents, none relevant. Patent-Light found 5 patents with the word "bell" in the title, one was relevant, 64 with "bell" as person name and hence it was not necessary to check this class of results, 98 with "bell" in the claims sections and 964 in the descriptions. The mere addition of the word "kettle" drastically reduced the number of retrieved results also in the sections "claims" and "description", but the relevant documents were found anyway.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusions and Future Work</head><p>In this paper we have described the development and preliminary evaluation of PatentLight on a collection of English patents with dishomogeneous structures. The peculiarity of PatentLight is to allow users to specify flexible constraints in their queries. Future work will study the evaluation of synonyms for the tag names used in the queries of the four categories.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>. 16 Fig. 1 .</head><label>161</label><figDesc>Fig. 1. The flexible constraint similar</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>People: applicants/near::Last-Name[ text() contains text "query terms"] Title: invention-title[ text() contains text "query terms"] Descriptions: Description/below::p[ text() contains text "query terms"] Claims: claims/below::claim-text[ text() contains text "query terms"]</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>People: similar(applicants)/near::Last-Name[ text() contains text "query terms"] applicants/near::similar(Last-Name)[ text() contains text "query terms"] Title: similar(invention-title)[ text() contains text "query terms"] Descriptions: similar(Description)/below::p[ text() contains text "query terms"] Claims: similar(claims)/below::claim-text[ text() contains text "query terms"] claims/below::similar(claim-text)[ text() contains text "query terms"]</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. PatentLight architecture</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Preliminary comparative evaluation of the produced results.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>PatentLight presents each query results divided into four classes (i.e., People, Title, Claims, and Descriptions) depending</figDesc><table><row><cell></cell><cell cols="2">Patent Light</cell><cell></cell><cell></cell><cell>Google</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>Patents</cell></row><row><cell>Query</cell><cell cols="2">People Title</cell><cell>Claims</cell><cell cols="2">Descriptions No class</cell></row><row><cell>Q1: «Bell»</cell><cell>64(0)</cell><cell>5(1)</cell><cell>98(2)</cell><cell>964(1)</cell><cell>4(0)</cell></row><row><cell>Q1.1: «Kettle bell»</cell><cell>0</cell><cell>1(1)</cell><cell>2(2)</cell><cell>3(1)</cell><cell>0</cell></row><row><cell>Q2: «gas turbine»</cell><cell>0</cell><cell cols="2">215(1) 453(4)</cell><cell>1110(2)</cell><cell>113(2)</cell></row><row><cell>Q2.1: «gas turbine</cell><cell>0</cell><cell>2 (2)</cell><cell>70(2)</cell><cell>341(2)</cell><cell>98(2)</cell></row><row><cell>compressor»</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q3: «Gonzales»</cell><cell>8(1)</cell><cell>0</cell><cell>0</cell><cell>46(1)</cell><cell>40(0)</cell></row><row><cell>Q3.1: « Martino</cell><cell>1(1)</cell><cell>0</cell><cell>0</cell><cell>0</cell><cell>0</cell></row><row><cell>Gonzales»</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q4: «search»</cell><cell>6(0)</cell><cell cols="2">279(1) 1770(2)</cell><cell>9487(2)</cell><cell>147(2)</cell></row><row><cell>Q4.1: «search engine»</cell><cell>0</cell><cell>25(2)</cell><cell>207(2)</cell><cell>2159(2)</cell><cell>114(0)</cell></row><row><cell>Q4.2: «semantic search</cell><cell>0</cell><cell>2(2)</cell><cell>5(2)</cell><cell>139(1)</cell><cell>50(1)</cell></row><row><cell>engine»</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Q5: «transistor»</cell><cell>0</cell><cell cols="2">346(1) 2730(1)</cell><cell>8312(1)</cell><cell>199(1)</cell></row><row><cell>Q5.1: « low frequency</cell><cell>0</cell><cell>0</cell><cell>12(4)</cell><cell>910(2)</cell><cell>110(4)</cell></row><row><cell>transistor»</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://www.ir-facility.org/prototypes/marec</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Calegari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Panzeri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Pasi</surname></persName>
		</author>
		<title level="m">PatentLight: a Patent Search Application, Proceedings of lliX 2012</title>
				<meeting><address><addrLine>Nijmegen, The Netherlands</addrLine></address></meeting>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Patent Retrieval Foundations and Trends in Information Retrieval</title>
		<author>
			<persName><forename type="first">Mihai</forename><surname>Lupu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Allan</forename><surname>Hanbury</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="1" to="97" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Simple vs. Sophisticated Approaches for Patent Prior-Art Search</title>
		<author>
			<persName><forename type="first">W</forename><surname>Magdy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lopez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J F</forename><surname>Jones</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Information Retrieval</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="volume">6611</biblScope>
			<biblScope unit="page" from="725" to="728" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><surname>Wiens</surname></persName>
		</author>
		<ptr target="http://www.benwiens.com/patents.html" />
		<title level="m">Understanding Patents</title>
				<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Term distillation in patent retrieval</title>
		<author>
			<persName><forename type="first">H</forename><surname>Itoh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ogawa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACL-2003 worshop on Patent corpus processing</title>
				<meeting>the ACL-2003 worshop on Patent corpus processing<address><addrLine>Stroudsburg, PA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="41" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Query terms extraction from patent document for invalidity search</title>
		<author>
			<persName><forename type="first">T</forename><surname>Takaki</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">NTCIR-5</title>
				<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Transforming patents into prior-art queries</title>
		<author>
			<persName><forename type="first">X</forename><surname>Xue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">B</forename><surname>Croft</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGIR</title>
				<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="808" to="809" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Prior art retrieval using various patent document fields contents</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Z</forename><surname>Wanagiri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Adriani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF-2010 (Notebook Papers/LABs/Workshops)</title>
				<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Enhancing patent retrieval by citation analysis</title>
		<author>
			<persName><forename type="first">A</forename><surname>Fujii</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGIR</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="793" to="794" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Simple vs. sophisticated approaches for patent prior-art search</title>
		<author>
			<persName><forename type="first">W</forename><surname>Magdy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lopez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J F</forename><surname>Jones</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECIR</title>
				<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="725" to="728" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Fall, Divided WWe Stand: A Study of Query Segmantation and PRF for Patent Prior Art Search</title>
		<author>
			<persName><forename type="first">D</forename><surname>Ganguly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Leveling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">J F</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><surname>United We</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of PaIR&apos;11</title>
				<meeting>PaIR&apos;11<address><addrLine>Glasgow, UK</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="13" to="17" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Patentssearcher: a novel portal to search and explore patents</title>
		<author>
			<persName><forename type="first">V</forename><surname>Hristidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ruiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hernandez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Fanfan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Varadaraian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">PaIR&apos;10</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="33" to="38" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A patent search and classification system</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">S</forename><surname>Larkey</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACM DL</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="179" to="187" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Evaluating document retrieval in patent database: A preliminary report</title>
		<author>
			<persName><forename type="first">M</forename><surname>Osborn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Strzalkowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Marinescu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CIKM</title>
				<editor>
			<persName><forename type="first">F</forename><surname>Golshani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Makki</surname></persName>
		</editor>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="1997">1997</date>
			<biblScope unit="page" from="216" to="221" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Preliminary Study into Query Translation for Patent Retrieval</title>
		<author>
			<persName><forename type="first">C</forename><surname>Jochim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lioma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schutze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Koch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ertl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">PaIR&apos;10</title>
				<meeting><address><addrLine>Toronto, Ontario, Canada</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="57" to="66" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">User-assisted query translation for interactive cross-language information retrieval</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">W</forename><surname>Oard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Inf. Process. Manage</title>
		<imprint>
			<biblScope unit="volume">44</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="181" to="211" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">A survey of patent users: an analysis of tasks, behavior, search functionality and system requirements</title>
		<author>
			<persName><forename type="first">H</forename><surname>Joho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A</forename><surname>Azzopardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Vanderbauwhede</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the third symposium on Information interaction in context (IIiX &apos;10)</title>
				<meeting>the third symposium on Information interaction in context (IIiX &apos;10)<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<biblScope unit="page" from="13" to="24" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">NEXI, Now and Next</title>
		<author>
			<persName><forename type="first">A</forename><surname>Trotman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sigurbjornsson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in XML Information Retrieval</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Fuhr</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Lalmas</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Malik</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Z</forename><surname>Szlavik</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Sprienger</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="volume">3493</biblScope>
			<biblScope unit="page" from="16" to="40" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><surname>W3c</surname></persName>
		</author>
		<ptr target="http://www.w3.org/TR/xpath/" />
		<title level="m">XML Path Language 1</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">XQuery 1.0: An XML Query Language</title>
		<ptr target="http://www.w3.org/TR/xquery/" />
	</analytic>
	<monogr>
		<title level="m">W3C</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">XQuery and XPath Full Text 1</title>
		<author>
			<persName><surname>W3c</surname></persName>
		</author>
		<ptr target="http://www.w3.org/TR/xpath-full-text-10/" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">A flexible extension ox XPath to improve XML querying</title>
		<author>
			<persName><forename type="first">E</forename><surname>Damiani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Marrara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Pasi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SIGIR</title>
				<meeting><address><addrLine>Singapore</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="849" to="850" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">XQuery Full Text Implementation in BaseX</title>
		<author>
			<persName><forename type="first">C</forename><surname>Grun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gath</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Holupirek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">H</forename><surname>Scholl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">XSym &apos;09</title>
				<meeting><address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="114" to="128" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
