<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploratory Patent Search with Faceted Search and Configurable Entity Mining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pavlos Fafalios</string-name>
          <email>fafalios@ics.forth.gr</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michail Salampasis</string-name>
          <email>salampasis@ifs.tuwien.ac.at</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yannis Tzitzikas</string-name>
          <email>tzitzik@ics.forth.gr</email>
        </contrib>
      </contrib-group>
      <fpage>39</fpage>
      <lpage>47</lpage>
      <abstract>
        <p>Searching for patents is usually a recall-oriented problem and depending on the patent search type, quite often a problem which is characterized by uncertainty and evolution or change of the information need. We propose an exploratory strategy for patent search that exploits the metadata already available in patents in addition to the results of clustering and entity mining that are performed at query time. The results (metadata, clusters and entities grouped in categories) can complement the ranked list of patents produced from the core search engine with useful information for the user (e.g. providing a concise overview of the search results) which are further exploited in a faceted and sessionbased interaction scheme that allows the users to focus their searches gradually and to change between search methods as their information need is better defined and their understanding of the topic evolves in response to found information. In addition, we propose the exploitation of Linked Data for specifying the entities of interest and for providing further information about the identified entities. The proposed system offers a dynamic, entity-based integration of patent documents, patents metadata and other external (semantic) resources.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>In this paper, we propose an exploratory search system which uses static and dynamically generated metadata
and combines the following methods: a) exploiting at query time the static metadata of the top results, b)
clustering and entity mining that are performed at query time (no pre-processing or indexing is necessary) for
the domain of patent search and the ability to configure and personalize the entities of interest, and c) exploring
the entities and their characteristics by exploiting the LOD cloud. The results (metadata, clusters and entities
grouped in categories) can complement the query answers with useful information for the user (offering a concise
overview of the search results) which are further exploited in a faceted and long session interaction scheme that
allows users to restrict their focus gradually as their information need is better defined.</p>
      <p>We believe the work presented in this paper addresses the issue of integrated patent/professional search
systems from two important perspectives. From an information integration point of view entity names are used
as the “glue” for automatically connecting documents (patents in our case) with data (and knowledge). This
approach does not require deciding or designing an integrated schema/view, nor mappings between concepts as
in knowledge bases, or mappings in the form of queries as in the case of databases. Note also that in professional
patent search, in many situations, one must look beyond keywords to find and analyze patents based on a
more sophisticated understanding of the patent’s content and meaning [JAV10]. Technologies such as entity
identification and analysis could become a significant aid to such searches and can be seen, together with other
text analysis technologies, as becoming the cutting edge of information retrieval science [BCC10].</p>
      <p>From an information seeking process perspective we present the tight integration of different search tools for
a) faceted search using existing metadata, b) entity extraction and c) textual clustering, with the main retrieval
engine which produces ranked lists of patent documents in response to a query. This integration allows different
search interfaces to coexist in an information seeker’s patent search system and may be seen as a desirable
feature, but it could also easily lead to a feeling of “information overload” on the searcher’s side. To address this
risk the tools are synchronized so one event or action in one tool (for example selecting a facet) can update the
views produced from the other tools. All search tools, taken together, could provide professional search systems
better supporting exploratory search characterized by recall-oriented information problems. Ultimately, users
working within this complex information workplace, should have at their disposal multiple tools, interfaces, and
engage in rich and complex interactions to achieve their goals as their understanding of the topic is increased
and the information need is better defined.</p>
      <p>The rest of this paper is organized as follows. Section 2 discusses related work. Section 3 describes and
analyzes the proposed functionality. Section 4 reports experimental results, and finally, Section 5 concludes and
identifies issues that are worth further research.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Related Work</title>
      <p>Many countries provide web interfaces for searching their patent databases, e.g. the United States Patent and
Trademark Office (USPTO)3 and the European Patent Office (EPO)4. There are also many well-known web
systems that retrieve information from several patent databases like Google Patent Search5, Free Patens Online
(PTO)6 and Patents.com7, and some commercial systems like Delphion Derwent8 and Thomson Innovation9.
Furthermore, several workshops have been conducted to evaluate and improve the state of the art in patent
retrieval [LHR11, HZB10] and several approaches have been proposed. [OSM97] introduced a system that
integrates a series of shallow natural language processing techniques into a vector-based document information
retrieval system for searching relevant patents, while [Lar99] proposes a probabilistic information retrieval system
for searching and classifying US patents. Other works apply text analysis and retrieval methods to improve recall
and precision [MMO+05], investigate cluster-based retrieval in the context of invalidity search task of patent
retrieval [KNKL07], or construct clusters of patents containing same classification codes and employ cluster
based retrieval [KKL06]. [TLL07] introduced a series of text mining techniques for patent analysis, including
text segmentation and summary extraction, [HFAS09] uses classification code hierarchy to find similar patents,
while [XC09] considers a search scenario in which users can pose full patents as a query. Most recent works focus
on increasing the retrievability of patents by expanding prior-art queries generated from query patents using
query expansion with pseudo relevance feedback [BR10], or propose a topic-driven patent analysis and mining
3http://patft.uspto.gov/
4http://www.epo.org/searching.html
5https://www.google.com/patents
6http://www.freepatentsonline.com/search.html
7http://www.patents.com/
8http://www.delphion.com/derwent/
9http://info.thomsoninnovation.com/
system [TWY+12]. However, to the best of our knowledge there is no work on patent search that performs
at real time textual clustering and (configurable) entity mining on the top results returned by a patent search
system, exploits at real time semantic repositories and offers a faceted search-like interface.</p>
      <p>The idea of enriching the classical query-and-response process of current web search engines, with static and
dynamic metadata for supporting exploratory search was proposed in [PKA09] and it is described in more detail
(enriched with the results of a user-based evaluation) in [PAKT12]. In that work the notion of dynamic metadata
refers to the outcome of results clustering algorithms which take as input the snippets of hits, where snippets are
query word dependent (and thus they cannot be extracted, stored and indexed a-priori). Note that the results of
entity mining if applied over the textual snippets also falls into the case of dynamic metadata.</p>
      <p>[FKM+12] presents a method to enrich the classical (keyword based) web searching with entity mining that
is performed at query time. In addition, it shows how Linked Data can be exploited for specifying the entities of
interest and for providing further information about the identified entities. That work showed that the application
of entity mining over the snippets of the top hits of the answers can be performed at real-time. Mining over the
full content of the top results returns much more entities but is very time and memory consuming. Our intention
is to apply the aforementioned techniques plus the results of textual clustering and metadata-based grouping in
patent search, with focus on the needs of patent searchers.</p>
      <p>Regarding the value of the supplementary information to users in web searching (in our case the metadata
groupings, the clusters and the entities), experimental results have shown that categorizing the search results
improves the search speed and increases the accuracy of the selected results [KA05]. Moreover, the user study in
[K¨ak05] showed that categories are successfully used as part of users’ search habits. Specifically, users are able to
access results located far down in the ranked order list and formulate simpler queries in order to find the needed
results. In addition, the categories are beneficial when more than one result is needed like in an exploratory or
undirected search task. A user study in [PF00] indicated that categorizing the results dynamically in a medical
search system offers an organization of the results that is more clear, easy to use, accurate, precise, and helpful
than the simple relevance ranking. Finally, a study in [AJK05] showed that experienced web users prefer to use
clustering when they are trying to get an overview or explore a topic.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Real-time Exploratory Patent Search</title>
      <p>We focus on a dynamic approach where no pre-processing of the resources has been done. Specifically, the user
submits a keyword query and the system fetches the top-L results (i.e. the potentially most relevant patents) from
the underlying search system, including the static metadata of each patent. Then we apply textual clustering and
entity mining in the title and abstract of the top-L results. Afterwards, we group the hits in categories according
to their static metadata, clusters and entities and rank the - often numerous - elements of each category.</p>
      <p>Figure 1 depicts an indicative screendump of a prototype system that we have designed and developed (we
analyze it more at Section 3.8). Note that the user has many options for restricting the search space by selecting
one or more entities (A), metadata values (B), or clusters (C). For instance, in the current example the user has
focused to the Drug ibuprofen, the International Patent Classification A61531/185, and the Cluster behandlung (D),
restricting the search space to only 2 results. Furthermore, he is able to retrieve at real-time more information
about an entity (E). We analyze more the proposed interaction model in Section 3.6.
3.1</p>
      <sec id="sec-3-1">
        <title>Metadata-based grouping</title>
        <p>For grouping the results according to their static metadata values, we had to discover which metadata elements
are important in a patent search. Specifically, we need to know which fields are most useful and likely to be used
in a faceted search-like interface for narrowing the search space. Note that a patent document may have numerous
metadata elements. For instance, a patent document of the Matrixware Research Collection (MAREC) data corpus10
may contain more than 20 metadata elements including document identification fields, concerned parties, filing
and priority information, national and international classification codes, titles, abstracts and descriptions (in
many languages), citations, related applications, claims, etc. For selecting the fields which are most useful and
likely to be used in a faceted search-like interface, we gathered opinions during interviews with patent examiners
from the Industrial Property Organization of Greece11, in a visit aiming to observe patent examiners searching
in their working environment. Later on we did an one-on-one interview with a very experienced patent examiner
specifically to learn about their attitudes and beliefs surrounding the usefulness of different types of metadata
in patent search. The expert mentioned us the following 9 metadata fields as being important in a faceted
10http://www.ir-facility.org/prototypes/marec
11http://www.obi.gr/
patent search: International Patent Classification (IPC), European Classification (ECLA), Applicant, Inventor,
publication number, publication country, publication year, application country, application year. Thus, we decided
to offer the above metadata fields for faceted exploration.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Textual Clustering</title>
        <p>Results clustering is very useful for providing users with an overview of the results. It aims at grouping the
results into topics (called clusters), with predictive names (labels), aiding the user to locate quickly one or more
documents (patents in our case) that otherwise it would be very difficult to find, especially if they are low ranked.
In our setting, we use a variation of the Suffix Tree Clustering (STC) algorithm [ZE98], called NM-STC (No-Merge
STC) [KPT09], that derives hierarchically organized labels and is able to favor occurances in a specific part of
the result (e.g. in the title). The last is very useful for clustering the results of a patent search because we
want to favor occurrences in the title of the patents, since the invention title usually is the most important and
descriptive part of a patent.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Discovering Entities</title>
        <p>We currently use GateAnnie12 for entity mining. In our setting it takes as input a set of textual contents (the
title and abstract of each patent), specifically those of the top hits of the query answer, and it returns as output
a set of entity lists (one list for each category of entities). We have automated the procedure of adding a new
category of entities in GateAnnie. Thereby, we can easily configure the entity names that are interesting for the
application at hand (e.g. LOD of a particular type). As we will see later (Section 3.7), for defining the entities
of interest we can exploit any semantic repository that is accessible via a SPARQL endpoint, or we can load our
own lists of entities.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Ranking the Entities and the Static Metadata Values</title>
        <p>Note that the discovered entities and the static metadata values may be numerous. Thereby, we need an affective
method for ranking them and promoting the most important. Consider that the user submits a keyword query q,
and let P be the set of the top-L patents (e.g. L = 200) returned by the underlying search system. For a p ∈ P ,
let rank(p) be its position in the answer (the first result has rank equal to 1, the second 2, and so on). Regarding
the ranking of entities, we apply entity mining in P , get a set of entities E, and rank each e ∈ E according to
the formula: Score(e) = !p∈pats(e|)P(|((||PP||++11))−rank(p)) , where pats(e) denote the patents (i.e. the elements of P )
2
in which an entity e has been identified. We can see that the entities occurring in the top results are promoted,
i.e. we exploit the ranking of the patents. The rational behind this ranking formula is that the top patents in
the ranked list probably contain more useful entities that the last patents since they are considered “better”
results. On this account (and considering that we analyze only the top-L results), the ranking algorithm of the
underlying search system is very important.</p>
        <p>We use the same formula for ranking also the static metadata values in a metadata category, e.g. for ranking
the applicants, the publication years, etc.
3.5</p>
      </sec>
      <sec id="sec-3-5">
        <title>Ranking of Metadata and Entity Categories</title>
        <p>For a category c, if inst(c) denotes the entities or the metadata values that fall in c, we rank the categories
according to the formula: Score(c) = !e∈inst(c) Score(e). We can see that the categories which contain the
more highly scored entities or metadata values are promoted.
3.6</p>
      </sec>
      <sec id="sec-3-6">
        <title>The Proposed Interaction Model</title>
      </sec>
      <sec id="sec-3-7">
        <title>Faceted search-like exploration of the (top) results</title>
        <p>The results of entity mining, clustering and metadata-based grouping are visualized and exploited according to
the faceted exploration interaction paradigm [ST09]: when the user clicks on an entity, a cluster or a metadata
value, the hits are restricted to those that contain that entity, cluster or metadata value (Figure 1D). Specifically,
the user is able to gradually select elements from one or more categories and refine the answer set accordingly
(the mechanism is session-based). If such selections belong to the same category, they have disjunctive (OR)
semantics and if they belong to separate categories they have conjunctive (AND) semantics. Furthermore, the
user can see only the top-10 values in each category and by simply clicking a hyperlink (button) he can inspect
all of them.</p>
      </sec>
      <sec id="sec-3-8">
        <title>On-click semantic exploration of the Linked Data</title>
        <p>There are already vast amounts of structured information published according to the principles of LOD. The
availability of such datasets enables not only to configure easily the entity names that are interesting for the
application at hand (see Section 3.7), but also the enrichment of the entities with more information about them.
In this way the user not only can get useful information about one entity without having to submit a new query,
but he can also start browsing the entities that are linked to that entity. Note that many of the static metadata
can also be considered entities, e.g. the Applicant, the Inventor, the Publication Country, etc.</p>
        <p>Another important point is that exploiting LOD is more dynamic, affordable and feasible, than an approach
that requires each search system to keep stored and maintain its own knowledge base of entities and facts.
Returning to our setting, a question is which LOD dataset(s) to use. One approach is to identify and specify one
or more appropriate dataset(s) for each category of entities. For example, for entities in the category Publication
Country, the GeoNames13 dataset seems ideal since it offers rich information about countries. Furthermore,
DBpedia14 is appropriate for multiple categories such as Applicants, Countries, Inventors, etc. Other sources that
could be used include FreeBase15 (for persons, places and things) and YAGO [SKW07] which includes Wikipedia,
WordNet and GeoNames. In addition FactForge [BKO+11] includes 8 LOD datasets (including DBpedia, Freebase,
Geonames, Wordnet). Many of the aforementioned datasets offer access through SPARQL endpoints16.</p>
        <p>Running one (SPARQL) query for each entity would be a very expensive task, especially if the system has
discovered a lot of entities. For this reason, we offer this service on demand. Specifically when the user clicks
on the small icon at the right of an entity’s name, the system at that time collects more information about that
entity which are visualized in a popup window as shown in Figure 1E. Then, the user is able to continue browsing
by exploring the properties of the related entities. As we will see later, the user is able to define a SPARQL
endpoint and a SPARQL template query for each category of entities (see Section 3.7).</p>
        <p>13http://www.geonames.org/
14http://dbpedia.org/
15http://www.freebase.com/
16DBpedia: http://dbpedia.org/sparql, FactForge: http://www.factforge.net/sparql, YAGO: http://lod2.openlinksw.com/sparql</p>
      </sec>
      <sec id="sec-3-9">
        <title>3.7 Configurability</title>
        <p>We give particular emphasis on the configurability of the system. The administrator can specify various
parameters of the system through a configuration page (Figure 1F). The most important is the exploitation of the
LOD cloud for i) dynamically adding a new category of entities, and ii) defining how to semantically explore the
identified entities.</p>
      </sec>
      <sec id="sec-3-10">
        <title>Adding a new category of entities</title>
        <p>We are able to add a new category of entities by giving a category title and a list of words/phrases. The list can
be loaded by running a SPARQL query over a knowledge base that offers a SPARQL endpoint. For example,
we can run a SPARQL query over DBpedia’s SPARQL endpoint that returns a list of all objects of rdf:type
dbpedia-owl:ChemicalCompound and thereby offer the ability to explore Chemical Compounds in the search
results.</p>
      </sec>
      <sec id="sec-3-11">
        <title>Specifying the underlying knowledge bases</title>
        <p>We are able to define how to semantically explore an identified entity by giving a SPARQL template query and a
SPARQL endpoint for each category of entities that we want to offer entity exploration. The SPARQL template
query must contain the character sequence &lt;ENTITY&gt; (including the &lt; and &gt;). When a user asks for more
information about an entity, we read the template query of the category in which the selected entity belongs,
and we replace each occurrence of &lt;ENTITY&gt; with the entity’s label name.</p>
      </sec>
      <sec id="sec-3-12">
        <title>3.8 The prototype</title>
        <p>We have implemented a prototype that offers the aforementioned functionality17. The underlying search system
searches the CLEF-IP 201118 data collection which contains more than 2.6 million patent documents extracted
from the MAREC data corpus. With the current configuration, the system offers faceted exploration of the
“important” metadata fields (Section 3.1) and of the entity types Drug, Disease, Chemical Substance and Protein. For
exploring the entities, we exploit DBpedia’s SPARQL endpoint and we have specified the appropriate SPARQL
template queries (all this information can be managed through the configuration page).
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Evaluation</title>
      <sec id="sec-4-1">
        <title>Execution Time</title>
        <p>We run 100 random queries in the system described in Section 3.8 and we measured the average time required
for a) grouping the top-L results according the their metadata values, b) applying clustering on the title and
abstract of the top-L results, c) applying entity mining on the title and abstract of the top-L results, for several
values of L. The experiments were carried out using a laptop with processor Intel Core i5 @ 2.4Ghz CPU, 4GB
RAM and running Windows 7 (64 bit). The implementation of the system is in Java 1.6 (J2EE platform), using
Apache Tomcat 7.</p>
        <p>Table 4.1 reports the results. We notice that the metadata-based grouping requires about 0.8 ms per result,
the clustering about 3 ms per result, while entity mining is the most time consuming task requiring about 10 ms
per result. However, the 3 tasks can be performed in parallel (the results of a task are not required for running
another task). Note also that the time does not depend on the underlying data sources but on the number of
the top results that we want to analyze; the more results we analyze, the more time is required for grouping,
clustering and mining them.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Number of top results</title>
        <p>25
50
100
200</p>
        <p>Note that the user is able to select the number of top-hits, e.g. a bigger L, and thus achieve the desired
recall level. However, the higher this number is, the more time the system requires. In such cases and for
improving scalability one could either build a dedicated index and offer instant response for the most frequent
queries according to the approach proposed in [FT11], or adopt a MapReduce approach [DG08] for distributing
the problem to many computers.</p>
        <p>17The prototype is accessible through http://139.91.183.72/x-search-metadata-groupings/
18http://www.ir-facility.org/clef-ip
4.2</p>
      </sec>
      <sec id="sec-4-3">
        <title>Time for exploring an entity</title>
        <p>The time for exploring an entity (by querying the LOD cloud) highly depends on the SPARQL endpoint (i.e.
the underlying knowledge base) and the SPARQL template query. We have noticed that the more data (i.e.
entities) a category of entities contains in the underlying knowledge base, the more time is required for retrieving
information. For example (and for the time being), DBpedia’s endpoint contains about 5,000 entities of type
Drug (rdf:type dbpedia-owl:Drug). For retrieving information about a Drug, about 5 seconds are required
(including network delay time). However, for retrieving information about a Company (DBpedia contains about
45,000 entities of rdf:type dbpedia-owl:Company), the time required is about 20 seconds. Nevertheless, if we
know the URIs of the entities in a category (and keep them in an index) the retrieval can be performed much
faster because the query that we can form is simple and the endpoint will not perform many comparisons since
it knows the exact URI in which the information lies.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>We have introduced an exploratory method for patent search that exploits at real time the available metadata
plus the results of textual clustering and entity mining in a faceted and session-based interaction scheme that
allows the user to restrict his focus gradually. We also propose the exploitation of Linked Data for specifying the
entities of interest and for providing further information about the identified entities. The proposed functionality
offers an entity-based integration of patent documents, patents metadata and other external (semantic) resources.</p>
      <p>In comparison to the existing systems for patent search, the proposed approach offers the ability to a) restrict
the focus using static metadata values which are not offered by the advanced search but are important for the
patent searchers, b) restrict the focus using entity values and important topics (clusters) that were discovered in
the search results, c) exploit any knowledge base that is accessible through a SPARQL endpoint for both retrieving
more information about an identified entity and specifying the entities of interest. Furthermore showing values
and their count gives an overview (e.g. percentage of patents published in Greece). Note also that the proposed
functionality can be exploited by any patent search system (i.e. it acts as a service over a ranked list of results),
it does not require any pre-processing and it does not use any caching scheme. The experimental results showed
that we can efficiently offer the proposed functionality, however the time that we have to pay is proportional to
the number of the top results that we want to “explore”. Furthermore, the time for exploring the LOD cloud for
retrieving more information about an entity highly depends on the SPARQL endpoint and the SPARQL query
that we use.</p>
      <p>In future we plan to investigate approaches for entity deduplication and cleaning that are appropriate for our
setting. Entity disambiguation is a problem that affects the quality of the presented entities and an important
issue that worths further research. Ambiguity in an entity name can arise from variations in how an entity may
be referenced, e.g. IBM and International Business Machines, or from the existence of several entities with the
same name, e.g. Argentina (the country) and Argentina (the fish). Finally, we plan to conduct a user centered,
task-based evaluation in order to measure the overall impact of the techniques for structuring patent search
result lists.</p>
      <sec id="sec-5-1">
        <title>Acknowledgements</title>
        <p>Work done in the context of MUMIA (COST action IC1002, 2010-2014) and iMarine (FP7 Research
Infrastructures 283644, 2011-2014).
[AJK05]</p>
        <p>A. Aula, N. Jhaveri, and M. K¨aki. Information search and re-access strategies of experienced web
users. In Proceedings of the 14th international conference on World Wide Web. ACM, 2005.
D. Bonino, A. Ciaramella, and F. Corno. Review of the state-of-the-art in patent information and
forthcoming evolutions in intelligent patent informatics. World Patent Information, 32(1), 2010.
[BR10]</p>
        <p>S. Bashir and A. Rauber. Improving retrievability of patents in prior-art search. Advances in
Information Retrieval, pages 457–470, 2010.</p>
        <p>Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters.
Communications of the ACM, 51(1):107–113, 2008.
[FT11]
[HFAS09]
[HZB10]
[JAV10]
[KA05]
[K¨ak05]
[KKL06]
[KPT09]
[Lar99]
[LHR11]
[FKM+12] P. Fafalios, I. Kitsos, Y. Marketakis, C. Baldassarre, M. Salampasis, and Y. Tzitzikas. Web searching
with entity mining at query time. In Proceedings of the 5th Information Retrieval Facility Conference,
July 2012.</p>
        <p>P. Fafalios and Y. Tzitzikas. Exploiting available memory and disk for scalable instant overview
search. Web Information System Engineering, pages 101–115, 2011.</p>
        <p>C.G. Harris, S. Foster, R. Arens, and P. Srinivasan. On the role of classification in patent invalidity
searches. In Proceedings of the 2nd international workshop on Patent information retrieval. ACM,
2009.</p>
        <p>Allan Hanbury, Veronika Zenz, and Helmut Berger. 1st international workshop on advances in patent
information retrieval. SIGIR Forum, 44(1), 2010.</p>
        <p>H. Joho, L.A. Azzopardi, and W. Vanderbauwhede. A survey of patent users: an analysis of tasks,
behavior, search functionality and system requirements. In Procs of the 3rd symposium on
Information interaction in context. ACM, 2010.</p>
        <p>M. K¨aki and A. Aula. Findex: improving search result use through automatic filtering categories.
Interacting with Computers, 17(2):187–206, 2005.</p>
        <p>M. K¨aki. Findex: search result categories help users when document ranking fails. In Proceedings
of the SIGCHI conference on Human factors in computing systems. ACM, 2005.</p>
        <p>J. Kim, I.S. Kang, and J.H. Lee. Cluster-based patent retrieval using international patent
classification system. Computer Processing of Oriental Languages. Beyond the Orient: The Research
Challenges Ahead, pages 205–212, 2006.
[OSM97]</p>
        <p>M. Osborn, T. Strzalkowski, and M. Marinescu. Evaluating document retrieval in patent database: a
preliminary report. In Proceedings of the sixth international conference on Information and knowledge
management. ACM, 1997.
[PAKT12] Panagiotis Papadakos, Nikos Armenatzoglou, Stella Kopidaki, and Yannis Tzitzikas. On exploiting
static and dynamically mined metadata for exploratory web searching. Knowledge and Information
Systems, 30:493–525, 2012.
[PF00]
[PKA09]</p>
        <p>W. Pratt and L. Fagan. The usefulness of dynamically categorizing search results. Journal of the
American Medical Informatics Association, 7(6):605–617, 2000.</p>
        <p>
          P. Papadakos, S. Kopidaki, and Y. Armenatzoglou, N.and Tzitzikas. Exploratory web searching
with dynamic taxonomies and results clustering. In Proceedings of the 13th European Conference on
Digital Librarie
          <xref ref-type="bibr" rid="ref3">s, September 2009</xref>
          .
[SKW07]
[ST09]
[TLL07]
[XC09]
[ZE98]
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [BKO+11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bishop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kiryakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ognyanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Peikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tashev</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Velkov</surname>
          </string-name>
          .
          <article-title>Factforge: A fast track to the web of data</article-title>
          .
          <source>Semantic Web</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ):
          <fpage>157</fpage>
          -
          <lpage>166</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [KNKL07]
          <string-name>
            <given-names>I.S.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.H.</given-names>
            <surname>Na</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.H.</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Cluster-based patent retrieval</article-title>
          .
          <source>Information processing &amp; management</source>
          ,
          <volume>43</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1173</fpage>
          -
          <lpage>1182</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Kopidaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papadakos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tzitzikas</surname>
          </string-name>
          .
          <article-title>Stc+ and nm-stc: Two novel online results clustering methods for web searching</article-title>
          .
          <source>Web Information Systems Engineering</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>L.S.</given-names>
            <surname>Larkey</surname>
          </string-name>
          .
          <article-title>A patent search and classification system</article-title>
          .
          <source>In Proceedings of the fourth ACM conference on Digital libraries</source>
          , volume
          <volume>11</volume>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Mihai</given-names>
            <surname>Lupu</surname>
          </string-name>
          , Allan Hanbury, and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Rauber</surname>
          </string-name>
          . 4th international workshop on patent information retrieval.
          <source>In Proceedings of the 20th ACM international conference on Information and knowledge management</source>
          , New York, NY, USA,
          <year>2011</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [MMO+05]
          <string-name>
            <given-names>H.</given-names>
            <surname>Mase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Matsubayashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ogawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Iwayama</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Oshio</surname>
          </string-name>
          .
          <article-title>Proposal of two-stage patent retrieval method considering the claim structure</article-title>
          .
          <source>ACM Transactions on Asian Language Information Processing</source>
          ,
          <volume>4</volume>
          (
          <issue>2</issue>
          ):
          <fpage>190</fpage>
          -
          <lpage>206</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>F.M. Suchanek</surname>
            ,
            <given-names>G.</given-names>
            Kasneci, and G.
          </string-name>
          <string-name>
            <surname>Weikum.</surname>
          </string-name>
          <article-title>Yago: a core of semantic knowledge</article-title>
          .
          <source>In Proceedings of the 16th World Wide Web conference</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>G.M. Sacco</surname>
            and
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Tzitzikas</surname>
          </string-name>
          .
          <article-title>Dynamic taxonomies and faceted search: theory, practice, and experience</article-title>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Y.H.</given-names>
            <surname>Tseng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.J.</given-names>
            <surname>Lin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.I.</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <article-title>Text mining techniques for patent analysis</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <volume>43</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1216</fpage>
          -
          <lpage>1247</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [TWY+12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          , et al.
          <article-title>Patentminer: topic-driven patent analysis and mining</article-title>
          .
          <source>In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>X.</given-names>
            <surname>Xue</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Transforming patents into prior-art queries</article-title>
          .
          <source>In Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. ACM</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>O.</given-names>
            <surname>Zamir</surname>
          </string-name>
          and
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          .
          <article-title>Web document clustering: A feasibility demonstration</article-title>
          .
          <source>In Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>