<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CLEF-IP 2009: retrieval experiments in the Intellectual Property domain</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giovanna Roda</string-name>
          <email>g.roda@matrixware.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>John Tait</string-name>
          <email>j.tait@ir-facility.org</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Florina Piroi</string-name>
          <email>f.piroi@ir-facility.org</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Veronika Zenz</string-name>
          <email>v.zenz@matrixware.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vienna</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Austria</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Matrixware Information Services GmbH</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Clef Iptrack ran for the rst time within Clef 2009. The purpose of the track was twofold: to encourage and facilitate research in the area of patent retrieval by providing a large clean data set for experimentation; to create a large test collection of patents in the three main European languages for the evaluation of cross lingual information access. The track focused on the task of prior art search. The 15 European teams who participated in the track deployed a rich range of Information Retrieval techniques adapting them to this new speci c domain and task. A large-scale test collection for evaluation purposes was created by exploiting patent citations.</p>
      </abstract>
      <kwd-group>
        <kwd>Patent retrieval</kwd>
        <kwd>Prior art search</kwd>
        <kwd>Intellectual Property</kwd>
        <kwd>Test collection</kwd>
        <kwd>Evaluation track</kwd>
        <kwd>Benchmarking</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The Cross Language Evaluation Forum Clef1 originally arose from a work on Cross Lingual
Information Retrieval in the US Federal National Institute of Standards and Technology Text
Retrieval Conference Trec2 but has been run separately since 2000. Each year since then a
number of tasks on both cross lingual information retrieval (Clir) and monolingual information
retrieval in non English languages have been run. In 2008 the Information Retrieval Facility (Irf)
and Matrixware Information Services GmbH obtained the agreement to run a track which allowed
groups to assess their systems on a large collection of patent documents containing a mixture of
English, French and German documents derived from European Patent O ce data. This became</p>
    </sec>
    <sec id="sec-2">
      <title>1http://www.clef-campaign.org</title>
      <p>2http://trec.nist.gov
known as the Clef Iptrack, which investigates IR techniques in the Intellectual Property domain
of patents.</p>
      <p>One main requirement for a patent to be granted is that the invention it describes should be
novel: that is there should be no earlier patent or other publication describing the invention. The
novelty breaking document can be published anywhere in any language. Hence when a person
undertakes a search, for example to determine whether an idea is potentially patentable, or to
try to prove a patent should not have been granted (a so-called opposition search), the search is
inherently cross lingual, especially if it is exhaustive.</p>
      <p>The patent system allows inventors a monopoly on the use of their invention for a xed period
of time in return for public disclosure of the invention. Furthermore, the patent system is a major
underpinning of the company value in a number of industries, which makes patent retrieval an
important economic activity.</p>
      <p>
        Although there is important previous academic research work on patent retrieval (see for
example the Acm Sigir 2000 Workshop [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] or more recently the Ntcir workshop series [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
there was little work involving non English European Languages and participation by European
groups was low. Clef Ipgrew out of desire to promote such European research work and also to
encourage academic use of a large clean collection of patents being made available to researchers
by Matrixware (through the Information Retrieval Facility).
      </p>
      <p>Clef Ip has been a major success. For the rst time a large number of European groups
(15) have been working on a patent corpus of signi cant size within an integrated and single IR
evaluation collection. Although it would be unreasonable to pretend the work is beyond criticism
it does represent a signi cant step forward for both IR community and patent searchers.
2
2.1</p>
      <sec id="sec-2-1">
        <title>The CLEF-IP Patent Test Collection</title>
        <sec id="sec-2-1-1">
          <title>Document Collection</title>
          <p>
            The Clef Iptrack had at its disposal a collection of patent documents published between 1978
and 2006 at the European Patent O ce ( Epo). The whole collection consists of approximately
1.6 million individual patents. As suggested in [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ], we split the available data into two parts
1. the test collection corpus (or target dataset) - all documents with publication date
between 1985 and 2000 (1,958,955 patent documents pertaining to 1,022,388 patents, 75Gb)
2. the pool for topic selection - all documents with publication date from 2001 to 2006
(712,889 patent documents pertaining to 518,035 patents, 25Gb)
          </p>
          <p>Patents published prior to 1985 were excluded from the outset, as before this year many
documents were not led in electronic form and the optical character recognition software that
was used to digitize the documents produced noisy data. The upper limit, 2006, was induced by
our data provider a commercial institution which, at the time the track was agreed on, had not
made more recent documents available.</p>
          <p>The documents are provided in Xml format and correspond to the Alexandria Xml Dtd3.
Patent documents are structured documents consisting of four major sections: bibliographic data,
abstract, description and claims. Non-linguistic parts of patents like technical drawings, tables
of formulas were left out which put the focus of this years track on the (multi)lingual aspect of
patent retrieval: Epo patents are written in one of the three o cial languages English, German
and French. 69% of the documents in the Clef Ipcollection have English as their main language,
23% German and 7% French. The claims of a granted patent are available in all 3 languages and
also other sections, especially the title are given in several languages. That means the document
collection itself is multilingual, with the di erent text sections being labeled with a language code.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3http://www.ir-facility.org/pdf/clef/patent-document.dtd</title>
      <p>Patent documents and kind codes
In general, to one patent are associated several patent documents published at di erent stages of
the patent’s life cycle. Each document is marked with a kind code that speci es the stage it was
published in. The kind code is denoted by a letter possibly followed by a one digit numerical code
that gives additional information on the nature of the document. In the case of the Epo, A
stands for a patent’s application stage and B for a patent’s granted stage, B1 denotes a patent
speci cation and B2 a later, amended version of the patent speci cation4.</p>
      <p>Characteristic to our patent document collection is that les corresponding to patent documents
published at various stages need not contain the whole data pertinent to a patent. For example, a
B1 document of a patent granted by theEpo contains, among other, the title, the description,
and the claims in three languages (English, German, French), but it usually does not contain an
abstract, while an A2 document contains the original patent application (in one language) but
no citation information except the one provided by the applicant.5</p>
      <p>The Clef Ipcollection was delivered to the participants as is , without joining the documents
related to the same patent into one document. Since the objective of a search are patents (identi ed
by patent numbers, without kind code), it is up to the participants to collate multiple retrieved
documents for a single patent into one result.
2.2</p>
      <sec id="sec-3-1">
        <title>Tasks and Topics</title>
        <p>The goal of the Clef Iptasks consisted in nding prior art for a patent. The tasks mimic an
important real life scenario of an IP search professional. Performed at various stages of the patent
life-cycle, prior art search is one of the most common search types and a critical activity in the
patent domain. Before applying for a patent, inventors perform a such a search to determine
whether the invention ful lls the requirement of novelty and to formulate the claims as to not
con ict with existing prior art. During the application procedure, a prior art search is executed
by patent examiners at the respective patent o ce, in order to determine the patentability of an
application by uncovering relevant material published prior to the ling date of the application.
Finally parties that try to oppose a granted patent use this kind of search to unveil prior art that
invalidates patents claims of originality.</p>
        <p>
          For detailed information on information sources in patents and patent searching see [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Tasks</title>
      <p>Participants were provided with sets of patents from the topic pool and asked to return all patents
in the collection which constituted prior art for the given topic patents. Participants could choose
among di erent topic sets of sizes ranging from 500 to 10000.</p>
      <p>The general goal in Clef Ip was to nd prior art for a given topic patent. We proposed
one main task and three optional language subtasks. For the language subtasks a di erent topic
representation was adopted that allowed to focus on the impact of the language used for query
formulation.</p>
      <p>
        The main task of the track did not restrict the language used for retrieving documents.
Participants were allowed to exploit the multilinguality of the patent topics. The three optional subtasks
were dedicated to cross lingual search. According to Rule 71(3) of the European Patent
Convention [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], European granted patents must contain claims in the three o cial languages of the
European Patent O ce (English, French, and German). This data is well suited for investigating
the e ect of languages in the retrieval of prior art. In the three parallel multi lingual subtasks
topics are represented by title and claims, in the respective language, extracted from the same
B1 patent document. Participants were presented the same patents as in the main task, but
4For a complete list of kind codes used by various patent o ces see http://tinyurl.com/EPO-kindcodes
5It is not in the scope of this paper to discuss the origins of the content in the Epo patent documents. We only
note that applications to the Epo may originate from patents granted by other patent o ces, in which case the
Epo may publish patent documents with incomplete content, referring to the original patent.
with textual parts (title, claims) only in one language. The usage of bibliographic data, e.g. Ipc
classes was allowed.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Topic representation In Clef Ipa topic is itself a patent. Since patents come in several version corresponding to the di erent stages of the patent’s life-cycle, we were faced with the problem of how to best represent a patent topic.</title>
      <p>A patent examiner initiates a prior art search with a full patent application, hence one could
think about taking highest version of the patent application’s le would be best for simulating a
real search task. However such a choice would have led to a large number of topics with missing
elds. For instance, for EuroPCTs patents (currently about 70% of EP applications are EuroPCTs)
whose PCT predecessor was published in English, French or German, the application les contain
only bibliographic data (no abstract and no description or claims).</p>
      <p>In order to overcome these shortcomings of the data, we decided to assemble a virtual patent
application le to be used as a topic by starting from the B1 document. If the abstract was
missing in the B1 document we added it from the most current document where the abstract was
included. Finally we removed citation information from the bibliographical content of the patent
document.</p>
    </sec>
    <sec id="sec-6">
      <title>Topic selection</title>
      <p>
        Since relevance assessments were generated by exploiting existing manually created information
(see section 3.1) Clef Iphad a topic pool of hundreds of thousands of patents at hand. Evaluation
platforms usually strive to evaluate against large numbers of topics, as robustness and reliability
of the evaluation results increase with the number of topics [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. This is especially true when
relevance judgments are not complete and the number of relevant documents per topic is very
small as is the case in Clef Ipwhere each topic has on average only 6 relevant documents. In
order to maximize the number of topics while still allowing also groups with less computational
resources to participate, four di erent topic bundles were assembled that di ered in the number
of topics. For each task participants could chose between the topics set S (500 topics), M (1,000
topics), L (5,000 topics), and XL (10,000 topics) with the smaller sets being subsets of the larger
ones. Participants were asked to submit results for the largest of the 4 sets they were able to
process.
      </p>
      <p>From the initial pool of 500; 000 potential topics, candidate topics were selected according to
the following criteria:</p>
    </sec>
    <sec id="sec-7">
      <title>1. availability of granted patent</title>
    </sec>
    <sec id="sec-8">
      <title>2. full text description available</title>
    </sec>
    <sec id="sec-9">
      <title>3. at least three citations</title>
    </sec>
    <sec id="sec-10">
      <title>4. at least one highly relevant citation</title>
      <p>The rst criteria restricts the pool of candidate topics to those patents for which a granted
patent is available. This restriction was imposed in order to guarantee that each topic would
include claims in the three o cial languages of the EPO: German, English and French. In this
fashion, we are also able to provide topics that can be used for parallel multi-lingual tasks. Still,
not all patent documents corresponding to granted patents contained a full text description. Hence
we imposed this additional requirement on a topic. Starting from a topics pool of approximately
500,000 patents, we were left with almost 16,000 patents ful lling the above requirements. From
these patents, we randomly selected 10,000 topics, which bundled in four subsets constitute the
nal topic sets. In the same manner 500 topics were chosen which together with relevance
assessments were provided to the participants as training set.</p>
      <p>
        For an in-depth discussion of topic selection for Clef Ipsee [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>Patent Family 2
Patent
21</p>
      <p>Patent
22
...</p>
      <p>Patent
2k2</p>
      <p>Patent Family 1
Patent</p>
      <p>11
(Source
Patent)
...</p>
      <p>Patent</p>
      <p>1k1
...</p>
      <p>Patent Family m
Patent
m1</p>
      <p>Patent
m2
...</p>
      <p>Patent
mkm
Type1: Direct Citation of the source patent
Type2: Direct Citation from family member of the source patent
Type3: Family Member of Type1 Citation</p>
      <p>Type4: Family Member of Type2 Citation</p>
      <sec id="sec-10-1">
        <title>Relevance Assessment Methodology</title>
        <p>This section describes the two types of relevance assessments used in Clef Ip2009: (1) assessments
automatically extracted from patent citations as well as (2) manual assessments by patent experts.
3.1</p>
        <sec id="sec-10-1-1">
          <title>Automatic Relevance Assessment</title>
          <p>A common challenge in IR evaluation is the creation of ground truth data against which to
evaluate retrieval systems. The common procedure of pooling and manual assessment is very
labor-intensive. Voluntary assessors are di cult to nd, especially when expert knowledge is
required as is the case of the patent eld. Researchers in the eld of patents and prior art search
however are in the lucky position of already having partial ground truth at hand: patent citations.</p>
          <p>Citations are extracted from several sources:
1. applicant’s disclosure : some patent o ces (e.g. USPTO) require applicants to disclose all
known relevant publications when applying for a patent
2. patent o ce search report : each patent o ce will do a search for prior art to judge the
novelty of a patent
3. opposition procedures : often enough, a company will monitor granted patents of its
competitors and, if possible, le an opposition procedure (i.e. a claim that a granted patent is
not actually novel).</p>
          <p>There are two major advantages of extracting ground truth from citations. First citations
are established by members of the patent o ces, applicants and patent attorneys, in short by
highly quali ed people. Second, search reports are publicly available and are made for any patent
application, which leads to a huge set of assessment material that allows the track organizers to
scale the set of topics easily and automatically.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Methodology</title>
      <p>
        The general method for generating relevance assessments from patent citations is described in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
This idea had already been exploited at the Ntcir workshop series6. Further discussions within
the 1st Irf Symposium in 2007 7 led to a clearer formalization of the method.
      </p>
      <p>For Clef Ip2009 we used an extended list of citations that includes not only patents cited
directly by the patent topic, but also patents cited by patent family members and family members
of cited patents. By means of patent families we were able to increase the number of citations by
a factor of seven. Figure 1 illustrates the process of gathering direct and extended citations.</p>
      <p>
        A patent family consists of patents granted by di erent patent authorities but related to the
same invention (one also says that all patents in a family share the same priority data). For Clef
Ip this close (also called simple) patent family de nition was applied, as opposed to the extended
patent family de nition which also includes patents related via a split of one patent application
into two or more patents. Figure 1 (from [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]) illustrates an example of extended families.
      </p>
      <p>In the process of gathering citations, patents from » 70 di erent patent o ces (including
Uspto, Sipo, Jpo, etc.) were considered. Out of the resulting lists of citations all non Epo
patents were discarded as they were not present in the target data set and thus not relevant to
our track.</p>
      <p>
        Characteristics of patent citations as relevance judgments
What is to be noted when using citations lists as relevant judgments is that:
² citations have di erent degrees of relevancy (e.g. sometimes applicants cite not really relevant
patents). This can be spotted easily by labeling citations as coming from applicant or from
examiner and patent experts advise to chose patents with less than 25 - 30 citations coming
from the applicant.
² the lists are incomplete: even though, by considering patent families and opposition
procedures, we have quite good lists of judgments, the nature of the search is such that it often
stops when it nds one or only a few documents that are very relevant for the patent. The
Guidelines for examination in the Epo [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] prescribe that if the search results in several
documents of equal relevance, the search report should normally contain no more than one of
      </p>
    </sec>
    <sec id="sec-12">
      <title>6http://research.nii.ac.jp/ntcir/</title>
      <p>7http://www.ir-facility.org/symposium/irf-symposium-2007/the-working-groups
them. This means that we have incomplete recall bases which must be taken into account
when interpreting the evaluation results presented here.</p>
    </sec>
    <sec id="sec-13">
      <title>Further automatic methods</title>
      <p>To conclude this section we describe further possibilities of extending the set of relevance
judgements. These sources have not been used in the current evaluation procedure as they seem to be
less reliable indicators of relevancy. Nevertheless they are interesting avenues to consider in the
future, which is why they are mentioned here:</p>
      <p>A list of citations can be expanded by looking at patents cited in cited patents, if we assume
some level of transitivity of this relation. It is however arguable how relevant a patent C is to
patent A if we have something like A cites B and B cites C. Moreover, such a judgment cannot
be done automatically.</p>
      <p>In addition, a number of other features of patents can be used to identify potentially relevant
documents: co-authorship (in this case "co-inventorship"), if we assume that an inventor generally
has one area of research, co-ownership if we assume that a company specializes in one eld, or
co-classi cation if two patents are classi ed in the same class according to one of the di erent
classi cation models at di erent patent o ces. Again, these features would require intellectual
e ort to consider.</p>
      <p>
        Recently, a new approach for extracting prior art items from citations has been presented in
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
3.2
      </p>
      <p>Manual Relevance Assessment by Patent Experts
A number of patent experts were contacted for the manual assessment of a small part of the track’s
experimental results. Communicating the project’s goals and procedures was not an easy task,
nor was it motivating them to invest some time for this assessment activity. Nevertheless, a total
of 7 experts agreed to assess the relevance of retrieved patents for one or more topics. Topics
were chosen by the experts out of our collection according to their area of expertise. A limit of
around 200 retrieved patents to assess seemed to provide an acceptable amount of work. This
limit allowed us to pool experimental data up to depth 20.</p>
      <p>The engagement of patent experts resulted in 12 topics assessed up to rank 20 for all runs. A
total of 3140 retrieval results were assessed with an average of 264 results per topic.</p>
      <p>The results were submitted too late to be included in the track’s evaluation report. In the
section on evaluation activities we are going to report on the results obtained by using this
additional small set of data for evaluation even though this collection is too small a sample to draw
any general conclusions.
4
4.1</p>
      <sec id="sec-13-1">
        <title>Submissions</title>
        <sec id="sec-13-1-1">
          <title>Submission format</title>
          <p>For all tasks, a submission consisted of a single Ascii text le containing at most 1; 000 lines per
topic, in the standard format used for most Trec submissions: white space is used to separate
columns, the width of the columns is not important, but it is important to have exactly ve
columns per line with at least one space between the columns.</p>
          <p>EP1133908
EP1133908
EP1133908</p>
          <p>Q0
Q0
Q0</p>
          <p>EP1107664
EP0826302
EP0383071
² the rst column is the topic number (a patent number);
² the third column is the o cial document number of the retrieved document;
² the fourth column is the rank of the document retrieved;
² the fth column shows the score (integer or oating point) that generated the ranking. This
score must be in decreasing order.
4.2</p>
        </sec>
        <sec id="sec-13-1-2">
          <title>Submitted runs</title>
          <p>A total of 70 experiments from 14 di erent teams and 15 participating institutions (the University
of Tampere and Sics joined forces) was submitted to Clef Ip2009. Table 1 contains a list of all
submitted runs.</p>
          <p>Experiments ranged over all proposed tasks (one main task and three language tasks) and over
three (S, M, XL) of the proposed task sizes.</p>
        </sec>
        <sec id="sec-13-1-3">
          <title>Submission System</title>
          <p>Clear and detailed guidelines together with automated format checks are critical in managing
large-scale experimentations.</p>
          <p>Group-ID
qterm selection</p>
          <p>indexes</p>
          <p>For the upload and veri cation of runs a track management system was developed based on
the open source document management system Alfresco8 and the web interface Docasu9. The
system provides an easy-to-use Web-frontend that allows participants to upload and download
runs and any other type of le (e.g. descriptions of the runs). The system o ers version control
as well as a number of syntactical correctness tests. The validation process that is triggered on
submission of a run returns a detailed description of the problematic content. This is added as
an annotation to the run and is displayed in the user interface. Most format errors were therefore
detected automatically and corrected by the participants themselves. Still one error type passed
the validation and made the postprocessing of some runs necessary: patents listed as relevant on
several di erent ranks for the same topic patent. Such duplicate entries were ltered out by us
before evaluation.
4.3</p>
        </sec>
        <sec id="sec-13-1-4">
          <title>Description of Submitted Runs</title>
          <p>A comparison of the retrieval systems used in the Clef IpTask is given in Table 2. The usage
of Machine Translation (MT) is displayed in the second column, showing that MT was applied
only by two groups, both using Google Translate. Methods used for selecting query terms are
listed in the third column. As Clef Iptopics are whole patent documents many participants
found it necessary to apply some kind of term selection in order to limit the number of terms
in the query. Methods for term selection based on term weighting are shown here while
preselection based on patent- elds is shown separately in Table 3. Given that each patent document
could contain elds in up to three languages many participants chose to build separate indexes
per language, while others just generated one mixed-language index or used text elds only in</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>8http://www.alfresco.com/ 9http://docasu.sourceforge.net/</title>
      <p>Other
?
citations
citations,
priority,
applicant, ecla
*
applicant,
inventor
title
?
x
x
x
x
one languages discarding information given in the other languages. The granularity of the index
varied too, as some participants chose to concatenate all text elds into one index elds, while
others indexed di erent elds separately. In addition several special indexes like phrase or passage
indexes, concept indexes and Ipc indexes were used. A summary on which indexes were built and
which ranking models were applied is given in Table 2.
² As this was the rst year for Clef Ip many participants were absorbed with understanding
the data and task and getting the system running. The Clef Iptrack presented several
major challenges</p>
    </sec>
    <sec id="sec-15">
      <title>A new retrieval domain (patents) and task (prior art).</title>
      <p>The large size of the collection.</p>
      <p>The special language used in patents. Participants had not only to deal with
German, English and French text but also with the specialities of patent-speci c language
( Patentese ).</p>
      <p>The large size of topics. In most Clef tracks a topic consists of few selected query
words while for Clef Ipa topic consists of a whole patent. The prior art task might
thus also be tackled from the viewpoint of a document similarity or as proposed by
Nlel as a plagiarism detection task.
² Cross-linguality: participants approached the multilingual nature of the Clef Ipdocument
collection in di erent ways: Some groups like cle p-ug or Uaic did not focus on the
multilingual nature of the data. Other participants like Hildesheim and cle p-dcu chose to use only
data in one speci c language while many others used several monolingual retrieval systems
to retrieve relevant documents and merged their results. Two groups made use of machine
translation: Utasics used Google translate in the Main task to make patent- elds available
in all three languages. They report that using the Google translation engine actually
deteriorated their results. hcuge used Google translate to generate the elds in the missing
languages in the monolingual tasks. humb applied cross-lingual concept tagging.
² Several teams integrated patent-speci c know-how in their retrieval systems by:
Using classi cation information (Ipc, Ecla) was mostly found helpful. Several
participants used the Ipc class in their query formulation as a post-ranking lter criterium.
While using Ipc classes to lter out generally improves the retrieval results, it also
makes it impossible to retrieve relevant patents that don’t share an Ipc class with the
topic.
hcuge and humb exploited citation information given in the corpus.</p>
      <p>Apart from patent classi cation information and citations further bibliographic data
(e.g. inventor, applicant, priority information) was used only by humb.</p>
      <p>Only few groups had patent expertise at the beginning of the track. Aware of this
problem some groups started cooperation with patent experts, like for example Utasics
who are currently analysing patent experts’ query formulation strategies.
² Even though query and indexing time were not evaluation criteria, participants had to start
thinking about performance due to the large amount of data.
² Di erent strategies were applied for indexing/ranking on patent level. Several teams applied
the concept of virtual patent documents introduced by the organizers in the presentation of
topics for indexing a set of patent documents as a single entity.
² Some teams combined several di erent strategies in their systems: this was done on a large
scale by the humb team. cwi proposes a graphical user interface for combining search
strategies.
² The training set, consisting of 500 patents with relevance assessments, was used by almost all
of the participants, mostly for tuning and checking their strategies. humb used the training
set also for Machine Learning. For this aim, it showed to be too small and they generated
a larger one from the the citations available in the corpus.
² Having made the evaluation data available allowed many participants (among them Tud,
Utasics, Hildesheim, cle p-ug) to run additional experiments after the o cial evaluation.
They report on new insights obtained (e.g. further tuning and comparisons of approaches)
in their working notes papers.
5</p>
      <sec id="sec-15-1">
        <title>Results</title>
        <p>We evaluated the experiments by some of the most commonly used metrics for IR e ectiveness
evaluation. A correlation analysis shows that the rankings of the systems obtained with di erent
topic sizes can be considered equivalent. The manual assessments obtained from patent experts
allowed us to perform some preliminary analysis on the completeness of the automatically generated
set of relevance assessments.</p>
        <p>
          The complete collection of measured values for all evaluation bundles is provided in the Clef
Ip 2009 Evaluation Summary ([
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]). Detailed tables for the manually assessed patents will be
provided in a separate report ([
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]).
5.1
        </p>
        <sec id="sec-15-1-1">
          <title>Measurements</title>
          <p>After some corrections of data formats, we created experiment bundles based on size and task.
For each experiment we computed 10 standard IR measures:
² MAP
² nDCG (with reduction factor given by a logarithm in base 10)</p>
          <p>All computations were done with Soire10, a software for IR evaluation based on a service
oriented architecture. Results were double checked against trec_eval11, the standard program
for evaluation used in the Trec evaluation campaign, except for nDCG for which, at the time of
the evaluation, we were not aware of a publicly available implementation.</p>
          <p>10http://soire.matrixware.com
11http://trec.nist.gov/trec_eval</p>
          <p>MAP, recall@100 and precision@100 of the best run for each participant are listed in Table 4
and illustrated in Figure 3. The values were calculated on the small topic set. The MAP values
range from 0:0031 to 0:27 and are quite low in comparison with other CLEF tracks. The Precision
values are generally low, but it must be noted that the average topic had 6 relevant documents,
meaning that the upper boundary for precision@100 was at 0:06. Recall@100, a highly important
measure in prior art search, ranges from 0:02 to 0:57. It must be noted these low values might be
due to the incompleteness of the automatically generated set of relevance assessments.
In order to see whether the evaluations obtained with the three di erent bundle sizes (S, M,
XL) could be considered equivalent we did a correlation analysis comparing the vectors of MAPs
computed for each of the bundles.</p>
          <p>In addition to that, we also evaluated the results obtained by the track’s participants for the
12 patents that were manually assessed by patent experts. We evaluated the runs from three
bundles extracting only the 12 patents (when present) from each run le. We called these three
extra-small evaluation bundles and named them ManS, ManM, ManXL. Table 5 lists Kendall’s ¿
and Spearman’s ½ for all compared rankings.</p>
          <p>Figures 4 5 illustrates the correlation between pairs of bundles together with the best
leastsquares linear t.</p>
          <p>The rankings obtained with topic sets S, M, and L are highly correlated, suggesting that the
three bundles an be considered equivalent for evaluation purposes. As expected, the correlation
between S, M, XL and the respective ManS, ManM, ManXL rankings by MAP drops drastically.</p>
          <p>It must however be noted that the limited number of patents in the manual assessment bundle
(12) is not su cient for drawing any conclusion. We hope to be able to collect more data in the
future in order to assess the quality of our automatically generated test collection.
Patent experts marked in average 8 of the proposed patents as relevant to the seed patent. For a
comparison:
² 5:4 is the average number of citations for the 12 seed patents that were assessed manually
² for the whole collection, there are in average 6 citations per patent</p>
          <p>Furthermore, some of the automatically extracted citations (13 out of 34) were marked as not
relevant by patent experts. Again, in order to have some meaningful results a larger set of data is
needed.
6</p>
        </sec>
      </sec>
      <sec id="sec-15-2">
        <title>Lessons Learned and Plans for 2010</title>
        <p>In the 2009 collection only patent documents with data in French, English and German were
included. One area in which to extend the track for 2010 is provide additional patent data in more
European languages.</p>
        <p>Patents are organized in what are known as patent families . A patent might be originally
led in France in French, and then subsequently to ease enforcement of that patent in the United
States a related patent might be led in English with the US Patents and Trademarks O ce.
Although the full text of the patent will not be a direct translation of the French (for example
because of di erent formulaic legal wordings) the two documents may be comparable, in the sense
of a Comparable Corpus in Machine Translation). It might be that such comparable data will be
useful to participants to mine for technical and other terms. The 2009 collection does not lend
itself to this use and we will seek to make the collection more suitable for that purpose.</p>
        <p>
          For the rst year we measured the overall e ectiveness of systems. A more realistic evaluation
should be layered in order to measure the contribution of each single component to the overall
e ectiveness results as proposed in the GRID@CLEF track ([
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]) and also by [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Analysis of the
data should be statistical.
        </p>
        <p>The 2009 task was also somewhat unrealistic in terms of a model of the work of patent
professionals. Real patent searching often involves many cycles of query reformulation and results
review, rather than one o queries and results set. In 2010 we would like to move to a more
realistic model.
7</p>
      </sec>
      <sec id="sec-15-3">
        <title>Epilogue</title>
        <p>CLEF IP has to be regarded as a major success: looking at previous CLEF tracks we regarded
four to six groups as a satisfactory rst year participation rate. Fifteen is a very satisfactory
number of participants - a tribute to those who did the work and to the timeliness of the task and
data. In terms of retrieval e ectiveness the results have proved hard to evaluate: if there is an
over all conclusion the e ective combination of a wide range of indexing methods is best, rather
than a single silver bullet or wooden cross. However some of the results from groups other than
Humboldt University indicate that speci c techniques may work well: we look forward to more
results next year. Also it is unclear how well the 2009 task and methodology maps to what makes
a good (or better) system from the point of view of patent searchers - this is an area where we
clearly need to improve. Finally we need to be clear that a degree of caution is needed for what
is inevitably an initial analysis of a very complex set of results.</p>
        <sec id="sec-15-3-1">
          <title>Acknowledgements</title>
          <p>We would like to thank Judy Hickey, Henk Tomas and all the other patent experts who helped us
with manual assessments and who shared their know-how on prior art searches with us. Thanks to
Evangelos Kanoulas and Emine Yilmaz for interesting discussions on creating large test collections.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>European</given-names>
            <surname>Patent</surname>
          </string-name>
          <article-title>Convention (EPC)</article-title>
          . http://www.epo.org/patents/law/legal-texts.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] Guidelines for Examination in the European Patent O ce</article-title>
          . http://www.epo.org/patents/ law/legal-texts/guidelines.html,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Stephen</surname>
            <given-names>R</given-names>
          </string-name>
          <string-name>
            <surname>Adams.</surname>
          </string-name>
          <article-title>Information sources in patents</article-title>
          . K.G. Saur,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Harman</surname>
          </string-name>
          .
          <article-title>Dealing with multilingual information access: Grid experiments at trebleclef</article-title>
          . In Esposito F. In
          <string-name>
            <surname>Agosti</surname>
            , M. and
            <given-names>C</given-names>
          </string-name>
          . Thanos, editors,
          <source>Post-proceedings of the Fourth Italian Research Conference on Digital Library Systems (IRCDL</source>
          <year>2008</year>
          ),
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Atsushi</given-names>
            <surname>Fujii</surname>
          </string-name>
          , Makoto Iwayama, and
          <string-name>
            <given-names>Noriko</given-names>
            <surname>Kando</surname>
          </string-name>
          .
          <article-title>Overview of the Patent Retrieval Task at the NTCIR-6 Workshop</article-title>
          . In Noriko Kando and David Kirk Evans, editors,
          <source>Proceedings of the Sixth NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval</source>
          , Question Answering, and
          <string-name>
            <surname>Cross-Lingual Information</surname>
          </string-name>
          Access, pages
          <fpage>359</fpage>
          <lpage>365</lpage>
          ,
          <fpage>2</fpage>
          -1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan, May
          <year>2007</year>
          . National Institute of Informatics.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Graf</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Azzopardi</surname>
          </string-name>
          .
          <article-title>A Methodology for Building a Patent Test Collection for Prior art Search</article-title>
          .
          <source>In Proceedings of the Second International Workshop on Evaluating Information Access (EVIA)</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          and
          <string-name>
            <surname>H.</surname>
          </string-name>
          <article-title>M ller. Toward automated component-level evaluation</article-title>
          .
          <source>In SIGIR Workshop on the Future of IR Evaluation</source>
          , Boston, USA, pages pages
          <fpage>29</fpage>
          <lpage>30</lpage>
          .,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>David</given-names>
            <surname>Hunt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Long</given-names>
            <surname>Nguyen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Matthew</given-names>
            <surname>Rodgers</surname>
          </string-name>
          .
          <article-title>Patent searching : tools and techniques</article-title>
          . Wiley,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Noriko</given-names>
            <surname>Kando</surname>
          </string-name>
          and
          <string-name>
            <surname>Mun-Kew Leong</surname>
          </string-name>
          .
          <source>Workshop on Patent Retrieval (SIGIR 2000 Workshop Report)</source>
          .
          <source>SIGIR Forum</source>
          ,
          <volume>34</volume>
          (
          <issue>1</issue>
          ):
          <fpage>28</fpage>
          <lpage>30</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <article-title>Organisation for Economic Co-operation and Development (OECD)</article-title>
          .
          <source>OECD Patent Statistics Manual, Feb</source>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Florina</surname>
            <given-names>Piroi</given-names>
          </string-name>
          , Giovanna Roda, and
          <string-name>
            <given-names>Veronika</given-names>
            <surname>Zenz</surname>
          </string-name>
          .
          <article-title>CLEF-IP 2009 Evaluation Summary</article-title>
          .
          <source>July</source>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Florina</surname>
            <given-names>Piroi</given-names>
          </string-name>
          , Giovanna Roda, and
          <string-name>
            <given-names>Veronika</given-names>
            <surname>Zenz</surname>
          </string-name>
          .
          <article-title>CLEF-IP 2009 Evaluation Summary part II (in preparation)</article-title>
          .
          <source>September</source>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Giovanna</surname>
            <given-names>Roda</given-names>
          </string-name>
          , Veronika Zenz, Mihai Lupu, Kalervo J rvelin, Mark Sanderson, and
          <string-name>
            <surname>Christa</surname>
          </string-name>
          Womser-Hacker.
          <article-title>So Many Topics, So Little Time</article-title>
          .
          <source>SIGIR Forum</source>
          ,
          <volume>43</volume>
          (
          <issue>1</issue>
          ):
          <fpage>16</fpage>
          <lpage>21</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Shahzad</given-names>
            <surname>Tiwana</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ellis</given-names>
            <surname>Horowitz</surname>
          </string-name>
          .
          <article-title>Findcite automatically nding prior art patents</article-title>
          .
          <source>In PaIR '09: Proceeding of the 1st ACM workshop on Patent information retrieval. ACM</source>
          , to appear.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Ellen</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Voorhees</surname>
          </string-name>
          .
          <article-title>Topic set size redux</article-title>
          .
          <source>In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <fpage>806</fpage>
          <lpage>807</lpage>
          , New York, NY, USA,
          <year>2009</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Ellen</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Voorhees</surname>
            and
            <given-names>Chris</given-names>
          </string-name>
          <string-name>
            <surname>Buckley</surname>
          </string-name>
          .
          <article-title>The e ect of topic set size on retrieval experiment error</article-title>
          .
          <source>In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <fpage>316</fpage>
          <lpage>323</lpage>
          , New York, NY, USA,
          <year>2002</year>
          . ACM.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>