<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Patent classi cation experiments with the Linguistic Classi cation System LCS in CLEF-IP 2011</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Suzan Verberne</string-name>
          <email>s.verberne@cs.ru.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eva D'hondt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Foraging Lab Radboud University Nijmegen</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We report the results of a series of classi cation experiments with the Linguistic Classi cation System LCS in the context of CLEF-IP 2011. We participated in the main classi cation task: classifying documents on the subclass level. We investigated (1) the use of di erent sections (abstract, description, metadata) from the patent documents; (2) adding dependency triples to the bag-of-words representation; (3) adding the WIPO corpus to the EPO training data; (4) the use of patent citations in the test data for reranking the classes; and (5) the threshold on the class scores for class selection. We found that adding full descriptions to abstracts gives a clear improvement; the rst 400 words of the description also improves classi cation but to a lesser degree. Adding metadata (applicants, inventors en address) did not improve classi cation. Adding dependency triples to words gives a much higher recall at the cost of a lower precision but this e ect is largely due to the class selection threshold. We did not nd an e ect from adding the WIPO corpus, nor from reranking with patent citations. In future work, we plan to investigate whether there are other methods for reranking with patent citations that does give an improvement, because we feel that the citations may still give valuable information. Our most important nding however is the importance of the threshold on the class selection. For the current work, we only compared two values for the threshold and the results are much better for 1.0 than for 0.5. The 0.5 threshold gives higher recall in all runs, which was the original motivation for submitting runs with a lower threshold. However, because the much lower precision, the F-scores are lower. We think that there is still some improvement to be gained from proper tuning of the class selection threshold, and the use of a exible threshold (also taking into account the di erent text representations). This is part of our future work.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>In this paper, we describe the classi cation experiments that we conducted in the context of the
Intellectual Property (IP) track at CLEF 2011 (CLEF-IP1). In 2009, the track was organized for
the rst time with a prior art retrieval task. In 2010, a classi cation task was added to the track.
In 2011, this task was continued and extented with a new optional sub-task, which is to classify a
given patent document up to the subgroup level, when the subclass is given. We only participated
in the main classi cation task: classifying documents on the subclass level.</p>
      <p>The goal of the classi cation task at CLEF-IP is to classify a given patent document,
according to the International Patent Classi cation system (IPC). For the purpose of the track, the
organization released a collection of 2.6 million patent documents from the European Patent
Ofce (EPO), extended with 400,000 documents from the World Intellectual Property Organization
(WIPO). These 3 Million documents with content in English, German and French pertain to over
1 Million patents.2 From the collection, 1,000 documents (the `topics') per language were held out
1 http://www.ir-facility.org/clef-ip
2 A patent is the name for a group of patent documents that relate to the same invention; they have the
same patent ID number.
as test set. The remainder of the corpus constitutes the target data, on which participants could
develop their methods.</p>
      <p>In this notebook paper, we describe our classi cation experiments with the Linguistic
Classi cation System LCS. We only performed mono-lingual classi cation, training and evaluating
our models on English texts only. We evaluate a number of classi cation variables: (1) the use
of di erent patent sections, (2) adding dependency triples to the bag-of-words representation, (3)
expanding the EPO training corpus with WIPO documents, (4) using patent citations to rerank
the selected classes, and (5) tuning the threshold on class selection.</p>
      <p>In Section 2, we describe the data selection, data preparation and the classi cation settings
used. The results from the classi cation experiments are presented in Section 3, followed by our
conclusions in Section 4.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Classi cation experiments with LCS</title>
      <p>
        For our classi cation experiments, we used the Linguistic Classi cation System (LCS)3 [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. The
LCS can perform both mono-classi cation (each document is assigned exactly one class label) and
multi-classi cation. In the training phase, the LCS takes as input a le which list the paths to
the classi cation les followed by their classes. After this training phase the LCS can be used for
testing the trained classi er on a test collection of documents with known classes (usually held-out
training data), or for producing a classi cation of new documents without known classes.
      </p>
      <p>
        Three classi ers have been implemented in the LCS: Naive Bayes, Winnow and SV M light. Last
year [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], we experimented with both Winnow and SV M light and we found that their classi cation
accuracy scores are comparable but that SV M light is much slower. Therefore, we decided to use
Winnow for this year's CLEF-IP experiments. Winnow has a number of parameters that can be
tuned: , and maxiters (the number of training iterations). Based on the tuning we did last
year, we decided to use = 1:02, = 0:98 and maxiters = 10.
      </p>
      <p>In our classi cation experiments, we compared the following experimental settings:
1. The use of di erent sections (abstract, description, metadata) from the patent documents;
2. The use of di erent document representations for classi cation, adding dependency triples to
the bag-of-words representation.
3. The training corpus selection: EPO only, or EPO and WIPO together;
4. The use of patent citations in the test patents for reranking the assigned classes;
5. The threshold on the class scores for class selection.</p>
      <p>We will explain how we prepared the experiments for each of these comparisons in the following
subsections.
2.1</p>
      <p>Corpus preparation: extracting IPC classes and sections
From all patents in the target data, we extracted the information needed for classi cation: the
IPCR classes, the textual content from the English abstract and description; and applicants, inventors
en address as additional metadata. For each patent, we selected the most recent version which
contains all the information needed.4 Table 1 shows the size of the training corpus when particular
patent sections are included. We allowed the abstract to be empty if either the description or the
metadata sections contains content. As a result, the subcorpus `abstract and metadata' is the
largest: 1,3M documents, some of which only contain metadata.</p>
      <p>
        We separately extracted the rst 400 words of the description because the experiences from other
participants in last year's workshop [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] taught us that the head of the description is a good
alternative to the complete description, which may be too heavy to classify due to its length. We
conducted experiments to validate this assumption.
3 A demo of the application can be found at http://ir-facility.net/news/linguistic-classi
cation-systemprototype/ for registered IRF members.
4 E.g. in the corpus directory EP/000000/00/59/01/, EP-0005901-A3.xml is newer than
EP-0005901A2.xml and both are newer than EP-0005901-B1.xml.
      </p>
      <p>
        Di erent document representations: adding triples to words
In CLEF-IP 2010, we experimented with the addition of dependency triples to the bag-of-words
representation, which is generally used in text classi cation. The results on the 2010 test set were
mixed [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] but in follow-up experiments [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we consistently found a signi cant improvement in
F-score when we added dependency triples to the word-based representation of patent abstracts.
      </p>
      <p>
        This year, we again investigated the improvement that can be gained from adding dependency
triples to the bag of words, but we did not limit ourselves to classi cation of abstracts. We parsed
the abstracts and the rst 400 words of the descriptions with the AEGIR dependency parser [
        <xref ref-type="bibr" rid="ref4 ref7">4, 7</xref>
        ]
version 1.8.2. AEGIR's output representation is comparable to the Stanford typed dependencies
representation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], in the sense that it generates a set of binary relations between words for an
input sentence, thereby converting some function words (such as prepositions) to relations. In
addition to that, AEGIR performs a number of normalizing transformations, such as
passive-toactive transformation. For example, the clause \an in ammatory reaction, caused by the bowel
tissue" leads to the same analysis as \the bowel tissue causes an in ammatory reaction". An
example of the triple representation can be found in Figure 1 below [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>Original text words triples
Heat is stored heat is stored [IT,SUBJ,store] [store,OBJ,heat]
at a steady at a steady [store,PREPat,temperature]
temperature using temperature using [temperature,ATTR,steady]
calcium chloride calcium chloride [temperature,DET,a] [chloride,ATTR,calcium]
hexahydrate and hexahydrate and [hexahydrate,ATTR,chloride]
up to 20 percent up to percent [hexahydrate,ATTR,using] [up,PREPto,20 percent]
strontium chloride strontium chloride [assist,OBJ,crystallization]
hexahydrate hexahydrate [chloride,ATTR,strontium]
to assist to assist [hexahydrate,ATTR,chloride]
crystallisation. crystallisation [hexahydrate,SUBJ,assist]
In text classi cation, system performance usually goes up when the size of the training set
increases. While the CLEF-IP test set only consisted of documents from the EPO corpus, we
investigated if adding documents from another corpus, namely the WIPO, to the EPO training set
led to improvements in classi cation accuracy. We added the WIPO corpus to two of our section
subcorpora: abstracts and description, and abstracts, description and metadata. Table 2 shows the
resulting document counts for the training corpora. From the table it is clear that in the WIPO
corpus, there are fewer documents with the metadata elds applicants, inventors en address than
in the EPO corpus.</p>
      <p>The use of patent citations for reranking the classes
Some of the patent les (topics) in the test set contain citations to other EPO patents. We used
these citations to rerank the LCS output using the following procedure:
1. For each topic, we extracted the patents that are cited by the topic (labelled as patcit in the</p>
      <p>XML le);
2. We looked up each of the citations in the training corpus and extracted their IPC-R classes.</p>
      <p>We found that 562 of the 1,000 topics contains at least one cited patent with one or more
IPC-R classes.
3. These `citation classes' get a vote each time they occur in a cited patent. A vote is worth 1.0
in addition to the LCS score.</p>
      <p>For example, in one of the experiments, LCS selected
and assigned them the following scores:
ve classes for the topic EP-1223323-A2,
Of these, F01N (1x), B60K (2x) and B60W (2x) occur in the citations of EP-1223323-A2. Their
classi cation score is increased by the number of times they occur in the citations, and the list of
classes is re-ranked:</p>
      <p>EP-1223323-A2
EP-1223323-A2
EP-1223323-A2
EP-1223323-A2
EP-1223323-A2</p>
      <p>F01N
F02D
B60W
B60K</p>
      <p>F02N
EP-1223323-A2
EP-1223323-A2
EP-1223323-A2
EP-1223323-A2
EP-1223323-A2</p>
      <p>F01N
B60W
B60K
F02D
F02N</p>
      <p>
        The threshold on the class scores for class selection
In the case of multi-classi cation, LCS is exible with respect to the number of classes that are
returned per document. Internally, it produces a full ranking of classes for each document in the
test set. The user can regulate the selection of classes with three parameters: (1) a threshold that
puts a lower bound on the classi cation score for a class to be selected, (2) the maximum number
of classes selected per document (`maxranks') and (3) the minimum number of classes selected
per document (`minranks'). In the experiments on the target data, we kept the selection threshold
to 1.0 (which is the default). Based on the average number of classes per document in the target
data (2.7 according to [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]), we decided to set maxranks = 4. Setting minranks = 1 assures that
each document is assigned at least one class, even if all classes have a score below the threshold.
      </p>
      <p>In the submitted runs on the test data, we decided to lower the class selection threshold to 0.5
because the value of 1.0 gives an average of 1.8 classes per test document; setting it at 0.5 gives
an average of 3.2 classes. The latter seemed wiser for a recall-oriented task. Also, we increased
maxranks to 5. In additional experiments, we evaluated the results for a threshold of 1.0 against
the results for the threshold of 0.5.
ad400WT abs, desc400 words+triples
ad400WTcit abs, desc400 words+triples
ad400WT1 abs, desc400 words+triples
aWT abs words+triples
amWTcit abs, meta words+triples
amWT abs, meta words+triples
aW abs words
aWcit abs words
amW abs, meta words
ad400W abs, desc400 words
admWOW abs, desc, meta words
admWOWcit abs, desc, meta words
admWcit abs, desc, meta words
admW abs, desc, meta words
adW abs, desc words
aW1 abs words
adWcit abs, desc words
adWOW abs, desc words
adWOWcit abs, desc words
amW1 abs, meta words
aWT1 abs words+triples
amWT1 abs, meta words+triples
admWOW1 abs, desc, meta words
admWOWcit1 abs, desc, meta words
ad400W1 abs, desc400 words
admW1 abs, desc, meta words
adW1 abs, desc words
adWcit1 abs, desc words
adWOW1 abs, desc words
adWOWcit1 abs, desc words
3</p>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>
        For training the classi cation models, we used the target data with the exception of the 2000
most recent documents in the training corpus, which we used as test set in the development stage.
A complete overview of the results on the real test data (the 1,000 topics provided by the track
organization) is shown in Table 3. As opposed to last year, when we measured standard deviations
over multiple runs of the same experiment, we only performed each experiment once this year.
Our results on the 2010 data showed that standard deviations are small and even small di erences
in the results tend to be signi cant because of the large data set [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Figures 2{6 at the end of the paper show the e ects of di erent sections, text representation,
corpus selection, patent citations and class selection threshold respectively (the ve experimental
variables that we compare).</p>
      <p>
        Figure 2 shows that adding the description to the abstract gives a clear improvement in
classi cation accuracy: from 0.54 to 0.62 in F-score. The e ect of adding the rst 400 words of the
description instead of the complete description, is smaller, giving an F-score of 0.60. Surprisingly,
adding metadata (applicants, inventors en address) to the abstracts and descriptions does not give
any improvement. This is in contrast with last year's results, when some participants reported
signi cant improvement from adding applicants, inventors en address as metadata [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Figure 3 shows that adding dependency triples to the bag-of-words representation has an
e ect but whether this is a positive e ect highly depends on the evaluation measure used. Recall
is higher for the words+triples representation but this comes at the cost of a much lower precision.
The experimental setting with the lowest F-score of all, ad400WT, has the highest recall of all
runs (0.87). We had a look at the full ranking of the classes and found that for the runs with
triples, the class scores are generally higher. This means that more classes get a score above the
xed threshold of 0.5 (in fact, the average number of classes selected per patent for ad400WT is
5.0, which is the maximum number of selected classes). As a result, recall is higher and precision
is lower.</p>
      <p>Figure 4 shows that there is no e ect of adding the WIPO documents to the EPO training
corpus. More data generally gives better classi cation results, but in this task and using this data,
increasing the number of documents from 650K to 905K did not generate any e ect.</p>
      <p>Figure 5 shows that the use of patent citations in the test data for reranking the classes has
no visible e ect either. We plan to investigate whether there are other methods for reranking with
patent citations that does give an improvement, because we feel that the citations may still give
valuable information.</p>
      <p>Figure 6 shows that the threshold on the class scores for class selection is highly important
for the evaluation scores. For the current work, we only compared two values for the threshold,
0.5 and 1.0, and it is clearly visible that the results are much better for 1.0 than for 0.5. The 0.5
threshold gives higher recall in all runs, which was the original motivation for submitting runs
with a lower threshold. However, because the much lower precision, the F-scores are lower. The
default LCS threshold of 1.0 clearly is the better choice here. We think that there is still some
improvement to be gained from proper tuning of the class selection threshold, and the use of a
exible threshold (also taking into account the di erent text representations). This is part of our
future work.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>We reported the results of a series of classi cation experiments in the context of CLEF-IP 2011. We
investigated (1) the use of di erent sections (abstract, description, metadata) from the patent
documents; (2) adding dependency triples to the bag-of-words representation; (3) adding the WIPO
corpus to the EPO training data; (4) the use of patent citations in the test data for reranking the
classes; and (5) the threshold on the class scores for class selection.</p>
      <p>We found that adding full descriptions to abstracts gives a clear improvement; the rst 400
words of the description also improves classi cation but to a lesser degree. Adding metadata
(applicants, inventors en address) did not improve classi cation. Adding dependency triples to
words gives a much higher recall at the cost of a lower precision but this e ect is largely due to
the class selection threshold. We did not nd an e ect from adding the WIPO corpus, nor from
reranking with patent citations. Our most important nding is the importance of the threshold
on the class selection. Our future work will be directed at tuning this threshold.</p>
      <p>1
abs
abs
words+triples
abs</p>
      <p>words</p>
      <p>P
R
F1
P
R
F1
yes no yes no yes
abs abs, desc abs, desc abs, desc, meta abs, desc, meta</p>
      <p>Sections and reranking with citations (yes/no)
P
R
F1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.5
abs
1.0
abs</p>
      <p>0.5 1.0 0.5 1.0 0.5 1.0
abs, meta abs, meta abs, desc abs, desc abs, desc, abs, desc,</p>
      <p>meta meta
Sections and threshold for class selection (1.0 or 0.5)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>M.C. De Marne</surname>
            e and
            <given-names>C.D.</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
          </string-name>
          .
          <article-title>The Stanford typed dependencies representation</article-title>
          .
          <source>In Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation</source>
          , pages
          <fpage>1</fpage>
          <lpage>{</lpage>
          8. Association for Computational Linguistics,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>C.H.A.</given-names>
            <surname>Koster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Seutter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Beney</surname>
          </string-name>
          <article-title>. Multi-classi cation of patent applications with Winnow</article-title>
          .
          <source>Lecture Notes in Computer Science</source>
          , pages
          <volume>545</volume>
          {
          <fpage>554</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cornelis</surname>
            <given-names>H. A.</given-names>
          </string-name>
          <string-name>
            <surname>Koster</surname>
            , Jean G. Beney, Suzan Verberne, and
            <given-names>Merijn</given-names>
          </string-name>
          <string-name>
            <surname>Vogel</surname>
          </string-name>
          .
          <article-title>Phrase-Based Document Categorization</article-title>
          . In W. Bruce Croft, Mihai Lupu, Katja Mayer, John Tait, and Anthony J. Trippe, editors,
          <source>Current Challenges in Patent Information Retrieval</source>
          , volume
          <volume>29</volume>
          of The Kluwer International Series on Information Retrieval, pages
          <volume>263</volume>
          {
          <fpage>286</fpage>
          . Springer Berlin Heidelberg,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Nelleke</given-names>
            <surname>Oostdijk</surname>
          </string-name>
          , Suzan Verberne, and
          <string-name>
            <surname>Cornelis</surname>
            <given-names>H.A.</given-names>
          </string-name>
          <string-name>
            <surname>Koster</surname>
          </string-name>
          .
          <article-title>Constructing a broad coverage lexicon for text mining in the patent domain</article-title>
          .
          <source>In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC</source>
          <year>2010</year>
          ).
          <source>European Language Resources Association (ELRA)</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Florina</given-names>
            <surname>Piroi</surname>
          </string-name>
          and
          <string-name>
            <surname>John Tait. CLEF-IP</surname>
          </string-name>
          <year>2010</year>
          :
          <article-title>Retrieval Experiments in the Intellectual Property Domain</article-title>
          .
          <source>In CLEF 2010 LABs and Workshops Notebook Papers</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>S.</given-names>
            <surname>Verberne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vogel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Dhondt</surname>
          </string-name>
          .
          <article-title>Patent classi cation experiments with the Linguistic Classication System LCS</article-title>
          .
          <source>In Proceedings of the Conference on Multilingual and Multimodal Information Access Evaluation (CLEF</source>
          <year>2010</year>
          ), CLEF-IP workshop,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Suzan</given-names>
            <surname>Verberne</surname>
          </string-name>
          , Eva D'hondt, Nelleke Oostdijk, and
          <string-name>
            <surname>Cornelis</surname>
            <given-names>H.A.</given-names>
          </string-name>
          <string-name>
            <surname>Koster</surname>
          </string-name>
          .
          <article-title>Quantifying the Challenges in Parsing Patent Claims</article-title>
          .
          <source>In Proceedings of the 1st International Workshop on Advances in Patent Information Retrieval (AsPIRe</source>
          <year>2010</year>
          ), pages
          <fpage>14</fpage>
          {
          <fpage>21</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>