<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Report on the CLEF-IP 2012 Experiments: Exploring Passage Retrieval with the PIPExtractor</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Linda Andersson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>andersson@ifs.tuwien.ac.at</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Allan Hanbury</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>hanbury@ifs.tuwien.ac.at</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Andreas Rauber</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Parvaz Mahdabi</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Lugano</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Vienna University of Technology</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2012</year>
      </pub-date>
      <abstract>
        <p>This technical report presents the work carried out for the Patent Passage Retrieval track of CLEF-IP 2012. Our aim was to create IR-Platform independent module for the Passage Retrieval process. For the Document retrieval Method - a Language Model based on IPC classes and for the Passage Retrieval a Passage Intellectual property Extractor (PIPExtractor) was implemented. Topics with the main language other than English were semi-manually translated by accessing the EPO Google Translation. We submitted five official runs one retrieving only document and four retrieving Passages.</p>
      </abstract>
      <kwd-group>
        <kwd>Passage Retrieval</kwd>
        <kwd>Patent Search</kwd>
        <kwd>Natural language Processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The CLEF-IP track started in 2009 with Prior Art Candidate Search track, since then
several different tasks have been explored e.g. text classification, Image Retrieval etc
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This year Passage Retrieval was introduced as the text mining task. Previously,
Passage Retrieval has been explored in NTCIR 4 and 5 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        For this Passage Retrieval track we set up two pre-conditions for our Passage
Retrieval module i) it should be IR-platform independent i.e. it should not be
incorporated in the indices; ii) we want to take the advantage of noun phrases which have
shown to be effective in Patent Document Retrieval [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        To use noun phrases as complement to bag-of-word method in IR is motivated by
the fact that technical dictionaries, in majority, consist of terms with more than one
word [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The technical multi-word phrases consist of noun phrases containing
adjective, nouns and occasionally preposition (e.g. „of‟).
      </p>
      <p>
        However, using NLP-application in order to improve IR results has not been
straight forward since sentence as “time is ripe to use Natural language processing for
information retrieval” [4, p1] has generally been followed by “the impact of NLP on
information retrieval task has largely been one of promise rather than substance” [5,
p99]. Research involving Information Retrieval (IR) and Natural language processing
(NLP) shows the that shallow linguistic methods such as stop word, stemmer, etc.
yield significant improvements, while deeper linguistic analyses such as Part-of –
Speech tagging, parsing, word sense disambiguation etc. could even decrease
accuracy [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Furthermore, as to use a NLP-application without any adaptations to the
patent domain would affect the performance of the application considerable [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        In experiment we use two-stage model costing of a Query model and a Passage
Model. Only open word classes (adjective, verbs and nouns) and noun phrases are
used in the similarity computation between paragraphs. The noun phrase extraction
methods re-uses the lexico-syntactic patterns used in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] with additionally pattern
including noun phrases with preposition.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Patent Retrieval</title>
      <p>
        The most used model in patent search is the Boolean retrieval model since it is
transparent and the model will generate high recall, if the query constructed by the expert
is well formed [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Here the search outcome lies in the hand of the searcher. The first
search task of a patent examiner when given a patent application is to identify
essential aspects and extract terms that can be used in the search query session. In real life
this search is not limited to patent – since no prior publication shall exist in order to
meet the uniqueness and novelty requirement.
      </p>
      <p>
        Additionally, to the general linguistic IR problems presented in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], previous
studies in the patent genre have observed that patent writers intentionally try to use
entirely different word combinations, not only synonyms, but also paraphrasing to
recreate “concept” [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        In the patent genre the selection of alternative concepts and search keys have
turned out to be more severe since a patent writer becomes his/her own lexicographist
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Furthermore, given the diachronical nature of the patent genre terms such as has
“LP” and “water closet” could be regarded as instances of obsolescence [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The
morphological variation of search keys in patent reflects in the high amount of
chemical formulae and morphological variation of foreign spelling e.g „sulfur-sulpur‟ and
aluminum- aluminium.
      </p>
      <p>
        The problem defined as referred and omitted search keys encompasses anaphoric
and elliptical keys (e.g. pronoun, acronyms etc) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In the patent genre both standard
and non-standard acronyms is used [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The search key ambiguity addresses
polysemy or homography in the patent genre phrases like “mouse trap” (a trap to catch
mouse, a logic device) addressing wide variety of concept in different technological
field, they are so called “shape shifter” [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Consequently, the patent collection
accentuates several linguistic problems which still general IR systems are struggling
with in other text genre such as term distribution diversity, vocabulary mismatch,
dealing with paraphrases (including hyponymy relation, synonym etc)
      </p>
      <p>
        Many Patent Retrieval studies have tried to address above IR-issues by applying
variety of linguistic knowledge like lexico-syntactic pattern extraction, creation of
domain semantic annotation and using ontologies. The studies have target different a
range of application from handle phrase indexing [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and query reduction or
expansion, semantic annotation [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] to sentence decomposition [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and readability [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        In Mahdabi et al a Part-of-Speech tagger was used for noun phrase extraction. The
extracted were based upon manually observed lexico-syntactic patterns [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The noun
phrases were then combined with different weight schema. Even though, errors were
generated by the Part-of-Speech tagger when combining the extracted noun phrases
with statistical methods the errors was minimized.
2.1
      </p>
      <p>
        Vocabulary characteristics of the Patent genre
Patent documents are associated with several interesting characteristics such as huge
differences in length, strictly formalized document structure (both semantic and
syntactic), acronyms and new terminology [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. When comparing general language
resources (CLEX lexicon 160,568 English terms) with a patent corpus of 10,000
documents coverage on distinct word type (excluding chemical formulae and numerals)
was 60% [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        A patent document consists of four main textual components (title, abstract,
description, and claim) which intention is to fulfill different communication goals. The
abstract section gives short and general summery, broad terms are generally used. The
description section gives elaborative background information on the invention, here
prior art in the field is mentioned. The claim has its own very special conceptual,
syntactic and stylistic/rhetorical structure and need to compose the essential
component of the invention to make patent infringement difficult [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Our Approach</title>
      <p>
        Data, the corpus used in the Passage Retrieval track is consistence with the CLEF-IP
2011 collection both containing WO patent and EPO patent. The claim segments used
as topics were extracted from 58 different patent application documents –generating
105 different topics. The claims segment used as topic was manually selected based
on existing search reports. For the Passage the xpaths was used as Qrels.
Method, for the Document retrieval Method - a Language Model based on IPC
classes was used (for detailed description see [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]). The Passage Intellectual Property
Extractor (PIPExtractor) was implemented in Perl and consists of a two-stage method:
a query model and a passage model. The query model consisted of
two-dimensionedmatrix computing cosine similarity values pair wise for each sentence in the topic
document in order increase query terms. The first dimension generated a cosine value
based upon bag-of-word (only considering adjective, verbs and nouns) between
sentences; the second dimension generated a cosine value based upon common noun
phrases - a modification of technique used in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>In the second stage a four-dimension-matrix was used generating cosine values for
word and noun phrases in the original topic claim sentence and word and noun
phrases used as query expansion keys. The computation across document boundaries was
conducted per sentence; paragraph containing several sentences received a summation
values. The term frequency was used as weight technique.</p>
      <p>
        Topics with the main language other than English were semi-manually translated
by accessing the EPO Google Translation. All documents used as topics were
Part-ofSpeech tagged with the Stanford Part-of-Speech tagger (using the
english-left3wordsdistsim.tagger model) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The noun phrase was extracted based upon 201
lexicosyntactic pattern include noun phrases with preposition „of‟ and participle used as
adjectives.
      </p>
      <p>In the official run only the TF of noun phrase and open word classes were used
both in the Query model and in the passage model. For each retrieved passage four
different cosine values were generated; and then summed up in order to establish one
value per retrieved passage.</p>
      <p>Unfortunately, a late error in or script – generate incorrect format of xpath which
explain the exceptionally low performance compared to the other participants runs.
The script error was only discovered after the submission deadline. Therefore, we
have chosen to present the corrected version of the official runs in this report.</p>
      <p>In this report we also present three additional experiments using TF-IDF (inverse
document frequency) and a stemmer (Porter) on both noun phrases and open word
classes. The IDF was calculated within the retrieved documents generated of the
Document Retrieval Method.</p>
      <p>The baseline is generated by the Document Retrieval Model only listing retrieved
document. Four different combinations were deployed at the passage level:
1. TF-Sum
2. The TF-Sum value was divided by the position rank value given by the Document
retrieval model
3. Additional weight (0.2) for the noun phrases was given in calculation
4. TF-IDF and a Porter stemmer on word and noun phrases were deployed.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>The official runs was also carried out on all 105 topics however since we only
translated topic the overall performance on document was moderate with a MAP value of
0.0383 and PRES@100 of 0.1810 and Recall@100 of 0.1810.</p>
      <p>The official runs for the Passage Retrieval track did not find any relevant passage
for one runs and other runs had a MAP(D) value below 0.0002 and below 0.0006 for
Precision (D) – this was caused by a error in the script generating the xpath. Therefore
in the table 1 we present the official runs when the xpath has been corrected.</p>
      <p>Using only TF as weight technique decrease the ranking order considerable on the
document level compared to the baseline. When using the ranked position value from
the Document retrieval method the drop between baseline performances is reduced,
significantly. When experiment with TF-IDF and Porter the performance improves on
the document level but have negative effect on the Passage Retrieval Level.
Combining all method (1.2.3.4) both the performance on document level as well as on
Passage Retrieval level decreased.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion and Conclusion</title>
      <p>Our aim for the Passage Retrieval task was to construct a module independent of
IRPlatform and use the power of noun phrases to improve the performance. By using
TF-IDF as weight schema and allow stemming we increase the discrimination power
of words and phrases and reduce the affect of the morphological variation of search
keys – generating an improvement on document level but it affected the Passage
Retrieval performance negatively.</p>
      <p>The paradox in the patent genre is that there is a large amount of data (e.g. higher
frequency, lager document etc) but there is also issues related to data sparseness such
as selection of alternative concepts and search keys, referred and omitted search keys
and search key ambiguity.</p>
      <p>The paradox is partly language depended, since in English combines common
terms in order to create new terminology. At the same time the data sparseness
occurs since each part of the new terminology can be substituted with synonyms or just
have a different morphological suffix. When exploring Patent Retrieval consisting of
English patent documents there is a need to identify the noun phrases boundaries as
well handle the data sparseness in terms of stemming and expanding with synonyms.
However, there are several pitfalls that can make this process perform lesser than
good since we are depending on NLP-applications which are time consuming to adapt
to the patent genre since it requires both linguistic knowledge as well as domain
knowledge.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>F.</given-names>
            <surname>Prior</surname>
          </string-name>
          and
          <string-name>
            <given-names>J</given-names>
            <surname>Tait</surname>
          </string-name>
          , CLEF-IP
          <year>2010</year>
          :
          <article-title>Retrieval Experiments in the Intellectual Property Domain</article-title>
          .
          <article-title>Workshop of the Cross-Language Evaluation Forum, LABS and Workshops</article-title>
          ,
          <string-name>
            <given-names>Notebook</given-names>
            <surname>Papers</surname>
          </string-name>
          .
          <year>2010</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A</given-names>
            <surname>Fujii</surname>
          </string-name>
          .,
          <string-name>
            <given-names>M.</given-names>
            <surname>Iwayama</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Kando</surname>
          </string-name>
          ,.
          <article-title>Introduction to the special issue on patent processing</article-title>
          .
          <source>Inf. Process. Manage</source>
          .
          <volume>43</volume>
          ,
          <issue>5</issue>
          (
          <year>September 2007</year>
          ),
          <fpage>1149</fpage>
          -
          <lpage>1153</lpage>
          .
          <article-title>(2007</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>P.</given-names>
            <surname>Mahdabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Andersson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Keikha</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <article-title>Automatic refinement of patent queries using concept importance predictors</article-title>
          .
          <source>In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (SIGIR '12)</source>
          . ACM, New York, NY, USA,
          <fpage>505</fpage>
          -
          <lpage>514</lpage>
          .
          <year>2012</year>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Lease</surname>
          </string-name>
          ,
          <article-title>Natural Language Processing for Information Retrieval: the time is ripe (again)</article-title>
          .
          <source>In Proceedings of the 1st Ph.D. Workshop at the ACM Conference on Information and Knowledge Management (PIKM)</source>
          .
          <year>2007</year>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Smeaton</surname>
          </string-name>
          ,
          <article-title>Using NLP or NLP resources for information retrieval task</article-title>
          . In T. Strzalkowski, editor,
          <source>Natural language information retrieval</source>
          . Kluwer Academic Publisher, Dredrecth, NL,
          <fpage>99</fpage>
          -
          <lpage>111</lpage>
          .
          <year>1999</year>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>T.</given-names>
            <surname>Brants</surname>
          </string-name>
          ,
          <source>Natural Language Processing in Information Retrieval</source>
          .
          <source>In Proceedings of the 14th Meeting of Computational Linguistics in the Netherlands</source>
          .
          <year>2003</year>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>S.</given-names>
            <surname>Sheremetyeva</surname>
          </string-name>
          .,
          <article-title>Natural language analysis of patent claims</article-title>
          .
          <source>In Proceedings of the ACL2003 workshop on Patent corpus processing - Volume 20 (PATENT '03)</source>
          , Vol.
          <volume>20</volume>
          . Association for Computational Linguistics, Stroudsburg, PA, USA,
          <fpage>66</fpage>
          -
          <lpage>73</lpage>
          .
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. S. van Dulken,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Free patent databases on the Internet: a critical view</article-title>
          ,
          <source>World Patent Information -Volume</source>
          <volume>21</volume>
          (
          <issue>4</issue>
          ); p
          <fpage>253</fpage>
          -
          <lpage>257</lpage>
          .
          <year>1999</year>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Hedlund</surname>
          </string-name>
          ,,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pirkola</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Järvelin</surname>
          </string-name>
          , ,
          <article-title>Aspects of Swedish morphology and semantics from the perspective of mono- and cross-language information retrieval Information Processing</article-title>
          and Management - Volume
          <volume>37</volume>
          (
          <issue>1</issue>
          ),
          <fpage>147</fpage>
          -
          <lpage>161</lpage>
          .
          <year>2001</year>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>K. H. Atkinson</surname>
          </string-name>
          ,
          <article-title>Toward a more rational patent search paradigm</article-title>
          .
          <source>In Proceedings of the 1st ACM workshop on Patent information retrieval (PaIR '08)</source>
          . ACM, New York, NY, USA,
          <fpage>37</fpage>
          -
          <lpage>40</lpage>
          .
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>C. G. Harris</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Arens</surname>
            and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Srinivasan</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <article-title>Using Classification Code Hierarchies for Patent Prior Art Searches</article-title>
          .
          <source>Current Challenges in Patent Information Retrieval. The Information Retrieval Series</source>
          , Vol.
          <volume>29</volume>
          .
          <string-name>
            <surname>Lupu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Mayer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Tait</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Trippe</surname>
            ,
            <given-names>A.J</given-names>
          </string-name>
          . (
          <source>Eds.) 1st Edition</source>
          ,
          <year>2011</year>
          , XIV, 402 p.
          <year>2011</year>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>D'hondt</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verberne</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alink</surname>
            <given-names>W.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Cornacchia</surname>
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2011</year>
          )
          <article-title>Combining document representations for prior-art retrieval</article-title>
          .
          <source>Workshop of the CLEF-IP2011</source>
          ,
          <article-title>LABS and Workshops</article-title>
          , Notebook Papers.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. L.
          <string-name>
            <surname>Wanner</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Baeza-Yates</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Brügmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Codina</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Diallo</surname>
            , E. Escorsa,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Giereth</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Kompatsiaris</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Papadopoulos</surname>
            , E. Pianta,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Gemma</surname>
            ,
            <given-names>I. Puhlmann</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gautam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rotard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schoester</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          , Serafini and
          <string-name>
            <given-names>V.</given-names>
            <surname>Zervaki</surname>
          </string-name>
          ,
          <string-name>
            <surname>Towards</surname>
          </string-name>
          content-oriented
          <source>patent document processing World Patent Information -Volume</source>
          <volume>30</volume>
          (
          <issue>1</issue>
          ),
          <fpage>21</fpage>
          -
          <lpage>30</lpage>
          .
          <year>2008</year>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>P.</given-names>
            <surname>Parapatics</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Dittenbach</surname>
          </string-name>
          ,
          <article-title>Patent Claim Decomposition for Improved Information Extraction</article-title>
          . In W. B Croft.,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lupu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mayer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tait</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.A.</given-names>
            <surname>Trippe</surname>
          </string-name>
          , (eds.)
          <source>Current Challenges in patent Information Retrieval</source>
          , Springer Berlin Heidelberg. 2011
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>L. S Larkey</surname>
          </string-name>
          ,
          <article-title>A patent search and classification system</article-title>
          ,
          <source>In Proceedings of the 4th ACM conference on Digital libraries</source>
          , (pp
          <fpage>179</fpage>
          -
          <lpage>187</lpage>
          ), (Berkeley, California,
          <string-name>
            <given-names>United</given-names>
            <surname>States</surname>
          </string-name>
          .
          <year>1999</year>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>N.</given-names>
            <surname>Oostdijk</surname>
          </string-name>
          , H. van Halteren,
          <string-name>
            <surname>E. D'hondt</surname>
            <given-names>E</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Verberne</surname>
          </string-name>
          , Genre and Domain in Patent Texts.
          <source>In Proceedings of the The 3rd International Workshop on Patent Information Retrieval (PAIR) at CIKM</source>
          <year>2010</year>
          , pages
          <fpage>39</fpage>
          -
          <lpage>46</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>P.</given-names>
            <surname>Mahdabi</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Andersson.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          ,
          <article-title>Report on the CLEF-IP 2011 Experiments: Exploring Patent Summarization, Workshop of the Cross-Language Evaluation Forum, LABS and Workshops</article-title>
          ,
          <string-name>
            <given-names>Notebook</given-names>
            <surname>Papers</surname>
          </string-name>
          .
          <year>2011</year>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>K.</given-names>
            <surname>Konishi</surname>
          </string-name>
          ,
          <article-title>Invalidity patent search system of NTT DATA</article-title>
          .
          <source>In Working Notes of the Fourth NTCIR Workshop Meeting</source>
          .
          <fpage>250</fpage>
          --
          <lpage>255</lpage>
          .
          <year>2004</year>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Manning</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Singer</surname>
          </string-name>
          ,
          <article-title>Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network</article-title>
          .
          <source>In Proceedings of HLT-NAACL</source>
          <year>2003</year>
          , pp.
          <fpage>252</fpage>
          -
          <lpage>259</lpage>
          .
          <year>2003</year>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Justeson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Katz</surname>
          </string-name>
          , (
          <year>1995</year>
          ).
          <article-title>Technical terminology: some linguistic properties and an algorithm for identification in text</article-title>
          .
          <source>Natural Language Engineering</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <fpage>9</fpage>
          -
          <lpage>27</lpage>
          .
          <year>1995</year>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>