<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CLEF-IP 2011 Working Notes: Utilizing Prior Art Candidate Search Results for Refined IPC Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hyung-Kook Seo</string-name>
          <email>hkseo@wisenut.co.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kyouyeol Han</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaean Lee</string-name>
          <email>jalee@wisenut.co.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>2F Ottogi Center</institution>
          ,
          <addr-line>1009-1 Daechi-Dong, Kangnam-Gu, Seoul</addr-line>
          ,
          <country country="KR">Korea</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>For the refined IPC classification in the CLEF-IP 2011 task, we constructed classification system with KNN classification which uses PAC (Prior Art Candidate) search results as neighbors. We also slightly modified the neighborhood evaluation. We also furnished a simple PAC search system. We produced some running results both in PAC search and classification, and evaluated our system. Our test showed an improved result in the refined IPC classification.</p>
      </abstract>
      <kwd-group>
        <kwd>Prior Art Candidate Search</kwd>
        <kwd>IPC Categorization</kwd>
        <kwd>KNN Classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>1.1</p>
      <p>Test Collection and Topics
Like other participants in CLEF-IP 2011, we used only test the data collection which
comprises extracts of the MAREC dataset by IRF.</p>
      <p>We only used this collection in the entire process for producing running result of
the tasks we participated.
1.2</p>
      <p>
        CLEF-IP Tasks Participated
We participated in following three tasks in CLEF-IP 2011 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]:
      </p>
      <sec id="sec-1-1">
        <title>1. Prior Art Candidate (PAC) Search</title>
        <p>2. IPC Classification: up to subclass level
3. Refined IPC Classification: up to subgroup level, with given subclass value
As we stated before, our ultimate interests lie in refined IPC classification, and
there were small efforts to improve PAC search results. This is described in the next
chapter.
2</p>
        <p>Approaches to Refined Classification
Classification of a patent up to sub-class degree is quite difficult task for model-based
classification, because of sparseness of training samples in that level. So, we
implemented indirect (and simple) method, that implements KNN-like classification
using PAC search results.
2.1</p>
        <p>
          Existing Classification Approach
In patent classification, Kostar et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] proposed a method using the winnow
algorithm. Winnow is a mistake driven learning algorithm that computes for each
category a vector of weights for separating between relevant and irrelevant patent [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
In this study, they obtained F-measure of around 68% (multi-categorization). This
result was measured with a customized success criterion and relatively few documents.
        </p>
        <p>
          Fall et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] applied various machine learning algorithms to patent classification.
The machine learning algorithms were Naive Bayes(NB), SNoW, support vector
machines(SVM) and k-nearest neighbor(KNN) algorithms. Here SNoW is a variation
of the winnow algorithm. They investigated useful patent document fields to index,
and defined three measures of categorization success. As a result, they presented the
best precision of 41%(SVM), 39%(NB), 33%(NB) and 36%(SNoW) when the first
300 words are indexed at subclass level. In first three guesses KNN achieved the best
precision of 62% and all categories SVM achieved the best precision of 48%. They [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
also presented a customized language-independent text classification system for
categorization.
        </p>
        <p>When the amount of training data increses, a model-based system has increased
feature scale and time complexity. In order to reduce the feature scale, some of
researchers limited the number of documents, term selection, and length of the
documents. To reduce time complexity, other have attempted instance-based learning
such as KNN. It first selects K samples when the similarity values are sorted in
descending order, and then determines the categories of test sample with class
mapping method. It makes a trade-off between effectiveness and time complexity.</p>
        <p>
          W. Wang et al.[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] reported their experience in the NTCIR-7 Patent Mining
Task(MT) to classify patent documents according to the IPC taxonomy. Their
approach is based on the KNN algorithm using cosine and Euclid distance similarity.
And T. Xiao et al.[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] described their methodology that are used KNN and re-ranking
models. They achieved a mean average precision (MAP) of 48.86% when classifying
according to the subgroup level. Also [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] reported result of their experiments on the
automatic assignment of patent classification to research paper abstracts. The results
showed the best precision of 50.62% (MAP) when using formal run data and
particular query group.
        </p>
        <p>
          Lately, Y. Cai et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] presented a KNN text categorization method based on
shared nearest neighbor, effectively combining the BM25 similarity calculation
method and the Neighborhood Information of samples in the NTCIR-8 workshop.
BM25 is a bag-of-words retrieval function, combines the word frequency and
document frequency, balances the length of the document, and is a highly efficient
similarity calculation method. They conducted a comparison experiment on Japanese
corpus and English corpus provided by the National Institute of Informatics from
1993 to 2002, using the basic KNN and KNN based on shared nearest neighbors.
Compared to KNN method, KNN+SNN method showed 72.12% precision (about
0.03) higher at subclass levels and 36.93% precision at subgroup levels on English
corpus.
2.2
        </p>
        <p>
          Our IPC Classification Approach
As we know about KNN Classification, K nearest neighbors are K documents most
similar with the given query document to be classified [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          So it can be easily connected with search results. That is, top K search results with
query document can be directly adopted in KNN classification, and one system used
this simple method, though it was not so competitive with model-based classification
algorithms [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>In fact, at subgroup level we need about 70,000 categories to be trained, and most
classification models suffer from sparseness of training documents and problems in
system memory (for loading models) and processing time (training or classification
itself). But according to our experience in Korean patent domain, KNN classification
with PAC search showed quite good quality in classification of subgroup level.</p>
        <p>So we tried to construct refined IPC classification system utilizing PAC results.
(And IPC classification up to subclass level as well. In fact, we paid not so much
attention in the optimization or improvement in subclass level, because of limited
time)
2.3</p>
        <p>PAC Search Approach
We implemented a PAC search system using only selected weighted keywords which
are extracted from major content fields (title of invention, abstract, description,
claims).</p>
        <p>We added two additional efforts in PAC search to improve our results. They are
the following:</p>
        <sec id="sec-1-1-1">
          <title>Removing Non-Content Words</title>
          <p>A Document will be represented by a set of words which consist of content-word and
functional word. After POS tagging, we try to remove functional words and stop word.
While stop words are controlled by human input and cannot be automated, we can
algorithmically find words which don’t describe a particular document. (non-content
words).</p>
          <p>The standard probabilistic model for the distribution of a certain type of event over
units of a fixed size is the Poisson distribution.</p>
          <p>The most common model of the Poisson distribution in IR, the parameter
&gt; 0 is
the average number of occurrences of
per document: that is,
where
is the collection frequency and N is the total number of documents in the collection.
And we can get an approximation of DF by . As this model assumes
independence between term occurrences, its estimations are good for non-content
words. [12]</p>
        </sec>
        <sec id="sec-1-1-2">
          <title>Algorithm:</title>
          <p>1. Calculate
collection )
2. Calculate expected document frequency by Poisson distribution:
: collection frequency of i / N ( N is total number of document in the
3. Get the overestimation value: expected document frequency(i)/df(i)</p>
        </sec>
        <sec id="sec-1-1-3">
          <title>Parameter:</title>
          <p>1. Document Frequency Rate: Percentage of the number of document in the
collection</p>
          <p>2. Lower overestimation criteria and upper overestimation criteria</p>
        </sec>
        <sec id="sec-1-1-4">
          <title>Result:</title>
          <p>According to parameter tests, our system (in Korean) showed best result under
observed document frequency of 0.05%, lower overestimation criteria of 0.9 and
upper overestimation criteria of 1.0.</p>
          <p>In the CLEF-IP 2011, we’ll fix lower overestimation criteria as 0.9, and change
upper overestimation criteria from 1.0 to 1.3.
(1)</p>
        </sec>
        <sec id="sec-1-1-5">
          <title>Extracting Co-Occurrence Terms</title>
          <p>A null hypothesis is often stated by saying the parameter Θ is in a specified
subset 0 of the parameter space .</p>
          <p>Likelihood ratios are an approach to hypothesis testing. The likelihood function is
is a function of the parameter θ with x held fixed at the value that
was actually observed, i.e., the data. The likelihood ratio test statistic is [14]</p>
          <p>In applying the likelihood ratio test to collocation discovery, we examine the
following two alternative explanations for the occurrence frequency of a bigram
[13]</p>
          <p>We used the usual maximum likelihood estimates for p, p1 and p2 and write c1, c2,
and c12 for the number of occurrences of w1, w2 and w1w2 in the corpus:</p>
          <p>Now we can get likelihood ratio by assuming a binomial distribution and then
following is asymptotically distributed
(3)
(4)
(5)
(6)</p>
          <p>In the CLEF-IP 2011, we used confidence level of α=99.9. We tried both with
cooccurrence terms and without them in the runs.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>System Setup</title>
      <p>We implemented our system according to procedures to be explained in this chapter.</p>
      <p>We first extracted weighted keywords from each patent xml file provided in
CLEFIP 2011 corpora, combined them in several bulk files, and indexed them. For indexer
and searcher, we used Lucene. We implemented a simple searcher program
implemented in Java, and a final classifier program applying search results.
3.1</p>
      <p>Overall Architecture
Our system is illustrated in the following architecture diagram. It’s very similar with
traditional document search system.</p>
      <p>Patent
XML Files
Online
Translation</p>
      <p>XML</p>
      <p>Parser
Translator
Queries
(Translated)
Bulk Files</p>
      <p>POS</p>
      <p>Tagger
(English only)
Classifier
Application
Searcher
Application</p>
      <p>Feature
Extractor
Co-occurrence</p>
      <p>Terms
Extractor
Searcher
(Lucene)</p>
      <p>Bulk Files
(Features)</p>
      <p>Indexer
Application</p>
      <p>Indexer
(Lucene)</p>
      <p>Index
CLEF-IP System</p>
      <p>Indexer/Searcher</p>
      <p>We used our in-house English POS tagger for base English analysis.</p>
      <p>For translating other languages like French or German into English, we used open
online translation service, MyMemory [15].</p>
      <p>We used Lucene 3.1.0 for the base search engine, and for accessing this engine, we
wrote simple java applications for indexing and searching. We also wrote an
application for classification which calls the searcher application.
3.2</p>
      <p>Preprocessing and Indexing
Basically, we used a nearly identical preprocessing system except this time we used
English POS tagger instead of Korean one.</p>
      <p>We selected only one XML document among various versions of a same patent to
guarantee uniqueness of the patent, so that there’s no patent document with same
application number in the entire index system. After this process, we got 1,331,182
unique patents in the EP(European Patents), and 437,987 in the WO(WIPO Patents)</p>
      <p>During this procedure we also translated content fields with the online translation
service. After some sample runs, we discovered that translating full text would
consume a lot of time (and may lead to missing the CLEF-IP 2011 deadline), so we
only translated the abstract and select sentences(about 2048 characters) in other
content fields.</p>
      <p>For feature extraction, we used these content fields: Title of Invention, Abstract,
Description, and Claims. We also extracted some co-occurrence terms and select up
to 5 terms with extracted features.</p>
      <p>We finally produced bulk files with features with and without co-occurrence terms
(we call them co-terms for simplification) for indexing. And we produced two
separated indices, to analyze the effect of co-terms used in the search.</p>
      <p>We also preprocessed patents used as topics (queries). In this case, translation with
full content was conducted.
3.3</p>
      <p>Prior Art Candidate Search
We produced a total of 8 runs for search results. First 4 runs target index without
coterms, and other 4 runs target index with co-terms. In each group, we changed upper
overestimation threshold for non-content words removal from 1.0 to 1.3 (1.0, 1.1, 1.2,
1.3) resulting in 4 runs for each group.</p>
      <p>We produced 1,000 results for every patent query. The results are produced
without lower threshold in the weight of search results, so in most cases, our search
results per one document were almost 1,000 documents.
3.4</p>
      <p>Classification
Because we used search results of PAC search, we also produced 8 runs for each
classification task.</p>
      <p>For KNN classification, we set K as 1,000, because we produced 1,000 search
results per a query.</p>
      <p>In fact, we have observed that combining reciprocal of search results than just
counting the number of patents per category shows much better results. It’s similar to
adoption of weighting in the average precision [16]. We basically adopted this
improved weighting scheme in the KNN classification results we got.</p>
      <p>To verify this intuition, we also ran the base condition and compared this result
with improved weighting scheme applied in the CLEF-IP 2011 runs.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Results and Analysis</title>
      <sec id="sec-3-1">
        <title>Following is a simple report on our results.</title>
        <p>4.1</p>
        <p>PAC Search Results
We simply show the result of our runs along with best runs of CLEF-IP 2011.
CHEMNITZ.CUT_UHI_CLEFIP_BOW
HYDERABAD.</p>
        <p>TEXTRANK_IDFCITATIONALL
WISENUT_R1_BASE_PAC
WISENUT_R2_BASE_10_PAC
WISENUT_R3_BASE_30_PAC
WISENUT_R4_BASE_30_PAC
WISENUT_R5_CO_PAC
WISENUT_R6_CO_10_PAC
WISENUT_R7_CO_20_PAC
WISENUT_R8_CO_30_PAC</p>
        <p>Entire Language</p>
        <p>MAP SET_P SET_recal recal_5 recal_10 recal_20
0.0914 0.0037 0.4318 0.0896 0.1251 0.1635</p>
        <p>English Only</p>
        <p>MAP SET_P SET_recal recal_5 recal_10 recal_20
0.1009 0.0049 0.5233 0.0956 0.1401 0.1921</p>
        <p>We got slightly improved result when co-terms are applied. And the differences in
upper threshold in the non-content words extraction made no special differences.</p>
        <p>And due to multilingual issues, our result showed quite low quality. (It’s partially
displayed in English results that show quite narrower gaps with the top runners) We’ll
try to find alternatives to overcome this issue.
4.2</p>
        <p>IPC Classification Results
We also got the results of our classification runs and compared them with ones from
the other participant.
RUN_NAME
NIJMEGEN.RUN_ADMWCIT_CLS1
NIJMEGEN.RUN_ADMW_CLS1
WISENUT.WISENUT_R1_BASE_CLS1
WISENUT.WISENUT_R2_BASE_10_CLS1
WISENUT.WISENUT_R3_BASE_20_CLS1
WISENUT.WISENUT_R4_BASE_30_CLS1
WISENUT.WISENUT_R5_CO_CLS1
WISENUT.WISENUT_R6_CO_10_CLS1
WISENUT.WISENUT_R7_CO_20_CLS1
WISENUT.WISENUT_R8_CO_30_CLS1
set_P
0.5379
0.5436
0.2867
0.2871
0.2869
0.2871
0.2882
0.2883
0.2885
0.2884
set_recall</p>
        <p>set_F_1.0
0.8563
0.8506
0.838
0.8389
0.8384
0.8387
0.8366
0.8371
0.8376
0.8376
0.6168
0.6186
0.4021
0.4027
0.4024
0.4027
0.4027
0.4029
0.4032
0.4031</p>
        <p>Because we do not use model-based methods, our result showed lower result in the
precision. We also didn’t limit the score of classification result; if we tune the score
thresholds, it’s expected that we may produce much better results.
4.3</p>
        <p>Refined IPC Classification Results
Finally, our refined IPC classification results are displayed in the table below. If more
labs participated in this track, we may get a better perspective on our quality.
RUN_NAME
NIJMEGEN.RUN_WINNOW_WORDS_CLS2
WISENUT.WISENUT_R1_BASE_CLS2
WISENUT.WISENUT_R2_BASE_10_CLS2
WISENUT.WISENUT_R3_BASE_20_CLS2
WISENUT.WISENUT_R4_BASE_30_CLS2
WISENUT.WISENUT_R5_CO_CLS2
WISENUT.WISENUT_R6_CO_10_CLS2
WISENUT.WISENUT_R7_CO_20_CLS2
WISENUT.WISENUT_R8_CO_30_CLS2
set_P</p>
        <p>Whie simple comparison is quite dangerous, our system showed quite improved
results in this track.</p>
        <p>And as we stated before, we compared the new, refined weighting scheme (which
is applied in CLEF-IP 2011) with the base one. Following table shows that result.</p>
        <p>Refined scheme showed far better results than the base scheme, especially in
precision. MAPs were also dramatically improved. (Note that precision and recall
were micro-averaged, so they’re quite different from our reported values)</p>
        <p>
          Considering the result of [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], our result is very promising, because precision in
suggesting one IPC classification result showed almost the same or improved quality.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Future Work</title>
      <p>We implemented a simple refined IPC classification system utilizing search results
provided from PACS system. Though our PACS system showed rather lower
performance than those of other labs, our refined classification results based on the
search results of our system showed quite good performance, especially when the
criteria for category selection is changed.</p>
      <p>We left some challenges as future work.</p>
      <p>First, we can improve PACS search results. For example, we didn’t set threshold in
the score, only the maximum number of results. And we had a major problem in
search results due to multilingual defects of our system. We may improve these
problems at the next workshop.</p>
      <p>Second, we can adapt model-based classification up to subclass level. In fact, it’s
true that model-based classification method works well up to subclass level, so our
IPC classification system should use classification model like SVM does.</p>
      <p>Finally, we may optimize weighting factor of ranked documents in the refined IPC
classification. As just using reciprocal of ranks in the search results improved the
refined classification, it’s expected that adopting more sophisticated weighting factor
in the KNN can produce improved classification results.
12. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing, MIT</p>
      <p>Press, Cambridge (1999)
13. Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. In:</p>
      <p>Computational Linguistics 19. pp. 61--74. (1993)
14. Casella, G., Berger, R.L.: Statistical Inference, 2nd edition, p. 375. Duxbury Press (2001)
15. MyMemory, http://mymemory.translated.net
16. Average Precision, http://en.wikipedia.org/wiki/Information_retrieval#Average_precision</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          : CLEF- IP
          <year>2011</year>
          :
          <article-title>Track Guidelines</article-title>
          . IRF, Vienna (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Koster</surname>
            ,
            <given-names>C.H.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seutter</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beney</surname>
          </string-name>
          , J.:
          <article-title>Classifying Patent Applications with Winnow</article-title>
          .
          <source>In: Proceedings Benelearn Conference</source>
          , Antwerpen (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Littlestone</surname>
          </string-name>
          , N.:
          <article-title>Learning Quickly when Irrelevant Attributes Abound: A new LinearThreshold Algorithm</article-title>
          .
          <source>In: Machine Learning</source>
          , Vol.
          <volume>2</volume>
          , pp.
          <fpage>285</fpage>
          --
          <lpage>318</lpage>
          . Springer, Nethelands (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fall</surname>
            ,
            <given-names>C. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torcsvari</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benzineb</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Karetka</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Automated Categorization in the International Patent Classification</article-title>
          .
          <source>In: ACM SIGIR Forum</source>
          , Vol.
          <volume>37</volume>
          ,
          <string-name>
            <surname>Issue</surname>
          </string-name>
          <article-title>1</article-title>
          . ACM, New York (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Fall</surname>
            <given-names>C. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benzineb</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guyot</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torcsvari</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fievet</surname>
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Computer-Assisted Categorization of Patent Documents in the International Patent Classification</article-title>
          .
          <source>In: Proceedings of the International Chemical Information Conference (ICIC'03)</source>
          , Nimes, France (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
          </string-name>
          . C.
          <article-title>: ICL at NTCIR-7: A Improved KNN Algorithm for Text Categorization</article-title>
          .
          <source>In: Proceedings of NTCIR-7 Workshop Meeting</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Xiao</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
          </string-name>
          , H.:
          <article-title>KNN and Re-ranking Models for English Patent Mining at NTCIR-7</article-title>
          .
          <source>In: Proceedings of NTCIR-7 Workshop Meeting</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Mase</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iwayama</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Hitachi Ltd.: NTCIR-7
          <string-name>
            <given-names>Patent</given-names>
            <surname>Mining</surname>
          </string-name>
          <article-title>Experiments at Hitachi</article-title>
          .
          <source>In: Proceedings of NTCIR-7 Workshop Meeting</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <string-name>
            <given-names>A KNN</given-names>
            <surname>Research</surname>
          </string-name>
          <article-title>Paper Classification Method Based on Shared Nearest Neighbor</article-title>
          .
          <source>In: Proceedings of the 8th NTCIR Workshop Meeting</source>
          . pp.
          <fpage>336</fpage>
          --
          <lpage>340</lpage>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <article-title>k-nearest neighbor algorithm</article-title>
          , http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romary</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Experiments with Citation Mining and Key-Term Extraction for Prior Art Search</article-title>
          . In: CLEF-IP
          <year>2010</year>
          ,
          <string-name>
            <surname>Padua</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>