<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Use of WordNets for Multilingual Text Categorization: A Comparative Study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mohamed Amine Bentaallah</string-name>
          <email>mabentaallah@univ-sba.dz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mimoun Malki</string-name>
          <email>malki-m@yahoo.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>EEDIS Laboratory, Department of computer sciences Djillali Liabes University Sidi Bel Abbes</institution>
          ,
          <addr-line>22000.</addr-line>
          <country country="DZ">ALGERIA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2012</year>
      </pub-date>
      <fpage>121</fpage>
      <lpage>128</lpage>
      <abstract>
        <p>The successful use of the Princeton WordNet for Text Categorization has prompted the creation of similar WordNets in other languages as well. This paper focuses on a comparative study between two WordNet based approaches for Multilingual Text Categorization. The first relates on using machine translation to access directly the princeton WordNet while the second avoids machine translation by using the WordNet associated for each language.</p>
      </abstract>
      <kwd-group>
        <kwd>Multilingual</kwd>
        <kwd>Text Categorization</kwd>
        <kwd>WordNet</kwd>
        <kwd>Ontology</kwd>
        <kwd>Mapping</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>Multilingual Text Categorization</title>
      <p>Multilingual Text Categorization(MTC) is a new area in Text categorization in
which we have to cope with two or more languages (e.g English, Spanish and
Italian).</p>
      <p>
        MTC is a relatively new research topic, about which not much previous work
in the literature appears to be available. Most approaches have mainly addressed
different translation issues to solve the problem. R.Jalam et al. presented in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
three approaches for MTC that are based on the translation of documents toward
a language of reference. Rigutini et al. used in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] a machine translation system to
bridge the gap between different languages. The major disadvantage of Machine
translation based approaches is the absence of machine translation systems for
many language pairs and the wide gap between the translated documents and
original documents.
      </p>
      <p>
        In order to overcome the disadvantage of using machine translation systems,
many researches have been working on using linguistic resources such as bilingual
dictionaries and comparable corpora to induce correspondences between two
languages. A.Gliozzo and C.Strapparava propose in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] a new approach to solve the
Multilingual Text Categorization problem based on acquiring Multilingual
Domain Models from comparable corpora to define a generalized similarity function
(i.e. a kernel function) among documents in different languages, which is used
inside a Support Vector Machines classification framework. The results show
that the approach largely outperforms a baseline. K.Wu et al. proposed in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] a
novel refinement framework for cross-language text categorization investigating
the use of a bilingual lexicon to identify a novel model called domain alignment
translation model. Their approach can achieve comparable performance with the
machine translation approach using the Google translation tool, although their
experiments only consider the word level but ignore the base phrase.
      </p>
      <p>
        These last years, researches showed that using ontologies in monolingual
text categorization is a promising track. J.Guyot proposed in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] a new approach
that consists in using a multilingual ontology for Information Retrieval, without
using any translation. He tried only to prove the feasibility of the approach.
Nevertheless, it still has some limits because the used ontology is incomplete and
dirty. Intelligent methods for enabling concept-based hierarchical Multilingual
Text Categorization using neural networks are proposed in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. These methods
are based on encapsulating the semantic knowledge of the relationship between
all multilingual terms and concepts in a universal concept space and on using a
hierarchical clustering algorithm to generate a set of concept-based multilingual
document categories, which acts as the hierarchical backbone of a browseable
multilingual document directory. We have proposed in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] a new approach for
MTC based on spreading the use of WordNet in Text Categorization towards
MTC in order to reduce noises introduced by machine translation.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Description of the two proposed approaches</title>
      <p>As shown in figure 1, the two approaches are composed of three phases:
– Knowledge representation step;
– Training step;
– Predicting step.</p>
      <p>For our experiments, the two approaches have the same training and prediction
phases. The only difference is on the knowledge representation phase.
3.1</p>
      <sec id="sec-3-1">
        <title>Knowledge representation</title>
        <p>First approach The first approach consist on representing knowledge with
the use of the Princeton WordNet. The labelled documents are mapped directly
into the synsets of the princeton WordNet since they are expressed in English
language. The unlabelled documents needs to be translated into the English
language in order to be able to be mapped to the Princeton WordNet. The mapping
into the princeton WordNet consists in replacing each term in a document by its
most common meaning from the Princeton WordNet. We used a simple
disambiguation strategy that consists of considering only the most common meaning
of the term (first ranked element) as the most appropriate. Thus the synset
frequency is calculated as indicated in the following equation:
sf (ci, s) = tf (ci, {t ∈ T | f irst(Ref (t)) = s})
(1)
where:
– tf (ci, T ): the sum of the frequencies of all terms t ∈ T in the train
documents of category ci.</p>
        <p>– Ref (t): the set of all synsets assigned to term t in WordNet.</p>
        <p>Second approach The second approach excludes the direct use of machine
translation techniques by incorporating the WordNet associated for document
languages. Indeed, each term document will be firstly mapped to the WordNet
synsets of the language in which the document is expressed. As result, the
labelled documents and the unlabelled documents will be mapped on different
taxonomies. The labelled documents will be mapped to the Princeton WordNet,
and the unlabelled documents will be mapped to the WordNets associated to
unlabelled documents languages. It is necessary to match the taxonomies of all
the used WordNets to a common taxonomy in order to unify document
representations. Since the Princeton WordNet is the richest taxonomy, we have chosen
it to be the common taxonomy. This matching offers the following advantages:
– Avoiding the direct use of machine translation techniques which eliminate
the problem of translation disambiguation.
– Interconnecting the different WordNets to the most rich WordNet (Princeton</p>
        <p>WordNet) which resolves the richness of some WordNets.</p>
        <p>Formally,the synset frequency is calculated as indicated in the following equation:
sf (d, s) = tf (d, {t ∈ T | match(f irst(Ref (t, L))) = s})
where:
– tf (d, T ): the sum of the frequencies of all terms t ∈ T in the unlabelled
document d .
– L: The language of the unlabelled document d.
– Ref (t, L): the set of all synsets assigned to term t in WordNet associated to
language L.
– match(s): the corresponding synset of the synset s on the Princeton
WordNet.
Capturing relationships After mapping terms into Princeton WordNet synsets,
this step consists in using the WordNet hierarchy to capture some useful
relationships between synsets (hypernymy in our case).The synset frequencies will
be updated as indicated in the following equation:
sf (ci, s) =</p>
        <p>b∈H(s) sf (ci, b)</p>
        <sec id="sec-3-1-1">
          <title>Where:</title>
          <p>– ci: the ith category.
– b and s are synsets.</p>
          <p>– H(s): the hyponyms set of synset s
3.2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Training</title>
        <p>
          The training phase consists in using the labelled documents to create conceptual
categories profiles. Formally, each category will be represented by a conceptual
profile which contains the K better synsets (our features) characterizing best
the category compared to the others. For this purpose we used the χ2
multivariate statistic for feature selection. The χ2 multivariate [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], noted χmultivariate
2
is a supervised method allowing the selection of features by taking into
account not only their frequencies in each category but also interaction of features
between them and interactions between features and categories. Given the
matrix (synsets-categories) representing the total number of occurrences of the p
synsets in the m categories. The contributions of these synsets in discriminating
categories are calculated as indicated in the following equation, then sorted by
descending order for each category.
        </p>
        <p>Cjχk2 = N (fjkf−j.ffj..kf.k)2 × sign(fjk − fj.f.k)
(4)</p>
        <sec id="sec-3-2-1">
          <title>Where:</title>
          <p>– fjk = NNjk : the relative occurrence frequency.
– N : The total sum of the occurrences.</p>
          <p>– Njk: The frequency of the synset sj in the category ck.</p>
          <p>Once the contributions of synsets are calculated and ordered for each category,
the conceptual profile of each category contains the k first sorted synsets.
3.3</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Prediction</title>
        <p>
          The Prediction phase consists on using the conceptual categories profiles in
classifying unlabelled documents. Our Prediction phase consists of:
– Weighting the conceptual categories profiles and the conceptual vector of the
unlabelled document. In our experiments, we used the standard tf idf (term
frequency - inverse document frequency) function [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], defined as:
w(sk, ci) = tf idf (sk, ci) = tf (sk, ci) × log( df|(Cs|k) )
(5)
Where:
• tf (sk, ci) denotes the number of times synset sk occurs in category ci.
• df (sk) denotes the number of categories in which synset sk occurs.
• | C | denotes the number of categories.
– Calculating distances between the conceptual vector of the document and
all conceptual categories profiles and assigning the document to the category
whose profile is the closest with the document vector. In our experiments,
we used the dominant similarity measure in information retrieval and text
classification which is the cosine similarity that can be calculated as the
normalized dot product:
        </p>
        <p>Si,j = √</p>
        <p>s∈i j tfidf(s,i)×tfidf(s,j)
s∈i tfidf2(s,i)× s∈j tfidf2(s,j)
With:
s: a synset,
i and j: the two vectors (profiles) to be compared.
tf idf (s, i): the weight of the synset s in i.</p>
        <p>tf idf (s, j): the weight of the synset s in j.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental results</title>
      <sec id="sec-4-1">
        <title>Dataset for evaluation</title>
        <p>For our experimentations, we extracted a bilingual dataset from Reuters Corpus
Vol. 1 and 2 (RCV1, RCV2) using English training (RCV1) and Spanish test
documents (RCV2). Our dataset is based on topic (category) codes with a rather
varying number of documents per category as shown in Table1
For comparison, we have tested the two approaches on our multilingual dataset.
Experimental results reported in this section are based on the so-called ”F1
measure”, which is the harmonic mean of precision and recall.</p>
        <p>F1(i) = 2×precision×recall</p>
        <p>precision+recall</p>
        <p>The results of the experimentations are presented in Table2, Concerning the
profiles size, the best performances are obtained with size profile k = 900 for the
two approaches. Indeed, the performances improve more and more by increasing
the size of profiles.</p>
        <p>Comparing the results of the two approaches, the first approach largely
outperform the second approach.
(7)
In this paper, we have compared two approches for using WordNets for MTC,
The first approach is based on using machine translation to use the Princeton
WordNet while the second approach is based on replacing the use of machine
translation by incorporating a WordNet for each language. The results of the
experimentations show that the use of WordNets does not guarantee good results
rather than those obtained by the Princeton WordNet. Future works will concern
the experimentation of the second approach with differents WordNets in order
to be able to confirm the obtained results.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Jalam</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clesh</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rakotomalala</surname>
          </string-name>
          , R.:
          <article-title>Cadre pour la cat´egorisation de textes multilingues.7 `emes Journ´ees internationales d'Analyse statistique des Donn´ees Textuelles.Louvain-la-</article-title>
          <string-name>
            <surname>Neuve</surname>
          </string-name>
          ,
          <source>Belgique</source>
          (
          <year>2004</year>
          )
          <fpage>650</fpage>
          -
          <lpage>660</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Rigutini</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maggini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>An EM based training algorithm for CrossLanguage Text Categorization</article-title>
          .
          <source>Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence. Compiegne</source>
          , France.
          <year>September 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A Refinement Framework for Cross Language Text Categorization. 4th Asia Infomation Retrieval Symposium</article-title>
          ,
          <string-name>
            <surname>AIRS</surname>
          </string-name>
          <year>2008</year>
          , Harbin, China, (
          <year>2008</year>
          )
          <fpage>401</fpage>
          -
          <lpage>411</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gliozzo</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strapparava</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Cross Language Text Categorization by acquiring Multilingual Domain Models from Comparable Corpora</article-title>
          .
          <source>in Proceedings of the ACL Workshop on Building and Using Parallel Texts. Ann Arbor</source>
          , Michigan, USA (
          <year>2005</year>
          )
          <fpage>9</fpage>
          -
          <lpage>16</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Adeva</surname>
            ,
            <given-names>J. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvo</surname>
            ,
            <given-names>R. A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ipia</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          : Multilingual Approaches to Text Categorisation.
          <source>The European Journal for the Informatics Professional</source>
          ,Vol
          <volume>6</volume>
          , (
          <year>2005</year>
          )
          <fpage>43</fpage>
          -
          <lpage>51</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>: Machine learning in automated text categorization</article-title>
          .
          <source>ACM Computing Surveys</source>
          , (
          <year>2002</year>
          )
          <fpage>1</fpage>
          -
          <lpage>47</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sheridan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Acces multilingue aux syst`emes d'information</article-title>
          .
          <source>In: In 67th IFLA Council and General Conference</source>
          . (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Nunberg</surname>
          </string-name>
          , G.:
          <article-title>Will the Internet speak english?</article-title>
          .
          <source>The American Prospect</source>
          . (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Guyot</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radhouani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Falquet</surname>
          </string-name>
          , G.:
          <article-title>Ontology-based multilingual information retrieval</article-title>
          .
          <source>In CLEF Workhop</source>
          , Working Notes Multilingual Track, Vienna, Austria (
          <year>2005</year>
          )
          <fpage>21</fpage>
          -
          <lpage>23</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Bentaallah</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malki</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>WordNet based Multilingual Text Categorization</article-title>
          .
          <source>Journal of Computer Science</source>
          , Vol
          <volume>6</volume>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jian</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Gmo: A graph matching for ontologies</article-title>
          .
          <source>In Proceedings of the K-CAP 2005Workshop on Integrating Ontologies</source>
          , (
          <year>2005</year>
          )
          <fpage>41</fpage>
          -
          <lpage>48</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lacher</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groh</surname>
          </string-name>
          .G.:
          <article-title>Facilitating the exchange of explicit knowledge through ontology mappings</article-title>
          .
          <source>In Proceedings of the 14th International Florida Artificial Intelligence Research Society Conference (FLAIRS01)</source>
          , AAAI Press, (
          <year>2001</year>
          )
          <fpage>305</fpage>
          -
          <lpage>309</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Chau</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yeh</surname>
            ,
            <given-names>C.H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>A Neural Network Model for Hierarchical Multilingual Text Categorization</article-title>
          .
          <source>In proceeding of ISSN-05 Second International Symposium on Neural Networks, Chongqing</source>
          , China (
          <year>2005</year>
          )
          <fpage>238</fpage>
          -
          <lpage>245</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Chau</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yeh</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Multilingual Text Categorization for Global Knowledge Discovery Using Fuzzy Techniques</article-title>
          .
          <source>Proceedings of the 2002 IEEE International Conference on Artificial Intelligence Systems(ICAIS)</source>
          , (
          <year>2002</year>
          )
          <fpage>82</fpage>
          -
          <lpage>86</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ichise</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamasaki</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takeda</surname>
          </string-name>
          , H.:
          <article-title>Discovering relationships among catalogs</article-title>
          . In E. Suzuki and S. Arikawa, editors,
          <source>Proceedings of the 7th International Conference on Discovery Science (DS04)</source>
          , volume
          <volume>3245</volume>
          <source>of LNCS</source>
          , Springer, (
          <year>2004</year>
          )
          <fpage>371</fpage>
          -
          <lpage>379</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Nottelmann</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Straccia</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>A probabilistic, logic-based framework for automated web directory alignment</article-title>
          . In Zongmin Ma, editor,
          <source>Soft Computing in Ontologies and the Semantic Web, Studies in Fuzziness and Soft Computing</source>
          , Springer Verlag, (
          <year>2006</year>
          )
          <fpage>47</fpage>
          -
          <lpage>77</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>G.A.</given-names>
          </string-name>
          :
          <article-title>WordNet: An On-Line Lexical Database</article-title>
          . In Special Issue of
          <source>International Journal of Lexicography</source>
          , Vol
          <volume>3</volume>
          , No.
          <volume>4</volume>
          (
          <year>1990</year>
          )
          <fpage>238</fpage>
          -
          <lpage>245</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Furst</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trichet</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Axiom-based ontology matching</article-title>
          .
          <source>In Proceedings of the 3rd international conference on Knowledge capture (K-CAP 05)</source>
          , ACM Press, (
          <year>2005</year>
          )
          <fpage>195</fpage>
          -
          <lpage>196</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Do</surname>
            ,
            <given-names>H.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahm</surname>
          </string-name>
          , E.:
          <article-title>Coma - a system for flexible combination of schema matching approaches</article-title>
          .
          <source>In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB 02)</source>
          , (
          <year>2002</year>
          )
          <fpage>610</fpage>
          -
          <lpage>621</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verdejo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chugur</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cigarran</surname>
          </string-name>
          , J.:
          <article-title>Indexing with WordNet synsets can improve text retrieval</article-title>
          .
          <source>In: Proceedings of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems</source>
          . (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Ide</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veronis</surname>
          </string-name>
          , J.:
          <article-title>Introduction to the special issue on word sense disambiguation: The state of the art</article-title>
          .
          <source>Computational Linguistics</source>
          . (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Rahm</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          :
          <article-title>A survey of approaches to automatic schema matching</article-title>
          .
          <source>The VLDB Journal</source>
          ,
          <volume>10</volume>
          (
          <issue>4</issue>
          ), (
          <year>2001</year>
          )
          <fpage>334</fpage>
          -
          <lpage>350</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Euzenat</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A survey of schema-based matching approaches</article-title>
          .
          <source>Journal on Data Semantics, 4(LNCS 3730)</source>
          , (
          <year>2005</year>
          )
          <fpage>146</fpage>
          -
          <lpage>171</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pederson</surname>
            ,
            <given-names>J.O.:</given-names>
          </string-name>
          <article-title>A comparative study on feature selection in text categorization</article-title>
          .
          <source>In Proceedings of ICML-97, 14th International Conference on Machine Learning</source>
          . (
          <year>1997</year>
          )
          <fpage>412</fpage>
          -
          <lpage>420</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Salton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buckley</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Term-weighting approaches in automatic text retrieval</article-title>
          .
          <source>Information Processing and Management</source>
          . (
          <year>1988</year>
          )
          <fpage>513</fpage>
          -
          <lpage>523</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Salton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer</article-title>
          .Addison-Wesley (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>