<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Pruning Texts with NLP and Expanding Queries with an Ontology : TagSearch</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gil FRANCOPOULO www.tagmatica.com</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>The basic line of our action is first to use natural language processing to prune the texts and the query, and secondly to use an ontology to expand the queries. Last year The system described here is based on the one used last year for CLEF-2002. But most components have been improved and some new steps have been added. The system whose name is TagSearch is based on three main components: - A chunker, named TagChunker1. - Lucene, a good OpenSource search engine written by Doug Cuting and his friends2. - An ontology, named TagDico3. The first two components were used last year. The use of an ontology is new.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>connected by a AND. But instead of building one big query, the combination of each expansion is built,
producing a lot of queries. For each query, the number of terms is computed and the evaluation starts with
the queries that has the lesser terms.</p>
      <p>Step-3: a query is built like in step-1, but instead of using AND, we use OR between words.</p>
      <p>Step-4: queries are built like in step-2, but instead of using AND, we use OR between groups.</p>
    </sec>
    <sec id="sec-2">
      <title>A few words on expansion</title>
      <p>The goal of expansion is to find documents that are on the subject we search but without exactly the word we
have in the query. For instance, in a query like “Les syndicats en Europe7” (query C156). Imaging a text in the
pool of text that is about “syndicats en Italie” but without the word Europe. If the word “Europe” is not
expanded, you cannot find this document. So meronymy expansion is the only possibility to find this document.</p>
    </sec>
    <sec id="sec-3">
      <title>A few words on document ranking</title>
      <p>The documents must be ranked. And a document that is found during step-1 must have a higher rank than the
one computed during step-2 or 3.</p>
      <p>I use the ranking produced by Lucene as a basis and I multiply this ranking by a number lesser that one in order
to reflect the query ranking.</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>As far as I know in reading the comparative results, our results seems to be not so bad. We are not in the worst
results.</p>
    </sec>
    <sec id="sec-5">
      <title>Future</title>
      <p>The lexicon being rather rich, a problem occurs: in case of polysemy, the system does not prune the meanings
that are described in the lexicon but that are not in the context of the sentence. That means that sometime the
expansion is noisy.</p>
      <p>The system could be improved by a semantic desambiguation component.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>