<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UNIBA-SENSE at CLEF 2008: SEmantic N-levels Search Engine</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pierpaolo Basile</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Annalina Caputo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Semeraro</string-name>
          <email>semerarog@di.uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>General Terms</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Information Retrieval, Performance, Experimentation</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science - Univerisity of Bari</institution>
          ,
          <country country="IT">ITALY</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents evaluation experiments conducted at the University of Bari for the Ad-Hoc Robust WSD task of the Cross-Language Evaluation Forum (CLEF) 2008. The evaluation was performed using SENSE (SEmantic N-levels Search Engine) [2]. SENSE tries to overcome the limitations of the ranked keyword approach by introducing semantic levels, which integrate (and not simply replace) the lexical level represented by</p>
      </abstract>
      <kwd-group>
        <kwd>H</kwd>
        <kwd>3 [Information Storage and Retrieval]</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>1 Indexing methods</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>3 Retrieval models</kwd>
        <kwd>H</kwd>
        <kwd>3</kwd>
        <kwd>4 Performance evaluation (e ciency and e ectiveness)</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Information Retrieval (IR) systems are generally concerned with the selection of documents, from
a xed collection, which satisfy a user's one-o information need (query). The traditional search
strategy performed by IR systems is ranked keyword search: For a given query, a list of documents,
ordered by relevance, is returned. Relevance computation is primarily driven by a string-matching
operation: If any query word is found in a document belonging to the collection, a match is made
and the document is considered as relevant.</p>
      <p>Ranked keyword search has been quite successful in the past, in spite of its obvious limits
basically due to polysemy, the presence of multiple meanings for one word, and synonymy, di erent
words having the same meaning. The result is that, due to synonymy, relevant documents can be
missed if they do not contain the exact query keywords, while, due to polysemy, wrong documents
could be deemed as relevant. These problems call for alternative methods that work not only at
the lexical level of the documents, but also at the meaning level.</p>
      <p>Any attempt to work at the meaning level must solve the problem that, while words occur in
a document, meanings do not, since they are often hidden behind words. For example, for the
query \apple", some users may be interested in documents dealing with \apple" as a \fruit", while
some other users may want documents related to \Apple computers". Some linguistic processing
is needed in order to provide a more powerful \interpretation" both of the user needs behind the
query and of the words in the document collection. This linguistic processing may result in the
production of semantic information that provide machine readable insights into the meaning of
the content.</p>
      <p>
        As shown by the previous example, named entities (people, organizations, locations, etc.)
mentioned in the documents constitute important part of their semantics. Therefore, in our
interpretation, semantic information could be captured from a text by looking at word meanings,
as they are described in a reference dictionary (e.g. WordNet [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]), as well as named entities.
      </p>
      <p>We propose an IR system which manages documents indexed at multiple separate levels:
keywords, senses (word meanings), and entities. The system is able to combine keyword search with
semantic information provided by the two other indexing levels. In particular, for each level:
1. a local scoring function is used in order to weigh elements belonging to that level according
to their informative power;
2. a local similarity function is used in order to compute document relevance by exploiting the
above-mentioned scores.</p>
      <p>Finally, a global ranking function is de ned in order to combine document relevance computed at
each level.</p>
      <p>The rest of the paper is structured as follows. The N-levels model used in SENSE is described
in Section 2, while Section 3 presents an overview of the meaning level. A brief description of the
global ranking function is given in Section 4, followed by the details of the system setup for the
CLEF competition in Section 5. Finally the experiments are described in Section 6. Conclusions
and future work close the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>N-levels model</title>
      <p>
        According to [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], an IR model is a quadruple:
      </p>
      <p>hD; Q; F; R(q; d)i</p>
      <p>D is a set composed of logical views (or representations) for the documents in the collection;
Q is a set composed of logical views (or representations) for user information needs. Such
representations are called queries;
F is a framework for modeling document representations, queries, and their relationships;
R(q; d) is a ranking function which associates a real number with a query q 2 Q and a
document representation d 2 D. Such a ranking de nes an ordering among the documents
with respect to the query q.</p>
      <p>For the classic vector model, the framework F is composed of a t-dimensional vectorial space
and standard linear algebra operations on vectors. In this model, tf-idf schemes are used to weigh
index terms in documents D and queries Q. Term weights are used to compute the degree of
similarity between each document d in the collection and the user query q, according to a ranking
function R(q; d).</p>
      <p>
        We propose an extension of the classical Vector Space Model [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] called N-levels model in which
documents are represented at di erent levels. Each level corresponds to a logical view that aims
at describing one of the possible spaces in which documents are represented.
      </p>
      <p>The lexical space of the vector model is retained and semantic spaces are added, each one
providing di erent information about the meaning of the content.</p>
      <p>Each level is described by means of a speci c type of features, where each feature is de ned as
a prominent attribute or aspect of the document. The model is currently implemented in SENSE,
SEmantic N-levels Search Engine, an IR system in which three di erent levels are considered,
corresponding to as many di erent types of features: keywords, word meanings and named entities.
Each document at each level is represented by a Bag-of-Features (BOF), a vector of weights
assigned to homogeneous features.</p>
      <p>More formally, given D = fd1; : : : ; djDjg the document collection and N the number of levels,
the document dk is represented by N vectors:</p>
      <p>BoF!ki = (w1i;k; : : : ; wjiVij;k) i = 1; : : : ; N
(1)
where Vi is the vocabulary of the features at the i-th level, and wmi;k is the weight of the m-th
feature at the i-th level for document dk.</p>
      <p>Analogously, N query vectors (one for each level) are used for representing queries. The N
query vectors are not necessarily extracted simultaneously from the original keyword query issued
by the user: A query vector can be obtained when needed. For example, the ranked list of
documents for the query \Apple growth" might contain documents related to both the growing of
computer sales by Apple Inc. and the growth stages of apple trees. Then, when the system collects
the user feedback (for instance, a click on a document in which \Apple" has been recognized as a
named entity), the query vector for the named entities level will be produced.</p>
      <p>Given these representations, we need to de ne a strategy to compute both wmi;k and R(q; d)
(the degree of similarity between query and document). The weighting scheme for computing
wmi;k must be di erent for each type of feature. The adoption of a simple adjustment of tf-idf
for semantic levels would result in a loss of the semantics that we want to capture by our model.
More advanced strategies should be adopted in order to take into account the inherent informative
power of each speci c kind of feature. We call these strategies local scoring functions. Section
3.1 describes the local scoring function de ned for weighting features (represented as WordNet
synsets) at the word meaning level (hereafter meaning level, for brevity). As regards queries,
binary weights are adopted.</p>
      <p>We have also to extend the notion of relevance R(q; d) in order to enhance keyword search
with semantic information. Therefore, the degree of similarity between q and d must be evaluated
at each level by de ning a proper local similarity function that computes document relevance
according to the weights de ned by the corresponding local scoring function. Section 3.2 describes
the local similarity function de ned for the word meaning level. Since the nal goal is to obtain a
single list of documents ranked in decreasing order of relevance, a global ranking function is needed
to merge all the result lists that come from each level. This function is independent of both the
number of levels and the speci c local scoring and similarity functions because it takes as input
N ranked lists of documents and produces a unique merged list of the most relevant documents.
Section 4 describes the adopted global ranking function.</p>
      <p>To meet all the requirements of the proposed model, we implemented an extension of the
Lucene API 1. Lucene is a full-featured text search engine library that implements the vector
space model.</p>
    </sec>
    <sec id="sec-3">
      <title>Meaning Level</title>
      <p>In SENSE, features at the meaning level are synsets obtained from WordNet, a semantic lexicon
for the English language. It groups English words into sets of synonyms called synsets, provides
short general de nitions (glosses), and records various semantic relations between these synonym
sets. WordNet distinguishes between nouns, verbs, adjectives and adverbs because they follow
di erent grammatical rules. Each synset is assigned with a unique identi er and contains a set of
synonymous words or collocations; di erent senses of a word are in di erent synsets. The meaning
of a synset is further clari ed by short de ning glosses. A typical example of a synset with gloss
is:</p>
      <p>01722233 good, right, ripe { (most suitable or right for a particular purpose; \a good time
to plant tomatoes"; \the right time to act"; \the time is ripe for great sociological changes")</p>
      <p>In order to assign synsets to words, the original system adopted a Word Sense Disambiguation
(WSD) strategy. In the case of CLEF, the system used the synsets provided by the organizers
of the Ad-Hoc Robust WSD task. The provided documents contain for each word a list of the
possible synsets with a con dence factor. We use this factor to weigh the synset in the meaning
index structure.</p>
      <p>The idea behind the adoption of WSD is that each document is represented, at the meaning
level, by the senses conveyed by the words in its content, together with their respective occurrences.
Documents are represented by using a synset-based vector space. Consequently, the BOF at the
meaning level is indeed a bag-of-synsets. The vocabulary at this level is the set of distinct synsets
recognized by the WSD procedure in the collection, while wmi;k in (1) is the weight of the m-th
synset for document dk, computed according to the local scoring function de ned in the following
section.
3.1</p>
      <sec id="sec-3-1">
        <title>Synset Scoring Function</title>
        <p>Given a document di and its synset representation computed by the WSD procedure, X =
[s1; s2; : : : ; sk], the basic idea is to compute a partial weight for each sj 2 X, and then to
improve this weight by nding out some relations among synsets belonging to X.</p>
        <p>The partial weight, called sfidf (synset frequency, inverse document frequency), is computed
according to a strategy resembling the tf-idf score for words:
sfidf(sj; di) =
tf(sj; di)
| {z nj
synset freque}ncy | I{DzF }
log j C j
where j C j is the total number of documents in the collection and nj is the number of documents
containing the synset sj. tf(sj; di) computes the frequency of sj in document di.</p>
        <p>Finally, the synset con dence factor ( ) is used to weigh the sfidf value. Thus, the nal local
score for synset sj in di is:</p>
        <p>sfidf(sj; di) (1 + )
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Synset Similarity Function</title>
        <p>The local similarity functions for both the meaning and the keyword levels are computed using a
modi ed version of the LUCENE default document score. For the meaning level, both query and
document vectors contain synsets instead of keywords. Given a query q and a document di, the
synset similarity is computed as:
synsim(q; di) = C(q; di)</p>
        <p>X (sfidf(sj; di)(1 + ) N (di))
sj2q
are computed as described in the previous section;
(2)
(3)
(4)
C(q; di) is the number of query terms in di.</p>
        <p>N (di) is a factor that takes into account document length normalization;
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Global Ranking</title>
      <p>Given a query q, each local similarity function produces a local ranked list of relevant documents.
All the local lists must be merged in order to give a single ranked list to the user. The global
ranking function is devoted to this task.</p>
      <p>
        Algorithms for merging ranked lists are widely used by meta-search engines, which send user
requests to several search engines and aggregate results into a single list [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Our strategy for
de ning the global ranking function is thus inspired by prior work on meta-search engines.
      </p>
      <p>Formally, we de ne:</p>
      <p>U : the universe, that is the set containing all the distinct documents in the local lists;
j =f x1 x2
documents, S
: : :
U ,
xn g: the j-th local list, j = 1; : : : ; N , de ned as an ordered set S of
is the ranking criterion de ned by the j-th local similarity function;
j (xi): a function that returns the position of xi in the list j ;
s j (xi): a function that returns the score of xi in j ;
w j (xi): a function that returns the weight of xi in j .</p>
      <p>Two di erent strategies can be adopted to obtain w j (xi), based on the score or the position
of xi in the list j . Since local similarity functions may produce scores varying in di erent ranges,
and the cardinality of lists can be di erent, a normalization process (of scores and positions) is
necessary in order to produce weights that are comparable.</p>
      <p>
        The aggregation of lists in a single one requires two steps: The rst one produces the N
normalized lists and the second one merges the N lists in a single one denoted by ^. The two
steps are thoroughly described in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. After tuning experiments we choose to adopt Z-Score
normalization and ComSUM respectively as score normalization and rank aggregation function.
In particular the Z-Score normalization is computed using the following formula:
      </p>
      <p>Regarding ComSUM list aggregation method, the score of document xi in the global list is
computed by summing all the normalized scores for xi:
w j (xi) = s j (xi)
s j</p>
      <p>s j
(xi) = P j2R w j (xi)
where R is the set of all local lists.
5</p>
    </sec>
    <sec id="sec-5">
      <title>System Setup</title>
      <p>We adopted the SENSE model to build our IR system for CLEF evaluation. We used two di erent
levels: keyword level using word stems and word meaning level using WordNet synsets. All the
SENSE components involved in the experiments are implemented in JAVA using the last available
version of Lucene API (2.3.2). Experiments were run on an Intel Core 2 Quad processor at 2.4
GHz, operating in 32 bit mode, running Linux (UBUNTU 7.10), with 2 GB of main memory.</p>
      <p>In according to CLEF guidelines we performed two di erent tracks of experiments: Ad-Hoc
Mono-language and Cross-language. Each track required two di erent evaluations: with and
without synsets. We exploited several combination of levels and queries expansion methods,
especially for the meaning level. All query expansion methods are automatic and do not require
manual operations. Moreover we used di erent boosting factors for each eld contained into the
topic. In this way we give more importance to the terms in the elds TITLE and DESCRIPTION.</p>
      <p>In particular for the Ad-Hoc Mono-language track we performed the following runs:
1. MONO1TDnus2f : the query is built using word stems in the elds TITLE and
DESCRIPTION of the topics. All query terms are joined adopting the OR boolean operator. The
terms in the TITLE eld are boosted using a factor 2;
2. MONO11nus2f : similar to the previous run but in this case we add the NARRATIVE
eld and adopt di erent term boosting values: 4 for TITLE, 2 for DESCRIPTION and 1
for NARRATIVE. These boost factors are used for all the following runs;
3. MONO12nus2f : for this instance we adopt the Lucene Phrase Query in addition to the
query expansion described in MONO11nus2f. This kind of queries are able to exploit terms
proximity in the computation of relevance score. We build proximity query using the terms
contained into the TITLE and DESCRIPTION elds. In detail: for TITLE we build a
proximity query using all the terms into the eld, while for DESCRIPTION we build a
proximity query for each sentence;
4. MONO13nus2f : as the previous run but we adopt a di erent strategy to build Phrase
Query. We exploit PoS-tag in order to build proximity queries. We produce a proximity
query for each sequence of PoS-tags that matches the following patterns:
adjective-nounverb, verb-adjective-noun, verb-noun, noun-verb and adjective-noun. For example into the
sentence: 'The wrapping artist Christo took two weeks...' we build a proximity query using
the following terms: \artist Christo took\ ;
5. MONO14nus2f : this experiment adopts a combination of all the previous methods;
6. MONOwsd1nus2f : the query is built expanding the synsets in the TITLE and
DESCRIPTION elds of the topics. This run exploits the hypernyms and hyponyms. In particular,
we include only the direct hyponyms and the hypernyms that have a path less or equal to
two. For synsets we adopt a di erent boost factor taking into account both the eld and
synsets distance;
7. MONOwsd11nus2f : in this instance each word is expanded using the whole set of synsets
in WordNet and we compute a boosting factor using the ZIPF distribution that approximates
properly the natural distribution of meanings. The ZIPF formula is:
f (k; N ; s) = PN1=ks
n=1 1=ns
k is the synset rank. The synsets in WordNet are ranked according to the their frequency
in a reference corpus;
s is the value of exponent characterizing the distribution: after tuning experiments we
set s equal to 2.
8. MONOwsd12nus2f : in this experiment we exploit the N-levels architecture of SENSE.</p>
      <p>For keyword level we adopt the query expansion described in MONO14nus2f and for word
meaning level the MONOwsd1nus2f;
9. MONOwsd13nus2f : as the previous run but, for the word meaning level we adopt the
method described in MONOwsd11nus2f.</p>
      <p>For the Ad-Hoc Cross-language track we performed the following runs:
1. CROSS1TDnus2f : the query is built using word stems in TITLE and DESCRIPTION
elds of the topics. In Cross-language track the topics are in Spanish, thus a translation
of terms in English is required. The SENSE system was not developed for Cross-language
retrieval and in this instance we adopted a very trivial method in order to translate the
query in English. We exploited WordNet dictionary to translate a word. In detail, we query
Spanish WordNet using the Spanish word ws and retrieve the whole set of synsets S related
to the word ws; then we use the set S to query English WordNet and retrieve, for each synset
in S, the set of the English synonyms We. Finally, we build the query using the words in
We. The boost factors have the same values used in the Mono-language track;
2. CROSS1nus2f : as described in the previous run adding the NARRATIVE eld;
3. CROSSwsd1nus2f : in this case we adopt the same method presented in MONOwsd1nus2f
but we use directly the synsets in Spanish Topic. It is important to notice that terms in a
Spanish query are disambiguated using the rst sense in Spanish WordNet;
4. CROSSwsd11nus2f : in this instance we exploit the N-levels architecture. For the keyword
level we adopt the method described in CROSS1nus2f and for word meaning level the method
proposed in MONOwsd1nus2f;
5. CROSSwsd12nus2f : this run di ers from the CROSSwsd11nus2f for the use of a di erent
Spanish-English translation method. We use directly the Spanish WordNet synset instead of
the Spanish word. We query English WordNet using the synsets into the topic and retrieve,
for each synset, the set of synonymous English words.</p>
      <p>For all the runs we remove the stop words from both the index and the topics. In particular,
we built a di erent stop words list for topics in order to remove non-informative words such as
nd, reports, describe that occur with high frequency in topics and are poorly discriminating.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Experimental Session</title>
      <p>The experiments were carried out on the CLEF Ad-Hoc WSD Robust dataset derived from the
English CLEF data, which comprises corpora from \Los Angeles Times" and \Glasgow Herald",
amounting to 166; 726 documents and 160 topics in English and Spanish. The relevance judgments
were taken from CLEF.</p>
      <p>The goal of our evaluation is to prove that the combination of two levels outperforms a single
level. In particular, the combination of keyword and meaning levels is more e ectiveness than the
keyword level alone.</p>
      <p>To measure retrieval performance, we adopted Mean-Average-Precision (MAP) calculated by
the CLEF organizers using the DIRECT system on the basis of 1,000 retrieved items per request.
Table 1 shows the results for each run with an overview on the exploited features.</p>
      <sec id="sec-6-1">
        <title>MONO1TDnus2f</title>
        <p>MONO11nus2f
MONO12nus2f
MONO13nus2f</p>
        <p>MONO14nus2f
MONOwsd1nus2f
MONOwsd11nus2f
MONOwsd12nus2f
MONOwsd13nus2f</p>
      </sec>
      <sec id="sec-6-2">
        <title>CROSS1TDnus2f</title>
        <p>CROSS1nus2f
CROSSwsd1nus2f
CROSSwsd11nus2f
CROSSwsd12nus2f</p>
        <p>X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X</p>
        <p>X
X
X
X
X
X
X
X
X
X
X
X
0.168
0.192
0.145
0.154
0.068
0.180
0.186
0.220
0.227
0.025
0.015
0.071
0.060
0.072</p>
        <p>The results con rm our hypothesis: The combination of two levels outperforms a single
level. In particular, the combination of keyword and meaning levels (MONOwsd12nus2f and
MONOwsd13nus2f) is more e ectiveness than the single keywords level (MONO1TDnus2f and
MONO11nus2f). If we consider MONO1TDnus2f as baseline, we obtain an improvement of 35%
in precision using the N-levels model (MONOwsd13nus2f). The behavior of the two systems is
shown in Figure 1: The N-levels model outperforms the keyword level at all values of recall.</p>
        <p>It is interesting to notice that just the use of the word meaning level alone is able to outperform
the keyword level. This result has a motivation: We chose to index all the synsets for each word
(not only the synset with the highest con dence factor). This intuition makes the retrieval process
easier.</p>
        <p>Regarding the Cross-language track, our system achieves a low precision. This was an
expected result because our system is not designed speci cally for this kind of task. Moreover,
the method adopted for topic translation is based only on the use of WordNet as dictionary. In
particular, performance of the Cross-language without WSD (experiments: CROSS1TDnus2f and
CROSS1nus2f) are not satisfying because the system exploits only keywords and the translation
process introduces a lot of wrong terms into the query, producing a noise e ect. Conversely, the
word meaning level is able to help the retrieval process, as shown in CROSSwsd1nus2f where we
used only the word meaning level (without keywords). In the second attempt (CROSSwsd11nus2f)
we combined the keyword level with the word meaning level obtaining worse results due to the
keyword translation method (as in CROSS1TDnus2f). Finally, we tried to translate the Spanish
words using directly the synsets obtaining a good result with respect to the previous one.</p>
        <p>
          We noticed that our system has a low precision with respect to the other participants to the
CLEF competition. This is due to the standard relevance function implemented in Lucene and this
result was expected. In particular, Lucene performance decreases when the number of terms in
the query grows. In fact, the experiment MONO14nus2f produces large queries and results point
out that the system achieves a low precision in this experiment with respect to the others that rely
exclusively on keywords. This problem also a ects the Cross-language experiments because we
translate a Spanish word using all the possible English translations (CROSS1TDnus2f) producing
a query with a lot of terms. Details concerning this well known behavior of Lucene can be found
in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Nonetheless, the goal of our evaluation was to prove the e ectiveness of the N-levels model
and the experiments con rm our hypothesis.
We have described and tested SENSE (SEmantic N-levels Search Engine), a semantic N -levels IR
system which manages documents indexed at multiple separate levels: keywords and meanings.
The system is able to combine keyword search with semantic information provided by the other
indexing level.
        </p>
        <p>The distinctive feature of the system is that, di erently from the previous approaches, an
adaptation of the vector space model is proposed to integrate, rather than simply replace, the
lexical space with semantic spaces. We provided a detailed description of the SENSE model,
by de ning a local scoring function, a local similarity function for synsets and a global ranking
function in order to merge rankings produced by di erent levels.</p>
        <p>We performed an intensive evaluation using the CLEF Ad-Hoc Robust WSD dataset. This
dataset supplies both words and synsets for each document and it is the ideal framework to evaluate
the N-levels architecture. The experiments show that the N-levels model is e ective when the word
meaning level is involved.</p>
        <p>As future research we plan to improve the performance of the system. We can achieve this
goal adopting two di erent strategies: The former involves the change of the relevance function
implemented in Lucene; the latter exploits the possibility to replace vector space model with a
more e ective IR model.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Baeza-Yates</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Ribeiro-Neto</surname>
          </string-name>
          .
          <article-title>Modern Information Retrieval</article-title>
          . Addison-Wesley,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Annalina Caputo, Anna Lisa Gentile, Marco Degemmis, Pasquale Lops, and
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Semeraro</surname>
          </string-name>
          .
          <article-title>Enhancing semantic search using n-levels document representation</article-title>
          . In Stephan Bloehdorn, Marko Grobelnik,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Mika</surname>
          </string-name>
          , and Duc Thanh Tran, editors,
          <source>SemSearch</source>
          , volume
          <volume>334</volume>
          <source>of CEUR Workshop Proceedings</source>
          , pages
          <volume>29</volume>
          {
          <fpage>43</fpage>
          . CEUR-WS.org,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Amitay</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Carmel</surname>
          </string-name>
          . Lucene and juru at trec
          <year>2007</year>
          :
          <article-title>1-million queries track</article-title>
          .
          <source>In Proceedings of the 16th Text REtrieval Conference (TREC</source>
          <year>2007</year>
          ),
          <year>November 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Mohamed</given-names>
            <surname>Farah</surname>
          </string-name>
          and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Vanderpooten</surname>
          </string-name>
          .
          <article-title>An outranking approach for rank aggregation in information retrieval</article-title>
          . In Wessel Kraaij, Arjen P. de Vries, Charles L. A.
          <string-name>
            <surname>Clarke</surname>
          </string-name>
          , Norbert Fuhr, and Noriko Kando, editors,
          <source>SIGIR</source>
          , pages
          <volume>591</volume>
          {
          <fpage>598</fpage>
          . ACM,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <article-title>Wordnet: a lexical database for english</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>38</volume>
          (
          <issue>11</issue>
          ):
          <volume>39</volume>
          {
          <fpage>41</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>G. Salton.</surname>
          </string-name>
          <article-title>The SMART Retrieval System - Experiments in Automatic Document Processing</article-title>
          . Prentice Hall, Englewood Cli s, NJ,
          <year>1971</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>