<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Wikification: Mining Structured Queries Unstructured Information Needs using Wikipedia-based Semantic Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Amir Hossein Jadidinejad</string-name>
          <email>amir@jadidi.info</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fariborz Mahmoudi</string-name>
          <email>mahmoudi@itrc.ac.ir</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Islamic Azad University of Qazvin</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Measurement</institution>
          ,
          <addr-line>Performance, Experimentation</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Combining the language model and inference network, as implemented in the Indri search engine, is efficient and verified approach. In this retrieval model, the user's information need is exhibited as Indri's Structural Query Language. Although the SQL allows expert users to richly represent its information needs but unfortunately, the complicacy of SQLs make them unpopular in the WEB for ordinary ones. Automatically detecting the concepts in a user's information need and generate a richly structured equivalent query is a good solution. It needs a concept repository and a way to extracting appropriate concepts from the user's information need. We utilize Wikipedia as a great, multilingual, free-content encyclopedia for our knowledge base and also some state of the art algorithms for extracting Wikipedia's concepts from the user's information need. This process is called “Query Wikification”. Mining Wikipedia concept repository help us to propose a solution that supports usability in multilingual environments, cross-language retrievals, scalability and covering erratum, various equivalents and synonyms of a concept. Experimental results verify that our automatic structured query construction is an efficient and scalable method that has a very good potential to apply on the WEB. Our experiments over TEL corpus in CLEF2009 achieves +23% improvement in Mean Average Precision and retrieves more than 600 relevant documents against the Indri baselines. In Persian track, we evaluated a simple stemmer so-called “Perstem”, a stemmer and light morphological analyzer for Persian language. Our experimental results show that using this stemmer in indexing and retrieval phase can significantly improve both precision (+91%) and recall (+43%).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction and Motivation</title>
      <p>Representing user’s information need is a fundamental part in an information retrieval system.
Most systems get a list of keywords for each information need. For example, if a user is interested
in “colour therapy” and the therapeutic use of colour they might formulate the natural language
query “colour therapy”. It’s not only a hard task for ordinary users to represent its information
need as a set of keywords but also clear that a lot of semantics is lost by transcribing the information
need into a set of keywords. Such a query may retrieve some documents about “color” or “therapy”
that completely irrelevant. Also user’s knowledge about the query is neglected when encoding it as
a list of keywords. For example, maybe our user knows that “color” and “colour” are synonymous!</p>
      <p>
        Structured Queries can represent user’s information needs accurately. A Structured Query
Language (SQL) allows terms weighting, the use of proximity information among terms, field
restricting and various ways of combining concepts. Since structured queries can be more expressive
than keywords, it’s verified that retrieval models that can evaluate structured queries [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] such as
Indri [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and InQuery [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] have more potential to retrieve more accurate results.
      </p>
      <p>
        Although the structured queries and related models got a very good results in different
experiments [
        <xref ref-type="bibr" rid="ref12 ref8">12, 8</xref>
        ] but they suffer from a drawback that made them unusable in the WEB. Having
knowledge about related concepts in the query is necessary to constructing structured queries. Even
we presume that the user has a good knowledge about its information need, learning the complicated
Structured Query Language for WEB users is not favorable. Understanding the user’s
information need and generating a richly structured query is a great solution. It needs a huge concept
repository that covers all query concepts and a way to extracting appropriate concepts from the
user’s information need. Wikipedia is a multilingual, web-based, free-content encyclopedia that
cover most important concepts in the world. We call the process of extracting a list of Wikipedia
concepts from a natural language information need as “Query Wikification”. Mining Wikipedia
and some state of the art Wikification algorithms [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] are used to generate a richly, efficient
structured query. The contributions of this paper are the following:
∙ Proposing a new method for converting a simple natural language information need into a
well-formed, rich, efficient, structured query. This process is done with the aid of state of
the art algorithms in both “Wikification” [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] and “Structural Retrieval Models” [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. It
can replace keyword-based search engines on the WEB with the powerful structured queries
and related models.
∙ Usability in multilingual environments and cross-language retrieval. The proposed model
make a meta-language search engine from Indri [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] that can efficiently apply on multilingual
environments such as the WEB. Our experiments in CLEF2009 campaign is a good evidence
for this feature.
∙ Scalability. The proposed approach is base on Indri search engine1, a scalable language
modeling search engine that supports structured queries. Also some new projects such
as Galago2 that supports Indri Structured Query Language in a distributed computation
framework make it more and more scalable and suitable for the WEB.
∙ Our model can extract a vocabulary for each query’s concept by mining Wikipedia. It
contains erratum, stemmed equivalent and synonyms of the concept. All of them are embedded
in the structured query. Also, this vocabulary can work as a semi-stemming algorithm and
very helpful in multilingual environments or complicated languages such as Persian language
that have a hard morphology (Sec. 4.2.2). This feature is WEB suitable too!
      </p>
      <sec id="sec-1-1">
        <title>1http://lemurproject.org/indri/ 2http://www.galagosearch.org/</title>
        <p>
          The process of automatically recognizing the topics mentioned in unstructured text and linking
them to the appropriate Wikipedia articles is known as wikification [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The user’s information
need is a short and informative text. So we can apply Wikification on user’s information needs
in order to map unstructured query into a weighted list of concepts in Wikipedia. We call this
process as “Query Wikification”. To our knowledge, there isn’t any relevant publication in this
research area.
        </p>
        <p>
          Two Wikification method have been proposed by now. The first is Wikify! [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and the second
is WM-Wikifier [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. WM-Wikifier is a distinguish approach that uses Wikipedia articles not only
as a source of information to point to, but also as training data for how best to create links. We
utilize this algorithm for “Query Wikification”. More details can be found in [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          For example, take a look at Figure 1. It’s a sample user’s information need in CLEF 2009.
The result of Query Wikification is shown in Figure 2. As you see, the important topics are
extracted and the original query is annotated using Wikipedia concepts. We use Wikipedia-Miner3
toolkit [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] in our experiments.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Structured Query Construction</title>
      <p>If we can map an unstructured user’s information need to a weighted list of Wikipedia concepts,
what can we do with these concepts?!, It can help us to move from unstructured, limited and noisy
text to structured, well-known and accurate concepts. It’s a break through step in Information
Retrieval. The Wikification algorithms simply do that!</p>
      <p>
        In our experiments, we utilize the WM-Wikifier [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] algorithm in order to extract a weighted
list of Wikipedia concepts and mine translation and synonyms of these concepts from Wikipedia
knowledge-base to construct an equivalent structure query. For example, take a look at Figure 1.
It’s a sample topic in CLEF 2009. In this topic the user is looking for all relevant information
bout colour therapy and therapeutic use of colours. The following is the Indri [
        <xref ref-type="bibr" rid="ref12 ref8">12, 8</xref>
        ] equivalent
structure query after removing redundant and stop words:
#combine(colour therapy therapeutic)
      </p>
      <sec id="sec-2-1">
        <title>3http://wikipedia-miner.sourceforge.net/</title>
        <p>Title
dc:title
dcterms:alternative
dc:subject
dc:abstract
dc:description
dc:contributor
Distribution
80%
little
210%
little
42%
little</p>
        <sec id="sec-2-1-1">
          <title>Description</title>
          <p>This is record’s title. All records
contains this field and it ia a valuable field.</p>
          <p>In some records, this field contains
relevant information.</p>
          <p>Manually assigned subject heading.</p>
          <p>Record’s abstract.</p>
          <p>Record’s description. Mostly contains
copyrights and related stuffs.</p>
          <p>Record’s contributor.</p>
          <p>The following structure query is generated by our approach4. It contains some professional
expressions (“chromotherapy”) and all translations and synonyms of each concept:
#combine(colour therapy therapeutic
#syn(chromotherapy farbtherapi colourology #1(color therapy))
#syn(color couleur farb colour colors colours couleur)
#syn(therapi thrap therapi treatment therapie therapy))</p>
          <p>There are various approach in constructing equivalent structure query. In the next section, we
describe our experiments.
4
4.1
4.1.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>TEL@CLEF2009</p>
      <p>
        Meta-Language Field Index Construction
TEL is an inherently multilingual corpus. It contains not only records in different languages but
also some records maybe have multilingual fields. Detecting record’s language is a fundamental
task to apply stemming and stop word removal. On the other hand, detecting different languages
in each record is not only a hard work but also lead to poor results. Previous experiments
utilize different language identification approaches to detect each field’s language and then apply
appropriate stemmer and stop words [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We use a meta-language index in our experiments.
Instead of distinguishing different languages, all fields are indexed without stemming and stop
word removal. In this approach, all valuable contents are indexed together without any concern
about underlying language. It is clear that such indexing strategy is not appropriate in general
but our experiments have shown that it is an appropriate indexing strategy in tandem with Query
Wikification and Indri Structured Query Language.
      </p>
      <p>In the preprocessing step, we delete all noisy and invaluable fields from TEL corpus. After
analyzing TEL’s records, we extract a list of fields that contains important information. Table 1
shows the valuable fields in preprocessing step. For example, see Figure 3, it is a sample record
in TEL corpus. Figure 4 is an equivalent record after preprocessing. As you see, we skip all
invaluable fields and store remaining one in TREC format. Also we don’t apply any stemming
or stop word removal in the indexing phase. Instead apply stop word removal in retrieval phase
using a list of stop words provided by UNINE5.</p>
      <p>
        We utilize Indri [
        <xref ref-type="bibr" rid="ref12 ref8">12, 8</xref>
        ] Field Index for indexing because it not only construct a powerful field
index but also support index’s fields in its query language. All valuable fields (Table 1) is configured
4The generation procedure is discussed in Sec 4.
5http://members.unine.ch/jacques.savoy/clef/englishST.txt
as a backward field index. Finally, the indexing is done by “indribuildindex” application in Lemur
toolkit [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
4.1.2 Indri Baseline
To compare our results, we apply Indri retrieval model [
        <xref ref-type="bibr" rid="ref12 ref8">12, 8</xref>
        ] on the title and description of each
topic. The query model is as follow:
#combine( &lt;title&gt; &lt;description&gt; )
Before passing topics to Indri retrieval engine, all common and redundant words are removed. For
example, for the query that is shown in Figure 1, after removing common and redundant words:
#combine(colour therapy therapeutic)
This run is addressed as “SIM” in the our experiments. Table 3 and Figures 5 and 6 compare this
baseline with proposed approaches.
4.1.3
      </p>
      <p>Concept Translation
Wikipedia contains articles in more than 250 natural languages. Each article link to equivalent
one in other languages. After extracting concepts from unstructured user’s information need, we
can utilize the translation links in Wikipedia in order to translate each concepts. The following
model is applied:
#combine( &lt;title&gt; &lt;description&gt; #syn(#1(EN) #1(FR) #1(GE)) )</p>
      <sec id="sec-3-1">
        <title>For example for previous sample query:</title>
        <p>This run is addressed as “SIMTR” in our experiments. Table 3 and Figures 5 and 6 compare this
run with other approaches and the baseline. Also take a look at Table 2, it compares the proposed
approaches and baseline for the previous example (“colour therapy”). Evaluation results show
that translating concepts using Wikipedia significantly improve both precision (+18%) and recall
(+8%). For the example query (Table 2), Mean Average Precision is improved (+62%) and also
1 (+4%) more relevant document is retrieved.
4.1.4</p>
        <p>Concept Translation and Synonyms Extraction
Most retrieval systems are a simple pattern matcher. So co-occur terms play an important role
in ranking algorithm. So we eager to know more and more synonyms and relevant concepts for
each concept. If we have an article in Wikipedia, we can mine all other articles to find a list of
synonyms for this article. There are two distinct ways: redirect pages6 and anchors. We prefer
anchor titles since we can rank the vocabulary for each concept while ranking is not possible for
redirect pages7. This can be done by anchor texts. All anchors for one articles are synonym. This
assumption construct the following structure query:
#combine( &lt;title&gt; &lt;description&gt;</p>
        <p>#syn(#1(EN) #1(FR) #1(GE) &lt;Anchors List&gt;))
For example the previous sample query is defined as:</p>
        <p>6redirects are standalone pages in Wikipedia that just have a title that refer to an article. For covering various
equivalents, misspelling, and. . .</p>
        <p>7We can rank redirect pages by query logs in Wikipedia.
#combine(colour therapy therapeutic
#syn(chromotherapy farbtherapi colourology #1(color therapy))
#syn(color couleur farb colour colors colours couleur)
#syn(therapi thrap therapi treatment therapie therapy))</p>
        <p>This run is addressed as “SIMEXT” in our experiments. Table 3 and Figures 5 and 6 compare
this run with other approaches and the baseline. Also take a look at Table 2, it compares the
proposed approaches and baseline for the previous example (“colour therapy”). Evaluation results
show that translating concepts in tandem with synonyms and various equivalent extraction using
Wikipedia significantly improve both precision (+22%) and recall (+13%). For the example query
(Table 2), Mean Average Precision is improved (+66%) and also 2 (+8%) more relevant document
is retrieved. Also our experimental results over TEL corpus show that SIMEXT is a better solution
than SIMTR in both precision and recall.
4.2</p>
        <p>Persian@CLEF
4.2.1</p>
        <p>
          Bilingual
Persian is an Indo-European language spoken in Iran, Afghanistan and Tajikistan. It is also known
as Farsi [
          <xref ref-type="bibr" rid="ref1 ref6">1, 6</xref>
          ]. In this section we summarize our experiments in the Persian track of CLEF2009.
Bilingual retrieval in Persian track is done with a same approach as discussed in Sec. 4.1.
Unlike TEL experiments, we have a very poor results, due to little coverage of Farsi language of
Wikipedia8. For example most topics is extracted from the query but since there isn’t an
equivalent article in Farsi language of Wikipedia, we can’t translate it. Table 4 shows our different runs.
        </p>
        <p>RUN
IAUPEREN1
IAUPEREN2
IAUPEREN3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Relevant-Retrieved 650/4330 659/4330 773/4330</title>
        <p>MAP
0.0195
0.0202
0.0277</p>
      </sec>
      <sec id="sec-3-3">
        <title>NDCG 0.0975 0.0984 0.1223</title>
      </sec>
      <sec id="sec-3-4">
        <title>R-PREC 0.0433 0.0427 0.0477</title>
      </sec>
      <sec id="sec-3-5">
        <title>Relevant-Retrieved</title>
      </sec>
      <sec id="sec-3-6">
        <title>Desc</title>
        <p>
          Perstem9 is a stemmer and light morphological analyzer for Persian by Jon Dehdari 10. It is
written in Perl and uses regular expression substitutions to separate inflectional morphemes and
remove affixes. The stemmer currently has 76 substitution rules, which replace one pattern of text
with another [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. It has a very good performance and accuracy for stemming and morphological
analyzing of Persian texts. On a sample dataset, Perstem correctly and efficiently analyzed 97%
of the words [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>
          Inconsistent stemming results have been reported in CLEF2008 [
          <xref ref-type="bibr" rid="ref2 ref7">2, 7</xref>
          ]. So we decided to
evaluate it in our CLEF 2009 experiments. Unlike [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], our evaluation is based on overall performance
(precision/recall) with Hamshahri corpus and benchmark queries in CLEF 2009. On the other
hand, we investigate the application of Perstem in Persian retrieval in a large news corpus.
Table 5 shows our official runs11. Experimental results show that stemming algorithm significantly
improved both precision (+91%) and recall (+43%).
5
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Future Works</title>
      <p>In this paper we propose an efficient approach for extracting relevant concepts and a vocabulary of
synonyms, translations, various equivalents and. . . that all of them are embedded in a structured
query. We leverage Wikipedia as our knowledge base and Indri as Structured Query Language and
model. Query modification techniques such as query expansion suffer from a problem so-called
“Query Drift”. It means that although by modifying a query we can get more relevant documents
but it maybe hurt the precision. Our experiments over TEL corpus show that this method is an
efficient and robust approach that significantly improves both precision and recall. We believe that
our method is a good potential to apply on the WEB. For example, take a look at the following
query12:
Title: Modern Persian Language,
Desc: Retrieve publications providing instructions on
learning or teaching modern/contemporary Persian.</p>
      <p>Take a look at the generated structured query by SIMEXT :
#weight(0.3 #combine(modern teaching instructions persian contemporary learning language)
0.7 #syn(farsi #1(persian languages) #1(farsi salis language) #1(modern perisan) persian
#1(modern persian language) #1(parsi language) #1(farsi language) #1(modern persian)
#1(persian language) #1(persische sprache) ))
“Farsi” or “Parsi” are informal equivalents of “Modern Persian Language” that it can’t nowise
understand from the original query. Using these informal equivalent on the WEB is very important
evidence. For another example, take a look at the following structured query for Figure 1:
9http://sourceforge.net/projects/perstem/
10http://www.ling.ohio-state.edu/∼jonsafari/
11DOI: 10.2415/AH-PERSIAN-MONO-FA-CLEF2009.QAZVINIAU.IAUPERFA¡X¿
1210.2452/733AH
As you see, without applying a complicated stemmer in our multilingual environment (TEL
corpus), our extracted vocabulary from anchor titles can cover most of them efficiently. For example,
in the structured query, “color” and “colour” are synonyms. It’s a very good potential in highly
multilingual environments such as the WEB. Evaluation comparison for each query is shown in
Table 6.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>We would like to thank Donald Metzler13, one of the main developers of Indri Structured Query
Language, for his ideas and advice, and Lemur community14 for supporting and sharing an
excellent resource. Also, we would like to thank Jon Dehdari for sharing Perstem, and DBRG15 for
Hamshahri corpus. Finally, we must of course acknowledge the tireless efforts of the Wikipedia
community that make a valuable knowledge base during years. We are also debated to the CLEF
organizers too.</p>
      <p>13http://research.yahoo.com/Don Metzler
14http://sourceforge.net/forum/?group id=161383
15http://ece.ut.ac.ir/DBRG/
W .8 .4 .4 .3 .3 .2 .2 .2 .2 .2 .1 .4 .1
0 0 0 0 0 0 0 0 0 0 0 0 0
g
n
i
r
e
e
n
i
g
n
e
y
m
to y
a d
n o
y a b s
n
o
i
t
h la
c t n n ic n rc
i i
1</p>
      <p>2
4</p>
      <p>2
4</p>
      <p>0
0</p>
      <p>0
4
n
o
i
t
a
l
u
p
i
n
a
M
e
n
e
G
n
a
m
u
9
y
h
p
o
s
o
l
y i
h h
p p
s
r
e
h
p
o
s
o
l
i
h
P
h
c
n
e
r
F
y
r
a
r
o
p
m
e
t
n
o
e
l
t
i
D
I
T</p>
      <p>H</p>
      <p>C</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Eneko</given-names>
            <surname>Agirre</surname>
          </string-name>
          ,
          <string-name>
            <surname>Giorgio M. Di Nunzio</surname>
            , Nicola Ferro, Thomas Mandl, and
            <given-names>Carol</given-names>
          </string-name>
          <string-name>
            <surname>Peters</surname>
          </string-name>
          .
          <source>Clef</source>
          <year>2008</year>
          :
          <article-title>Ad hoc track overview</article-title>
          .
          <source>In Proceedings of the CLEF 2008: Workshop on Cross-Language Information Retrieval and Evaluation</source>
          , Aarhus, Denmark,
          <year>2008</year>
          .
          <volume>4</volume>
          ,
          <fpage>8</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>AleAhmad</surname>
          </string-name>
          , E. Kamalloo,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zareh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rahgozar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Oroumchian</surname>
          </string-name>
          .
          <article-title>Cross language experiments at persian@clef 2008</article-title>
          .
          <source>In Proceedings of the CLEF 2008: Workshop on CrossLanguage Information Retrieval and Evaluation</source>
          , Aarhus, Denmark,
          <year>2008</year>
          .
          <article-title>CLEF 2008 Organizing Committee</article-title>
          .
          <fpage>9</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>James</surname>
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Callan</surname>
            ,
            <given-names>W. Bruce</given-names>
          </string-name>
          <string-name>
            <surname>Croft</surname>
            ,
            <given-names>and John Broglio. Trec</given-names>
          </string-name>
          <article-title>and tipster experiments with inquery</article-title>
          .
          <source>Inf</source>
          . Process. Manage.,
          <volume>31</volume>
          (
          <issue>3</issue>
          ):
          <fpage>327</fpage>
          -
          <lpage>343</lpage>
          ,
          <year>1995</year>
          . 2
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Jon</given-names>
            <surname>Dehdari</surname>
          </string-name>
          and
          <string-name>
            <given-names>Deryle</given-names>
            <surname>Lonsdale</surname>
          </string-name>
          .
          <article-title>A link grammar parser for Persian</article-title>
          . In Simin Karimi, Vida Samiian, and Don Stilo, editors,
          <source>Aspects of Iranian Linguistics</source>
          , volume
          <volume>1</volume>
          . Cambridge Scholars Press,
          <year>2008</year>
          . 9
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Amir</given-names>
            <surname>Hossein</surname>
          </string-name>
          Jadidinejad and
          <string-name>
            <given-names>Fariborz</given-names>
            <surname>Mahmoudi</surname>
          </string-name>
          . Qiau at clef2009:
          <article-title>Persian track</article-title>
          .
          <source>In Proceedings of the CLEF 2009: Workshop on Cross-Language Information Retrieval and Evaluation</source>
          , Corfu, Greece,
          <year>September 2009</year>
          .
          <article-title>CLEF 2009 Organizing Committee</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Simin</given-names>
            <surname>Karimi</surname>
          </string-name>
          .
          <source>Persian or farsi? 20Farsi.pdf. 8</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Karimpour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghorbani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pishdad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mohtarami</surname>
          </string-name>
          , A. AleAhmad, and
          <article-title>Amiri A. Using part of speech tagging in persian information retrieval</article-title>
          .
          <source>In Proceedings of the CLEF 2008: Workshop on Cross-Language Information Retrieval and Evaluation</source>
          , Aarhus, Denmark,
          <year>2008</year>
          .
          <article-title>CLEF 2008 Organizing Committee</article-title>
          .
          <fpage>9</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Donald</given-names>
            <surname>Metzler</surname>
          </string-name>
          and
          <string-name>
            <given-names>W. Bruce</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Combining the language model and inference network approaches to retrieval</article-title>
          . Inf. Process. Manage.,
          <volume>40</volume>
          (
          <issue>5</issue>
          ):
          <fpage>735</fpage>
          -
          <lpage>750</lpage>
          ,
          <year>2004</year>
          .
          <volume>2</volume>
          ,
          <issue>3</issue>
          ,
          <issue>4</issue>
          ,
          <fpage>6</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Rada</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andras</given-names>
            <surname>Csomai</surname>
          </string-name>
          . Wikify!
          <article-title>: linking documents to encyclopedic knowledge</article-title>
          .
          <source>In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management</source>
          , pages
          <fpage>233</fpage>
          -
          <lpage>242</lpage>
          , New York, NY, USA,
          <year>2007</year>
          . ACM.
          <volume>2</volume>
          ,
          <fpage>3</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>David</given-names>
            <surname>Milne</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ian H.</given-names>
            <surname>Witten</surname>
          </string-name>
          .
          <article-title>Learning to link with wikipedia</article-title>
          .
          <source>In CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge management</source>
          , pages
          <fpage>509</fpage>
          -
          <lpage>518</lpage>
          , New York, NY, USA,
          <year>2008</year>
          . ACM.
          <volume>2</volume>
          ,
          <fpage>3</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Paul</given-names>
            <surname>Ogilvie</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jamie</given-names>
            <surname>Callan</surname>
          </string-name>
          .
          <article-title>Experiments using the lemur toolkit</article-title>
          .
          <source>In In Proceedings of the Tenth Text Retrieval Conference (TREC-10</source>
          , pages
          <fpage>103</fpage>
          -
          <lpage>108</lpage>
          ,
          <year>2002</year>
          . 6
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Trevor</surname>
            <given-names>Strohman</given-names>
          </string-name>
          , Donald Metzler, Howard Turtle, and
          <string-name>
            <given-names>W. Bruce</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Indri: A languagemodel based search engine for complex queries (extended version)</article-title>
          .
          <source>IR 407</source>
          , University of Massachusetts,
          <year>2005</year>
          .
          <volume>2</volume>
          ,
          <issue>3</issue>
          ,
          <issue>4</issue>
          ,
          <fpage>6</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Masoud</surname>
            <given-names>Tashakori</given-names>
          </string-name>
          , Mohammad Reza Meybodi, and
          <string-name>
            <given-names>Farhad</given-names>
            <surname>Oroumchian</surname>
          </string-name>
          .
          <article-title>Bon: The persian stemmer</article-title>
          .
          <source>In EurAsia-ICT</source>
          , pages
          <fpage>487</fpage>
          -
          <lpage>494</lpage>
          ,
          <year>2002</year>
          . 9
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Ian</surname>
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Witten</surname>
            and
            <given-names>David</given-names>
          </string-name>
          <string-name>
            <surname>Milne</surname>
          </string-name>
          .
          <article-title>An open-source toolkit for mining Wikipedia</article-title>
          . In (to appear),
          <year>2009</year>
          . 3
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>