=Paper=
{{Paper
|id=Vol-1172/CLEF2006wn-QACLEF-AdafreEt2006
|storemode=property
|title=The University of Amsterdam at WiQA 2006
|pdfUrl=https://ceur-ws.org/Vol-1172/CLEF2006wn-QACLEF-AdafreEt2006.pdf
|volume=Vol-1172
|dblpUrl=https://dblp.org/rec/conf/clef/AdafreJR06a
}}
==The University of Amsterdam at WiQA 2006==
<pdf width="1500px">https://ceur-ws.org/Vol-1172/CLEF2006wn-QACLEF-AdafreEt2006.pdf</pdf>
<pre>
    The University of Amsterdam at WiQA 2006
                 Sisay Fissaha Adafre   Valentin Jijkoun    Maarten de Rijke
                                ISLA, University of Amsterdam
                            sfissaha,jijkoun,mdr@science.uva.nl


                                             Abstract
     This paper describes our participation in the WiQA 2006 pilot on question answer-
     ing using Wikipedia. We present an analysis of the results of our system for both
     monolingual and bilingual runs. The system currently works for Dutch and English.

Categories and Subject Descriptors
H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Infor-
mation Search and Retrieval; H.3.4 Systems and Software; H.3.7 Digital Libraries; H.2.3 [Database
Managment]: Languages—Query Languages

General Terms
Measurement, Performance, Experimentation

Keywords
Question answering, Questions beyond factoids, Importance ranking, Novelty detection, Duplicate
removal


1    Introduction
The WiQA 2006 pilot deals with access to the content of Wikipedia, the free online encyclopedia.
At WiQA, information access is considered both from a reader’s point of view and from an author’s
point of view—WiQA derives much of its motivation from the observation that, in the Wikipedia
context, the distinction between reader and author has become blurred; see [3] for an overview
of the task. The overlap in the roles of the different types of users in Wikipedia motivates new
modes of information access, ones that can support the emerging dual roles identified above.
    WiQA 2006 attempts to address this issue by formulating the task definition as follows. Given
a topic (and associated article) in one language, identify relevant snippets on the topic from other
articles in the same language or even in other languages. In the context of Wikipedia, having
a system that offers this type of functionality capability is important both for using Wikipedia
as a reader and as an author. It provides effective access to additional relevant information in
Wikipedia that is not found in the main article on the topic, which, we believe, is important for
both use cases.
    In this report, we describe our participation in the WiQA 2006 pilot. We took part both in
the monolingual and bilingual tasks. Section 2 presents an overview of the system. Section 3
describes the runs that we submitted. Section 1 provides an analysis of the results. Finally,
Section 5 presents some concluding remarks.
2       System Description
We briefly describe our mono- and bilingual systems for the WiQA 2006 pilot; for a far more
detailed description of our monolingual system, please consult [2]

2.1     Monolingual Task
The main components of the monolingual system are aimed at addressing the following subtasks:
identifying relevant snippets, estimating sentence importance, and removing redundant snippets.
We now discuss each of these in turn.

2.1.1    Identifying relevant snippets
We apply two approaches to identifying relevant sentences. The first exploits the peculiar char-
acteristics of Wikipedia, i.e., the dense link structure, to identify relevant snippets. The second
approach makes use of standard document retrieval techniques. We briefly review the two ap-
proaches below.

Link based retrieval. The link structure, particularly, the structure of the incoming links
(similar to Wikipedia’s ‘What links here’ feature), provides a simple mechanism for identifying
relevant articles. If an article contains a citation to the topic then it is likely to contain relevant
bits of information about the topic since the hyperlinks in Wikipedia are part of the content of the
article. Since hyperlinks are created manually by humans, this approach tends to produce little
noise. However, due to inconsistencies in manual processing and also editing requirements, not
all mentions of a topic may be hyperlinked which may cause some recall problems. Furthermore,
coreferences may also need to be resolved in order to improve recall.
    We devised a simple coreference resolution method for a particular class of coreferences, i.e.,
last name resolution for a person. This method checks whether a person name appears at least
once in the article as hyperlink. Then the person’s name will be tokenized into words and the
last word will be taken as last name of the person. The last name will then be replaced by the
full name. This ommits a large class of coreferences such as pronouns and definite descriptions.
However, the use of pronouns in Wikipedia is relatively low. Furthermore, current coreference
resolution techniques mostly use deep linguistic analysis which is hard to scale up to corpora of
the size of Wikipedia.

Lucene based retrieval. We indexed the Wikipedia articles using the Lucene retrieval en-
gine [4]. We assumed each article to constitute a single document in the index. The resulting
index is used to retrieve articles that contain information about a topic. We used the title of the
topic, which corresponds to the title of a Wikipedia article, as a query to retrieve articles. Since
the aim is to identify smaller snippets about the topic, we split articles into sentences and keep
only those sentences that contain an occurrence of the topic at hand. For non-person topics, we
require the snippets to contain all the tokens of the topic. On the other hand, for person topics, we
also consider snippets that contain one of the tokens in the name. Unlike the link based retrieval
approach outlined above, which requires strict hyperlink relation, this method only checks whether
a snippet contains the title of the topic which may not necessarly imply a hyperlink relation. As
a result, the method tends to be recall oriented, identifying more relevant snippets. However, the
relaxed criteria also means that the method is likely to pick up more noise.

2.1.2    Estimating Sentence Importance
Once we have an initial set of relevant sentences, the next phase of the processing step is to
rank the sentences based on their importance to the topic. For this, we combine several types of
evidence. These include retrieval scores, position in the article, and graph-based ranking scores.
Retrieval scores. The topic is used as a query to retrieve the initial set of relevant articles.
Articles containing multiple occurrences of the topic will be ranked higher. The assumption is that
such higher ranking articles are likely to contain larger number of relevant sentences, which in turn
increases the likelihood of getting important and novel sentences. This provides for a relatively
weak source of evidence of importance of a sentence in an article. We convert the retrieval scores
to values between 0 and 1 by dividing each retrieval score with the maximum value.

Position in article. The position of a sentence in an article is used as an extra indication for
its importance for the topic of the article from which the sentence is extracted. The earlier the
sentence appears in the article, the more important it is assumed to be. The sentence positions are
converted into a value between 0 and 1 in which the sentence at position 1 receives the maximum
graph-based score (explained below), and subsequent sentences receive a score which is a fraction
of the maximum graph-based score using the formula proposed in [5].

Graph-based scoring The previous methods make use of local context in assigning relative
importance to sentences. Graph-based scoring allows us to rank sentences by taking into account
a more global context of the sentences. This global context consists of a representative sample
of articles that belong to the same categories as the topic. The resulting corpus serves as our
importance model by which we assign each sentence a score. Once we have our representative
corpus, we rank sentences based on their centrality with respect to this corpus. For more detailed
discussion of the method, see [2].
    The overall score of a sentence is the sum of the above scores.

2.1.3   Filtering
After we compute the scores for the snippets, the next step involves computing a redundancy
penalty [5]. For this, we sort the snippets by decreasing order of their scores. We compare each
candidate sentence with the sentences in the main article, and sentences that appear before it in the
ranked list. We used simple word overlap for measuring sentence similarity. We keep the maximum
similarity score and subtract it from the sentence score. In all our similarity computations, we
remove stopwords. We sort the list again in decreasing order of the resulting scores. The top 10
sentences constitutes the snippets in our submitted runs.

2.2     Multilingual Task
The multilingual task extends the problem of finding important snippets in one language to mul-
tiple languages in Wikipedia: given a topic in one language, find important snippets in other
Wikipedia articles of the same language or other languages. The major challenge in this task con-
cerns ensuring relevance and novelity acrose multiple languages. This in turn means that we need
to have some mechanism of measuring similarity among snippets from different languages. This
may be achieved using different approaches, i.e., using a biligual dictionary or machine translation
system. For this task, we only used a bilingual dictionary generated from Wikipedia itself. We
applied the method on Dutch-English language pairs.

2.2.1   Steps
Identifying important snippets in a multilingual task consists of several monolingual extraction
plus multilingual filtering. Specifically, given a topic in one language, we
   • identify important snippets in the language of the topic (primary language).
   • translate the topic into other languages
   • identify important snippets in the translated topics (secondary languages)
   • filter the resulting multilingual list for redundant information
The ranked list consists of three types of snippets. The first set contains the snippets extracted
from the primary language. These snippets are extracted using the monolingual procedure pre-
sented in the Section 2.1 (primary snippets). The second set contains snippets coming from the
main article in the secondary language (secondary main article snippets). These are automatically
considered as relevant and are included in the main list. The third type of snippets are extracted
from other articles in secondary language (secondary article snippets), again using the monolin-
gual algorithm, but this time on the secondary language. The snippets are ordered such that the
primary snippets comes at the top followed by secondary main article snippets which are then
followed by the secondary other article snippets.
    The above ranked list and the main article in the primary language are input to the filter-
ing module, which goes through the list and removes duplicates. In the next section we briefly
summarize the method we adopt for multilingual similarity which forms the core of the filtering
module.

2.2.2    Using a Bilingual Lexicon
We now describe the method we used for identifying similar text across different languages. It
relies on a bilingual lexicon which is generated from Wikipedia using the link structure. The
bilingual lexicon consists of the Wikipedia page titles that typically represents concepts or entities
that have entries in Wikipedia. Therefore, similarity measure is based on concept or article title
overlap. We used page title translations as our primitives (as main features) for the computation
of multilingual similarity. For each Wikipedia page in one language, translations of the title in
other languages, for which there are separate entries, are given. This information can be used to
generate a bilingual translation lexicon. Most of these titles are noun phrases and are very useful
in multilingual similarity computations. Most of these noun phrases are already disambuiguated.
They are mostly content bearing terms and may consist of single word or multiword units.
    We used the Wikipedia redirect feature to identify synonymous expression. In Wikipedia, the
redirect feature is used to map several titles into a canonical form. For a more detailed description
see [1].


3       Runs
We submitted a total of seven runs: six monoligual runs for Dutch and English (three runs per
language), and one is a Dutch-English bilingual run.

3.1     Monolingual Runs
All the monolingual runs use the approach outlined in Section 2. Observe that except for the use
of stopwords lists, the method is generic and can be ported to a new language with relative ease.
    The runs differ in the methods adopted for acquiring the initial set of relevant sentences. As
mentioned in Section 2.1.1, we identified two ways of identifying relevant sentences. The first run
uses the link based approach whereas the second uses retrieval based approach. The third run
combines the retrieval based approach with link based filtering in identifying relevant sentences.
The latter two runs use the retrieval score in computing the overall score of sentence importance.

3.2     Bilingual Run
The bilingual run considers Dutch and English language pairs. The source topics can be in any
of these languages. As mentioned in Section 2.2, the method is built on top of our monolingual
approach. We used the output of the third monoligual run as an input for the bilingual filtering.
The bilingual run makes use of the bilingual lexicon generated automatically from Wikipedia for
identifying duplicates across different languages.
4     Results
We present the results of our seven runs. The results are assessed based on the following attributes:
”supported”, ”important”, ”novel” and ”not repeated”. The summary statistics used as evaluation
measures are:
    • Yield at top 10 snippets: total number of supported important novel non-repeated snippets
      for all topics among the top 10 snippets (yield );
    • Average yield per topic (top 10 snippets): previous, devided by the number of topics (Avg.
      Yield ) ;
    • MRR (top 10 snippets): Mean Reciprocal Rank of the first supported important novel non-
      repeated snippet among top 10, according to the system’s ranking(MRR);
    • Precision at top 10 snippets: the number of supported important novel non-repeated snippets
      among top 10 snippets per topic, divided by the total number of top 10 snippets per topic
      (Precision).
    The test files for the English and Dutch monolingual tasks consist of 65 and 60 topics respec-
tively. The corresponding test file for bilingual task consists of 60 topics.
    Table 1 shows the results of the seven runs. As mentioned in Section 3, our monolingual runs
differ on how the initial set of relevent sentences are acquired. In Table 1, the three different
approaches are indicated by Ret (retrieval only approach), Link (link only based approach), and
LinkRet (the combination of the two methods). The columns in Table 1 are as described above.

                                              English
                                        Avg. Yield MRR        Precision
                            Ret           2.938       0.523    0.329
                            Link          3.385       0.579    0.358
                            LinkRet       2.892       0.516    0.330
                                               Dutch
                                        Avg. Yield MRR        Precision
                            Ret           3.200       0.459    0.427
                            Link          3.800       0.532    0.501
                            LinkRet       3.500       0.532    0.494
                                          English-Dutch
                                        Avg. Yield MRR        Precision
                            LinkRet        5.03       0.518    0.535


    Table 1: Results for English and Dutch monolingual tasks; Dutch-English bilingual task.

    Overall the scores for link based monolingual runs are the best. This shows that link based
retrieval approach provide a more accurate initial set of relevant sentences on which the perfor-
mance of the whole system largely depends. The retrieval based approach seems to introduce more
noise as shown by the scores of the Ret and LinkRet based monolingual runs for both languages.
Contrary to our expectation, the combination of the two methods performed worse for English.
    In order to see whether the three approaches return similar sets of snippets, we counted the
number of snippets that are found in the result sets of the three methods for English monolingual
runs. The three methods have 305 snippets in common in their result sets. Of these snippets,
103 snippets are judged good snippets where the number of good snippets returned by the three
methods are; 220-Link, 191-ret, and 188-linkret.
    Furthermore, the three methods return no good snippets for the following topics, Brooks
Williams, Wing Commander (film), Christian County, Illinois, White nationalism, Telenovela
database, Oxygen depletion. Although the potential cause for the poor performance on these top-
ics may vary acrose the topics, a brief look at the outputs for these topics reveals some possible
sources of problems. For some of these topics, the retrieval component returned very few candi-
date snippets, e.g. Brooks Williams and Oxygen depletion. For others, it returned a large number
of similar snippets, e.g. towns and cities for Christian County, Illinois and different entries of
Telenovela database, which are judged irrelevant or redundant by the assessors. Some topics have
ambiguous titles, e.g. Wing Commander (film), as indicated by its result set which contains
snippets from articles with similar titles, e.g Wing Commander (computer game).
    On the other hand, the three methods return more than 5 good snippets for the following
topics: Center for American Progress, Atyrau, Saitama Prefecture, Kang Youwei, Philips Records.
Further examination of the outputs of the three methods for these topics shows that the methods
tend to return similar sets of good snippets. Overall most of the scores for our English monolingual
runs are above the median score. The scores for our best run is close to the maximum score.
    A similar analysis on the output of the Dutch monolingual runs shows similar patterns. Some
topics are very hard for all methods. For example, all methods returned no snippets at all for the
following topics; Vclav Havel, MG (auto), Tsjetsjeni, Caro, Socit Nationale des Chemins de fer
Franais, TVR (auto). This is mainly due to the special characters in the titles of the topics which
we did not anticipate and made it impossible to identify candidate snippets. On the other hand,
all methods performed well on the following topics; Gilera, Columbia-universiteit, Albert Einstein,
Zwarte rat, NSU, Slangendrager .
    The scores for the billingual run is based on the output of the LinkRet retrieval component.
The scores tend to be higher than the monolingual scores. This may be due to the fact that
most of the topics are Dutch topics that are mostly short and additional snippets are likely to
new. Furthermore, the snippets can come from both Dutch and English encyclopedias which also
contributes to finding good snippets.


5    Conclusion
This paper described our participation in the WiQA 2006 pilot. Our approaches consist basically
of the following three steps: retrieving candidate snippets, reranking the snippets, and remov-
ing duplicates. We compared two approaches for identifying candidate snippets, i.e. link based
retrieval and the traditional document based retrieval. The result showed that the link based
approach performed better than the document retrieval based approach. The results showed that
overall our system performed well. However, there is a lot of room for improvements. In the
future, we want to compare different reranking methods and also perform detailed error analysis
of the output of our system.


6    Acknowledgments
Sisay Fissaha Adafre was supported by the Netherlands Organisation for Scientific Research
(NWO) under project number 220-80-001. Valentin Jijkoun was supported by NWO under project
numbers 220-80-001, 600.-065.-120 and 612.000.106. Maarten de Rijke was supported by NWO
under project numbers 017.001.190, 220-80-001, 264-70-050, 354-20-005, 600.-065.-120, 612-13-001,
612.000.106, 612.066.302, 612.069.006, 640.001.501, and 640.002.501.


References
[1] Sisay Fissaha Adafre and Maarten de Rijke. Finding similar sentences across multiple languages
    in wikipedia. In EACL 2006 Workshop on New Text, Wikis and Blogs and Other Dynamic
    Text Sources, 2006.
[2] Sisay Fissaha Adafre and Maarten de Rijke. Learning to identify important biographical facts.
    In Submitted, 2006.
[3] Valentin Jijkoun and Maarten de Rijke. Overview of WiQA 2006. In This volume, 2006.
[4] Lucene. The Lucene search engine. http://lucene.apache.org/.
[5] Dragomir R. Radev, Hongyan Jing, Magorzata Sty, and Daniel Tam. Centroid-based sum-
    marization of multiple documents. Information Processing and Management, 40(6):919–938,
    2004.

</pre>