<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Wikipedia entity retrieval for Dutch and Spanish</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gosse Bouma</string-name>
          <email>g.bouma@rug.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergio Duarte</string-name>
          <email>sergio.duarte@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Science, University of Groningen</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We developed two systems (for Dutch and Spanish) for the GikiCLEF task, in which Wikipedia pages have to be found that match a description in natural language. We concentrated on linguistic analysis of the query, for mapping the question onto the most relevant Wikipedia categories, and for extracting additional constraints that matching pages have to satisfy. In addition, for Spanish we experimented with query expansion for improved recall of the IR process. In both the Dutch and Spanish system we tried to incorporate additional knowledge sources (WordNet, Yago, DbPedia) for better question analysis and retrieval results. The Dutch system obtained a GikiCLEF score of 2.5 (7th overall and 7th for Dutch). The Spanish system was still under development at the time of the official evaluation, and performed poorly. We show that the completed system would have performed well at the 2009 task.</p>
      </abstract>
      <kwd-group>
        <kwd>Entity Ranking</kwd>
        <kwd>Wikipedia</kwd>
        <kwd>Linguistic Analysis</kwd>
        <kwd>Dutch</kwd>
        <kwd>Spanish</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>relational information harvested from infoboxes in Dutch Wikipedia and expanded with
information extracted from infoboxes found in English Wikipedia. The Spanish system uses, among
others, cross-language links and English Wordnet for query expansion. For Dutch we were able to
use an existing full parsing system for syntactic analysis of the question. For Spanish, an existing
POS tagger was combined with a NE tagger trained on CONLL data.</p>
      <p>It should be noted that the Spanish system was still under development when results had to
be submitted. As a consequence, we were not able to submit a single run combining the results for
Dutch and Spanish. We describe the Dutch and Spanish system in sections 2 and 3, respectively.
Some suggestions for future work are given in section 4.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Dutch</title>
      <p>The Dutch system for the GikiCLEF task consists of a module for query analysis, which predicts
approppriate categories for a given query, and which tries to identify additional constraints that
returned pages must satisfy, and an IR component which returns the most relevant pages based
on an index that was developed for the QA task of CLEF 2008. We use a simple ranking scheme
that prefers pages that satisfy the categorical constraints, the additional constraints (if any are
found), and finally ranks all pages that satisfy these conditions on the basis of the IR score. At
most 15 pages are returned.</p>
      <p>
        The goal of query analysis is to identify words in the input that can be used to predict the
most appropriate Wikipedia category for a query (e.g. African capital), and to identify additional
constraints that matching pages have to satisfy (e.g. more than one million inhabitants). We use
a syntactic parser for Dutch [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to parse the query. From the parse result, we extract root forms,
part-of-speech labels, and dependency relations.
      </p>
      <p>Below, we first describe how category labels are predicted, and next how additional constraints
are identified. We conclude with a discussion of the results obtained for Dutch, including error
analysis.
2.1</p>
      <sec id="sec-2-1">
        <title>Predicting Category Labels</title>
        <p>Wikipedia pages are typically classified into one or more categories such as Dutch author. Queries
often contain phrases that are similar to the category labels used to classify Wikipedia pages (i.e.
writers from the Netherlands. However, as queries rarely contain a phrase that literally matches
a Wikipedia category, we try to find the optimal Wikipedia category matching a query.</p>
        <p>
          To this end, we parsed all category labels used in Dutch Wikipedia using the Alpino parser,
and stored the head noun, its stem, and its root form (for compounds). Furthermore, for each
category a list of content words modifying the head noun is stored. Finally, head nouns are linked
to a corresponding word in a Dutch Wordnet (Cornetto1, [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]). An identity-link is established
if there are no modifiers, and the head noun is found in the wordnet. An isa-link is established
if the root form can be found in the wordnet. For instance the category labels Attractiepark
(Amusement park), Berg in Chili (Mountain in Chile) and Amerikaans stripauteur (American
comic book author) are stored as
hAttractiepark, Atrractiepark, atrractie park, park, [ ], ident, dn24615i
hBerg in Chili, Berg, berg, berg, [ Chili ], isa, dn32731i
hAmerikaans stripauteur, stripauteur, strip auteur, auteur, [ Amerikaans ], isa, dn25105i
        </p>
        <p>Given a query parsed by Alpino, we now determine the most appropriate Wikipedia category
labels as follows:
1. Potential head nouns and their modifiers are identified. Head nouns are linked to all wordnet
ids for this noun,
1http://www2.let.vu.nl/oz/cltl/cornetto/
2. Wikipedia categories are retrieved whose wordnet id matches with one of the ids found for
the given headnoun,
3. A score for the retrieved category is computed, based on the overlap between modifiers
present in the query and the category label,</p>
        <sec id="sec-2-1-1">
          <title>4. The highest scoring category labels are selected.</title>
          <p>
            Potential head nouns are all nouns in the query which are not themselves part of a phrase
that is a modifier to a noun. Note that we do not use the isa-relationships between categories in
Wikipedia. This was motivated by [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ], who observe that the Wikipedia category system contains
many non-taxonomic links. As an alternative, they propose to link categories for individual
Wikipedia pages to (English) WordNet word senses, and to use the WordNet hypernym relations
as an alternative for category isa-relations. We created a similar resource for Dutch [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ].
          </p>
          <p>Given a query such as Welke Nederlandse violisten... (which Dutch violin players...), the noun
violisten (with stem violist) is identified as a potential head. Matching categories are Duits violist,
Amerikaans jazzviolist, Amerikaans altviolist, Nederlands violist, Nederlands jazzviolist, etc. The
latter two are selected as the most relevant categories, as they contain a modifier Nederlands that
also occurs as a modifier in the query.</p>
          <p>
            The link to the wordnet allows us to search using synonym and hypernym relations. That is, if
a query contains the word schrijver (writer), we might still consider Amerikaans stripauteur as a
matching category label, as schrijver and auteur (author) are synonyms. Given the query musikant
(musician), we also find violist. A problem with this approach is that the linking of Wikipedia
labels to wordnet senses requires word sense disambiguation, as most nouns have multiple meanings
in wordnet. We solved this problem for Dutch by choosing the predominant word sense on the
basis of distributional similarity data obtained from a large Dutch corpus, following the idea of [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ].
This method is not perfect, however, and this has a negative effect on the selection of Wikipedia
categories. All categories for sharks and snakes, for instance, are linked to wordnet senses denoting
negative characterizations of female persons. As a consequence, queries for vrouw (women) may
match with Wikipedia categories for snakes and sharks.
2.2
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Templates</title>
        <p>Apart from a categorical constraint, queries often impose numerical or geographical constraints
on what constitutes a valid answer: geboren in het Bohemer woud, geboren in Alaska (born in
Bohemia, born in Alaska), met twee of drie Michelinsterren, met meer dan 10.000 studenten (with
two or three Michelin stars, with more than 10,000 students). Such constraints are not easily
checked using a plain IR engine, as a page may contain both the words born and Alaska, without
containing the information that Alaska was the birthplace of the entity described by the page.
Numerical expressions like more than 10,000 students can be satisfied by pages which do not
contain the number 10,000.</p>
        <p>Many Wikipedia pages contain a so-called infobox, expressing the most relevant information
for a given entity. For instance, web pages for universities usually mention the number of students
in the infobox (see figure 1).</p>
        <p>
          We stored all information in all infoboxes in Dutch Wikipedia and stored the result as relation
tuples hPage, Attribute, Valuei. As English Wikipedia typically contains more elaborate infoboxes
for a given entity than the corresponding Dutch page, we also automatically expanded the set of
relation tuples with tuples harvested from English Wikipedia. Attribute names were automatically
translated into Dutch as well (see [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] for details).
        </p>
        <p>For queries such as Nederlandse universiteiten met meer dan 10.000 studenten (Dutch
universities with more than 10,000 students), question analysis finds the constraint more than(students,
10000). For potentially matching pages P , we can now check whether there exists a tuple hP ,
students, V i where V &gt; 10000.
Our system returned 638 answers, 36 were correct. Thus we obtained a precision of 0.05 and a
GikiCLEF score of 2.4 (7th). For Dutch only, we returned 502 answers, of which 22 were correct
(0.04 precision, 0.9 GikiCLEF score).</p>
        <p>Identification of appropriate Wikipedia categories on the basis of the queries turned out to
be hard. In only 17 out of 50 questions, one or more categories were identified that were
considered appropriate for identifying the correct pages. In 9 cases, no category could be found.
This happened for instance for queries containing uncommon (compound) nouns (basiselementen,
Formule-1-rijders, talentenjachtwinnaars (basic elements, Formula 1 drivers, talent show winners),
but also for common nouns such as plaatsen, bergtoppen en skioorden (mountain tops, ski resorts,
and places), which are not used as category labels in Wikipedia, and which also could not be linked
to categories using synonyms. Errors are caused by nouns such as landen and rivieren (countries
and rivers), which correspond to highly general Wikipedia categories, for which several tens of
more specific subcategories exist, covering a large number of Wikipedia pages. Such categories
are not very selective, and easily lead the system to prefer results from a subcategory unrelated
to the input question. Sometimes, the wrong noun in the query was selected as the head noun of
the category.</p>
        <p>Additional numerical and geographical constraints were only correctly identified for two
questions. This module was not effective during the experiments.</p>
        <p>Finally, we noted that the IR module performed poorly in terms of recall. This means that
many of the pages that fall within one of the categories identified for a question were not in the
set of pages retrieved by means of IR. As a consequence, such pages cannot be ranked on the basis
of their IR score.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Spanish</title>
      <p>The Spanish system was developed to extract entities (wikipages) from the Wikipedia Spanish
Collection according an input topic given in natural language. The input topics are parsed to
obtain noun phrases (NPs) representing two types of information: The target named entity (NE)
type and the restrictions or constraints on the desired NEs.</p>
      <p>The target NE type is defined with a set of Wikipedia categories and a set of Yago and
DBpedia classes. These items are obtained using a text search on an IR engine (Lucene). We
built independent indexes containing the Wikipedia categories titles and the list of Yago and
DBpedia types. Although this matching can be also done using a simple look-up table, we think
that this approach is more convenient because the query expansion methods described in section
3.3 can be applied in a transparent fashion.</p>
      <p>A candidate entity set is constructed from the wikipages belonging to these Wikipedia
categories. Given that the wikipages of the same category can have different NEs types, we clean the
set by ignoring the entities that do not correspond to the target YAGO/DBPedia classes.</p>
      <p>To evaluate the restrictions on the NEs, we first map the NPs that specify constraints to
Wikipedia categories as it was done with the mapping of the NE type. Additionally, we obtain the
wikipages associated with the NEs mentioned in these NPs. We construct a set of wikipages for
each NP adding the members of the categories found and the wikipages of the NE. The phrases
that cannot be mapped to categories or do not mention any NE are matched in the text content
or infobox of the set of wikipages constructed.</p>
      <p>
        This matching is done using the IR engine, for this purpose a temporal index is created with
the pages of these sets. Pages scored below a threshold are deleted. In the final step, we create
a second set of candidate entities by including the entities that point to or are pointed from the
entities of the restriction sets, as it is done in the WikipediaListQA@wlv system [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This set is
also filtered to include only those entities that match the target NE type. We return as result the
intersection of the two candidate entity sets (the one created with the first target NE type noun
phrases and the one created with the restriction noun phrases). We return only one set in case
the other one is empty. In the following sections we describe in further details the main features
used in the system.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Shallow parsing of the input topic</title>
        <p>The input topics are parsed by first tagging the tokens with their POS and then extracting the
NPs. We observed that the NE type is commonly specified in the first NP encountered in the user’s
topic. This is the case for all the GikiCLEF topics in Spanish. Further information concerning
these NEs (restrictions) is specified in the NPs functioning as object of the sentence or
postmodifiers of the main NP. These two NPs are frequently separated by relative pronouns such as
that or in which. In more complex sentences the phrases are separated by verb elements. Thus,
first the verb phrases or relative pronouns are detected to split the NPs that refer to the NE type
and the ones that refer to the restrictions. This is done by matching the tokens with the relative
pronoun POS tags and with a list of common particles used to introduce relative clauses such as:
en los cuales (in which), en la que (inside which), por el que (along which), a la que (whom) and
so on. Similarly, verb phrases are extracted by matching the rule (Aux Verb)* (Verb) (Adv)*.
This process leads to a list of NPs in which the first element defines the NE type. Each one of
these NPs is further subdivided to allowing only prepositional phrases as post-modifiers. This
further splitting is carried out because the motivation is to match Wikipedia Categories to these
NPs and the categories are commonly described by short NPs.</p>
        <p>Applying this method on the topic Nombre los lugares de Italia que haya visitado Ernest
Hemingway a lo largo de su vida (List the Italian places where Ernest Hemingway visited during
his life) lead to the following phrases: {lugares de Italia}, {Ernest Hemingway}, {a lo largo de
su vida}.</p>
        <p>
          The POS tagging is performed using the OpenNLP POS Spanish tagger which employs a
maximum entropy model to predict the POS of each word. We encountered some inaccuracies using
this tagger on the GikiCLEF topics and other simple sentences, making it hard to identify phrases
using methods based on the POS. For this reason a procedure was designed to tune the results
and increase the accuracy of the tagger. First, missing nouns and numbers are detected using the
Stanford NER parser[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]2. Misclassified tags belonging to the closed POS classes such as
determiners, conjunctions and prepositions are corrected by looking up a table with a comprehensive
list of words with these POS classes. This list is extracted from the EAGLES tag definition of the
Technical University of Catalonia (UPC) and from Wikipedia. Currently this list contains around
420 items. Further inconsistencies are detected by checking contextual and lexical rules to find
sequences of POS that are unfeasible in the Spanish grammar. Similar rules are used to detect
the most likely correct tag.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Yago and DBpedia resources</title>
        <p>
          Yago is a semantic knowledge base extracted from Wikipedia and WordNet [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Yago contains
more than 2 million entities and 20 million facts about these entities which are available to the
public under the GNU license. Particularly, we are utilizing the Yago type definition which assigns
a set of types to each Wikipage in the English collection. This ontological classification is based
on the conceptual categories of Wikipedia and the WordNet synsets[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. We found 4930 different
types in the data which were translated to Spanish by using the cross-lingual dictionary and then
cleaning and completing the results by hand. The cross-lingual dictionary extracted contains
112.099 entries.
        </p>
        <p>
          Conversely, DBpedia is a community effort to extract information from Wikipedia and interlink
this information with other knowledge bases available on the Web [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. We particularly employed
the types defined in the DBpedia Ontology and the set of types assigned to the wikipages in
the English collection. This ontology was hand-generated from the Wikipedia infoboxes and
it contains 170 classes. Although the hand-generated mapping does not cover all the possible
infoboxes and properties present in the Wikipedia collection, the most frequent infoboxes are
included and normalized. Since the types are defined only for English, we translate the 170 classes
to Spanish manually.
        </p>
        <p>The filtering of entities in a set is performed applying the following steps for each entity:
1. Obtain the English name of the Spanish Wikipage using the cross-lingual links</p>
        <sec id="sec-3-2-1">
          <title>2. Fetch the types of the English Wikipage in the Yago/DBpedia data</title>
          <p>3. Intersect the types fetched in step 2 with the target NE types obtained from the input
question. If the size of the intersection is below n3, discard the Spanish Wikipage under
evaluation.
3.3</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Query processing and expansion</title>
        <p>The queries are constructed including the nouns, adjectives and adverbs of the NPs. These tokens
are expanded using three query expansion methods:
3.3.1</p>
        <p>Ontological Expansion
In this expansion method we employed the DBpedia ontology to include in the query all the types
under the hierarchy of the DBpedia types found in the input NPs. In this ontology the classes
and subclasses are related with the hypernymy semantic relation. For instance if the user requires
information about german artists, the algorithm expand the terms to include german writers,
comedians, actors, etc., since writer, comedian and actor are types under the type artist as it is
shown in figure 2.</p>
        <p>
          2This NER was trained using the Spanish data provided in the shared task of CoNLL 2002 [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
3In our system we set n to 2
WordNet [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] is used to include the synonyms of the nouns in the input sentence. We made use
of the cross-lingual links of Wikipedia to translate the target Spanish word to English. Then we
extracted the synonyms of the English nouns and these are translated back to Spanish using again
the cross-lingual links. Although the coverage of this method is low given that we are only using
the dictionary built from the cross-lingual links, we found that this method led to a slight increase
of the performance of the system.
3.3.3
        </p>
        <p>Redirect Links Expansion
We found that the redirect links can be exploited to normalize and nominalize entity names.
These links are used in Wikipedia to let the user refer to an entity using alternative names and
word forms such as: pseudonyms (e.g. Samuel Langhorne Clemens redirects to Marc Twain),
abbreviations, common misspelling forms (e.g. Condoleeza Rice redirects to Condoleezza Rice),
alternative spellings (e.g. colour redirects to color ) and other adjectival forms (e.g. Peruvian
redirects to Peru). The dictionary of redirect links extracted from the GikiCLEF Wikipedia
collection contains 113.790 Spanish entries. Nonetheless missing pairs of country-demonyms were
added to the dictionary since this type of information is frequently used in the GikiCLEF topics.
3.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>Results</title>
        <p>Unfortunately by the official submission deadline the system was still on its middle development
stages, for this reason it was only possible to submit a result list with a system implementing very
few of the features described above which lead to poor results. Nonetheless, we collected all the
results submitted by the 17 participants to build a list with the correct answers for the 50 topics
included in this track edition. We utilized this list to evaluate the performance of our final system
and evaluate the impact of the ontological resources and the query expansion methods. This list
contains 105 correct answers in Spanish.</p>
        <p>We evaluated our system using all the query expansion procedures, the category expansion and
the ontological resources and we obtained a GikiCLEF score of 4.08 (0.168 precision and 0.238
recall). These results are highly promising given that the system would rank in the fourth position
among the 17 participants of the task (considering only Spanish) and because we only processed
the Spanish Wikipedia collection, which is significantly less developed and completed than the
English collection.</p>
        <p>A baseline system was set to evaluate the performance gain obtained by the used of the DBpedia
resources, the query expansion methods and the category expansion. This system excludes the
used of all these features. The results are summarized in table 1. From these results we observed
that the query expansion methods and the Yago/DBpedia type filtering provide the greatest overall
performance gain in the system.</p>
        <p>Similarly, we evaluate the performance gain obtained by each query expansion method. As
baseline we include the Yago/DBpedia filtering because the query expansions techniques are also</p>
        <sec id="sec-3-4-1">
          <title>Setting</title>
          <p>Baseline (B)
B. +Yago/DBpedia
B. +Query Expansion
B. +Category Expansion
used in the matching of types. Results are summarized in table 2. We found that the query
expansion based on the Wikipedia redirect links contributes the most to increase the overall
performance of the system. Redirect links are commonly used to obtain the country that corresponds
to demonyms or adjectival forms expressed in the topics. This situation appears in 46% of the
Spanish GikiCLEF topics.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>We have developed two systems for Wikipedia entity retrieval based on linguistic analysis of the
query, query expansion, and incorporation of knowledge harvested from Wikipedia itself, and from
external knowledge sources. Both systems have their own strenghts and weaknesses. Linguistic
analysis is smoother for Dutch, given the fact that we could use a full parser in combination with a
Dutch Wordnet. The Spanish system uses more shallow syntactic analysis and consults Wordnet
through cross-language links. On the other hand, the IR component of the Spanish system is
much more sophisticated and more targeted towards the task of entity retrieval.</p>
      <p>An obvious direction for future work is to develop an integrated system that employs
sophisticated IR for both languages, and which also remedies some of the weaker aspects of the linguistic
analysis for Spanish.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S</given-names>
            <surname>¨oren Auer</surname>
          </string-name>
          , Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and
          <string-name>
            <given-names>Zachary</given-names>
            <surname>Ives</surname>
          </string-name>
          .
          <article-title>Dbpedia: A nucleus for a web of open data</article-title>
          .
          <source>The Semantic Web</source>
          , pages
          <fpage>722</fpage>
          -
          <lpage>735</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bouma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Duarte</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Islam</surname>
          </string-name>
          .
          <article-title>Cross-lingual Alignment and Completion of Wikipedia Templates</article-title>
          .
          <source>In Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3)</source>
          , pages
          <fpage>61</fpage>
          -
          <lpage>69</lpage>
          , Boulder, Colorado,
          <year>2009</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Gosse</given-names>
            <surname>Bouma</surname>
          </string-name>
          .
          <article-title>Linking Dutch Wikipedia Categories to EuroWordNet</article-title>
          .
          <source>In Proceedings of the 19th Computational Linguistics in the Netherlands meeting (CLIN 19)</source>
          . Groningen, the Netherlands,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Carreras</surname>
          </string-name>
          .
          <article-title>Resources on named entity recognition and classification</article-title>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Santos</surname>
          </string-name>
          et al.
          <article-title>Getting geographical answers from Wikipedia: the GikiP pilot at CLEF</article-title>
          .
          <source>In Cross Language Evaluation Forum: Working Notes for the CLEF 2008 Workshop</source>
          , Aarhus, Denmark,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Christiane</given-names>
            <surname>Fellbaum. WordNet</surname>
          </string-name>
          :
          <article-title>An Electronic Lexical Database</article-title>
          . MIT, Cambridge,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Finkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Grenager</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <article-title>Incorporating non-local information into information extraction systems by Gibbs sampling</article-title>
          .
          <source>In Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Diana</surname>
            <given-names>McCarthy</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Rob</given-names>
            <surname>Koeling</surname>
          </string-name>
          , Julie Weeds,
          <string-name>
            <given-names>and John</given-names>
            <surname>Carroll</surname>
          </string-name>
          .
          <article-title>Unsupervised acquisition of predominant word senses</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>33</volume>
          (
          <issue>4</issue>
          ):
          <fpage>553</fpage>
          -
          <lpage>590</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Fabian</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Suchanek</surname>
            , Gjergji Kasneci, and
            <given-names>Gerhard</given-names>
          </string-name>
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          .
          <article-title>Yago: a core of semantic knowledge</article-title>
          .
          <source>In WWW '07: Proceedings of the 16th international conference on World Wide Web</source>
          , pages
          <fpage>697</fpage>
          -
          <lpage>706</lpage>
          , New York, NY, USA,
          <year>2007</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Gertjan van Noord</surname>
          </string-name>
          .
          <article-title>At last parsing is now operational</article-title>
          . In Piet Mertens, Cedrick Fairon, Anne Dister, and Patrick Watrin, editors,
          <source>TALN06. Verbum Ex Machina. Actes de la 13e conference sur le traitement automatique des langues naturelles</source>
          , pages
          <fpage>20</fpage>
          -
          <lpage>42</lpage>
          .
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Vossen</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Maks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Segers</surname>
          </string-name>
          , and
          <string-name>
            <surname>H. van der Vliet.</surname>
          </string-name>
          <article-title>Integrating lexical units, synsets, and ontology in the cornetto database</article-title>
          .
          <source>In Proceedings of LREC-2008</source>
          , Marrakech, Morocco,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>