<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Obama was born on August</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Extending the Coverage of DBpedia Properties using Distant Supervision over Wikipedia</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessio Palmero Aprosio</string-name>
          <email>alessio.palmero@unimi.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudio Giuliano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Lavelli</string-name>
          <email>lavellig@fbk.eu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler</institution>
          ,
          <addr-line>Via Sommarive 18, 38123 Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universita` degli Studi di Milano</institution>
          ,
          <addr-line>Via Comelico 39/41, 20135 Milano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1961</year>
      </pub-date>
      <volume>4</volume>
      <issue>1961</issue>
      <abstract>
        <p>DBpedia is a Semantic Web project aiming to extract structured data from Wikipedia articles. Due to the increasing number of resources linked to it, DBpedia plays a central role in the Linked Open Data community. Currently, the information contained in DBpedia is mainly collected from Wikipedia infoboxes, a set of subject-attribute-value triples that represents a summary of the Wikipedia page. These infoboxes are manually compiled by the Wikipedia contributors, and in more than 50% of the Wikipedia articles the infobox is missing. In this article, we use the distant supervision paradigm to extract the missing information directly from the Wikipedia article, using a Relation Extraction tool trained on the information already present in DBpedia. We evaluate our system on a data set consisting of seven DBpedia properties, demonstrating the suitability of the approach in extending the DBpedia coverage.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Wikipedia is one of the most popular web sites in the world and the most used
encyclopedia. In addition, Wikipedia is steadily maintained by a community of thousands of
active contributors, therefore its content represents a good approximation of what people
need and wish to know. Finally, Wikipedia is totally free and it can be downloaded
entirely thanks to periodic dumps made available by the Wikipedia community. For these
reasons, in the last years several large-scale knowledge bases (KB) have been created
exploiting Wikipedia. DBpedia [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Yago [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] and FreeBase [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] are relevant examples of
such resources.
      </p>
      <p>
        In this work, we are particularly interested in DBpedia.3 Created in 2006, DBpedia
has grown in size and popularity, becoming one of the central interlinking hubs of
the emerging Web of Data. The approach adopted to build DBpedia is the following.
First, the DBpedia project develops and maintains an ontology, available for download
in OWL format. Then, this ontology is populated using a rule-based semi-automatic
approach that relies on Wikipedia infoboxes, a set of subject-attribute-value triples that
represents a summary of some unifying aspect that the Wikipedia articles share. For
example, biographical articles typically have a specific infobox (Persondata in the
English Wikipedia) containing information such as name, date of birth, nationality,
3 http://www.dbpedia.org/
activity, etc. Specifically, the DBpedia project releases an extraction framework used
to extract the structured information contained in the infoboxes and to convert it into
triples. Moreover, crowdsourcing is used to map infoboxes and infobox attributes to
the classes and properties of the DBpedia ontology, respectively. As the number of
required mappings is extremely large, the whole process follows an approach based on
the frequency of infoboxes and infobox attributes. Most frequent items are mapped first.
This guarantees a good coverage because infoboxes are distributed according the Zipf’s
law [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. This mapping process is divided into two different steps. First, the infobox is
mapped to the corresponding class in the DBpedia ontology; then, infobox attributes
are mapped to the properties owned by that class. Using the extraction framework
provided by the DBpedia community, pages containing such infobox are automatically
assigned to the class, and for each page the infobox attributes are used to populate the
corresponding properties. For example, the Infobox journal infobox is mapped
to the AcademicJournal DBpedia class, and its attributes title, editor, and
discipline are then mapped, respectively, to the properties foaf:name, editor,
and academicDiscipline. Finally, the resulting KB is made available as Linked
Data,4 and via DBpedia’s main SPARQL endpoint.5
      </p>
      <p>At the time of starting the experiments reported in this paper, the version of the
English DBpedia available (3.8) covers around 1.7M entities, against almost 4M articles
in Wikipedia. The main reason for this problem of coverage is the lack of infoboxes for
some pages (and also the limited coverage of the mappings mentioned above). In fact,
even if Wikipedia provides an infobox suitable for a page, it may happen that the users
who write the article do not know how to specify it in the source code of the page, or
simply they do not know that infoboxes (or that particular infobox) exist.</p>
      <p>
        Recently, in 2013, a project called Airpedia [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] addressed the problem of extending
the DBpedia coverage with respect to classes. The working hypothesis is that the article
class (i.e. Person, Place, etc.) can be inferred by some features of the page, for
example the list of categories and the latent semantic analysis of the text.
      </p>
      <p>
        In this paper, we extend this approach to properties. There are projects aiming to
extract properties from some structured parts of the page different from infoboxes. For
example, Yago exploits categories [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. However, such approach is feasible only for a
small number of attributes (for example the Wikipedia page Barack Obama is included
in the 1961 births category, from which it can be inferred that Obama’s birth year
is 1961). Therefore, to populate the whole DBpedia set of properties (over 1,500 in
version 3.8), we need to find the relevant information right from the page article, using
Natural Language Processing (NLP) tools. This is a relation extraction (RE) task, i.e. the
identification in a text of relevant entities and relationships between them. For example,
given the sentence “Barack Obama was born on August 4, 1961”, we need to identify
“Barack Obama” as a named entity of type person, the value “August 4, 1961” as a date,
and the birthDate relation between the two objects.
      </p>
      <p>
        Supervised machine learning techniques are widely used to approach the RE task, but
the lack of manually annotated texts to use as training data often limits the applicability
of such techniques. In 2009, a new paradigm, called distant supervision [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], has been
      </p>
    </sec>
    <sec id="sec-2">
      <title>4 http://wiki.dbpedia.org/Downloads</title>
    </sec>
    <sec id="sec-3">
      <title>5 http://dbpedia.org/sparql</title>
      <p>
        proposed to deal with the limited availability of manually annotated data. The intuition
behind distant supervision is that any sentence containing a pair of entities that participate
in a known DBpedia property relation is likely to express such relation in some way.
Using the example above, the assumption is that a sentence that includes both “Barack
Obama” and “August 4, 1961” is expressing the birthDate relation. Since there are
thousands of such pairs of entities in the DBpedia resource, we can extract very large
numbers of (potentially noisy [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]) examples for each relation.
      </p>
      <p>
        In this work, we first collect this set of sentences starting from DBpedia and extracting
the relevant sentences from the corresponding Wikipedia articles. Then, we train a RE
tool (jSRE [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], freely available on the web) using positive and negative examples
extracted from such sentences. Finally, we apply the model on unseen text articles and
extract the relations contained in the sentences.
      </p>
      <p>We evaluate our system on seven DBpedia properties, using cross-validation over
a small set of pages excluded from the training, demonstrating the suitability of the
approach with high precision and recall.</p>
      <p>
        The work reported in this paper is part of a wider effort devoted to the automatic
expansion of DBpedia [
        <xref ref-type="bibr" rid="ref16 ref17 ref18">16–18</xref>
        ] and tackles the issue of extracting properties for those
pages where the DBpedia paradigm cannot be applied (for example when the infobox is
missing). Table 1 summarizes the different steps of such effort on DBpedia expansion.
The main reference for our work is the DBpedia project [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Started in 2006, its goal is
to build a large-scale knowledge base semi-automatically using Wikipedia. Differently,
Yago [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], another similar project started in 2007, aims at extracting and mapping
entities from Wikipedia using categories (for fine-grained classes) and WordNet (for
upper-level classes). Yago uses particular categories to map properties (for example
the Wikipedia page Albert Einstein is included in the American physicists
category, from which it can be inferred that Albert Einstein’s occupation is physicist),
but this method can cover only a small number of relations (52 in Yago 2 [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]) and a big
effort is needed to port it to other languages. Conversely, FreeBase [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and WikiData [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]
are collaborative knowledge bases manually compiled by their community members.
      </p>
      <p>
        The distant supervision paradigm [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], presented in 2009, has been widely used for
RE purposes [
        <xref ref-type="bibr" rid="ref11 ref15 ref7">15, 11, 7</xref>
        ]. The basic assumption that each sentence which mentions the
entities involved in the relation is an expression of the relation itself produces the effect
that noisy examples can be present in the training. In [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] a new approach to distant
supervision is proposed, dealing with the presence of noisy examples in the training. A
survey on noise reduction methods for distant supervision is discussed in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>
        In addition, distant supervision has been recently tested on the web, to obtain rule sets
large enough to cover the actual range of linguistic variation, thus tackling the long-tail
problem of real-world applications [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], for sentiment analysis in social networks [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
fact checking [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], and question answering [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Finally, the problem of an automatic expansion of the DBpedia dataset is tackled from
various perspectives. On the one hand, starting from version 3.7, DBpedia is available
for different languages. As the corresponding versions of Wikipedia do not share the
same infobox structure, the manual effort needed to manually build mappings needs to
be multiplied by the number of versions of DBpedia. Some automatic approaches to
this problem has been proposed in the last year [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ]. On the other side, DBpedia
only considers Wikipedia pages that contain an infobox (and for which the mapping was
provided). The Airpedia project [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] deals with this problem, using a machine learning
approach to guess the DBpedia ontology class of a page using features extracted from
the entire Wikipedia page.
3
      </p>
      <sec id="sec-3-1">
        <title>Workflow</title>
        <p>As introduced before, the work presented in this paper relies on the intuition that
jointly exploiting interlinked structured and unstructured data sources can offer a great
potential for both NLP and Semantic Web applications. In particular, we focus on the
pair Wikipedia-DBpedia as corpus-KB and on distant supervision as paradigm.</p>
        <p>Wikipedia6 is a collaboratively constructed online encyclopedia: its content is free to
reuse, and can be edited and improved by anyone following the rules of the edition
process. Wikipedia articles can also contain structured elements like the infobox, providing
factual data in the form of values associated to fields, that can be easily extracted.</p>
        <p>DBpedia is a database derived from Wikipedia. Since Wikipedia pages contain a lot
of potentially useful data, the Semantic Web community started an effort to extract data
from Wikipedia infoboxes and then to publish them in a structured format (following the
open standards of the Semantic Web) to make them machine-readable. DBpedia releases
its dataset in RDF format, a conceptual model expressed in the form of
subject-predicateobject. Example of an instance of the dataset (triple) is
hBarack Obamai his born ini hHonolului
(1)
where “Barack Obama” is the subject, “is born in” is the predicate, and “Honolulu”
is the object. The set of possible values of subject and object are called domain and
range, respectively. In our experiments, we only consider triples involved in relations
between entities, expressing DBpedia properties, therefore the predicate represents the
relation/property, and the subject-object pair refers to the entities involved in that relation.
In particular, in DBpedia the subject of such a triple is always related to a Wikipedia
page.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6 http://www.wikipedia.org/</title>
      <p>The distant supervision paradigm is based on the assumption that there is a high
probability that the structured information present in the infobox is also expressed using
natural language sentences in the same Wikipedia page. Given the triple (1), we expect
that there is a sentence in the text article expressing the same relation with the same
entity mentions, like</p>
      <p>Obama was born on August 4, 1961 at Kapiolani Maternity &amp; Gynecological
Hospital in Honolulu, Hawaii, and is the first President to have been born in
Hawaii.</p>
      <p>Summarizing, for each DBpedia property we apply the following procedure:
1. all the triples expressing the relation are considered;
2. for each triple, the corresponding Wikipedia article is analyzed using Stanford</p>
      <p>CoreNLP (Section 4);
3. the sentences containing both the subject and the object of the triple are extracted
and collected as positive examples (Section 5.1);
4. a set of negative examples is collected, too (Section 5.2);
5. a RE tool is trained using the dataset built according to the procedure outlined above
(Section 5.3);
6. the trained model is then applied to extract the desired relation from article pages
where the infobox is missing or where the infobox does not contain such relation.</p>
      <p>The main part of the procedure is the RE task. Formally, given a sentence S which
consists of a sequence of words, we need to find a relation R that involves two sequences
of words E1 and E2, respectively the subject and the object of the relation. In our
experiments, we use jSRE,7 a state-of-the-art open source RE tool, made freely available
on the web.</p>
      <p>To assess to performance of the approach, we test the accuracy of the entire workflow
on a dataset consisting of seven DBpedia properties (Section 6).
4</p>
      <sec id="sec-4-1">
        <title>Pre-processing</title>
        <p>The preliminary phase of our approach consists in collecting Wikipedia pages where
a particular relation is (likely to be) expressed. The list of such pages can be easily
extracted from DBpedia.</p>
        <p>Given the Wikipedia page, the plain text article is obtained by removing tables,
images and all the wiki markup. We use JWPL8 for such purpose.</p>
        <p>The cleaned up text is then analyzed using Stanford CoreNLP9 with the following
processors: tokenization, sentence splitting, part-of-speech tagging, lemmatization and
Named Entity Recognition (NER). In particular, the NER module of Stanford CoreNLP
annotates persons, organizations, locations, numbers and dates. In addition, we use the</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>7 http://hlt.fbk.eu/en/technology/jSRE</title>
    </sec>
    <sec id="sec-6">
      <title>8 https://code.google.com/p/jwpl/</title>
    </sec>
    <sec id="sec-7">
      <title>9 http://nlp.stanford.edu/software/corenlp.shtml</title>
      <p>Stanford CoreNLP tag MISC for all the other DBpedia classes (Work, Event, and so
on). Finally, we connect each of these types to the corresponding DBpedia type/class.
Boolean properties are not considered as they affect only 4 relations out of over 1,700 in
the DBpedia ontology. See Table 2 for more information.
hPERiObamah/PERi was born on hDATEiAugust 4, 1961h/DATEi at
hORGiKapiolani Maternity &amp; Gynecological Hospitalh/ORGi in
hLOCiHonoluluh/LOCi, hLOCiHawaiih/LOCi, and is the first President to
have been born in hLOCiHawaiih/LOCi.</p>
      <p>The sentence contains both hPERi and hLOCi, the conversion of domain and range
type of relation his born ini in DBpedia, respectively (see Section 4). Therefore it can be
a candidate as a positive example for the relation. While there is a hLOCi part containing
the range of the relation (Honolulu), the complete string of the domain (Barack Obama)
never appears in the sentence, so an approach based on exact string matching would
erroneously discard this sentence. To avoid this behavior and increase the recall of this
extraction step, we apply different matching strategies. First of all, we perform the
exact match of the entire string as provided by Wikipedia. If the algorithm does not
find such string in the sentence, we clean it deleting the part between brackets, used to
disambiguate pages with the same title, as in “Carrie (novel)” and “Carrie (1976 film)”.
We use the resulting string for matching the domain in the sentence. If this fails, the
original string is tokenized and, given the set of obtained tokens, new strings are built
by combining the tokens (preserving the original word order). For instance, starting
from “John Fitzgerald Kennedy”, we obtain the new strings “John Fitzgerald”, “John
Kennedy”, “Fitzgerald Kennedy”, “John”, “Fitzgerald” and “Kennedy”. Using this rule,
in our example we can identify the hPERi part containing the string “Obama”.</p>
      <p>For numeric entities, we do not use exact matching between the value stored in
DBpedia and the number extracted by Stanford CoreNLP, as for some relations (such as
populationTotal) they may be slightly different. Given two numeric values a and
b, we then consider a positive match between them when a and b are both different from
0, and the ratio ja bj = jbj is less than 0.05 (5%).
5.2</p>
      <sec id="sec-7-1">
        <title>Selecting sentences</title>
        <p>
          Supervised machine learning tools need annotated data to be trained. Training (and test)
sets consist of both positive and negative examples: the former are examples where the
relation is present; the latter are examples where the relation is not expressed. The distant
supervision paradigm uses structured data to collect positive examples, following the
assumption that, if two entities participate in a relation, then all the sentences containing
the two entities express such relation; however, this is not always true [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] (see below).
        </p>
        <p>In our experiment, we use the hypothesis that a sentence containing the domain part
of the relation, and not containing its range but another entity of the type of the range, is
a good negative example for the relation his born ini. For example, the sentence
Following high school, Obama moved to Los Angeles in 1979 to attend
Occidental College.
contains “Obama”, and does not contain “Honolulu” but contains “Los Angeles”.
Therefore we pick this sentence as a negative example for the relation.</p>
        <p>This simple rule is sufficient for building a training set for relations where there is
no ambiguity. For example, in a biographical article in Wikipedia the date of birth is
usually used in the birth date relation only. In addition, other dates in the same sentences
certainly refer to different relations, as the birth date of a person is unique.</p>
        <p>However, there are relations where these assumptions are not necessarily true, since
the same pair subject-object can be involved in more than one relation. Therefore, we
apply different strategies for the extraction of the training data.</p>
      </sec>
      <sec id="sec-7-2">
        <title>First strategy: positives cannot be negatives. In the sentence</title>
        <p>the entity “Honolulu” refers to the birth place. Unfortunately, also the other hLOCi
instance (“Hawaii”) refers to the birth place (although it may not be included in the
DBpedia resource). To avoid this problem, when collecting our training set we discard
potential negative examples taken from a sentence already used to extract a positive one.</p>
      </sec>
      <sec id="sec-7-3">
        <title>Second strategy: only one sentence per relation. In the sentence</title>
        <p>In 1971, Obama returned to Honolulu to live with his maternal grandparents,
Madelyn and Stanley Dunham.
both “Obama” and “Honolulu” are present but the relation between them is different
from birth place.</p>
        <p>
          [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] tackle this problem by assuming that, if two entities participate in a relation, at
least one sentence that mentions these two entities might express that relation. Using
this assumption, they trained a graphical model with the optimization of the parameters
to ensure that predictions will satisfy a set of user-defined constraints. In their work, they
use the New York Times corpus, where pages are less standardized than in Wikipedia.
In our experiment we can rely on a stronger assumption: if two entities participate in a
relation, there is one and only one sentence expressing such relation. We can then discard
those pages not complying with this assumption, i.e. having more than one sentence
containing both domain and range of the relation.
        </p>
        <p>Third strategy: only one relation for each value. Finally, we can take advantage from
the complete set of properties available in DBpedia, by removing from the training set
those pages having more than one relation sharing the same object. For instance, the two
relations
hMark Zuckerbergi hwork fori hFacebooki
hMark Zuckerbergi hfoundedi hFacebooki
involve the same pair subject-object, therefore we cannot disambiguate whether a
sentence in Mark Zuckerberg’s Wikipedia page containing both his name and the company
he founded refers to the former or to the latter relation.
5.3</p>
      </sec>
      <sec id="sec-7-4">
        <title>Training algorithm</title>
        <p>
          As learning algorithm, we use jSRE, a state-of-the-art RE tool described in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The RE
task is treated as a classification problem in supervised learning, using kernel methods
[
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] to embed the input data into a suitable feature space, and then run a classical linear
algorithm to discover nonlinear patterns. The learning algorithm used is Support Vector
Machines (SVM) [
          <xref ref-type="bibr" rid="ref25 ref6">25, 6</xref>
          ].
        </p>
        <p>
          The tool uses two families of kernels: global context kernel and local context kernel.
The first one adapts the ideas in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and uses a bag-of-words of three sets of tokens:
fore-between (tokens before and between the two entities), between (only tokens between
the two entities), and between-after (tokens between and after the two entities). The
second kernel represents the local context using NLP basic features such as: lemma,
part-of-speech, stem, capitalization, punctuation, and so on.
6
        </p>
        <sec id="sec-7-4-1">
          <title>Experiments and evaluation</title>
          <p>We choose seven different DBpedia properties to evaluate the system, covering different
domain-range pairs. For each relation, we extract 50,000 pages for training, 1,000 for
development and 1,000 for test (except for the capital property, for which not enough
instances are available). All the experiments are conducted on a test set built following
the same steps used for training.</p>
          <p>
            The strategy used to calculate precision and recall is One Answer per Slot [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]:
– if the system finds in the document the correct value for the desired relation, we do
not count any false negative for the same relation;
– if the system does not find it (or the value is wrong), we count it as a false negative;
– false positives are not affected by this method, therefore they are all counted when
computing precision and recall.
          </p>
          <p>
            The results show that the application of the three strategies increases the F1 value.
In some cases (birthDate, deathDate, populationTotal) the increase is
negligible, because the corresponding relations are not affected by the issue described in
[
            <xref ref-type="bibr" rid="ref19">19</xref>
            ].
7
          </p>
        </sec>
        <sec id="sec-7-4-2">
          <title>Conclusions and future work</title>
          <p>This paper proposes a method to extract missing DBpedia properties from the article
text of Wikipedia pages. The approach is based on the distant supervision paradigm, and
makes use of supervised machine learning for the extraction.</p>
          <p>
            The accuracy of our approach is comparable to other systems’. However, a precise
comparison is hard to make, because they are applied on different resources and tasks.
In [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ], Yago is used as resource to collect training sentences, while [
            <xref ref-type="bibr" rid="ref24">24</xref>
            ] uses DBpedia
and the distant supervision paradigm for the TAC-KBP slot filling task.
          </p>
          <p>Due to the high variability and complexity of the task, much work is still to be done,
and different issues should be addressed:
– Disambiguation tools and Wikipedia links could be used for sentence retrieving (see</p>
          <p>
            Section 5.1).
– In our experiments we have used jSRE as an out-off-the-shelf tool. We plan to
investigate the use of kernels exploiting Wikipedia-related features, such as internal
links.
– To increase the number of sentences that can be used for training, some approaches
(e.g., [
            <xref ref-type="bibr" rid="ref24">24</xref>
            ]) use shallow coreference resolution using animate pronouns,. In real world
applications, where the number of relations is high and the number of examples
is not, a more sophisticated coreference resolution tool can help to obtain more
training data.
– Distant supervision is a language-independent paradigm, although most of the
resources and approaches concerns only English, and the multi-linguality of the
approach has not been deeply investigated. DBpedia releases its resource in 16
languages, therefore it can be in principle used to apply distant supervision on
languages for which suitable natural language tools are available (such as TextPro10,
OpenNLP11 or Stanbol12). There is a preliminary work on applying distant
supervision on the Portuguese Wikipedia and DBpedia [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ].
10 http://textpro.fbk.eu/
11 http://opennlp.apache.org/
12 http://stanbol.apache.org/
          </p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ives</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>DBpedia: a nucleus for a web of open data</article-title>
          .
          <source>In: Proceedings of the 6th international Semantic Web Conference and 2nd Asian Semantic Web Conference</source>
          . pp.
          <fpage>722</fpage>
          -
          <lpage>735</lpage>
          . ISWC'07/ASWC'07, Springer-Verlag, Berlin, Heidelberg (
          <year>2007</year>
          ), http://dl.acm.org/citation.cfm? id=
          <volume>1785162</volume>
          .
          <fpage>1785216</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Batista</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Forte</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martins</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Extracc¸a˜o de Relac¸ o˜es Semaˆnticas de Textos em Portugueˆs Explorando a DBpe´dia e a Wikipe´dia</article-title>
          .
          <source>Linguamatica</source>
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <fpage>41</fpage>
          -
          <lpage>57</lpage>
          (
          <year>Julho 2013</year>
          ), http://www.linguamatica.com/index.php/linguamatica/ article/view/157
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bollacker</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paritosh</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sturge</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , J.:
          <article-title>Freebase: a collaboratively created graph database for structuring human knowledge</article-title>
          .
          <source>In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data</source>
          . pp.
          <fpage>1247</fpage>
          -
          <lpage>1250</lpage>
          . SIGMOD '08,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2008</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/1376616.1376746
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bunescu</surname>
            ,
            <given-names>R.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mooney</surname>
          </string-name>
          , R.J.:
          <article-title>Subsequence kernels for relation extraction</article-title>
          .
          <source>In: NIPS</source>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cabrio</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cojan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmero</surname>
            <given-names>Aprosio</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Lavelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Gandon</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>QAKiS: an open domain QA system based on relational patterns</article-title>
          . In: Glimm,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Huynh</surname>
          </string-name>
          ,
          <string-name>
            <surname>D</surname>
          </string-name>
          . (eds.)
          <source>International Semantic Web Conference (Posters &amp; Demos)</source>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>914</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2012</year>
          ), http://dblp.uni-trier.de/db/conf/semweb/ iswc2012p.html#CabrioCAMLG12
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cristianini</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shawe-Taylor</surname>
          </string-name>
          , J.:
          <article-title>An Introduction to Support Vector Machines and Other Kernel-based Learning Methods</article-title>
          . Cambridge University Press (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Exner</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nugues</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Entity extraction: From unstructured text to DBpedia RDF triples</article-title>
          .
          <source>In: Proceedings of the Web of Linked Entities Workshop in conjuction with the 11th International Semantic Web Conference</source>
          . pp.
          <fpage>58</fpage>
          -
          <lpage>69</lpage>
          . Boston (
          <year>2012</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>906</volume>
          /paper7.pdf
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Giuliano</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lavelli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romano</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Relation extraction and the influence of automatic named-entity recognition</article-title>
          .
          <source>ACM Transactions on Speech and Language Processing</source>
          <volume>5</volume>
          (
          <issue>1</issue>
          ), 2:
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          :
          <fpage>26</fpage>
          (Dec
          <year>2007</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/1322391.1322393
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Go</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhayani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Twitter sentiment classification using distant supervision</article-title>
          . Processing pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          (
          <year>2009</year>
          ), http://www.stanford.edu/˜alecmgo/papers/ TwitterDistantSupervision09.pdf
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uszkoreit</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Large-scale learning of relation-extraction rules with distant supervision from the web</article-title>
          . In: Cudr-Mauroux,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Heflin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Sirin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Tudorache</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Euzenat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Hauswirth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Parreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Hendler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Schreiber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Blomqvist</surname>
          </string-name>
          , E. (eds.)
          <source>The Semantic Web ISWC 2012, Lecture Notes in Computer Science</source>
          , vol.
          <volume>7649</volume>
          , pp.
          <fpage>263</fpage>
          -
          <lpage>278</lpage>
          . Springer Berlin Heidelberg (
          <year>2012</year>
          ), http://dx.doi.org/10. 1007/978-3-
          <fpage>642</fpage>
          -35176-1_
          <fpage>17</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lange</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Bo¨hm,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Naumann</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>Extracting structured information from Wikipedia articles to populate infoboxes</article-title>
          .
          <source>In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management</source>
          . pp.
          <fpage>1661</fpage>
          -
          <lpage>1664</lpage>
          . CIKM '10,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2010</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/1871437.1871698
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lavelli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Califf</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciravegna</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitag</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giuliano</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kushmerick</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romano</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ireson</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          :
          <article-title>Evaluation of machine learning-based information extraction algorithms: criticisms and recommendations</article-title>
          .
          <source>Language Resources and Evaluation</source>
          <volume>42</volume>
          (
          <issue>4</issue>
          ),
          <fpage>361</fpage>
          -
          <lpage>393</lpage>
          (
          <year>2008</year>
          ), http://dx.doi.org/10.1007/s10579-008-9079-3
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gerber</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morsey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.C.N.:</given-names>
          </string-name>
          <article-title>DeFacto - Deep fact validation</article-title>
          .
          <source>In: International Semantic Web Conference (1)</source>
          . pp.
          <fpage>312</fpage>
          -
          <lpage>327</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Mintz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bills</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snow</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Distant supervision for relation extraction without labeled data</article-title>
          .
          <source>In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 -</source>
          Volume 2. pp.
          <fpage>1003</fpage>
          -
          <lpage>1011</lpage>
          . ACL '
          <volume>09</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, Stroudsburg, PA, USA (
          <year>2009</year>
          ), http://dl.acm.org/citation.cfm? id=
          <volume>1690219</volume>
          .
          <fpage>1690287</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>T.V.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moschitti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>End-to-end relation extraction using distant supervision from external semantic repositories</article-title>
          .
          <source>In: ACL (Short Papers)</source>
          . pp.
          <fpage>277</fpage>
          -
          <lpage>282</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>Palmero</given-names>
            <surname>Aprosio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Giuliano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Lavelli</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Automatic expansion of DBpedia exploiting Wikipedia cross-language information</article-title>
          .
          <source>In: Proceedings of the 10th Extended Semantic Web Conference</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>Palmero</given-names>
            <surname>Aprosio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Giuliano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Lavelli</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Automatic Mapping of Wikipedia Templates for Fast Deployment of Localised DBpedia Datasets</article-title>
          .
          <source>In: Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>Palmero</given-names>
            <surname>Aprosio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Giuliano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Lavelli</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Towards an Automatic Creation of Localized Versions of DBpedia</article-title>
          .
          <source>In: Proceedings of the 12th International Semantic Web Conference</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Riedel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Modeling relations and their mentions without labeled text</article-title>
          .
          <source>In: Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III</source>
          . pp.
          <fpage>148</fpage>
          -
          <lpage>163</lpage>
          . ECML PKDD'
          <volume>10</volume>
          , Springer-Verlag, Berlin, Heidelberg (
          <year>2010</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>1889788</volume>
          .
          <fpage>1889799</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barth</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegand</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klakow</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A survey of noise reduction methods for distant supervision</article-title>
          .
          <source>In: Automated Knowledge Base Construction 2013. Proceedings of the 3rd Workshop on Knowledge Extraction at CIKM</source>
          <year>2013</year>
          . California, USA (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Shawe-Taylor</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Cristianini</surname>
          </string-name>
          , N.:
          <article-title>Kernel Methods for Pattern Analysis</article-title>
          . Cambridge University Press (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Suchanek</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kasneci</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          , G.:
          <article-title>Yago: a core of semantic knowledge</article-title>
          .
          <source>In: Proceedings of the 16th international conference on World Wide Web</source>
          . pp.
          <fpage>697</fpage>
          -
          <lpage>706</lpage>
          . WWW '07,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2007</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/1242572.1242667
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Sultana</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>Q.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biswas</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahman</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Infobox suggestion for Wikipedia entities</article-title>
          .
          <source>In: Proceedings of the 21st ACM international conference on Information and knowledge management</source>
          . pp.
          <fpage>2307</fpage>
          -
          <lpage>2310</lpage>
          . CIKM '12,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2012</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/2396761.2398627
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Surdeanu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McClosky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tibshirani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bauer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>A.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spitkovsky</surname>
            ,
            <given-names>V.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>A simple distant supervision approach for the TAC-KBP slot filling task</article-title>
          .
          <source>In: Proceedings of the Third Text Analysis Conference (TAC</source>
          <year>2010</year>
          ). Gaithersburg, Maryland, USA (November
          <year>2010</year>
          ), pubs/kbp2010-slotfilling.pdf
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>An overview of statistical learning theory</article-title>
          .
          <source>IEEE Transactions on Neural Networks</source>
          <volume>10</volume>
          (
          <issue>5</issue>
          ),
          <fpage>988</fpage>
          -
          <lpage>999</lpage>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26. Vrandecˇic´, D.:
          <article-title>Wikidata: a new platform for collaborative data collection</article-title>
          .
          <source>In: Proceedings of the 21st international conference companion on World Wide Web</source>
          . pp.
          <fpage>1063</fpage>
          -
          <lpage>1064</lpage>
          . WWW '12 Companion,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2012</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/ 2187980.2188242
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>