<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Entity-Relationship Extraction from Wikipedia Unstructured Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Radityo Eko Prasojo</string-name>
          <email>rprasojo@unibz.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>KRDB Research Centre, Free University of Bozen Bolzano</institution>
          ,
          <addr-line>BZ 39100</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Wikipedia has been the primary source of information for many automatically-generated Semantic Web data sources. However, they su er from incompleteness since they largely do not cover information contained in the unstructured texts of Wikipedia. Our goal is to extract structured entity-relationships in RDF from such unstructured texts, ultimately using them to enrich existing data sources. Our extraction technique is aimed to be topic-independent, leveraging grammatical dependency of sentences and semantic re nement. Preliminary evaluations of the proposed approach have shown some promising results.</p>
      </abstract>
      <kwd-group>
        <kwd>Relation extraction</kwd>
        <kwd>knowledge base generation</kwd>
        <kwd>Wikipedia</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Problem Statement</title>
      <p>
        The Semantic Web envisions a web of data that can be easily processed by
machines. In contrast, only a small portion of information available on the Web
is in a machine readable format. For this reason, Semantic Web data sources
su er from incompleteness since they do not cover unstructured information,
which represent the major part of the Web. A typical example are knowledge
bases such as YAGO [27] and DBpedia [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], extracted from Wikipedia. These
knowledge bases exploit infoboxes of Wikipedia articles. Hence, they can answer
questions like the birth date or the political party of Barack Obama. However,
they are unable to give further information like Barack Obama's favorite sport
team because it is present only in the unstructured part of the Wikipedia article.
This problem calls for the development of e ective strategies that transform
Web unstructured content into a machine-readable format. Consequently, we
can cover more information and facilitate automatic data processing.
      </p>
      <p>In this work, we consider Wikipedia as our primary source of information
because of its coverage and cleanliness [12]. Being the most popular Internet
encyclopedia, Wikipedia hosts articles related to any kind of topics. As mentioned
earlier, each article typically also has an infobox, which contains a selected set of
information related to the article. Most RDF data sources nowadays contain
information that is extracted only from the infobox, where an accurate extraction
is guaranteed thanks to its semi-structured nature. Our goal is to go further and
extract RDF triples from the unstructured part of Wikipedia articles. For
example, in the Wikipedia article of Barack Obama1 it is mentioned that \Obama
is a supporter of the Chicago White Sox". This information can be represented
as the following triple: BarackObama supporterOf ChicagoWhiteSox. Such a
triple can then be added to any existing data source, making it more complete.
The extraction of such entity-relationship information from all Wikipedia articles
into RDF triples has an ultimate goal of enriching existing data sources.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Relevancy</title>
      <p>In the Semantic Web, introducing more high-quality RDF data in e cient
manner is an important issue. Many online applications, such as search engines, social
medias, or news websites, have started to utilize information from the Semantic
Web. For example, a news article talking about Barack Obama can be enriched
by some information that is previously known outside the news. If this
information is stored in RDF, as opposed to in human language, it can be retrieved and
processed more quickly and e ectively.</p>
    </sec>
    <sec id="sec-3">
      <title>3 Challenges and Related Work</title>
      <p>The main challenge in this work is that there are two di erent problems that
need to be simultaneously dealt with. The rst is the relation extraction (RE)
problem, that is, given a text containing entities, we syntactically extract
relations between them. The second is the knowledge representation (KR) problem,
that is, the extracted relations should always follow a well-de ned schema,
semantics, or ontology. This is an important issue because we want to combine
all the extracted relations to enrich existing data sources. Without handling
the knowledge representation properly, one can just extract all possible
relations without considering whether, for example, two relations are equivalent and
should be merged together, or whether additional semantic details can be mined
from the sentence in order to correctly represent a more complex fact.</p>
      <p>To illustrate the challenge, consider the previous example about Obama's
favorite sports team. In the same article, it is also mentioned that \in his childhood
and adolescence was a fan of the Pittsburgh Steelers". From the sentence, let us
extract the following triple: BarackObama fanOf PittsburghSteelers. From
the extraction point of view, the result is already correct. However, from the KR
point of view it is not good enough. First, because the predicate supporterOf
and fanOf are equivalent in this context and should be merged. Second, since
Obama was a fan of Pittsburgh Steelers in his childhood, it is suggested that the
fact happened in the past and therefore adding time information in the
representation is necessary to di erentiate it with the rst example. Now, the next
challenge is how we represent a complex fact. One solution is that we append
time information into the predicate, resulting in wasFanOf, and then we
specify the sub-predicate relation between wasFanOf and fanOf. Another solution
is that we keep using fanOf, and then we leverage RDF rei cation to append
time information as a separate triple. There may be other possible solutions, and
deciding which one is the best is most of the times di cult.</p>
      <p>
        Because of the above challenges, extracting relations from Wikipedia
unstructured text has been overlooked for data-source generation purposes.
Automatically-generated data sources like YAGO [27] and DBpedia [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] extract
relations only from infoboxes which provide two advantages. First, they are
semistructured ensuring accurate extraction. Second, they provide schema for
ontology building, ensuring a semantically well-de ned data source. In YAGO case,
the ontology is enhanced by exploiting Wikipedia category pages and WordNet.
      </p>
      <p>
        On the other hand, previous work that focused on relation extraction from
unstructured text did not concern much with the representation issue because
they do not have the goal of building or enriching Semantic Web data sources.
Typically, they use some pre-de ned schema that is taken from infoboxes. As
such, they showed that they can nd relations between entities only if they
are also present in infoboxes. Various technique has been used to achieve this
goal, including grammatical dependency exploitation [13] and anaphoric
coreferences [20]. Some other work relied on existing data sources to nd relations
in the text [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] [17] [21]. Because of this restriction, they cannot discover
new relations that are not previously known by the infoboxes or the data
sources. Beyond Wikipedia domain, there are also e orts with similar
objectives [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] [19] [26] [29].
      </p>
      <p>
        Some work tried to deal with the knowledge representation issue to some
extent. Yan et al. [30] tried to detect equivalent relations by leveraging
surface patterns. However, they did not deal with complex fact representation. The
Never Ending Language Learning (NELL) [18] is an ongoing project that aims
to build a knowledge base to understand human language on the Web. Part of
it is by storing relations between entities, which is done not only over Wikipedia
but also other websites. However, until now a nal result in the form of a
structured RDF data sources containing entity relationship has not been nished yet.
Similarly, FRED [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is machine reader that transforms a free natural language
text into RDF format, but it does not aim to create a well-constructed KB as a
result. On the other hand, Google Knowledge Vault [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] succeeded to
automatically build a KB from free text. However, they rely on distant supervision from
pre-existing KB so they still cannot nd new relations. Because representing
relations of topic-independent articles is di cult, some other work focused only on
a speci c type of articles, like occupation of people [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] or events [14] [22]. Some
other approaches focused on a speci c type of relations instead, for example
taxonomic [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], hyponymy [28], or numerical relations [15].
      </p>
      <p>
        From knowledge representation literature (untied to information extraction),
we consider works on expressing temporal information [24], epistemic
modality [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and complex fact in general [23] to be relevant.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Research questions</title>
      <p>Summarizing the challenges explained in Section 3, we de ne our research
questions as follows:
1. How do we extract relations between entities from the unstructured text of</p>
      <p>Wikipedia articles?
2. How should we represent the relations? More speci cally, how should we
represent complex facts?
3. How should we deal with the extraction problem and representation
problem? Is there a way to structure the two problems in a good way?
In the next section, we explain our hypothesis and proposed approaches to answer
the above research questions.</p>
    </sec>
    <sec id="sec-5">
      <title>Hypothesis and Proposed Approaches</title>
      <p>To syntactically extract relations, we plan to leverage grammatical dependency
parsing of sentences. We hypothesize that it should be e ective to detect any
kind of relations between entities in a text because a relation is always in the form
of subject, predicate, and object (i.e., a triple). Figure 1 shows an example of a
grammatically annotated sentence using StanfordNLP [16]. In this example, the
aim is to extract the triple Obama supporterOf ChicagoWhiteSox. To do this,
rst we look at the entity occurrences, which are Barack Obama and Chicago
White Sox.2 Then, we check how the relation can be extracted by looking at the
dependency. We observe that both entities are connected via an nsubj
dependency and an nmod:of dependency which share a head at the word \supporter".
By leveraging these two dependencies, we can correctly extract the relation.</p>
      <p>We then observe that this extraction can be applied to a more general case,
forming an extraction rule r1 which goes as follows: for any sentence that contains
a form of [s nsubj h nmod:pre!p o] where s is an entity subject, prep is some
preposition, o is an entity object, and h is the common head words, then we can
extract from the sentence a relation &lt; s concat(h; prep) o &gt; where concat is the
string concatenation function. Relations that can be extracted by this rule, for
example, can include the predicates livedIn, marriedTo, etc. We hypothesize
that this extraction rule should be e ective, that is, it can extract simple relations
from simple sentences with high precision. We identify simple relations by looking
at the fact that the relation can be represented using only one triple and the
predicate can be represented correctly by simple concatenation of words, while
simple sentences simply mean that they have a simple grammatical dependency
structure, as de ned by the extraction rule.</p>
      <p>We further make two hypotheses. First, there should exist simple extraction
rules other than r1 which leverage other kinds of grammatical dependency. We
de ne R as the set of all such simple extraction rules. Second, there should exist
other kinds of sentences such that if we apply only the grammatical parsing,
then the resulted relation is not good enough because some details are missing,
and that a more sophisticated representation of the relation is needed. Recall the
example in Section 3 about Pittsburgh Steelers being Obama's favorite football
club during his childhood. In order to understand that \his childhood" refers to
a time in the past, a syntactical parsing is not enough. A semantic re nement
is necessary. From our two hypotheses, we now understand that there are two
factors that determine the complexity of the problem. First, the complexity
2 In Wikipedia, these entity annotations are typically given. In the case of missing
annotations, we will do a preprocessing rst before extracting the relations.
of syntactical extraction which is based on the complexity of the grammatical
dependency. Second, the complexity of knowledge representation, which is based
on the necessity of representing complex facts. Based on this observation, we
de ne four di culty classes of the problem, shown in Figure 2.</p>
      <p>y
t
i
x
e
l
p
m
o
c
n
o
i
t
a
t
n
e
s
e
r
p
e
R</p>
      <p>III</p>
      <p>Extraction complexity</p>
      <p>Class I contains sentences from which every extraction rule in R can be
applied, which means that every correct relations can be extracted using simple
grammatical extraction and simple representation. Class II contains sentences
that may require a more complex grammatical dependency parsing that is not
contained in R, but still have simple representation. Figure 3 shows an example
of a sentence in this class. One can observe that the grammatical dependency is
much more complicated than the one shown in Figure 1.</p>
      <p>Class III and IV both contain sentences from which the extracted relations
require a complex representation. For sentences in Class III, an extraction rule from
R can still be applied but the result would need a further re nement. Figure 4
shows the previous example where the relation Obama fanOf PittsburghSteeler
can be extracted using extraction rule r1, but it is not precise because it does
not include time information. On the other hand, a sentence in Class IV would
require a possibly complex extraction rule that is not present in R.</p>
      <p>
        We plan to develop our extraction approach by going through the di culty
classes starting from the simplest one. First, we will handle Class I sentences.
This is done by adding more rules to the set R. Each rule should have a high
precision, which is set by a certain threshold. An extraction rule that does not
perform above the threshold should not be included into R. Then we proceed
with Class III since it requires simple extraction as Class I. we apply semantic
re nement to nd the missing details. We would investigate on possible ways of
doing this, one of which is to use lexical database like WordNet [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. At a later
stage we would address both Class II and Class IV sentences since they require
more sophisticated extraction. We assume that most sentences would be in either
Class I or Class III, therefore leaving out the other two cases would keep the
precision high without penalizing the recall. Finally, we will do another round
of semantic re nement after we extract relations from all sentences, primarily to
detect relations that should be merged.
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Evaluation Plan</title>
      <p>We plan to do two kinds of evaluation. The rst one is to evaluate the
performance of our extraction technique over a manually constructed ground-truth.
We would separately evaluate our syntactical extraction and semantic re
nement extraction in order to correctly assess the strengths and weaknesses of our
approach. We also plan to compare our approach to existing related work as
baselines.</p>
      <p>The second one is to evaluate how good our extraction technique in nding
new relations that are not previously present in a data source. We will use
YAGO and DBpedia as the data sources for evaluation. To do this, we need to
do mappings between our extracted relations and the respective data source. For
this purpose, we plan to leverage available schemas such as schema.org.3
7</p>
    </sec>
    <sec id="sec-7">
      <title>Preliminary Results</title>
      <p>We have evaluated a part of our proposed approach over a small dataset
containing 25 Wikipedia articles about famous people. Each article was rst
preprocessed by cleaning the noisy Wikipedia annotations and completing missing
entity annotations using entity resolution techniques [25]. Then, we applied the
extraction rules for Class I sentences, only. We included four extraction rules
into R, one of them is the r1. The other three rules are as follows: (1) one that
handles passive sentences similar to r1 using nsubjpass dependency, (2) one
that handles direct object relation using dobj dependency and (3) one that also
handles object relation by using xcomp dependency instead. By detecting
multiple entity occurrences in a single sentence, we observed that from a total 9646
sentences, 4259 contain relations between entities. Among them, 1048 (24:6%)
fall into Class I category. This is a quite signi cant amount, given that we can
still add more rules into R. So our rst observation is in line with our assumption
that most sentences would be in either Class I or Class III.</p>
      <p>We have also evaluated the precision and recall of our Class I extraction over
the \Personal Life", \Early Life", \Life", and \Legacy" sections from each
article. We were able to extract in total 205 relations. The result is shown in Table
3 http://schema.org/
1. The precision and recall are shown in two ways: the rst is the normal
precision and recall, where the extraction is done solely by leveraging grammatical
dependency parser. On the other hand, the SR precision and recall shows the
performance of hypothetically doing a 100% accurate Semantic Re nement on
top of the grammatical parsing result. One can observe that we have a
promising early result in terms of precision, especially if later we can come up with an
e ective semantic re nement. However, we still need to improve the recall. We
observed two main problems. First, our preprocessing should be improved, as
there is still noise that hinder our extraction process. Second, we need to add
more rules into R, as the four rules that we currently have are not enough.</p>
    </sec>
    <sec id="sec-8">
      <title>8 Re ections</title>
      <p>We believe that this research work will lead to fruitful results. We have structured
our work based on the four di culty classes, which enable us to focus on the
simplest thing at rst, then later to extend it to more di cult cases. Also, our
preliminary experiments have shown some promising results.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>The author would like to thank Mouna Kacimi and Werner Nutt for their support
and guidance as supervisors. The author would also like to thank Markus Zanker,
Fariz Darari, and Simon Razniewski for their feedback.
12. B. Han, P. Cook, and T. Baldwin. Lexical normalization for social media text.</p>
      <p>ACM Trans. Intell. Syst. Technol., 4(1):5:1{5:27, Feb. 2013.
13. A. Herbelot and A. Copestake. Acquiring ontological relationships from wikipedia
using rmrs. In In ISWC 2006 Workshop on Web Content, 2006.
14. E. Kuzey and G. Weikum. Evin: building a knowledge base of events. In Proceedings
of the 23rd WWW Conference, pages 103{106. WWW Steering Committee, 2014.
15. A. Madaan, A. Mittal, G. Ramakrishnan, and S. Sarawagi. Numerical relation
extraction with minimal supervision. In Proceedings of the 30th AAAI, 2016.
16. C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D.
McClosky. The stanford corenlp natural language processing toolkit. In ACL (System
Demonstrations), pages 55{60, 2014.
17. M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation
extraction without labeled data. In Proceedings of the JC of the 47th ACL and the
4th IJCNLP: Vol. 2-Vol. 2, pages 1003{1011. ACL, 2009.
18. T. M. Mitchell, W. W. Cohen, E. R. H. Jr., P. P. Talukdar, J. Betteridge, A.
Carlson, B. D. Mishra, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis,
T. Mohamed, N. Nakashole, E. A. Platanios, A. Ritter, M. Samadi, B. Settles, R. C.
Wang, D. T. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves, and J. Welling.</p>
      <p>Never-ending learning. In Proceedings of the 29th AAAI, pages 2302{2310, 2015.
19. N. Nakashole, G. Weikum, and F. Suchanek. PATTY: a taxonomy of relational
patterns with semantic types. In Proceedings of the 2012 JC on EMNLP and
CoNLL, pages 1135{1145. Association for Computational Linguistics, 2012.
20. D. P. Nguyen, Y. Matsuo, and M. Ishizuka. Relation extraction from wikipedia
using subtree mining. In Proceedings of the AAAI, page 1414. MIT Press, 2007.
21. T.-V. T. Nguyen and A. Moschitti. End-to-end relation extraction using distant
supervision from external semantic repositories. In Proceedings of the 49th ACL:
Human Language Technologies: short papers-Volume 2, pages 277{282. ACL, 2011.
22. M. Norrby and P. Nugues. Extraction of lethal events from wikipedia and a
semantic repository. In workshop on Semantic resources and semantic annotation
for NLP and the Digital Humanities at NODALIDA 2015, 2015.
23. N. Noy and A. Rector. De ning n-ary relations on the semantic web. Technical
report, World Wide Web Consortium, 04 2006.
24. M. J. O'Connor and A. K. Das. A Method for Representing and Querying Temporal</p>
      <p>Information in OWL. Springer Berlin Heidelberg, 2011.
25. R. E. Prasojo, M. Kacimi, and W. Nutt. Entity and aspect extraction for organizing
news comments. In Proceedings of the 24th CIKM, pages 233{242. ACM, 2015.
26. M. Schmitz, R. Bart, S. Soderland, O. Etzioni, et al. Open language learning for
information extraction. In Proceedings of the 2012 JC on EMNLP and CoNLL,
pages 523{534. ACL, 2012.
27. F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A large ontology from
wikipedia and wordnet. Journal of Web Semantics, 6(3):203{217, 2008.
28. B. Wei, J. Liu, J. Ma, Q. Zheng, W. Zhang, and B. Feng. Motif-based hyponym
relation extraction from wikipedia hyperlinks. Knowledge and Data Engineering,
IEEE Transactions on, 26(10):2507{2519, 2014.
29. Y. Xu, M.-Y. Kim, K. Quinn, R. Goebel, and D. Barbosa. Open information
extraction with tree kernels. In HLT-NAACL, pages 868{877, 2013.
30. Y. Yan, N. Okazaki, Y. Matsuo, Z. Yang, and M. Ishizuka. Unsupervised
relation extraction by mining wikipedia texts using information from the web. In
Proceedings of the JC of the 47th ACL and the 4th IJCNLP: Vol. 2-Vol. 2, pages
1021{1029. ACL, 2009.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. L.
          <string-name>
            <surname>Alonso-Ovalle</surname>
            and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Menendez-Benito</surname>
          </string-name>
          .
          <article-title>Epistemic Inde nites: Exploring Modality Beyond the Verbal Domain</article-title>
          . Oxford University Press, USA,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P.</given-names>
            <surname>Arnold</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Rahm</surname>
          </string-name>
          .
          <article-title>Automatic extraction of semantic relations from wikipedia</article-title>
          .
          <source>International Journal on Arti cial Intelligence Tools</source>
          ,
          <volume>24</volume>
          (
          <issue>2</issue>
          ),
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>M.</given-names>
            <surname>Banko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Cafarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Soderland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Broadhead</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          .
          <article-title>Open information extraction from the web</article-title>
          .
          <source>In IJCAI</source>
          , volume
          <volume>7</volume>
          , pages
          <fpage>2670</fpage>
          {
          <fpage>2676</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          , G. Kobilarov,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann. DBpedia -</surname>
          </string-name>
          <article-title>a crystallization point for the web of data</article-title>
          .
          <source>Web Semantics</source>
          ,
          <volume>7</volume>
          (
          <issue>3</issue>
          ):
          <volume>154</volume>
          {
          <fpage>165</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>X.</given-names>
            <surname>Dong</surname>
          </string-name>
          , E. Gabrilovich, G. Heitz,
          <string-name>
            <given-names>W.</given-names>
            <surname>Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Strohmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sun</surname>
          </string-name>
          , and
          <string-name>
            <surname>W. Zhang.</surname>
          </string-name>
          <article-title>Knowledge vault: A web-scale approach to probabilistic knowledge fusion</article-title>
          .
          <source>In Proceedings of the 20th SIGKDD</source>
          , pages
          <volume>601</volume>
          {
          <fpage>610</fpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>P.</given-names>
            <surname>Exner</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Nugues</surname>
          </string-name>
          .
          <article-title>Entity extraction: From unstructured text to DBpedia RDF triples</article-title>
          .
          <source>In WoLE 2012</source>
          , pages
          <fpage>58</fpage>
          {
          <fpage>69</fpage>
          .
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>A.</given-names>
            <surname>Fader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Soderland</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          .
          <article-title>Identifying relations for open information extraction</article-title>
          .
          <source>In Proceedings of the EMNLP</source>
          , pages
          <volume>1535</volume>
          {
          <fpage>1545</fpage>
          .
          <string-name>
            <surname>ACL</surname>
          </string-name>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>C.</given-names>
            <surname>Fellbaum</surname>
          </string-name>
          . WordNet. Wiley Online Library,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>D.</given-names>
            <surname>Firas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Simon</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Nugues</surname>
          </string-name>
          .
          <article-title>Extraction of career pro les from wikipedia</article-title>
          .
          <source>In Proceedings of the 1st Conference on Biographical Data in a Digital World</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>A.</given-names>
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Presutti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Recupero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Nuzzolese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Draicchio</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Mongiov</surname>
          </string-name>
          .
          <article-title>Semantic web machine reading with FRED</article-title>
          .
          <source>SemWeb Journal</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gu</surname>
          </string-name>
          , W. Liu, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Song</surname>
          </string-name>
          .
          <article-title>Relation extraction from wikipedia leveraging intrinsic patterns</article-title>
          .
          <source>In 2015 IEEE/WIC/ACM WI-IAT</source>
          , volume
          <volume>1</volume>
          , pages
          <fpage>181</fpage>
          {
          <fpage>186</fpage>
          ,
          <string-name>
            <surname>Dec</surname>
          </string-name>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>