<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Type Prediction for Entities in DBpedia by Aggregating Multilingual Resources</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thi-Nhu Nguyen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hideaki Takeda</string-name>
          <email>takeda@nii.ac.jp</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Khai Nguyen</string-name>
          <email>nhkhai@fit.hcmus.edu.vn</email>
          <email>nhkhai@nii.ac.jp</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ryutaro Ichise</string-name>
          <email>ichise@nii.ac.jp</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tuan-Dung Cao</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Haiphong University</institution>
          ,
          <country country="VN">Vietnam</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Hanoi University of Science and Technology</institution>
          ,
          <country country="VN">Vietnam</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National Institute of Informatics</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Science</institution>
          ,
          <addr-line>VNU-HCMC</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The entity type is considered as very important in DBpedia. Since this information is inconsistently described in different languages, it is difficult to recognize the most suitable type of an entity. We propose a method to predict the entity type based on a novel conformity measure. We combine the consideration of the specific-level and the majority voting. The experiment result shows that our method can suggest informative types and outperforms the baselines.</p>
      </abstract>
      <kwd-group>
        <kwd>DBpedia</kwd>
        <kwd>Ontology</kwd>
        <kwd>Mappings</kwd>
        <kwd>Conformity</kwd>
        <kwd>Consistency</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>DBpedia is built upon the community effort to extract the knowledge from Wikipedia
[1]. Currently, contributors from many countries have joined the DBpedia mapping
project, whose target is to map the Wikipedia templates into the types (e.g.,
Species, Person, and Place) in DBpedia ontology [2]. Despite the maturity of
the DBpedia community, the lack of consensus between the contributors from
different languages is still remaining as an issue.</p>
      <p>In DBpedia, a real-world entity is represented by multiple instances. Each instance
is described in a specific language and its type is based on the mappings constructed
for that language. Because the mappings are manually created for different languages,
the types of particular instances are different even when those instances describe the
same entity. Concretely, considering an entity, some types may be different at the
specific-level, correct, or incorrect. For example, the entity of Barack Obama is
recognized as Person, Politician, President, Artist and Book in 29
languages. Here, there is an agreement between Person, Politician, and
President but still different at the specific levels. Meanwhile, Artist and Book
nl
are incorrect. In this situation, choosing the most suitable type of an entity is
necessary to guarantee the consolidation of DBpedia but it becomes a difficult task.</p>
      <p>According to a preliminary analysis, the agreement of type assignment among
different languages is low, even if only comparing two particular languages. Table 1
illustrates the percentage of instances sharing the same type in 10 language pairs, in
which the number of instances had a type in both languages are the most among all
476 language pairs. In general, only 37% pairs have more than 50% of instances
assigned with the same type.</p>
      <p>Recently, entity type prediction is considered as an important problem. It is helpful
for the utility of DBpedia versions whose mapping community is immature. In
addition, it is also the core of automatic mapping creation [3].</p>
      <p>The simple ideas of type prediction are majority voting and most specific ancestor.
The disadvantage of majority voting is that the suggested type is not specific enough
for an entity. The most specific ancestor even returns more general types.</p>
      <p>In this paper, we propose a new method to predict the most suitable type of an
entity. Our method is the improvement of majority voting. In detail, we focus on how
to retrieve more specific types.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Type suggestion</title>
      <p>
        In this section, we describe how to predict the most suitable type for an entity. Our
idea is based on the combination of the specific-level and the majority voting. The
input is a set of the most specific-level types assigned by different languages. We
define the conformity ( ) of the most specific-level type . The conformity is a
recursive value taking the sum of the frequency of and the conformity of its parent.
( )
( )
(
( ))
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
      </p>
      <p>Where the of is the number of languages treating the entity as . For an
entity, we select the most suitable type by picking the one with the highest
conformity. Obviously, this chosen type will meet the condition that it is used in the
most languages and also enough specific. If there are many types that have the same
highest conformity, we rank the type based on the conformity of their parent type.
Let’s consider the example in Fig.1. In this figure, the entity of Barack Obama is
assigned to 6 types in 29 languages. The conformity of the type President is the
highest ( (President) =18). Therefore, it is selected as the prediction result.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Experiment and evaluation</title>
      <p>
        We compare our method with a manually crafted dataset and two other baselines: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
majority voting and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) most specific ancestor. We build an entity database from all
available language versions of DBpedia. An entity is the compilation of the instances
interconnected via owl:sameAs links. The difficulty of type suggestion is the
diversity of types. Therefore, we select the entities with high inconsistency.
Concretely, we first randomly select 500 entities whose type is available in at least 5
languages. Then, we pick up the 100 most inconsistent ones. Here, the inconsistency
is estimated by the entropy of types’ frequency. Different from the conformity in Eq.
1, in order to guarantee the hierarchical relations, transitive types are counted. In
which, transitive types are a set of ancestor types. After the selection, an expert is
asked to assign the most suitable type among available of the entity (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ). Finally, we
compare the results of our method, (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) against (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ).
      </p>
      <p>Table 2 implies that our method gives the best result. As entities of high
inconsistency are selected, the most specific ancestor method always chooses the
owl:Thing, which is the root of the DBpedia ontology. Although majority voting is
better than the most specific ancestor, in general, its result is not specific enough. This
experiment demonstrates our prediction method is good but still 45% of the predicted
types are different from human’s opinion. Most of them belong to types of place
entity because among countries, the definitions of administrative region and
residential area different. DBpedia ontology currently lacks types to represent all
these dissimilarities. For example, Voultegon is a commune in France but there is no
type for commune. Therefore, this entity should be mapped to Settlement type.
However, our method returns the inaccurate type City because this type is more
specific than Settlement.
4</p>
    </sec>
    <sec id="sec-4">
      <title>The demo</title>
      <p>We build a tool to visualize the entity types. A user can input the keywords in any
language or a URI to query the entity. The database contains 86,290,758 entities,
which are constructed from 128,866,644 instances of all languages. We use Lucene1
to have the entities indexed by all labels (i.e., rdfs:label) provided in all
languages. We build a tool named MLDQ 2 to visualize hierarchically types in
different languages, the suggested types of our method and other baselines, and some
general information of the entity (e.g., the entropy of inconsistency).
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and future work</title>
      <p>In this work, we proposed a new method that combines the consideration of the
specific-level and the majority voting to suggest the most suitable type of an entity.
Three methods were evaluated and the results show that our method is the most
promising one although it remains some weaknesses. For future work, we will
evaluate our method with deeper analyses, including comparisons to more baselines.
We also aim to improve our method by considering the conformity of transitive types
in order to give more accurate predictions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Lehmann</surname>
            , J.; Isele,
            <given-names>R.</given-names>
          </string-name>
          ; Jakob,
          <string-name>
            <surname>M;</surname>
          </string-name>
          et al.:
          <article-title>DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia</article-title>
          .
          <source>The Journal Semantic Web - Interoperability</source>
          , Usability, Applicability. vol.
          <volume>6</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>167</fpage>
          -
          <lpage>195</lpage>
          . (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Mendes</surname>
          </string-name>
          , PN.;
          <string-name>
            <surname>Jakob</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;.
          <article-title>DBpedia: A Multilingual Cross-Domain Knowledge Base</article-title>
          .
          <source>In Proceedings of the Eighth International Conference on Language Resources and Evaluation</source>
          , pp.
          <fpage>1813</fpage>
          -
          <lpage>1817</lpage>
          . (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Palmero</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Giuliano</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lavelli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Automatic Mapping of Wikipedia Templates for Fast Deployment of Localised DBpedia datasets</article-title>
          .
          <source>In Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies</source>
          , pp. {
          <volume>1</volume>
          :
          <fpage>1</fpage>
          -
          <issue>1</issue>
          :
          <fpage>8</fpage>
          } (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>