<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Updating Wikipedia via DBpedia Mappings and SPARQL?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Albin Ahmeti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Javier D. Ferna´ndez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Axel Polleres</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vadim Savenkov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Formalization of the OBDM Setting</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vienna University of Economics and Business</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>DBpedia is a community effort that has created the most important cross-domain datasets in RDF, a focal point of the Linked Open Data (LOD) cloud. In its core there is a set of declarative mappings extracting the data from Wikipedia infoboxes and tables into the RDF. However, while DBpedia focuses on publishing knowledge in a machine-readable way, little attention has been paid to the benefits of supporting machine updates. This greatly restricts the possibilities of automatic curation of the DBpedia data that could be semi-automatically propagated to Wikipedia, and also prevents maintainers from evaluating the impact of their edits on the consistency of knowledge. Excluding the DBpedia taxonomy from the editing cycle is a major drawback which we aim to address. This paper starts a discussion of DBpedia making a case for a benchmark for Ontology Based Data Management (OBDM). As we show, although based on fairly restricted mappings (which we cast as a variant of nested tgds here) and minimalistic TBox language, accommodating DBpedia updates is intricate from different perspectives, ranging from conceptual (what is an adequate semantics for DBpedia SPARQL updates?) to challenges related to the user interface design.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        ? An extended version of this paper including additional details is available in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
A sc B : A(x) ! B(x) P sp Q : P (x; y) ! Q(x; y)
P rng A : P (x; y) ! A(y) P inv Q : P (x; y) ! Q(y; x)
P pdw Q : P (x; y) ^ Q(x; y) ! ?
P dom A : P (x; y) ! A(x)
A dw B : A(x) ^ B(x) ! ?
func : P (x; y) ^ P (x; z) ^ y 6= z ! ?
we use an auxiliary surrogate key I to horizontally partition the single key-value store
Wd. Our schema W assumes key constraints UT ! I, IP ! V and the inclusion
dependency Wd[I] Wi[I]. Two kinds of values are allowed in W: labelled nulls and
constants, whereby only constants will be transferred to the DBpedia by the mappings
as explained below.
      </p>
      <p>
        Mapping constraints . The specification [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] distinguishes several types of DBpedia
mappings summarized in Table 1 along with their figures in the English DBpedia. All
these mappings can be represented as nested tgds [
        <xref ref-type="bibr" rid="ref5 ref7">5, 7</xref>
        ] extended with negation and
constraints in the antecedents for capturing the conditional mappings and interpreted
functions in the conclusions of implications, in the case of calculated mappings handling,
e.g., dates or geo coordinates. A crucial limitation of the mapping language (which we
call DBpedia tgds) is the impossibility of comparisons between infobox property values.
Infobox type Wi:T and property names Wd:P must be specified explicitly.
      </p>
      <p>
        For a Wiki instance I, by M(I) we denote the chase of I with the tgds in M [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
and by M T (I) the closure of M(I) under the rules in Fig. 1.
      </p>
      <p>Example 1. A tgd formalizing a French DBpedia mapping for clergy:
8U 8I Wi(U; ’fr:Pre´ lat catholique’; I) !</p>
      <p>Wd(I; ’titre’; ’Pape’) !9Y Pope(U ) ^ occupation(U; Y ) ^ PersonFunction(Y )
^ title(Y; ’Pape’)) // “Intermediate node mapping”
^ ...</p>
      <p>^ 8X(Wd(I; ’pre´ de´ cesseur pape’; X) ! predecessor(Y; X))
...
^ (Wd(I; ’titre’; ’Preˆ tre’) ! Priest(U ))
^ (:Wd(I; ’titre’; ’Pape’) ^ : : : ^ :Wd(I; ’titre’; ’Pr eˆtre’) ! Cleric(U )) // “otherwise”
^ 8X(Wd(I; ’nom’; X) ! foaf:name(U; X))
...
^ 8X(Wd(I; ’nom naissance’; X) ! birthName(U; X))</p>
      <p>The specification stipulates that conditions are evaluated in the natural order, and
thus every next condition has to include the negation of all preceding conditions. In our
case, this is only illustrated by the last, default (“otherwise”) case, since the conditions
are mutually exclusive. Note also that no universally quantified variable besides the
page URI U and the technical infobox identifier I) – i.e., no variable representing an
infobox property, called X in the example – can occur in more than two Wd atoms.</p>
      <p>
        One further particularity of the chase with tgds is handling of existentially
quantified variables. A usual approach is to instantiate such variables by null values, which
could be blank nodes in the case of RDF. The strategy followed by DBpedia is however
different: instead of blank nodes, the chase produces fresh IRIs, avoiding clashes with
existing page URIs. Already the following problem is worst-case intractable for WDFs:
ABox source consistency ASCONS [
        <xref ref-type="bibr" rid="ref2 ref6">2, 6</xref>
        ].Parameter: WDF (M; T ). Input: ABox A.
Test if A [ T 6j= ? and if a Wiki instance I exists such that M T (I) = A.
The ABox source consistency problem demonstrates one source of complexity for
DBpedia update translations, namely accommodating a set of insertions exactly (up to the
facts derivable via a TBox).
      </p>
      <p>Definition 1 (Translation of an infobox update). Let I be a Wiki instance, e = (e ; e+)
be an infobox update and let M be a DBpedia mapping. The translation MI (e) of e
w.r.t. M and I is a DBpedia update u = (u ; u+) where u = M T (I)nM T (e(I))
and u+ = M T (e(I)) n M T (I).</p>
      <p>The inverse translation, casting a DBpedia update as a Wiki update, can be defined
similarly, with the difference that such a translation is often not unique or even not
existing, for various reasons: (i) many-to-many relations between Wiki and RDF properties:
modifying just a single fact can be impossible (ii) updates can cause inconsistencies as
directly w.r.t. the previous DBpedia knowledge, as also indirectly, by triggering a
conditional mapping rule, causing already existing infobox properties to be transfered to
DBpedia, resulting in a clash. Therefore, we define translations based on containment.
Definition 2 (Update containment). The syntactic containment u1 u2 holds when
u1+ u2+ and u1 u2 is the case. Given an instance I of a WDF (M; T ) the WDF
containment u I e between the Wiki update e and the DBpedia update u holds if u
MI (e). The proper update containment relations and I are defined analogously.</p>
      <p>
        For the heterogeneous pair u; e of updates as above, we say that e minimally
contains u, written u Imin e, if (i) e(I) satisfies the source constraints of W and
? 62 MI (e), written e 6j=I ?, and (ii) for every Wiki update e0 with e0 e, u 6 I e0
or e0 j=I ? is the case; if e0 e implies u 6 I e0 (that is, the option e0 j=I ? is
eliminated), e is said to faithfully contain u, written u Ifth e. We also use u Iex e
(“exact”) and u =I e as shorthands for (u MI (e)) ^ (MI (e) u).
Intuitively, minimal containment ensures that all insertions and deletions performed by
e are necessary either to implement u or to restore the ABox consistency after
implementing u. In contrast, faithful containment deprecates extending u purely for the sake
of restoring the consistency. The notions of minimal and faithful adapt the semantics
considered in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] in a much simpler setting of SPARQL ABox updates, where no
mappings have been present.
      </p>
      <p>
        Using the above definition, the decision version of the OBDM [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] problem can be
defined as follows:
      </p>
    </sec>
    <sec id="sec-2">
      <title>Source revision SREV for the WDF (M; T ) and</title>
      <p>stance I, DBpedia update u, Wiki update e. Test if u</p>
    </sec>
    <sec id="sec-3">
      <title>2 fmin; fth; exg. Input: WDF in</title>
      <p>
        I e holds.
1 See [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for a proof sketch.
      </p>
      <p>The source revision problem is a special case of belief revision problem tailored to the
OBDM setting, in which the mapping and the TBox are considered fixed and the ABox
is derived: that is, only the infobox data can be actually modified.
3</p>
      <p>
        Discussion and Practical Outlook
OBDM related problems tend to be intractable w.r.t. the worst case complexity even
for simple mapping and ontology languages, such as those underlying DBpedia. Our
initial experiments with the translation of SPARQL updates in this setting (discussed in
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) demonstrate however, that worst-case scenarios leading to intractability of update
handling are seldom realized in the current DBpedia version. From a practical point
of view, the following considerations appear crucial. First, it is the inherent ambiguity
of update translation; mappings often create a many-to-one or many-to-many
relationships between infobox and DBpedia properties. Second, concisely presenting a large
number of options to the user is a challenge, hence an automatic selection resp.
ranking of update translations is required. The crucial part of these services is to provide
the user with the clear and concise justifications for the ranking or automatic selection,
based on the already present data or previously resolved updates. Finally, being a
curated system, Wiki also requires curated updates. Thus, splitting a SPARQL update into
small independent pieces to be verified by Wiki maintainers is needed as well.
      </p>
      <p>Little attention has been paid so far to the benefits that the semantic infrastructure
can bring to maintain the wiki content. In fact, the DBpedia mapping language has to the
best of our knowledge never formalized as a rule language, which this paper does. Our
early practical experiments with a DBpedia-based OBDM prototype show that likely
not the worst case complexity of update translation is a major challenge in such a
system, but defining a reasonable DBpedia-enabled maintenance process, comprehensible
user interface, and automatic aid in resolving ambiguities due to the robust design of
DBpedia mappings.</p>
      <p>Acknowledgements. This paper is funded by the Austrian Science Fund (FWF):
M1720G11, and by the Vienna Science and Technology Fund (WWTF), project ICT12-SEE.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>DBpedia</given-names>
            <surname>Mappings</surname>
          </string-name>
          . http://mappings.dbpedia.org/,
          <year>2015</year>
          .
          <source>[accessed 29.02</source>
          .15].
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          and
          <string-name>
            <given-names>O.</given-names>
            <surname>Duschka</surname>
          </string-name>
          .
          <article-title>Complexity of answering queries using materialized views</article-title>
          .
          <source>In Proc. PODS '98</source>
          , pp.
          <fpage>254</fpage>
          -
          <lpage>263</lpage>
          , ACM,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Ahmeti</surname>
          </string-name>
          , J. Ferna´ndez,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Savenkov</surname>
          </string-name>
          .
          <article-title>Towards Updating Wikipedia via DBpedia Mappings and</article-title>
          SPARQL.
          <source>Working Papers on Information Systems, Information Business and Operations</source>
          ,
          <volume>01</volume>
          /
          <year>2016</year>
          . WU Vienna University of Economics and Business. Available at http://epub.wu.ac.at/view/p_series/S1/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>A.</given-names>
            <surname>Ahmeti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Savenkov</surname>
          </string-name>
          .
          <article-title>Handling inconsistencies due to class disjointness in SPARQL updates</article-title>
          .
          <source>In Proc. ESWC '16</source>
          . to appear., Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>A.</given-names>
            <surname>Fuxman</surname>
          </string-name>
          , M. Herna´ndez, C. Howard
          <string-name>
            <surname>Ho</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Papotti</surname>
            , and
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Popa</surname>
          </string-name>
          .
          <article-title>Nested mappings: Schema mapping reloaded</article-title>
          .
          <source>In VLDB</source>
          , pages
          <fpage>67</fpage>
          -
          <lpage>78</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>G.</given-names>
            <surname>Grahne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moallemi</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Onet</surname>
          </string-name>
          .
          <article-title>Recovering exchanged data</article-title>
          .
          <source>In PODS '15</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ph</surname>
            . Kolaitis,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Pichler</surname>
            , E. Sallinger, and
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Savenkov</surname>
          </string-name>
          .
          <article-title>Nested dependencies: structure and reasoning</article-title>
          .
          <source>In PODS'14</source>
          , pp
          <fpage>176</fpage>
          -
          <lpage>187</lpage>
          , ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          , et al.
          <article-title>Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia</article-title>
          .
          <source>Semantic Web</source>
          ,
          <volume>6</volume>
          (
          <issue>2</issue>
          ):
          <fpage>167</fpage>
          -
          <lpage>195</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          .
          <article-title>Ontology-based data management</article-title>
          .
          <source>In Proc. CIKM '11</source>
          , pp.
          <fpage>5</fpage>
          -
          <lpage>6</lpage>
          , ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>