<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The DBpedia Events Dataset</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Magnus Knuth</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jens Lehmann</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dimitris Kontokostas</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Steiner</string-name>
          <email>tsteiner@liris.cnrs.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harald Sack</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CNRS, Universite de Lyon</institution>
          ,
          <addr-line>LIRIS</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Hasso Plattner Institute, University of Potsdam</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>UMR5205</institution>
          ,
          <addr-line>Universite Lyon 1</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Universitat Leipzig, Institut fur Informatik</institution>
          ,
          <addr-line>AKSW</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Wikipedia is the largest encyclopedia worldwide and is frequently updated by thousands of collaborators. A large part of the knowledge in Wikipedia is not static, but frequently updated, e. g., political events or new movies. This makes Wikipedia an extremely rich, crowdsourced information hub for events. However, currently there is no structured and standardised way to access information on those events and it is cumbersome to lter and enrich them manually. We have created a dataset based on a live extraction of Wikipedia, which performs this task via rules for ltering and ranking updates in DBpedia Live.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Since 2007, the DBpedia project has been extracting metadata and structured
data from Wikipedia and made it publicly available as RDF triples [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. DBpedia
also o ers a live synchronized version of extracted data { DBpedia Live [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
The English Wikipedia alone has hundreds of updates per minute [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] that are
processed via the Live framework. Changes in Wikipedia articles are often
connected to real life events, such as news related events from politics, cultural life,
or sports. Due to the large user base of Wikipedia, these events are often quickly
updated { in many cases quicker than in other Web sources [
        <xref ref-type="bibr" rid="ref1 ref7">7,1</xref>
        ].
      </p>
      <p>
        However, currently there is no structured and standardised way to access
information about these events and it is cumbersome to lter and enrich them
manually. While there are previous e orts to extract events from Wikipedia such
as [
        <xref ref-type="bibr" rid="ref2 ref5 ref7 ref8">2,5,7,8</xref>
        ], associated data about these events is not always available as RDF or
even archived. Providing an RDF dataset has the bene t of being able to rely on
standards for accessing and querying information. Furthermore, events can be
readily combined with background knowledge from DBpedia and other sources,
which enables mashups of events with further structured data.
      </p>
      <p>The most important challenges when extracting events from DBpedia are
(i) detecting events, (ii) providing context, and (iii) ranking events according to
their importance. Since by far not all changes in Wikipedia are events, we need
a mechanism to detect those. In our case, we rely on a semi-automatic approach</p>
      <p>Wikipedia
extract</p>
      <p>DBpedia Live</p>
      <p>Digest
Templates</p>
      <p>Changesets</p>
      <p>SPARQL Endpoint
(1) transform
GUO model
(2) query</p>
      <p>(3) contextQuery
Digest Items</p>
      <p>Snapshots
based on extensible rule sets, which are executed over DBpedia Live changesets.
If a rule res, it triggers another query obtaining contextual information. The
detected event is ranked according to the resource's PageRank and the edit
activity of the Wikipedia page. The output of the processing pipeline is stored
in RDF preserving all provenance information.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Conversion Process</title>
      <p>4 http://live.dbpedia.org/changesets/
5 http://purl.org/hpi/guo#
dbo:Organisation and having a label. More complex restrictions that can be
expressed with SPARQL may apply. The dig:descriptionTemplate is used to
generate a natural language headline for the detected event by substitution of
the placeholders (enclosed in %%) with the respective resource labels.</p>
      <p>Listing 1.1. The PRESIDENT digest template
2
3
4
5
6
7
8
9
10
1 dig : PRESIDENT a dbe : DigestTemplate ;
dcterms : identifier " PRESIDENT " ;
dcterms : description """ President changed ."""@ en ;
dbe : queryString """ SELECT ?u ? res ? oldPres ? newPres
{ ?u guo : target_subject ? res ;
guo : delete [ dbo : president ? oldPres ] ;
guo : insert [ dbo : president ? newPres ] . }""" ;
dbe : contextQueryString """ SELECT ? label</p>
      <p>{ %% res %% a dbo : Organisation ; rdfs : label ? label . }""" ;
dbe : descriptionTemplate """ %% newPres %% succeeds %% oldPres %% as the
,! president of %% res %%. """ .</p>
      <p>From the validated result the nal event or so-called Digest Item is created.
These items contain all necessary information to understand the change that
occurred in DBpedia Live.</p>
      <p>Listing 1.2. An event created from the LEADER template
1 item :2015/04/25/ Christian_Democrats_ ( Sweden ) - LEADER a dbe : Event ;
2 dbe : context snapshot :2015/04/25/ Christian_Democrats_ ( Sweden ) ;
3 dbe : update update :2015/04/25/ Christian_Democrats_ ( Sweden ) ;
4 dcterms : description """ Ebba Busch succeeded Goran Hagglund as the leader of
,! Christian Democrats ( Sweden )."""@ en ;
5 dbe : rank 1.82421 e -06 ;
6 prov : generatedAtTime " 2015 -04 -30 T13 :45:35.798+02:00 " ^^ xsd : dateTime ;
7 prov : wasDerivedFrom dig : LEADER ,
8 changesets :2015/04/25/14/000201. removed . nt . gz ,
9 changesets :2015/04/25/14/000201. added . nt . gz .</p>
      <p>3</p>
    </sec>
    <sec id="sec-3">
      <title>Dataset Description</title>
      <p>The dataset consists of daily digest dump les, which contain the descriptions
of events (c.f. Listing 1.2) occurring on this day as well as the resource updates
related to them. The resource snapshots that are linked in the event descriptions
might be relevant for further investigation. Thus, they are kept separately in
individual snapshot dumps. The daily generated event dumps can be accessed at
http://events.dbpedia.org/dataset/ and additionally a SPARQL interface
is o ered at http://events.dbpedia.org/sparql for querying the full dataset.
The resource snapshots corresponding to the events are published in a separate
path at http://events.dbpedia.org/snapshot/.</p>
      <p>At the current stage 12 digest templates have been de ned and red6. Table 1
shows the number of events that matched the templates.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>This paper presents an automated means to detect events and extract relevant
data changes within DBpedia Live on the one hand, and on the other hand make
these events available as Linked Data for others to consume and build upon.
6 De ned digest templates: http://events.dbpedia.org/data/digests.ttl.</p>
      <p>Count Template Count Template
4509 AWARDED 146 LEADER
3252 HEADHUNTED 118 PODIUM
2539 RELEASED 89 PRESIDENT
1991 JUSTMARRIED 22 VOLCANO
785 DEADPEOPLE 7 EUROPE2015
782 JUSTDIVORCED 1 INTRODUCED</p>
      <p>Table 1. Extracted events per template</p>
      <p>Potential use cases for our constantly growing dataset include, but are not
limited to, (breaking) news detection systems for news agencies, brand
monitoring systems for so-called digital war rooms, but also more mundane use cases such
as celebrity trackers (who married whom), or mashups in general. The dataset
provides a comprehensible overview on usually rather complex data changes and
may give valuable insights into dataset dynamics. Having stable identi ers for
events further allows for interesting reasoning use cases.</p>
      <p>Some information can simply not be deduced from discrete state resource
descriptions, e. g. that a person moved from Germany to France can not be
extracted from the separated facts that she lived in Germany and lives in France,
rather both states need to be regarded and compared. This is what this project
makes possible. The application supports an individual selection of changes of
interest by the free de nition of digest templates, which allows monitoring
customized data change events.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>B.</given-names>
            <surname>Fetahu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Anand</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Anand</surname>
          </string-name>
          .
          <article-title>How much is wikipedia lagging behind news</article-title>
          ?
          <source>In Proceedings of the 2015 ACM conference on Web science. ACM</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>M.</given-names>
            <surname>Georgescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kanhabua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Krause</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Nejdl</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Siersdorfer</surname>
          </string-name>
          .
          <article-title>Extracting event-related information from article updates in wikipedia</article-title>
          .
          <source>In Proceedings of the 35th European Conference on Advances in Information Retrieval</source>
          , pages
          <volume>254</volume>
          {
          <fpage>266</fpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Isele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jakob</surname>
          </string-name>
          , et al. DBpedia
          <article-title>- a large-scale, multilingual knowledge base extracted from wikipedia</article-title>
          .
          <source>Semantic Web Journal</source>
          ,
          <volume>6</volume>
          (
          <issue>2</issue>
          ):
          <volume>167</volume>
          {
          <fpage>195</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Morsey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Stadler</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. Hellmann.</surname>
          </string-name>
          <article-title>DBpedia and the live extraction of structured data from wikipedia</article-title>
          .
          <source>Program</source>
          ,
          <volume>46</volume>
          (
          <issue>2</issue>
          ):
          <volume>157</volume>
          {
          <fpage>181</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>M.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Petrovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>McCreadie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Ounis.</surname>
          </string-name>
          <article-title>Bieber no more: First story detection using twitter and wikipedia</article-title>
          .
          <source>In Proceedings of the SIGIR Workshop on Time-aware Information Access</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>T.</given-names>
            <surname>Steiner</surname>
          </string-name>
          .
          <article-title>Bots vs. wikipedians, anons vs. logged-ins (redux): A global study of edit activity on wikipedia and wikidata</article-title>
          .
          <source>In Proceedings of The International Symposium on Open Collaboration, OpenSym '14</source>
          , pages
          <fpage>25</fpage>
          :
          <article-title>1{25:7</article-title>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>T.</given-names>
            <surname>Steiner</surname>
          </string-name>
          et al.
          <article-title>MJ no more: Using concurrent wikipedia edit spikes with social network plausibility checks for breaking news detection</article-title>
          .
          <source>In Proceedings of the 22nd Int. Conference on World Wide Web Companion</source>
          , pages
          <volume>791</volume>
          {
          <fpage>794</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>G. B.</given-names>
            <surname>Tran</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Alrifai</surname>
          </string-name>
          .
          <article-title>Indexing and analyzing wikipedia's current events portal, the daily news summaries by the crowd</article-title>
          .
          <source>In Proceedings of the the 23rd International Conference on World Wide Web Companion</source>
          , pages
          <volume>511</volume>
          {
          <fpage>516</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>