<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Oktie Hassanzadeh[</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Building a Knowledge Graph of Events and Consequences Using Wikidata?</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>IBM Research</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>0000</year>
      </pub-date>
      <volume>0001</volume>
      <abstract>
        <p>In this short paper, we present our preliminary results on building a Knowledge Graph (KG) of events and consequences with application to event forecasting and analysis. A base KG is rst constructed using existing concepts and relations in Wikidata. Using an automated unsupervised knowledge extraction pipeline, causal knowledge is extracted from Wikipedia articles to augment the base KG. We show examples from the base and the augmented KG, and discuss a few challenges in building a high-quality KG. We also discuss a few potential directions that the Wikidata community can work on to improve the representation of event-related knowledge in Wikidata.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        While prior work has considered knowledge-driven forecasting of future events [
        <xref ref-type="bibr" rid="ref10 ref9">9,
10</xref>
        ], curating large collections of causes and e ects [
        <xref ref-type="bibr" rid="ref4 ref8">4, 8</xref>
        ], and event-based
knowledge graphs (e.g., GDELT GKG [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]), there are no rich structured sources of
knowledge around major societal events that can be queried directly to reason
about the potential consequences of ongoing events. In this paper, we report
on our initial results on curating such a source of knowledge from Wikidata
and Wikipedia.
      </p>
      <p>Wikipedia is a rich source of knowledge about major events and their
consequences. Major newsworthy events often result in many additions and new pages
describing various aspects of the events in detail. In particular, there are often
descriptions of causes and e ects of events, either explicitly in text, or implicitly
in statements, sections, or descriptions of timelines of events. An e ective
representation of this knowledge in the form of a rich knowledge graph can enable
a deep analysis of past events and their consequences. This can in turn be used
as a mechanism of predicting the potential consequences of ongoing events by
mapping them to past similar events in the knowledge graph.</p>
      <p>
        Wikidata [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] aims at representing the rich knowledge available in Wikipedia
in structured form. As shown in Figure 1, there are existing causal relations such
as has cause and has effect between many event-related concepts. However,
extracted causal relation
      </p>
      <p>(from: https://en.wikipedia.org/wiki/COVID-19_pandemic)
there are many explicit and implicit causal relations described in Wikipedia
articles that are missing from Wikidata. In what follows, we rst show how we can
turn the existing event-related concepts and causal relations in Wikidata into a
base knowledge graph of events and consequences geared towards future event
prediction and analysis. We then describe an unsupervised knowledge
extraction pipeline that uses the textual descriptions of events in Wikipedia articles
to augment the base knowledge graph. Using a few examples, we discuss the
strengths and weaknesses of the approach. Finally, we discuss a few directions
for future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Knowledge Graph of Events and Consequences</title>
      <p>We rst construct a base knowledge graph of events and consequences from
existing concepts and links in Wikidata. Since our goal is analyzing major
newsworthy events and their consequences, we only include in the base knowledge
graph those event types that at least one of their instances have an existing
link to a Wikinews article. This way, we ensure that out of the thousands of
subclasses of type occurrence (Q1190554) and their instances, we only include
events that are likely to receive news coverage. We then query for all the
existing causal relations in Wikidata using properties such as has effect (P1542),
contributing factor of (P1537), immediate cause of (P1536) and their
inverse properties. We then group the event types that are linked directly or
through their instances. The result is a collection of event objects that are
event types (classes in Wikidata), each associated with a list of consequences
which are also event types. Each consequence for an event has a list of examples
with each example having a cause event instance and an effect event instance.
Events and consequences are also annotated with a set of base scores derived
from simple frequency analysis, e.g. the number of example pairs of instances,
the number of triples for the event type and its instances, and the number of
Wikipedia pages linked to instances of the type. The result is a collection of event
types and their consequences, along with examples for each consequence, and
scores that can be used for ranking of potential consequences for a given event.</p>
      <p>Our current version of the base KG contains 50 source events (classes), 427
consequences, and 563 examples (instances). This output is a result of running
2,762 SPARQL queries to retrieve all the concepts and relations as well as their
included properties and statistics. Figure 2 shows a few event types linked to
coup d'etat (Q45382), instances (examples) for one of the consequences, and
their JSON representations. This Wikidata-based representation of events and
consequences not only enables retrieval of potential consequences for a given type
of event, it also enables a deeper analysis of potential consequences using the
rich structured knowledge around the Wikidata concepts. As a simple example,
one can group potential consequences by geographic locations associated with
the cause and e ect events.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Causal Knowledge Extraction</title>
      <p>
        As mentioned earlier, there are many causal relations expressed explicitly or
implicitly in Wikipedia articles that cannot be found on Wikidata. We use an
automated unsupervised causal knowledge extraction pipeline to augment the
base KG using natural language understanding. The pipeline, shown in Figure 3,
relies on pre-trained neural Question Answering (QA) and Entity Linking (EL)
models. It consists of the following steps: a) A collection of causal questions are
generated using a set of templates, such as \What could X cause?" or \What was
a major consequence of X?" where X is a label of an event type or instance, b) a
pre-trained neural QA model is used to nd the answer from Wikipedia articles
associated with the target event, and c) the answers are linked to Wikidata using
pre-trained neural entity linking models based on BLINK [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>At the time of this writing, we have applied the causal knowledge extraction
pipeline only to the opening paragraphs of a collection of Wikipedia articles
that describe instances of events that can be found in the base KG. Figure 3
shows examples of the extracted causes and consequences for the same coup
d'etat (Q45382) event used in the base KG example in Figure 2. Out of the
six extracted consequences, one was also in the base KG (conflict (Q180684)),</p>
      <p>Knowledge Graph of Events &amp; Consequences</p>
      <p>Causal Knowledge Extraction Pipeline
…
e1
e2</p>
      <p>…
…
… Examples</p>
      <p>Unsupervised Causal Knowledge Extraction</p>
      <p>QuestioNMneoAudrneasllwering EntiNMtyeoLudrieanllking
one is a superclass of of an event in the base KG murder (Q132821) which is
a superclass of political murder (Q1139665) and the other four extractions
were not in the base KG. The gure also shows an example for the discovered
consequence bomb attack (Q891854). The examples in the output KG from
this pipeline also include a list of mentions that are answers from the QA model
and come with a con dence score answer score, and a linking score that is
the con dence score of linking the mention text to the Wikidata entity.</p>
      <p>
        As the examples show, the pipeline is capable of extracting some very
interesting causal relations that could not be found on Wikidata. We have found the
overall quality of the output to be high even without any e ort on tuning the
models and parameters. One major quality issue in the current output is the
wrong direction of the edges which is also evident in Figure 3. This is mainly
a result of the question answering model returning cause instead of e ect and
vice versa. A potential solution is applying a custom classi er on top of the
output, possibly by applying the outcome of our prior work on binary causal
question answering [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and using Natural Language Inference (NLI) for causal
relation classi cation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>Lessons Learned &amp; Future Work</title>
      <p>Our current results show a number of challenges, some of which could be
addressed by the Wikidata community:
{ One simple but classic problem we are facing in using event-related
Wikidata entities is the inconsistency in instance of (P31) statements.
One example as shown in Figure 3 is the event death of Eduardo Frei
Montalva (Q5247432), which at the time of this writing, is an instance
of murder (Q132821), death (Q4) and certain aspects of a person's
life (Q20127274), whereas the right class consistent with the base KG
would be political murder (Q1139665) (which is a subclass of class
murder (Q132821)).
{ Some causal relations expressed in text cannot be represented using the
existing entities and relations on Wikidata. For example, the Wikipedia article
in the example in Figure 1 states that the pandemic has caused
\temporary decreases in emissions of pollutants and greenhouse gases". There are
currently no events or event types representing a decrease or a change in
pollutants and greenhouse gases. One potential solution could be a \has e ect
on" relation that could link the pandemic concept to e.g. carbon dioxide
emissions (Q3588927) along with attributes that could state whether the
e ect is temporary and whether it is a decrease or increase.
{ Another direction that could have a signi cant e ect on the community
would be a tighter integration between Wikinews and Wikidata. For
example, the authors of Wikinews articles can be encouraged to create
related Wikidata items and specify causes of the events being described.
On the Wikidata side, better representation of event classes and
eventrelated concepts along with better alignment with Wikinews categories and
sitelinks can provide the community with improved retrieval and news
analysis capabilities.</p>
      <p>
        We are currently working on improving our causal knowledge extraction
pipeline in several ways, and performing a thorough evaluation of the quality
of the extracted knowledge. A major challenge in using state-of-the-art causal
relation extraction solutions [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and benchmarks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is their focus on
commonsense reasoning as the end application. One direction we are pursuing is publicly
releasing our base KG along with a linked corpus of text from Wikipedia, that
can be used as a benchmark for causal relation extraction and generic
knowledge base completion solutions (e.g., IntKB [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]). We also plan to investigate the
application of the knowledge graph in event forecasting [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and enterprise risk
management [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This research is based upon work supported in part by U.S. DARPA KAIROS
Program No. FA8750-19-C-0206. The views and conclusions contained herein are
those of the authors and should not be interpreted as necessarily representing the
o cial policies, either expressed or implied, of DARPA, or the U.S. Government.
The U.S. Government is authorized to reproduce and distribute reprints for
governmental purposes notwithstanding any copyright annotation therein.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bhandari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feblowitz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassanzadeh</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srinivas</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sohrabi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Unsupervised causal knowledge extraction from text using natural language inference (student abstract)</article-title>
          .
          <source>In: AAAI</source>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Hassanzadeh</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Predicting the future with Wikidata and Wikipedia</article-title>
          .
          <source>In: Proceedings of the ISWC 2021 Posters &amp; Demonstrations Tracks co-located with 20th International Semantic Web Conference (ISWC)</source>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hassanzadeh</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhattacharjya</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feblowitz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srinivas</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrone</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sohrabi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Katz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Answering binary causal questions through large-scale text mining: An evaluation using cause-e ect pairs from human experts</article-title>
          .
          <source>In: IJCAI</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Heindorf</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scholten</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wachsmuth</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.C.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>CauseNet: Towards a causality graph extracted from the web</article-title>
          .
          <source>In: CIKM. ACM</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hosseini</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Broniatowski</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Predicting directionality in causal relations in text</article-title>
          .
          <source>arXiv preprint arXiv:2103.13606</source>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kratzwald</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kunpeng</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feuerriegel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diefenbach</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>IntKB: A veri able interactive framework for knowledge base completion</article-title>
          .
          <source>In: Proceedings of the 28th International Conference on Computational Linguistics (COLING)</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Leetaru</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schrodt</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          : GDELT:
          <article-title>Global data on events, location</article-title>
          , and tone,
          <year>1979</year>
          {
          <year>2012</year>
          . In: ISA Annual
          <string-name>
            <surname>Convention</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sha</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>K.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hwang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Commonsense causal reasoning between short texts</article-title>
          .
          <source>In: Proceedings of the Fifteenth International Conference on Principles of Knowledge Representation and Reasoning (KR)</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Radinsky</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davidovich</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markovitch</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Learning causality for news events prediction</article-title>
          .
          <source>In: WWW</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Radinsky</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horvitz</surname>
          </string-name>
          , E.:
          <article-title>Mining the web to predict future events</article-title>
          .
          <source>In: WSDM</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Sohrabi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Katz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassanzadeh</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Udrea</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feblowitz</surname>
          </string-name>
          , M.D.:
          <article-title>IBM scenario planning advisor: Plan recognition as AI planning in practice</article-title>
          .
          <source>In: IJCAI</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Vrandecic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Krotzsch, M.:
          <article-title>Wikidata: a free collaborative knowledgebase</article-title>
          .
          <source>Commun. ACM</source>
          <volume>57</volume>
          (
          <issue>10</issue>
          ),
          <volume>78</volume>
          {
          <fpage>85</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petroni</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Josifoski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Zero-shot entity linking with dense entity retrieval</article-title>
          .
          <source>In: EMNLP</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Yang</surname>
            , J., Han,
            <given-names>S.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poon</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A survey on extraction of causal relations from natural language text</article-title>
          .
          <source>CoRR abs/2101</source>
          .06426 (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>