<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Oktie Hassanzadeh[</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Predicting the Future with Wikidata and Wikipedia</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>IBM Research</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>0000</year>
      </pub-date>
      <volume>0001</volume>
      <abstract>
        <p>In this demonstration, we present a prototype knowledgebased event forecasting system powered by Wikidata and Wikipedia. The system relies on existing event-related concepts and relations in Wikidata to build a base knowledge graph of events and consequences. It then uses a combination of state-of-the-art knowledge extraction methods to augment the base knowledge graph using natural language descriptions of events and their consequences that can be found in Wikipedia articles. Using a number of use case scenarios, we show how the resulting knowledge graph can be used as a part of a human-in-the-loop explainable solution for event forecasting and analysis.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Wikipedia is a rich source of knowledge about major events and their
consequences. Major newsworthy events often result in many additions and new pages
describing various aspects of the events in detail. In particular, there are often
descriptions of causes and e ects of events, either explicitly in text, or implicitly
in statements, sections, or descriptions of timelines of events. Figure 1 shows
a few examples of such sources of causal knowledge around COVID-19 related
events. An e ective representation of this knowledge in the form of a rich
knowledge graph can enable a deep analysis of past events and their consequences. This
can in turn be used as a mechanism of predicting the potential consequences of
ongoing events by mapping them to past similar events in the knowledge graph.</p>
      <p>
        Wikidata [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] aims at representing the rich knowledge available in Wikipedia
in structured form. As shown in Figure 1, there are existing causal relations
such as has cause and has effect between many event-related concepts. We
will show in our demonstration how these existing links can be used for an
analysis of potential e ects of a given type of event. More importantly, we show how
we can turn the existing links into a base knowledge graph of events and
consequences, and then use the textual descriptions of events in Wikipedia articles to
augment the base knowledge graph. In what follows, we describe the
architecture of our prototype forecasting solution. We then present a brief sketch of our
demonstration plan.
      </p>
      <p>(from: https://en.wikipedia.org/wiki/COVID-19_pandemic)
Knowledge Graph of Events and Consequences A base knowledge graph
is curated from existing concepts and links in Wikidata. Since our goal in this
work is analysis of major newsworthy events, we only include in the base
knowledge graph those event types that at least one of their instances have an existing
link to a Wikinews article. This way, we ensure that out of the thousands of
subclasses of type occurrence (Q1190554) and their instances, we only include
events that are likely to receive news coverage. We then query for all the
existing causal relations in Wikidata using properties such as has effect (P1542),
contributing factor of (P1537), immediate cause of (P1536) and their
inverse properties. We then group the event types that are linked directly or
through their instances. Each link between event types is also annotated with
a set of base scores derived from simple frequency analysis, e.g., the number of
example pairs of instances, the number of triples for the event type and its
instances, and the number of Wikipedia pages linked to instances of the type. The
result is a collection of event types and their consequences, along with examples
for each cause-e ect pair and scores that can be used for ranking of potential
consequences for a given event.
Knowledge Graph of Events &amp; Consequences</p>
      <p>Causal Knowledge Extraction Pipelines
…
e1
e2</p>
      <p>…
…</p>
      <p>… Examples
Dashboard</p>
      <p>News Analysis</p>
      <p>Profile</p>
      <p>Effects Analysis
Event Analysis</p>
      <p>Cause-Effect Analysis</p>
      <p>Unsupervised Causal Knowledge Extraction</p>
      <p>NMeuoradleQlsA NPaetuterarnlNMLaItcMhiondgel+s
Distantly Supervised Models for Causal</p>
      <p>Knowledge Extraction
Neural Relation Extraction Models</p>
      <p>Temporal Event Analysis</p>
      <p>
        Temporal Event Models Timeline Extraction
Causal Knowledge Extraction Pipelines The base knowledge graph is
augmented with causal knowledge extracted from Wikipedia articles using a number
of causal knowledge extraction pipelines. This augmentation can be in the form
of a) nding new consequences for a given event of interest, b) nding new
example cause-e ect pairs of instances for a pair of event types, and c) calculating
scores re ecting the likelihood or signi cance for a causal relation between two
events. Given the variety of ways that causal knowledge can be captured in
Wikipedia documents as depicted in Figure 1, we need a number of di erent
knowledge extraction approaches. In this demonstration, we show examples of
three di erent kinds of pipelines we have implemented in our prototype:
Unsupervised Causal Knowledge Extraction: 1) An approach relying on pattern
matching and neural Natural Language Inference (NLI) models. Brie y, the
approach we show is an adaptation of the approach of Bhandari et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] which
is a fully unsupervised pipeline with a high precision of nearly 80% in manual
evaluations. We link the output phrases to Wikidata concepts using BLINK [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
keeping only high-con dence links. 2) An approach relying on neural Question
Answering (QA) models that a) generates questions using a set of templates,
such as \What could X cause?" or \What was a major consequence of X?"
where X is a label of an event type or instance, b) uses pre-trained neural QA
models and Wikipedia articles associated with the target event to retrieve an
answer for the generated questions, and c) performs entity linking to link the
answer to Wikidata.
      </p>
      <p>
        Supervised Models for Causal Knowledge Extraction: We use neural models [
        <xref ref-type="bibr" rid="ref1 ref4">1,
4</xref>
        ] trained on existing annotated data such as the BECauSE Corpus 2.0 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
for extraction of cause-e ect phrases from a corpus of event-related Wikipedia
articles, and perform entity linking [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] on the phrases to map them to Wikipedia
and then Wikidata concepts in the base knowledge graph. We are also exploring
distantly supervised models [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] by constructing a training set through nding
passages containing labels of pairs of events in the base knowledge graph, and
using a neural model of relation extraction [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] to extract new causal relations.
Temporal Event Analysis: This pipeline rst extracts event timelines from
timeline sections and pages (examples shown in Figure 1), then maps the extracted
sequences to Wikidata events, and then applies existing and novel temporal event
models [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to the sequences of events that will facilitate more complex analysis
of potential temporal and causal relations between event types along with
likelihood scores that will better facilitate the ranking of potential consequences for
a given event and context.
      </p>
      <p>Dashboard The user dashboard exposes a number of API functions that use
the knowledge graph to assist the user with event analysis and forecasting. The
APIs allow the user to 1) retrieve the latest news events and their context, 2)
retrieve a list of potential consequences for a given event/context, along with
explanation in the form of example similar past events and consequences, and
3) rank and re-rank the consequences based on di erent criteria. The user can
optionally de ne a pro le that will be used for ranking the consequences based
on how interesting or surprising the consequence could be for the user.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Demonstration Plan</title>
      <p>
        We plan to use a number of use cases involving di erent recent or ongoing events,
and show the ranked list of consequences according to the base knowledge graph
as well as di erent versions of the knowledge graph based on the extraction
method used for knowledge augmentation. For this initial prototype
demonstration, our primary focus will be on showing the quality and coverage of di erent
versions of the knowledge graph, and how simple major consequences of di erent
types of events are present or missing in di erent versions. We will use examples
of di erent types of events, including: 1) a \protest" event, e.g., recent protests
in Myanmar, highlighting some of the high-ranked extracted causes and
consequences from past protest events, which include a coup d'etat; 2) a \disease
outbreak" event as the cause while excluding COVID-19 related articles from
our source, showing how some actual consequences of the COVID-19 outbreak
show up in the ranked results of di erent pipelines; 3) a hypothetical natural
disaster event and showing context-speci c forecasts, e.g., similar to the work
of Radinsky et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] show how an earthquake (Q7944) at a location near an
ocean would result in a forecast of tsunami (Q8070). We will also highlight a
number of challenging examples and wrong forecasts and discuss a number of
directions for future work that could turn this simple prototype into a powerful
and reliable AI assistant for analysts.
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgements</title>
      <p>This research is based upon work supported in part by U.S. DARPA KAIROS
Program No. FA8750-19-C-0206. The views and conclusions contained herein are
those of the authors and should not be interpreted as necessarily representing the
o cial policies, either expressed or implied, of DARPA, or the U.S. Government.
The U.S. Government is authorized to reproduce and distribute reprints for
governmental purposes notwithstanding any copyright annotation therein.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Awasthy</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ni</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barker</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Florian</surname>
          </string-name>
          , R.:
          <source>IBM MNLP IE at CASE 2021</source>
          task
          <article-title>1: Multigranular and multilingual event detection on protest news</article-title>
          .
          <source>In: Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE</source>
          <year>2021</year>
          ). pp.
          <volume>138</volume>
          {
          <issue>146</issue>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bhandari</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feblowitz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassanzadeh</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srinivas</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sohrabi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Unsupervised causal knowledge extraction from text using natural language inference (student abstract)</article-title>
          .
          <source>In: Thirty-Fifth AAAI Conference on Arti cial Intelligence</source>
          ,
          <string-name>
            <surname>AAAI</surname>
          </string-name>
          <year>2021</year>
          . pp.
          <volume>15759</volume>
          {
          <issue>15760</issue>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bhattacharjya</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subramanian</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Order-dependent event models for agent interactions</article-title>
          .
          <source>In: Proceedings of the Twenty-Ninth International Joint Conference on Arti cial Intelligence</source>
          ,
          <string-name>
            <surname>IJCAI</surname>
          </string-name>
          <year>2020</year>
          . pp.
          <year>1977</year>
          {
          <year>1983</year>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khandelwal</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaudhary</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wenzek</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guzman</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          .
          <source>In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)</source>
          . pp.
          <volume>8440</volume>
          {
          <issue>8451</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dunietz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levin</surname>
            ,
            <given-names>L.S.</given-names>
          </string-name>
          , Carbonell, J.G.:
          <article-title>The BECauSE corpus 2.0: Annotating causality and overlapping relations</article-title>
          .
          <source>In: Proceedings of the 11th Linguistic Annotation Workshop</source>
          , LAW@EACL. pp.
          <volume>95</volume>
          {
          <issue>104</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>B.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Min</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehdad</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yih</surname>
          </string-name>
          , W.:
          <article-title>E cient one-pass end-to-end entity linking for questions</article-title>
          .
          <source>In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP</source>
          <year>2020</year>
          ,
          <year>2020</year>
          . pp.
          <volume>6433</volume>
          {
          <issue>6441</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mintz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bills</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snow</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Distant supervision for relation extraction without labeled data</article-title>
          .
          <source>In: ACL</source>
          <year>2009</year>
          ,
          <article-title>Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th</article-title>
          <source>International Joint Conference on Natural Language Processing of the AFNLP</source>
          . pp.
          <volume>1003</volume>
          {
          <issue>1011</issue>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Radinsky</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davidovich</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markovitch</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Learning to predict from textual data</article-title>
          .
          <source>J. Artif. Intell. Res</source>
          .
          <volume>45</volume>
          ,
          <issue>641</issue>
          {
          <fpage>684</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Soares</surname>
            ,
            <given-names>L.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>FitzGerald</surname>
          </string-name>
          , N.,
          <string-name>
            <surname>Ling</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kwiatkowski</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Matching the blanks: Distributional similarity for relation learning</article-title>
          .
          <source>In: Proceedings of the 57th Conference of the Association for Computational Linguistics</source>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          <year>2019</year>
          . pp.
          <volume>2895</volume>
          {
          <issue>2905</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Vrandecic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Krotzsch, M.:
          <article-title>Wikidata: a free collaborative knowledgebase</article-title>
          .
          <source>Commun. ACM</source>
          <volume>57</volume>
          (
          <issue>10</issue>
          ),
          <volume>78</volume>
          {
          <fpage>85</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petroni</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Josifoski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Scalable zero-shot entity linking with dense entity retrieval</article-title>
          .
          <source>In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <string-name>
            <surname>EMNLP</surname>
          </string-name>
          <year>2020</year>
          . pp.
          <volume>6397</volume>
          {
          <issue>6407</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>