<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Novel Resource for NLP Downstream Tasks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lars Michaelis</string-name>
          <email>lars.michaelis@hitec-hamburg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Junbo Huang</string-name>
          <email>junbo.huang@uni-hamburg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ricardo Usbeck</string-name>
          <email>ricardo.usbeck@uni-hamburg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hamburger Informatik Technologie-Center (HITeC) e.V.</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Hamburg</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Eficient Natural Language Processing (NLP) models require large amounts of training data. Manually creating training data is time-consuming. We present WikiEvents, an automatically curated dataset based on Wikipedia's Current Events portal. WikiEvents is a novel knowledge graph that aims to provide data for various event-centric NLP tasks, such as event-related location extraction and entity linking. Therefore, WikiEvents includes event summaries with linked entities and locations. WikiEvents also provides spatial and temporal information about extracted events for various use case analyses. We leverage the NLP Interchange Format (NIF) ontology and an event-specific novel ontology - CoyPu. We evaluate the suitability regarding NLP tasks by (1) training three BERT models on event-related location extraction with data queried from WikiEvents and (2) comparing WikiEvents to the existing entity linking dataset AIDA-YAGO2. Qualitative, event-related research capabilities are explored by querying data from WikiEvents for multiple use cases and visualizing it.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>With the rise of machine learning came an increasing need for large-scale training data.
Eventcentric NLP comprises several subtasks. The NLP task of event-related location extraction (LE)
concentrates on only annotating the named location where an event is taking place within
a document. Multiple LE datasets were made using human annotation of diferent sources.
Lingad et al. [1] and Ji et al. [2] annotated Tweets, Gupta and Nishu [3] annotated news articles,
and Wang et al. [4] annotated Wikipedia articles. The NLP task of entity linking (EL) involves
linking named entities in texts to their entities in knowledge bases. Existing EL datasets include
AIDA-YAGO2 [5] and SAT-300 [6] do not focus on (crisis) event reports but rather news or
encyclopedic articles in general. For more works w.r.t. EL we refer the interested reader to Möller
et al. [7].</p>
      <p>The creation of datasets through human annotation is time and cost-intensive.
With
WikiEvents, we leverage an automatic extraction based on an existing source of
summarized events with manually linked named entities. That is, our system extracts content from
Wikipedia’s Current events portal1 to create a novel KG-based dataset for LE and EL tasks.
Furthermore, our system uses Wikipedia articles, Wikidata2 resources, the Nominatim API 3
and the Falcon 2.0 entity linker4 [8] to extract additional temporal, spatial and event information
about events and entities mentioned in the Current events portal.</p>
      <p>Our goal is to use WikiEvents for the NLP tasks of EL and event-related LE, as well as provide
the extracted event data for event-related research. For a sub-graph example based on the
Marshall Fires see Figure 1. To the best of our knowledge, no comprehensive dataset for these
tasks existed before.</p>
      <p>
        We evaluate WikiEvents on the mentioned NLP tasks by (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) comparing it to the existing EL
dataset AIDA-YAGO2 using performances of current EL models on both datasets as well as (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
training and evaluating event-related transformer-based location extractors. Finally, we explore
several use cases regarding possible event-related research questions.
      </p>
      <p>The source code and a dataset sample are available at:
• The extractor of the dataset:</p>
      <p>https://github.com/semantic-systems/current-events-to-kg
• The machine learning code for EL and LE:</p>
      <p>https://github.com/semantic-systems/coypu-current-events-for-ml
1https://en.wikipedia.org/wiki/Portal:Current_events
2https://www.wikidata.org/
3https://nominatim.openstreetmap.org/ui/search.html
4https://labs.tib.eu/falcon/falcon2/
• A sample of the dataset including training and test samples for EL and LE:
https://www.fdr.uni-hamburg.de/record/11447</p>
    </sec>
    <sec id="sec-2">
      <title>2. WikiEvents Knowledge Graph</title>
      <p>The WikiEvents knowledge graph is automatically generated from the Wikipedia Current event
portal as its primary data source. It is saved as Resource Description Format (RDF) 5 and serialized
as JSON-LD 6.</p>
      <p>First, we give an overview of the general structure of WikiEvents. Afterward, the following
subsections will provide details about each kind of included information.</p>
      <p>Two types of events are extracted from the Current events portal: Event summaries and topics.
Both event types link to Wikipedia articles. Articles linked to topics further describe the topic
event, while articles in event summaries describe named entities. The event summaries are
grouped into sections of diferent categories. Entities for locations are created if an article is
identified as describing a location, see Figure 2.</p>
      <sec id="sec-2-1">
        <title>2.1. Event Information</title>
        <p>The CoyPu ontology7 is used to encode the event-related data of topics and event summaries. It
was developed as part of the CoyPu8 project which aims to increase the resilience of companies
during crises. We acknowledge the existence of ontologies which could replace the CoyPu
Ontology, such as CIDOC-CRM [9] or Simple Event Model (SEM) [10]. However, we decided
to use a niche ontology because using established ontologies would contradict the project
requirements.</p>
        <p>Event summaries are short summaries of significant real-world events. They include
hyperlinks from named entities to Wikipedia articles. The event summary entities are linked
to the category under which they were extracted which can be used to roughly classify them
(e.g. Armed conflicts and attacks or Disasters and accidents). Providing more specific event types
5https://www.w3.org/TR/rdf11-concepts/
6https://www.w3.org/TR/json-ld11/
7https://schema.coypu.org/global/
8https://coypu.org/
for event summaries (flood, election, ...) is accomplished by using event types linked to the
Wikidata entity of its parent topics.</p>
        <p>The NLP Interchange Format (NIF) ontology [11] is being used to encode the link from named
entities in event summaries to their entities, i.e., Wikipedia articles. The event summary is
ifrst split into sentences, to which included named entities are linked. These entities of named
entities link to the Wikipedia article’s entity which their hyperlink originally referenced. The
news sources mentioned in the event summaries are linked to the respective event summary.</p>
        <p>Topics can but do not need to reference a Wikipedia article and are therefore mapped to two
classes. The class for topic entities with a linked article is a subclass of the second class since
only the possible properties are extended through the additional article. Topics mentioned at
diferent points are mapped to identical entities if they either link to the same article or have
the same headline when no article is linked.</p>
        <p>Both topics and event summaries link to the event under which they are listed in the Current
events portal as their parent event. Events summaries and topics can only be sub-events of
topics since event summaries are leaves within a tree-structured list of mentioned events.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Wikipedia Articles</title>
        <p>The Geonames ontology9 is used to generate entities for Wikipedia articles referenced by topics
and linked in event summaries. It is linked to the metadata from the Wikipedia article’s schema
graph and its Wikidata entity. The one-hop graph extracted around the Wikidata entity is
additionally included to eliminate simple queries to Wikidata endpoints.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Spatial Information</title>
        <p>WikiEvents includes location entities, coordinates of locations, and WKT-encoded boundaries
of locations. We create a location entity next to marking it as a Wikipedia link/entity, if a
Wikipedia article is identified as describing a location or if an article is referenced by a topic.</p>
        <p>Our method of identifying location articles follows the method of identifying toponyms
from Wang et al. [4]. They concluded that the identification of Wikipedia articles about locations
should be done by determining if an article uses specific infobox templates 10. Additionally, we
check the infobox’s HTML table element, if the class attribute includes location-related template
classes, e.g., ib-island. This addition increased the recall of the location identification process
from 93.3% to 94.3% when the evaluation identified locations for January 2022.</p>
        <p>
          We encode hierarchical information about the locations by linking location entities together.
This is useful for event-related LE, so to only label the most specific location in an event
summary as the location of the event. We extract location hierarchies through (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) links in article
infoboxes under location-describing keys, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) entities linked by Falcon 2.0 under these keys, and
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) querying Wikidata for parent locations of entities.
        </p>
        <p>Boundaries of locations are queried from the OpenStreetMaps database using the Nominatim
geocoding service. Additionally, we check the Wikidata entity of the Wikipedia article for
linked spatial information from OpenStreetMaps.
9https://www.geonames.org/ontology/documentation.html
10Listed here: https://en.wikipedia.org/wiki/Wikipedia:List_of_infoboxes/Place</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Temporal Information</title>
        <p>
          All events are linked to dates on which they have been mentioned in the Current events
portal. Additionally, we employ the infoboxes of topic articles to extract more specific temporal
information about the topic event. We developed a pattern-based parser to parse the values of
specific infobox keys. The parser extracts (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) dates, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) times, (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) spans of date and/or time, and
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) UTC timezones. Similar to the CIDOC-CRM ontology, the temporal information extracted
from the infobox is saved to a timespan entity since it enables timespan-timespan relation
inference. We also include the parsed source strings from the infobox to enable supervised
machine learning on timespan extraction in the future. In WikiEvents from January 2022 to
December 2022 are 3683 unique topics with 1303 of them having a linked timespan entity from
a total of 1167 unique timespan entities.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Evaluation</title>
      <p>We evaluated the usefulness of WikiEvents w.r.t. NLP downstream task applicability via
eventrelated location extraction and entity linking tasks.</p>
      <sec id="sec-3-1">
        <title>3.1. Event-Related Location Extraction</title>
        <p>Event-related location extraction is a special subtask of LE, where only the event location
is extracted. For example, the following sentence “The United States Embassy in Kyiv calls
on Russia to ”fully comply” with the ceasefire in Donbas after pro-Russian forces shelled the
strategic Hnutove entry-exit checkpoint and a humanitarian road corridor.”11 contains multiple
location mentions. Event-related LE only extracts where the event took place (Kyiv) and ignores
other location mentions (United States, Russia, Donbas, Hnutove).</p>
        <p>Since WikiEvents is a KG, various information can be retrieved using custom SPARQL queries.
Therefore, we created a query to retrieve training data for event-related LE and tested it by
ifne-tuning multiple transformer-based models and evaluated their performance.</p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Querying Training Samples</title>
          <p>
            Our query has three logical steps: (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) selecting named location entities for each event summary
(candidate locations), (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) filtering out less specific candidate locations, e.g., Germany in Hamburg,
Germany, and (
            <xref ref-type="bibr" rid="ref3">3</xref>
            ) choosing the location of the event from remaining candidate locations. The
ifrst step is done via a SPARQL query. The second step (filtering) is facilitated by the extracted
hierarchy between locations. This reduced the number of event summaries with multiple
location candidates by 31.7% in the queried data. The third step is performed using a heuristic
that takes the first location candidate as the event location.
          </p>
          <p>To analyze the quality of our heuristics, we looked at the event summaries from January
2023 (n=276). In 77.5% of event summaries, the first link was a correct event location entity. In
85.5% of event summaries, the first link was part of a set of correct event location entities. We
are aware that choosing only one location as the assumed true event location is a systematic
11Source: https://en.wikipedia.org/wiki/Portal:Current_events/February_2022
problem with this approach since multiple afected locations can be mentioned. Thus, further
research into selecting the right candidates is required to be able to utilize this dataset’s full
potential.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Fine-Tuning Setting</title>
          <p>Three diferent uncased BERT models (DistilBERT, BERT base, and large) were fine-tuned on the
data to evaluate how model size afects performance. The uncased models were chosen since
we assumed that the case of the text holds limited value to the detection of location information.
We modeled the task as a token classification task. Hyperparameters were taken from ones
suggested by Devlin et al. [12] (learning rate = 3e-5, batch size = 16, AdamW optimizer). No epoch
limit in combination with early stopping was used to assure full training. Following Mosbach
et al. [13], a warm-up phase of 10% of total training steps was employed (taking 4 training
epochs as a prediction of total training epochs).</p>
          <p>The training samples from WikiEvents were queried from January 2020 to December 2022
resulting in 16451 samples. We based the following experiments on a 80/10/10 train-eval-test
split for fine-tuning.
3.1.3. Results
The performance metrics of each model are shown in Table 1. From the minor diferences in
performance between the models, you can conclude that model size is not majorly limiting the
performance.</p>
          <p>Additionally, we tried to evaluate the DistilBERT model on the event summaries from January
2023 which were previously used to evaluate the heuristic for selecting the best location link.
One limitation of reusing this evaluation dataset is that the correct location is not always the
most specific in cases where the most specific was not hyperlinked (e.g., “Hamburg” could not
be annotated in “Hamburg, Germany” when only “Germany” is hyperlinked). To counteract
this, wrong location predictions were manually checked for location annotation limitations
(13 were found). The model identified all event locations in 70.3% of event summaries while at
least one location was identified in 79% of event summaries (65.6% and 75.4% without manual
reevaluation).</p>
          <p>In the future, the next step will be improving the quality of the data samples for training. In
particular, we suspect that we need a better third step, i.e., identifying the correct event location.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Entity Linking</title>
        <p>The second analyzed NLP task for the WikiEvents dataset is entity linking. The hypothesis is
that entity linking data samples can be queried with enough quality and quantity to train entity
linking models. To test this hypothesis, a query was constructed to get data samples usable for
entity linking tasks. Moreover, the constructed dataset, together with AIDA-YAGO2 (AIDA)
dataset, was evaluated with two existing EL models.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Experimental Setting</title>
          <p>The used EL models are BLINK [14] and ELQ [15]. BLINK is an entity linking model based on
a two-stage fine-tuned BERT architecture. ELQ uses an end-to-end (one-stage) entity linking
BERT model for linking entities, primarily in short texts such as questions. These models were
used since no openly accessible trained models were found for other entity linking models
(DeepType [16], BERT-Entity [17]) and training would have required considerable resources.</p>
          <p>The data samples representing WikiEvents were queried from a WikiEvents dataset extracted
from 01/2020 to 12/2022 12. In total, we created 20630 samples with a list of mentions each.
Since AIDA consists of longer news articles as source texts, it only has 1392 samples but with
more mentions per sample. The queried WikiEvents entity linking data has 70241 mentions
with on average 3.5 mentions in each sample. The AIDA dataset has only 27812 mentions but
with 20 mentions per sample on average.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Experimental Results</title>
          <p>Supp. Acc.</p>
          <p>Prec. Recall
WikiEvents 75.8
AIDA 81.6</p>
          <p>As shown in Table 2, BLINK clearly outperforms ELQ. One limiting factor of ELQ in prediction
mode is the truncation of longer input texts. Since AIDA has many longer articles, entity
mentions are cut of which results in a lower recall. Since BLINK receives mentions individually
with the surrounding context, it has a significantly higher recall on AIDA. An additional
explanation for the higher recall of BLINK on AIDA could be unlinked entities in WikiEvents. A
third explanation is that entities that are unknown to both models are included in WikiEvents.
Since both models were trained on Wikipedia dumps from August 2019, they do not include the
events from 2020 onward present in this WikiEvents dataset. The significantly better precision
of ELQ on AIDA provides evidence for this. Our evaluation shows that WikiEvents can be
a useful alternative dataset. It easily scales compared to the static, human-annotated AIDA
dataset.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Event-Related Use Cases</title>
      <p>
        Next to its use as an NLP dataset, WikiEvents can be used as an event knowledge base. This
section explores this through two example use cases: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) using WikiEvents for sub-event analysis
and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) recognizing areas afected by an event. A newer WikiEvents dataset was used here than
in Section 3 to include the latest event data up until February 2023.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Sub-Event Analysis</title>
        <p>The relation between parent and sub-events can give insights into how events and their efects
and implications are related. WikiEvents extracts these relations from the Current events portal
structure. For example, Figure 3 shows all sub-events used for grouping event summaries
(topics) of the 2022 Russian Invasion of Ukraine. Analyzing the number of sub-events regarding
intergovernmental relations of an event could be beneficial in estimating the magnitude of an
event.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Analyzing Event Development</title>
        <p>
          To estimate how an event develops, you can analyze the number of sub-events over time.
The Current events portal often relies on media coverage of the event. This coverage is then
summarized by volunteers. Thus, the frequency and amount of sub-events regarding one
event could indicate the coverage and presence of this event in people’s lives. Following the
previous example event, Figure 4 shows the number of event summaries linked to the 2022
Russian invasion of Ukraine. You can observe (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) a substantial decrease in the number of event
summaries after the start of the invasion and (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) slight temporary increases around starting
September when the Ukrainian counterofensive took place and at the beginning of 2023.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Locating Events Geographically</title>
        <p>Using the included spatial information in WikiEvents, events with a linked location correspond
to a specific area. WikiEvents’ spatial data also enables filtering for area-specific events and,
thus, entities in this area, e.g., companies in cities under attack.</p>
        <p>Following the previous example again, you could visualize the number of events mentioning
specific areas at diferent during February and August 2022, shown in Figure 5.</p>
        <p>From these maps, you can observe that WikiEvents is able to link larger areas to events
and vice versa. The detail (size of areas) is dictated by how much the authors summarized the
original events in the Current events portal. The shown areas match information from the
Institute for the Study of War about both months of the invasion [18, 19, 20].</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Related Work</title>
      <p>Multiple event-related datasets have been created for diferent goals. The GDELT Project [ 21]
generates event data from news articles to study human societal-scale behavior and beliefs across
the world. The ACLED Project [22] collects crisis and political events to map and analyze them.
EM-DAT [23] is a dataset of mass disaster events to improve disaster related decision-making
at the (inter-)national level. EventWiki [24] manually identified Wikipedia articles about major
events. They extracted event-related data from infoboxes and article texts while classifying each
event into 95 event types based on the used infobox template. The first three use a relational
data model for storing extracted events while this can only be assumed for EventWiki. In
contrast, EventKG [25] uses the knowledge graph data model to consolidate event data from
multiple sources into a common format. EventKG focuses on the completeness of temporal
information regarding events. Intended use cases are Digital Humanity and NLP tasks like
question answering, timeline generation, and language- or community-specific cross-cultural
studies. Table 3 shows an overview of the mentioned datasets.</p>
      <p>
        The closest dataset to WikiEvents is EventKG. Both are knowledge graphs and include events
with temporal and spatial information. The main diferences between WikiEvents over EventKG
are: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) the inclusion of links between event summaries and mentioned entities, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) the inclusion
of boundary data of locations, (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) only identifying named entities as being locations but not for
being actors, (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) not being multilingual by only including information in English, (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) having a
less abstract ontology to include source-specific information. Figure 6 shows a comparison of
graph structure between WikiEvents and EventKG.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Summary and Future Work</title>
      <p>
        We presented the novel WikiEvents knowledge graph that is extracted automatically from
the Wikipedia Current events portal and other data sources. Its main aim are NLP tasks
such as EL and event-related LE. We evaluated these capabilities by (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) comparing it to the
existing human-annotated EL dataset AIDA-YAGO2 and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) training three BERT models on
event-related LE. Also, we describe multiple use cases of WikiEvents regarding event-related
research exploring the 2022 Russian invasion of Ukraine. The use cases included sub-event
analysis, event development analysis, and event localization. Finally, to highlight diferences,
we compared the dataset to the existing event knowledge graphs such as EventKG.
      </p>
      <p>In the near future, we will improve the event location identification heuristic in news
summaries and continuously improve the temporal and spatial information extractor since
Wikipedia’s website constantly evolves. To foster machine learning research, we will create
larger task-specific datasets with dedicated train-validation-test splits. Our approach to training
event-related LE BERT models using WikiEvents can be further evaluated by comparing the
model performances to models trained on comparable datasets.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>
        This research was supported by grants from NVIDIA and utilized NVIDIA 2 x RTX A5000 24GB.
Furthermore, we acknowledge the financial support from the Federal Ministry for Economic
Afairs and Energy of Germany in the project CoyPu (project number 01MK21007[G]).
[13] M. Mosbach, M. Andriushchenko, D. Klakow, On the stability of fine-tuning BERT:
misconceptions, explanations, and strong baselines, in: ICLR, OpenReview.net, 2021.
[14] L. Wu, F. Petroni, M. Josifoski, S. Riedel, L. Zettlemoyer, Scalable zero-shot entity linking
with dense entity retrieval, in: EMNLP (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), Association for Computational Linguistics,
2020, pp. 6397–6407.
[15] B. Z. Li, S. Min, S. Iyer, Y. Mehdad, W. Yih, Eficient one-pass end-to-end entity linking for
questions, in: EMNLP (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), Association for Computational Linguistics, 2020, pp. 6433–6441.
[16] J. Raiman, O. Raiman, Deeptype: Multilingual entity linking by neural type system
evolution, in: AAAI, AAAI Press, 2018, pp. 5406–5413.
[17] S. Broscheit, Investigating entity knowledge in BERT with simple neural end-to-end entity
linking, in: CoNLL, Association for Computational Linguistics, 2019, pp. 677–685.
[18] M. Clark, G. Barros, K. Stepanenko, Russian Ofensive Campaign
Assessment, February 28, 2022, https://understandingwar.org/backgrounder/
russian-offensive-campaign-assessment-february-28-2022, 2022. Accessed:
202209-27.
[19] K. Stepanenko, L. Philipson, K. Lawlor, F. W. Kagan, Russian Ofensive Campaign
Assessment, August 1,
https://understandingwar.org/backgrounder/russian-ofensive-campaignassessment-august-1, 2022. Accessed: 2022-09-27.
[20] K. Stepanenko, K. Hird, G. Barros, F. W. Kagan, Russian Ofensive Campaign
Assessment, August 31,
https://understandingwar.org/backgrounder/russian-ofensive-campaignassessment-august-31, 2022. Accessed: 2022-09-27.
[21] P. A. Schrodt, Automated production of high-volume, real-time political event data, in:
      </p>
      <p>Apsa 2010 annual meeting paper, 2010.
[22] C. Raleigh, R. Kishi, Updates to the Armed Conflict Location &amp; Event Data Project,
2020. URL: https://acleddata.com/acleddatanew/wp-content/uploads/2020/10/ACLED_
UpdatesOverview_2020.pdf, accessed: 2022-11-14.
[23] EM-DAT, Disaster profile for floods. em-dat: International disaster database, 2006.
[24] T. Ge, L. Cui, B. Chang, Z. Sui, F. Wei, M. Zhou, Eventwiki: A knowledge base of major
events, in: LREC, European Language Resources Association (ELRA), 2018.
[25] S. Gottschalk, E. Demidova, Eventkg - the hub of event knowledge on the web - and
biographical timeline generation, CoRR abs/1905.08794 (2019).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lingad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Karimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <article-title>Location extraction from disaster-related microblogs</article-title>
          , in: WWW (Companion Volume),
          <source>International World Wide Web Conferences Steering Committee / ACM</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>1017</fpage>
          -
          <lpage>1020</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sun</surname>
          </string-name>
          , G. Cong, J. Han,
          <article-title>Joint recognition and linking of fine-grained locations from tweets</article-title>
          , in: WWW, ACM,
          <year>2016</year>
          , pp.
          <fpage>1271</fpage>
          -
          <lpage>1281</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nishu</surname>
          </string-name>
          ,
          <article-title>Mapping local news coverage: Precise location extraction in textual news content using fine-tuned BERT based language model</article-title>
          ,
          <source>in: Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science</source>
          , Association for Computational Linguistics, Online,
          <year>2020</year>
          , pp.
          <fpage>155</fpage>
          -
          <lpage>162</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .nlpcss-
          <volume>1</volume>
          .17.
          <article-title>doi:1 0 . 1 8 6 5 3 / v 1 / 2 0 2 0</article-title>
          . n l p c
          <source>s s - 1 . 1 7 .</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Joseph</surname>
          </string-name>
          ,
          <article-title>Neurotpr: A neuro-net toponym recognition model for extracting locations from social media messages</article-title>
          ,
          <source>Trans. GIS</source>
          <volume>24</volume>
          (
          <year>2020</year>
          )
          <fpage>719</fpage>
          -
          <lpage>735</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Yosef</surname>
          </string-name>
          , I. Bordino,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fürstenau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pinkal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Spaniol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Taneva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thater</surname>
          </string-name>
          , G. Weikum,
          <article-title>Robust disambiguation of named entities in text</article-title>
          , in: EMNLP,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          ,
          <year>2011</year>
          , pp.
          <fpage>782</fpage>
          -
          <lpage>792</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mandalios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tzamaloukas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chortaras</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Stamou, GEEK: incremental graph-based entity disambiguation</article-title>
          ,
          <source>in: LDOW@WWW</source>
          , volume
          <volume>2073</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Möller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Usbeck</surname>
          </string-name>
          ,
          <article-title>Survey on english entity linking on wikidata: Datasets and approaches</article-title>
          ,
          <source>Semantic Web</source>
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <fpage>925</fpage>
          -
          <lpage>966</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sakor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <article-title>Falcon 2.0: An entity and relation linking tool over wikidata</article-title>
          , in: CIKM, ACM,
          <year>2020</year>
          , pp.
          <fpage>3141</fpage>
          -
          <lpage>3148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Doerr</surname>
          </string-name>
          ,
          <article-title>The CIDOC CRM, an Ontological Approach to Schema Heterogeneity</article-title>
          , in: Y. Kalfoglou,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schorlemmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sheth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          , M. Uschold (Eds.),
          <source>Semantic Interoperability and Integration</source>
          , volume
          <volume>4391</volume>
          <source>of Dagstuhl Seminar Proceedings (DagSemProc)</source>
          ,
          <source>Schloss Dagstuhl - Leibniz-Zentrum für Informatik</source>
          , Dagstuhl, Germany,
          <year>2005</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . URL: https://drops.dagstuhl.de/opus/volltexte/2005/35. doi:
          <article-title>1 0 . 4 2 3 0 / D a g S e m P r o c . 0 4 3 9 1 . 2 2 .</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>W. R. van Hage</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Malaisé</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Segers</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Hollink</surname>
          </string-name>
          , G. Schreiber,
          <article-title>Design and use of the simple event model (SEM)</article-title>
          ,
          <source>J. Web Semant</source>
          .
          <volume>9</volume>
          (
          <year>2011</year>
          )
          <fpage>128</fpage>
          -
          <lpage>136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hellmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Brümmer, Integrating NLP using linked data</article-title>
          ,
          <source>in: ISWC (2)</source>
          , volume
          <volume>8219</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2013</year>
          , pp.
          <fpage>98</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding, in: NAACL-HLT (1), Association for Computational Linguistics</article-title>
          ,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>