=Paper=
{{Paper
|id=None
|storemode=property
|title=Denoting Data in the Grounded Annotation Framework
|pdfUrl=https://ceur-ws.org/Vol-1035/iswc2013_poster_3.pdf
|volume=Vol-1035
|dblpUrl=https://dblp.org/rec/conf/semweb/ErpFVTHSSH13
}}
==Denoting Data in the Grounded Annotation Framework==
Denoting Data
in the Grounded Annotation Framework
Marieke van Erp1 , Antske Fokkens1 , Piek Vossen1 , Sara Tonelli2 , Willem
Robert van Hage3 , Luciano Serafini2 , Rachele Sprugnoli2 , and Jesper
Hoeksema1
1
VU University Amsterdam
{marieke.van.erp,antske.fokkens,piek.vossen,j.e.hoeksema}@vu.nl
2
Fondazione Bruno Kessler {satonelli,serafini,sprugnoli}@fbk.eu
3
SynerScope B.V. willem.van.hage@synerscope.com
Abstract. Semantic web applications are integrating data from more
and more different types of sources about events. However, most data an-
notation frameworks do not translate well to semantic web. We describe
the grounded annotation framework (GAF), a two-layered framework
that aims to build a bridge between mentions of events in a data source
such as a text document and their formal representation as instances.
By choosing a two-layered approach, neither the mention layer, nor the
semantic layer needs to compromise on what can be represented. We
demonstrate the strengths of GAF in flexibility and reasoning through a
use case on earthquakes in Southeast Asia.
1 Introduction
Semantic web applications are ingesting data from more and more different
sources such as output from natural language processing applications, sensor
data, videos or financial transactions. Each of these domains has their own data
annotation practices which first need to be reconciled with semantic web stan-
dards. One issue with integrating information from different sources is that rep-
resentation formats tend to look at their domain in isolation, making it difficult
to integrate information that comes from other domains.
The Grounded Annotation Framework (GAF) [1] aims at addressing this
problem by distinguishing instance mentions which can be domain specific from
instances conform to domain independent semantic web standards. In this man-
ner, we can integrate information for example extracted by NLP tools or from
sensor data in a formal context which can be shared by different applications
and over which we can perform reasoning. This paper addresses the advantages
of using GAF from the point of view of users of Linked Data.
We will describe GAF in Section 2, present an example in Section 3 and
conclude with pointers for future work in Section 4.
2 The Grounded Annotation Framework
The main property of GAF is that it distinguishes instances from instance
mentions. A mention is the act of referring to an object where an instance
is the object itself. The relation between instances and mentions is defined by
gaf:denotedBy, which is the only new predicate GAF introduces. Different re-
sources (or even the same resource) may refer to an instance in different ways
and each of these references may have properties of its own. This is quite com-
mon in natural language, where authors tend to alternate terms to refer to the
same object for stylistic reasons, but it can also play a role in other sources of
information. If, for instance, a sensor displays a measured temperature, this dis-
played value has properties of its own that are clearly not properties of the value
that was measured, such as the instrument that was used to measure it and its
error rate. In the remainder of this contribution, we will illustrate GAF through
the example of presenting instances in the Simple Event Model (SEM) [2] and
mentions in the TERENCE Annotation Format (TAF) [3] which represents lin-
guistic properties.
SEM is a model to express who did what, where, and when. It is not the
only RDF model to describe events but as SEM is not tied to a any domain
and is among the most flexible, we chose this model as the core of our semantic
layer. It should be noted however that, in principle any RDF schema can be
integrated into GAF. TAF is designed to annotate coreference relations between
event mentions as well as participants, locations and temporal expressions, which
covers the kind of information also represented in SEM. TAF has the additional
advantage that it already distinguishes between instances and instance mentions
for participants and locations. We use a slightly adapted variant of TAF that
extends this distinction to events and temporal expressions as described in [1].
We chose TAF as it is based on the ISO-TimeML standard and fits our event
use-case, however, any representation format can be used in GAF.
The gaf:denotedBy relations is used to link events represented in SEM to
specific mentions represented in TAF. If a linguistic analysis identifies a syntactic
relation between an event mention and the mention of a person, we can derive
that this person is an Actor of the event in SEM according to the analysis of
a specific text. Mentions thus play an important role in modelling provenance
of information. To model provenance we use the PROV-O ontology [4] as it
is compatible with our RDF representation and is recommended by W3C for
provenance modelling. When we represent alternative views in SEM, these views
are linked to the mentions they were derived from. This leads us to the original
source and hence information in who expressed which view.
Creating GAF Annotations
GAF annotations can be created both by starting from the linguistic layer and
the semantic layer. When starting from text for the mention layer, first TAF
annotations are added to the text using the Celct Annotation Tool [5], which are
then translated to SEM relations using a conversion script. Instances extracted
from a particular source (for example a document) are grouped into named
graphs, to which provenance information is added. We use manually defined
rules for mapping TAF to SEM, but plan to use machine learning in the future.
When starting from the semantic layer, events and event properties are linked
to textual mentions. We are currently working towards an annotation environ-
ment based on CROMER [6], which will allow the user to switch easily between
the linguistic and semantic layers.
3 Examples
The example sentences shown in Figure 1 both contain information about the
2004 Indian Ocean Earthquake and Tsunami. The articles disagree on the cause
of the earthquake; where Bloomberg ascribes it to moving tectonic plates, Vet-
eran’s Today sees a stealth attack submarine as the likely cause. Figure 2 shows
that these two declarations can co-exist within the GAF representation of the
earthquake. It is up to the application or user accessing the information to inter-
pret the fact that there is a contradiction and for example select only particular
sources for further processing. GAF provides the glue to connect non-semantic
web data to semantic web representation formats. The rdfs:isDefinedBy relation
at the top of Figure 2 shows how RDF predicates can be used to link GAF
representations to external resources such as the Linked Open Data cloud.1
"Indonesia lies in a zone where the Indo-Australian, Eurasian, Philippine and Pacific plates
meet and occasionally shift, causing earthquakes and sometimes generating tsunamis. There
have been hundreds of earthquakes in Indonesia since a 9.1 temblor in 2004 caused a
tsunami that swept across the Indian Ocean, devastating coastal communities and leaving more
than 220,000 people dead in Indonesia, Sri Lanka, India, Thailand and other countries."
(Bloomberg, 2009-01-07 01:55 EST)
"...were most concerned about the cause, scope, and consequences of the December 26, 2004
Indian Ocean tsunamis because they were far bigger and more destructive than they had
anticipated. More important, it had no clear alibi that their most likely source of the
disaster, the Multi-Mission Platform of the new stealth attack submarine, the USS Jimmy
Carter, had not been the culprit."
(Veteran’s Today, 2011-10-02)
Fig. 1. Sample sentences mentioning the December 2004 Indonesian earthquake
4 Conclusions and Future Work
We have presented GAF, a grounded annotation framework for integrating in-
formation from various sources. We have shown its flexibility in representing
contradicting information from different textual sources.
We are currently developing an annotation tool that allows users to easily
switch between linguistic and semantic annotation layers. After which we plan
to develop tools supporting easy integration of other types of information, such
as data from the Linked Open Data cloud, video metadata or sensor data.
1
http://groundedannotationframework.org/ provides full examples and the GAF def-
inition.
Acknowledgements
This research is supported by the European Unions 7th Framework Programme
via the NewsReader Project (ICT-316404) and by the BiographyNet project,
funded by the Netherlands eScience Center (http://esciencecenter.nl/).
sem:Event sem:EventType
dbpedia:Tectonic_Plate rdf:type
sem:Place rdf:type rdf:type rdf:type
dbpedia:Bloomberg rdf:type
wn30:synset-shift- rdf:type dbpedia:2004_Indian_Ocean_ wn30:synset-tsunami-
rdfs:isDefinedBy earthquake_and_ tsunami noun-1
verb-4 wn30:synset-
prov:attributedTo
dbpedia:Veterans_Today earthquake-noun-1
rdf:type gaf:G3
rdf:type sem:EventType
sem:hasLocation gaf:INSTANCE_197 sem:EventType sem:subEventOf
prov:attributedTo sem:EventType
skos:exactMatch sem:EventType sem:subEventOf
gaf:INSTANCE_181
sem:has
gaf:G2 gaf:INSTANCE_188
Actor
sem:subEventOf
dbpedia:Sundra_ gaf:causes
Trunch sem+:causes
gaf:INSTANCE_186
skos:exact gaf:INSTANCE_179
Match skos:exactMatch
gaf:INSTANCE_202
gaf:denotedBy
gaf:INSTANCE_200 gaf:denotedBy
gaf:G4
owl:objectProperty sem+:causes gaf:denotedBy
gaf:denotedBy
taf:INSTANCE_MENTION_118
wn30:synset-stable- taf:INSTANCE_MENTION_112
adjective-1 taf:causal_c
taf:hasParticipant
taf:INSTANCE_MENTION_120
_nsubj
gaf:INSTANCE_201
taf:INSTANCE_MENTION_40
skos:exact
Match
str:anchorOf str:anchorOf str:anchorOf str:anchorOf str:anchorOf
dbpedia:USS_Jimmy_Carter
_(SSN_23) "plates"@en "shift"@en "earthquakes"@en "temblor"@en "tsunami"@en prov:wasGeneratedBy
taf:annotation_
2013_03_24
Fig. 2. GAF representation of Earthquake example
References
1. Fokkens, A., van Erp, M., Vossen, P., Tonelli, S., van Hage, W.R., Serafini, L.,
Sprugnoli, R., Hoeksema, J.: GAF: A grounded annotation framework for events.
In: Proceedings of the first Workshop on Events: Definition, Dectection, Coreference
and Representation, Atlanta, USA (2013)
2. Van Hage, W.R., Malaisé, V., Segers, R., Hollink, L., Schreiber, G.: Design and use
of the simple event model (SEM). Journal of Web Semantics (2011)
3. Moens, M.F., Kolomiyets, O., Pianta, E., Tonelli, S., Bethard, S.: D3.1: State-of-
the-art and design of novel annotation languages and technologies: Updated version.
Technical report, TERENCE project-ICT FP7 Programme-ICT-2010-25410 (2011)
4. Moreau, L., Missier, P., Belhajjame, K., B’Far, R., Cheney, J., Coppens, S., Cress-
well, S., Gil, Y., Groth, P., Klyne, G., Lebo, T., McCusker, J., Miles, S., Myers, J.,
Sahoo, S., Tilmes, C.: PROV-DM: The PROV Data Model. Technical report, W3C
(2012)
5. Bartalesi Lenzi, V., Moretti, G., Sprugnoli, R.: CAT: the CELCT Annotation Tool.
In: Proceedings of LREC 2012. (2012)
6. Bentivogli, L., Girardi, C., Pianta, E.: Creating a Gold Standard for Person Cross-
Document Coreference Resolution in Italian News. In: Workshop on Resources
and Evaluation for Identity Matching, Entity Resolution and Entity Management.
(2008)