<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Ontology Design Pattern for Microblog Entries</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cogan Shimizu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michelle Cheatham</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data Semantics Laboratory, Wright State University</institution>
          ,
          <addr-line>Dayton, OH</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Motivation &amp; Scope</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Due to the exponential growth of the Internet of Things and use of Social Media Platforms, observers have an unprecedented level of detailed information available on the behavior of communities. However, due to the highly heterogeneous nature and the immense volume of the data, a composite view is di cult to generate. Such a composite view would be exceptionally useful in the realms of insider threat detection, after-action forensics, and hazardous situation detection and avoidance. The Semantic Web, via ontology modeling, o ers a powerful tool for fusing the disparate data sources and formats. To this end, we have created an ontology design pattern (ODP) for the modeling of a simple microblog entry. This ODP is intended to t within an ecosystem for fusing social media, support advanced visualization, and provide a preliminary framework for trust assessment.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>investigating other visualizations in collaboration with domain experts from the
United States Air Force. In this paper, we describe a pattern for a MBE as an
entry point into developing the ecosystem.</p>
      <p>The MBE pattern is important for a number of reasons. First, microblog
entries are representative of a fairly large subset of publicly available social media
data. For example, Twitter2 the popular, public-facing microblogging platform,
allows a Tweet's payload to contain text, hyperlinks, images, or video. The
entries may also be geotagged and may explicitly refer to other users. Additionally,
there are many existing datasets that capture Tweets during natural disasters
and humanitarian crises (e.g. CrisisLex3).</p>
      <p>By de nition and intent, microtext4 is simple; its model is relatively
straightforward and requires little of the complexity that OWL brings to the table.
Regardless, it is important to note that this pattern is a fundamental building block
of the intended ODP ecosystem. However, due to its simplicity, it is relatively
straightforward to t with many existing patterns. Speci cally, we foresee easy
integration with the Modi edHazardousSituation Design Pattern [4] and
ReportingEvent [7]. As the ecosystem matures, we also foresee including existing
patterns regarding maps, climate, and public infrastructure.</p>
      <p>Finally, the MBE pattern has some components that allow for interesting
interaction: spatiotemporal extent and author trustworthiness. Spatiotemporal
extent of information is of particular interest to the modeling community as there
are still many open questions on its handling. However, it is an integral part of
any sort of response or intelligence operation. In a perfect world, we could assume
that any author neither seeks to mislead nor propagate lies. However, in light of
recent events, as well as the ODP's relevance to crisis and operational intelligence
management, it is necessary to include a component for the trustworthiness of
an author. Thus, the model for the microblog entry seeks to answer, at least,
the following competency questions. Due to the strong emphasis on geospatial
and temporal components of the fused data, we assume that these queries will
be executed using geoSPARQL5.
1. Who is the author of entry x ?
2. What are all the entries authored by y ?
3. What entries from time A to time B originate from region of interest C with
radius D ?
4. What is the trust value v for author y ?
5. What is the trust value v for entry x ?
6. What entries from authors with a trust value greater than v originate from
a region of C with radius D ?
7. What entries relate to topic T ?
2 https://twitter.com
3 http://crisislex.org/
4 Microtext is any su ciently short parcel of information in natural language. An</p>
      <p>MBE is an instance of microtext.
5 http://www.opengeospatial.org/standards/geosparql
(a) A Circle Packing visualization
generated by D36. Smaller circles
are related to the superimposed
circle via subsumption and
proximity in the same level of circle
denotes a short semantic distance.
(b) A standard view of geographic
information: pins on a map
background. This visualization can be
updated in real-time and allows
the user to see incoming data.</p>
      <p>Microtext is a valuable resource in the Semantic Web Community, as
evidenced by [2, 9, 10, 8]. However, to our knowledge this is the rst attempt at
modeling an MBE as an entity, instead of only modeling extracted information.</p>
      <p>The rest of the paper is organized as follows. Section 2 will address the design
decisions in the structure of the pattern and accompanying axioms. Section 3
provides a motivating example and interaction with real data. Section 4 addresses
future work and collaborations.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Pattern Overview</title>
      <p>This pattern was directly informed by the competency questions in the
preceding section; the competency questions are fairly straightforward and have a one
to one correspondence with the concepts in the pattern. As such, the microblog
entry pattern must capture both the entry's payload and its provenance. In
addition, it must capture any information extracted from the payload and analysis
of the author, such as answers to the questions: \To what is the microblog entry
referring?" or \How trusted is the author by their peers?"</p>
      <p>We will discuss the main design aspects of this pattern by referring its class
diagram as depicted in Figure 2. Yellow boxes indicate datatypes, light blue
boxes with dashed borders indicate external patterns. Purple is used for external
6 Circle Packing is an arrangement of circles on a surface so that all circles touch one
another. D3 is a powerful JavaScript library used for generating visualizations.
Fig. 2: A graphical representation of the microblog entry design pattern. Yellow
boxes indicate datatypes, light blue boxes with dashed borders indicate external
patterns. Purple is used for external classes belonging to PROV-O [5]. Green
is used for external classes belonging to [7]. White arrowheads represent the
owl:SubclassOf relation.
classes belonging to PROV-O [5]. Green depicts external classes belonging to [7].
White arrowheads represent the owl:SubclassOf relation.</p>
      <p>By indicating several of the classes as \external," we intend to convey that
the models for said classes are not indicative of the functionality of the
MicroblogEntry pattern. For example, in our implementation7 the light blue boxes
are currently wrappers for datatypes. However, it is not hard to imagine
increasingly complex models for each class. Below, we will discuss our implementation
and future iterations. We will consider the pattern in the context of our use-case:
event detection during a crisis. Furthermore, we assume that any microblog
entry populating the ontology occurs within the time-frame and are shown to be
relevant to the crisis situation.</p>
      <p>MicroblogEntry The MicroblogEntry is the core class. Here, we will describe
a few limitations placed upon its relations.
7 The OWL le can be found at https://raw.githubusercontent.com/
cogan-shimizu-wsu/MicroblogEntryOWL/master/MicroblogEntry.owl
MicroblogEntry v =1hasPayload.Payload
MicroblogEntry v =1hasAuthor.Author
MicroblogEntry v
1hasLocation.Location
(1)
(2)
(3)
1. A MicroblogEntry may only have one Payload.
2. A MicroblogEntry may only have one Author.
3. A MicroblogEntry might not have a location attached to it.</p>
      <p>ReportingEvent The ReportingEvent pattern is documented in [7]. This
established pattern provides for a lot of interplay with MicroblogEntry, as well as
providing structure for how information is shared.</p>
      <p>As ReportingEvent is itself a subclass of Situation, it will be reasonably
straightforward to integrate the Modi edHazardousSituation [4] pattern to the
MicroblogEntry. Additionally, ReportingEvent provides a framework for connecting the
\report" to an ActualEvent; thus, along with Topic, ground the MicroblogEntry
in reality. Finally, the fact that a ReportingEvent isBasedOn a Source, provides
us a vehicle for capturing the fact that a MicroblogEntry has been re-Tweeted or
shared (without modi cation).</p>
      <p>Media The Media class allows us to represent the platform on which the
MicroblogEntry was posted. In the case of our example in the next section, this
would be Twitter. However, it is also conceivable that Media may represent
CNN, Fox News, BBC, and so on. Obviously, these establishments are fairly
complex in their own right.</p>
      <p>Media is also drawn from [7], though is largely left for others to implement.
Monitoring di erent Media will be very important in our use case scenario,
especially when considering the TrustMetric for provenance and author. To this
point, it seems reasonable to expect the trustworthiness of the platform and
corporation to e ect the trustworthiness of the reported data.</p>
      <p>Payload The Payload is the content of the MicroblogEntry. In Figure 3, this is the
content in Box 2. For the general pattern, we opted to leave this as an external
pattern due to the expected heterogeneity of MBEs of di erent platforms and
even high variance of content on the same platform. That is, Twitter allows for
many di erent payloads: text, hyperlinks, images, and videos. Facebook, on the
other hand, o ers a superset of content types and no length restriction on text
payloads.</p>
      <p>In addition, we see the Payload playing a large role in de ning how MBEs will
interact with each other. In the case of Tweets, a Tweet may be \Retweeted,"
thus embedding a Tweet inside of a Payload. Furthermore, a Payload may
\mention" another user or author. Our next steps will include ways to more accurately
model these relationships between Authors, Payloads, and MicroblogEntries.</p>
      <p>For our initial implementation, as our test sets do not include Tweets with
pictures or hyperlinks, Payload wraps an xsd:string. Additionally, relevant
MicroblogEntries must have a relevant Payload. That is, the Payload must refer to
some Topic relevant to the crisis situation.</p>
      <p>Topic In some cases, it may make sense to have Topic include a targeted list
of terms from a controlled vocabulary. Or, instead, to have the Topic act as a
category. For example, in [3], Tweets were partitioned into the following
categories: a ected individuals, infrastructures and utilities, donations and
volunteer, caution and advice, sympathy and emotional support, useful information
and unknown.</p>
      <p>Our implementation currently wraps an xsd:string. This allows us to
dynamically generate a Topic as Tweets are encountered. As the intended ODP
ecosystem matures, it is conceivable that this Topic sub-pattern will be more fully
eshed out, allowing for more interesting interaction between MicroblogEntries
referencing the same Topic.</p>
      <p>
        Location There are many methods for representing location, e.g. the POI:Place
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] pattern or using WellKnownText (WKT) from OpenGIS, among others. To
promote reusability, we do not constrain the top-level pattern to use one or
another. In our implementation, however, we opted to use a WKT literal for
simplicity's sake. In the future, we expect to be able to augment this part of the
model by including relevant descriptors, such as the name of the location taken
from a gazetteer.
      </p>
      <p>TrustMetric The TrustMetric sub-pattern has the potential to be the most
complex due to its far reaching e ects on the interplay between Author, Payload,
and Media. In addition, the actual metric for trust will need its own provenance
and uncertainty measures. Until the system is actually implemented, it will be
di cult to completely model. Thus, in our implementation, we assume we are
getting a value between 0 and 1 from some black-box system. As such, we wrap
xsd:double.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Example Triples</title>
      <p>kast:Evacuation
## Extracted from Box 2
t:Topic;
"Car Accident"^^xsd:string;
## Extracted from Box 2
kast:examplepayload
rdf:type
kast:hasvalue
kast:referencesTopic
## As trust in author is distinct from trust in the MBE.
tm:TrustMetric;
.89^^xsd:double;
kast:CoganShimizu ## Extracted from Box 1
a prov:Person, prov:Agent;
foaf:givenName "Cogan Shimizu"^^xsd:string;
kast:hasTrustMetric kast:cogantm;
pz:Media, prov:Entity;</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future Work</title>
      <p>The Microblog Entry Ontology Design Pattern is a useful model for a very
commonplace structure, especially as the amount of social media data available for
inspection continues to increase. The potential applications of this pattern are
widespread, from determining public sentiment, measuring a ect, or
investigating community formation and evolution on social media networks.</p>
      <p>The Microblog Entry pattern is foundational. On its own, it is not particularly
remarkable. However, in the ecosystem it plays a fundamental role. In similar
systems, it is analogous to entity extraction. Knowing the entities in play is
important, but ultimately provides only a small facet of a crisis situation. The
Microblog Entry pattern serves a similar role. It provides the threads to weave
a more comprehensive picture. At this time, the pattern heavily relies on many
external patterns, though many of them can be implemented as simple wrappers
for datatypes. Future work will be focused on developing the ecosystem of ODPs
for building a Common Operating Picture for a crisis situation. We will also
investigate how the di erent visualizations can be e ected by the trust metric.
As the work progresses, we will be working closely with domain experts in the
United States Air Force.</p>
      <p>Acknowledgement. The authors acknowledge support by the Dayton Area
Graduate Studies Institute (DAGSI) and input from Vincent Schmidt, Ph.D.
2. S. P. Bhatt, H. Purohit, A. Hampton, V. Shalin, A. Sheth, and J. Flach. Assisting
coordination during crisis: A domain ontology based approach to infer resource
needs from tweets. In Proceedings of the 2014 ACM Conference on Web Science,
WebSci '14, pages 297{298, New York, NY, USA, 2014. ACM.
3. G. Burel, H. Saif, M. Fernandez, and H. Alani. On semantics and deep learning for
event detection in crisis situations. 2017. Available from http://semdeep.iiia.
csic.es/files/SemDeep-17_paper_5.pdf on September 6, 2017.
4. M. Cheatham, H. Ferguson, C. Vardeman, and C. Shimizu. Modi ed hazardous
situation odp. 2017. Available from http://www.michellecheatham.com/files/
modification-hazardous-situation.pdf on September 6, 2017.
5. P. Groth and L. Moreau, editors. PROV-Overview: An Overview of the PROV</p>
      <p>Family of Documents. W3C Working Group Note 30 April 2013, 2013.
6. P. Hitzler, A. Gangemi, K. Janowicz, A. Krisnadhi, and V. Presutti, editors.
Ontology Engineering with Ontology Design Patterns: Foundations and Applications.
Studies on the Semantic Web. IOS Press, Amsterdam/AKA Verlag, Heidelberg,
2016.
7. E. Kowalczuk and A. Lawrynowicz. The reporting event ontology
design pattern and its extension to report news events. 2017. Available
from http://ontologydesignpatterns.org/wiki/images/a/ac/WOP2016_paper_
18.pdf on September 6, 2017.
8. M. B. Lazreg, M. Goodwin, and O. Granmo. Information abstraction from crises
related tweets using recurrent neural network. In L. S. Iliadis and I.
Maglogiannis, editors, Arti cial Intelligence Applications and Innovations - 12th IFIP WG
12.5 International Conference and Workshops, AIAI 2016, Thessaloniki, Greece,
September 16-18, 2016, Proceedings, volume 475 of IFIP Advances in Information
and Communication Technology, pages 441{452. Springer, 2016.
9. R. Nithish, S. Sabarish, M. N. Kishen, A. M. Abirami, and A. Askarunisa. An
ontology based sentiment analysis for mobile products using tweets. In 2013 Fifth
International Conference on Advanced Computing (ICoAC), pages 342{347, Dec
2013.
10. P. Thakor and S. Sasi. Ontology-based sentiment analysis process for social media
content. Procedia Computer Science, 53:199 { 207, 2015. INNS Conference on Big
Data 2015 Program San Francisco, CA, USA 8-10 August 2015.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A.</given-names>
            <surname>Alves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Antunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. C.</given-names>
            <surname>Pereira</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bento</surname>
          </string-name>
          .
          <article-title>Semantic enrichment of places: Ontology learning from web</article-title>
          .
          <source>Int. J. Know.-Based Intell. Eng. Syst.</source>
          ,
          <volume>13</volume>
          (
          <issue>1</issue>
          ):
          <volume>19</volume>
          {
          <fpage>30</fpage>
          ,
          <string-name>
            <surname>Jan</surname>
          </string-name>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>