An Ontology Design Pattern for Microblog Entries Cogan Shimizu and Michelle Cheatham Data Semantics Laboratory, Wright State University, Dayton, OH, USA Abstract. Due to the exponential growth of the Internet of Things and use of Social Media Platforms, observers have an unprecedented level of detailed information available on the behavior of communities. How- ever, due to the highly heterogeneous nature and the immense volume of the data, a composite view is difficult to generate. Such a compos- ite view would be exceptionally useful in the realms of insider threat detection, after-action forensics, and hazardous situation detection and avoidance. The Semantic Web, via ontology modeling, offers a powerful tool for fusing the disparate data sources and formats. To this end, we have created an ontology design pattern (ODP) for the modeling of a simple microblog entry. This ODP is intended to fit within an ecosystem for fusing social media, support advanced visualization, and provide a preliminary framework for trust assessment. 1 Motivation & Scope In recent years, access to data has become increasingly trivial as Social Media Platforms and the Internet of Things (IoT) continue to grow. However, important latent or implicit information runs the risk of obfuscation simply by the sheer volume of collected data. Further, the data is presented and accessed via highly disparate vectors (e.g. microblog entries, visual media, and geotagged textual data). Thus, it is increasingly necessary to identify and develop methods for seamless fusion and visualization of information extracted from heterogeneous social media data. Such methods are especially important for obtaining an accurate and com- prehensive view of a crisis theater or battlespace (e.g. formulating a “Common Operating Picture”1 ). For these use cases, it is also important to take into ac- count the provenance and trustworthiness of the acquired data and for any con- clusions drawn from such data. To support the fusion of such heterogeneous data and the capture of its metadata, we will build an ecosystem of ontology design patterns [6]. ODPs enable sophisticated visualizations that leverage the inherent concept hierarchy, such as models displaying varying levels of granularity and interconnectedness. Figure 1, provides two examples of possible visualization methods that the microblog entry (MBE) will help support. We are currently 1 A Common Operating Picture is a single identical display of relevant operational information on materiel shared by more than one Command. This term is frequently investigating other visualizations in collaboration with domain experts from the United States Air Force. In this paper, we describe a pattern for a MBE as an entry point into developing the ecosystem. The MBE pattern is important for a number of reasons. First, microblog en- tries are representative of a fairly large subset of publicly available social media data. For example, Twitter2 the popular, public-facing microblogging platform, allows a Tweet’s payload to contain text, hyperlinks, images, or video. The en- tries may also be geotagged and may explicitly refer to other users. Additionally, there are many existing datasets that capture Tweets during natural disasters and humanitarian crises (e.g. CrisisLex3 ). By definition and intent, microtext4 is simple; its model is relatively straight- forward and requires little of the complexity that OWL brings to the table. Re- gardless, it is important to note that this pattern is a fundamental building block of the intended ODP ecosystem. However, due to its simplicity, it is relatively straightforward to fit with many existing patterns. Specifically, we foresee easy integration with the ModifiedHazardousSituation Design Pattern [4] and Re- portingEvent [7]. As the ecosystem matures, we also foresee including existing patterns regarding maps, climate, and public infrastructure. Finally, the MBE pattern has some components that allow for interesting interaction: spatiotemporal extent and author trustworthiness. Spatiotemporal extent of information is of particular interest to the modeling community as there are still many open questions on its handling. However, it is an integral part of any sort of response or intelligence operation. In a perfect world, we could assume that any author neither seeks to mislead nor propagate lies. However, in light of recent events, as well as the ODP’s relevance to crisis and operational intelligence management, it is necessary to include a component for the trustworthiness of an author. Thus, the model for the microblog entry seeks to answer, at least, the following competency questions. Due to the strong emphasis on geospatial and temporal components of the fused data, we assume that these queries will be executed using geoSPARQL5 . 1. Who is the author of entry x ? 2. What are all the entries authored by y? 3. What entries from time A to time B originate from region of interest C with radius D? 4. What is the trust value v for author y? 5. What is the trust value v for entry x ? 6. What entries from authors with a trust value greater than v originate from a region of C with radius D? 7. What entries relate to topic T ? 2 https://twitter.com 3 http://crisislex.org/ 4 Microtext is any sufficiently short parcel of information in natural language. An MBE is an instance of microtext. 5 http://www.opengeospatial.org/standards/geosparql (a) A Circle Packing visualization (b) A standard view of geographic generated by D36 . Smaller circles information: pins on a map back- are related to the superimposed ground. This visualization can be circle via subsumption and prox- updated in real-time and allows imity in the same level of circle de- the user to see incoming data. notes a short semantic distance. Fig. 1: Both visualizations will utilize the MBE pattern at the most granular level (i.e. smallest circles and map pins). Microtext is a valuable resource in the Semantic Web Community, as evi- denced by [2, 9, 10, 8]. However, to our knowledge this is the first attempt at modeling an MBE as an entity, instead of only modeling extracted information. The rest of the paper is organized as follows. Section 2 will address the design decisions in the structure of the pattern and accompanying axioms. Section 3 provides a motivating example and interaction with real data. Section 4 addresses future work and collaborations. 2 Pattern Overview This pattern was directly informed by the competency questions in the preced- ing section; the competency questions are fairly straightforward and have a one to one correspondence with the concepts in the pattern. As such, the microblog entry pattern must capture both the entry’s payload and its provenance. In ad- dition, it must capture any information extracted from the payload and analysis of the author, such as answers to the questions: “To what is the microblog entry referring?” or “How trusted is the author by their peers?” We will discuss the main design aspects of this pattern by referring its class diagram as depicted in Figure 2. Yellow boxes indicate datatypes, light blue boxes with dashed borders indicate external patterns. Purple is used for external 6 Circle Packing is an arrangement of circles on a surface so that all circles touch one another. D3 is a powerful JavaScript library used for generating visualizations. Fig. 2: A graphical representation of the microblog entry design pattern. Yellow boxes indicate datatypes, light blue boxes with dashed borders indicate external patterns. Purple is used for external classes belonging to PROV-O [5]. Green is used for external classes belonging to [7]. White arrowheads represent the owl:SubclassOf relation. classes belonging to PROV-O [5]. Green depicts external classes belonging to [7]. White arrowheads represent the owl:SubclassOf relation. By indicating several of the classes as “external,” we intend to convey that the models for said classes are not indicative of the functionality of the Mi- croblogEntry pattern. For example, in our implementation7 the light blue boxes are currently wrappers for datatypes. However, it is not hard to imagine increas- ingly complex models for each class. Below, we will discuss our implementation and future iterations. We will consider the pattern in the context of our use-case: event detection during a crisis. Furthermore, we assume that any microblog en- try populating the ontology occurs within the time-frame and are shown to be relevant to the crisis situation. MicroblogEntry The MicroblogEntry is the core class. Here, we will describe a few limitations placed upon its relations. 7 The OWL file can be found at https://raw.githubusercontent.com/ cogan-shimizu-wsu/MicroblogEntryOWL/master/MicroblogEntry.owl MicroblogEntry v =1hasPayload.Payload (1) MicroblogEntry v =1hasAuthor.Author (2) MicroblogEntry v ≤1hasLocation.Location (3) 1. A MicroblogEntry may only have one Payload. 2. A MicroblogEntry may only have one Author. 3. A MicroblogEntry might not have a location attached to it. ReportingEvent The ReportingEvent pattern is documented in [7]. This es- tablished pattern provides for a lot of interplay with MicroblogEntry, as well as providing structure for how information is shared. As ReportingEvent is itself a subclass of Situation, it will be reasonably straight- forward to integrate the ModifiedHazardousSituation [4] pattern to the Microblo- gEntry. Additionally, ReportingEvent provides a framework for connecting the “report” to an ActualEvent; thus, along with Topic, ground the MicroblogEntry in reality. Finally, the fact that a ReportingEvent isBasedOn a Source, provides us a vehicle for capturing the fact that a MicroblogEntry has been re-Tweeted or shared (without modification). Media The Media class allows us to represent the platform on which the Mi- croblogEntry was posted. In the case of our example in the next section, this would be Twitter. However, it is also conceivable that Media may represent CNN, Fox News, BBC, and so on. Obviously, these establishments are fairly complex in their own right. Media is also drawn from [7], though is largely left for others to implement. Monitoring different Media will be very important in our use case scenario, es- pecially when considering the TrustMetric for provenance and author. To this point, it seems reasonable to expect the trustworthiness of the platform and corporation to effect the trustworthiness of the reported data. Payload The Payload is the content of the MicroblogEntry. In Figure 3, this is the content in Box 2. For the general pattern, we opted to leave this as an external pattern due to the expected heterogeneity of MBEs of different platforms and even high variance of content on the same platform. That is, Twitter allows for many different payloads: text, hyperlinks, images, and videos. Facebook, on the other hand, offers a superset of content types and no length restriction on text payloads. In addition, we see the Payload playing a large role in defining how MBEs will interact with each other. In the case of Tweets, a Tweet may be “Retweeted,” thus embedding a Tweet inside of a Payload. Furthermore, a Payload may “men- tion” another user or author. Our next steps will include ways to more accurately model these relationships between Authors, Payloads, and MicroblogEntries. For our initial implementation, as our test sets do not include Tweets with pictures or hyperlinks, Payload wraps an xsd:string. Additionally, relevant Mi- croblogEntries must have a relevant Payload. That is, the Payload must refer to some Topic relevant to the crisis situation. Topic In some cases, it may make sense to have Topic include a targeted list of terms from a controlled vocabulary. Or, instead, to have the Topic act as a category. For example, in [3], Tweets were partitioned into the following cate- gories: affected individuals, infrastructures and utilities, donations and volun- teer, caution and advice, sympathy and emotional support, useful information and unknown. Our implementation currently wraps an xsd:string. This allows us to dynam- ically generate a Topic as Tweets are encountered. As the intended ODP ecosys- tem matures, it is conceivable that this Topic sub-pattern will be more fully fleshed out, allowing for more interesting interaction between MicroblogEntries referencing the same Topic. Location There are many methods for representing location, e.g. the POI:Place [1] pattern or using WellKnownText (WKT) from OpenGIS, among others. To promote reusability, we do not constrain the top-level pattern to use one or another. In our implementation, however, we opted to use a WKT literal for simplicity’s sake. In the future, we expect to be able to augment this part of the model by including relevant descriptors, such as the name of the location taken from a gazetteer. TrustMetric The TrustMetric sub-pattern has the potential to be the most complex due to its far reaching effects on the interplay between Author, Payload, and Media. In addition, the actual metric for trust will need its own provenance and uncertainty measures. Until the system is actually implemented, it will be difficult to completely model. Thus, in our implementation, we assume we are getting a value between 0 and 1 from some black-box system. As such, we wrap xsd:double. 3 Example Triples Figure 3 shows an example Tweet. The relevant data that will be extracted has been boxed in red. kast:CarAccident ## Extracted from Box 2 rdf:type t:Topic; t:hasName "Car Accident"^^xsd:string; . kast:Evacuation ## Extracted from Box 2 Fig. 3: An example Tweet with extracted data highlighted in red. Note, this example does not have a geolocation. rdf:type t:Topic; t:hasName "Evacuation"^^xsd:string; . kast:examplepayload ## Extracted from Box 2 rdf:type pl:Payload; kast:hasvalue "There is a car accident on 4th and Main. Be careful out there! #evac"^^xsd:string; kast:referencesTopic kast:CarAccident, kast:Evacuation; . kast:cogantm ## Note here that there are two trust metrics. rdf:type tm:TrustMetric; tm:hasValue .99^^xsd:double; . kast:mbetm ## As trust in author is distinct from trust in the MBE. rdf:type tm:TrustMetric; tm:hasValue .89^^xsd:double; . kast:CoganShimizu ## Extracted from Box 1 a prov:Person, prov:Agent; foaf:givenName "Cogan Shimizu"^^xsd:string; kast:hasTrustMetric kast:cogantm; . kast:Twitter rdf:type pz:Media, prov:Entity; . kast:examplets ## Extracted from Box 3 rdf:type time:Instant; time:inXSDDateTimeStamp "2017-07-12T10:01:00-5:00"^^xsd:dateTimeStamp; . And finally, kast:exampletweet rdf:type kast:MicroblogEntry, pz:ReportingEvent; kast:hasPayload kast:examplepayload; kast:writtenBy kast:CoganShimizu; kast:presentedon kast:Twitter; kast:hasTrustMetric kast:mbetm; kast:kastTimestamp kast:examplets; . 4 Conclusions and Future Work The Microblog Entry Ontology Design Pattern is a useful model for a very com- monplace structure, especially as the amount of social media data available for inspection continues to increase. The potential applications of this pattern are widespread, from determining public sentiment, measuring affect, or investigat- ing community formation and evolution on social media networks. The Microblog Entry pattern is foundational. On its own, it is not particularly remarkable. However, in the ecosystem it plays a fundamental role. In similar systems, it is analogous to entity extraction. Knowing the entities in play is important, but ultimately provides only a small facet of a crisis situation. The Microblog Entry pattern serves a similar role. It provides the threads to weave a more comprehensive picture. At this time, the pattern heavily relies on many external patterns, though many of them can be implemented as simple wrappers for datatypes. Future work will be focused on developing the ecosystem of ODPs for building a Common Operating Picture for a crisis situation. We will also investigate how the different visualizations can be effected by the trust metric. As the work progresses, we will be working closely with domain experts in the United States Air Force. Acknowledgement. The authors acknowledge support by the Dayton Area Grad- uate Studies Institute (DAGSI) and input from Vincent Schmidt, Ph.D. References 1. A. Alves, B. Antunes, F. C. Pereira, and C. Bento. Semantic enrichment of places: Ontology learning from web. Int. J. Know.-Based Intell. Eng. Syst., 13(1):19–30, Jan. 2009. 2. S. P. Bhatt, H. Purohit, A. Hampton, V. Shalin, A. Sheth, and J. Flach. Assisting coordination during crisis: A domain ontology based approach to infer resource needs from tweets. In Proceedings of the 2014 ACM Conference on Web Science, WebSci ’14, pages 297–298, New York, NY, USA, 2014. ACM. 3. G. Burel, H. Saif, M. Fernandez, and H. Alani. On semantics and deep learning for event detection in crisis situations. 2017. Available from http://semdeep.iiia. csic.es/files/SemDeep-17_paper_5.pdf on September 6, 2017. 4. M. Cheatham, H. Ferguson, C. Vardeman, and C. Shimizu. Modified hazardous situation odp. 2017. Available from http://www.michellecheatham.com/files/ modification-hazardous-situation.pdf on September 6, 2017. 5. P. Groth and L. Moreau, editors. PROV-Overview: An Overview of the PROV Family of Documents. W3C Working Group Note 30 April 2013, 2013. 6. P. Hitzler, A. Gangemi, K. Janowicz, A. Krisnadhi, and V. Presutti, editors. On- tology Engineering with Ontology Design Patterns: Foundations and Applications. Studies on the Semantic Web. IOS Press, Amsterdam/AKA Verlag, Heidelberg, 2016. 7. E. Kowalczuk and A. Lawrynowicz. The reporting event ontology de- sign pattern and its extension to report news events. 2017. Available from http://ontologydesignpatterns.org/wiki/images/a/ac/WOP2016_paper_ 18.pdf on September 6, 2017. 8. M. B. Lazreg, M. Goodwin, and O. Granmo. Information abstraction from crises related tweets using recurrent neural network. In L. S. Iliadis and I. Maglogian- nis, editors, Artificial Intelligence Applications and Innovations - 12th IFIP WG 12.5 International Conference and Workshops, AIAI 2016, Thessaloniki, Greece, September 16-18, 2016, Proceedings, volume 475 of IFIP Advances in Information and Communication Technology, pages 441–452. Springer, 2016. 9. R. Nithish, S. Sabarish, M. N. Kishen, A. M. Abirami, and A. Askarunisa. An ontology based sentiment analysis for mobile products using tweets. In 2013 Fifth International Conference on Advanced Computing (ICoAC), pages 342–347, Dec 2013. 10. P. Thakor and S. Sasi. Ontology-based sentiment analysis process for social media content. Procedia Computer Science, 53:199 – 207, 2015. INNS Conference on Big Data 2015 Program San Francisco, CA, USA 8-10 August 2015.