<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Demonstration: Real-Time Semantic Analysis of Sensor Streams</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Harshal Patni</string-name>
          <email>harshal@knoesis.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cory Henson</string-name>
          <email>cory@knoesis.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Cooney</string-name>
          <email>michael@knoesis.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amit Sheth</string-name>
          <email>amit@knoesis.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Krishnaprasad Thirunarayan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kno.e.sis - Ohio Center of Excellence in Knowledge-enabled Computing Department of Computer Science and Engineering, Wright State University Dayton</institution>
          ,
          <addr-line>OH 45435</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The emergence of dynamic information sources - including sensor networks - has led to large streams of real-time data on the Web. Research studies suggest, these dynamic networks have created more data in the last three years than in the entire history of civilization, and this trend will only increase in the coming years [1]. With this coming data explosion, real-time analytics software must either adapt or die [2]. This paper focuses on the task of integrating and analyzing multiple heterogeneous streams of sensor data with the goal of creating meaningful abstractions, or features. These features are then temporally aggregated into feature streams. We will demonstrate an implemented framework, based on Semantic Web technologies, that creates feature-streams from sensor streams in real-time, and publishes these streams as Linked Data. The generation of feature streams can be accomplished in reasonable time and results in massive data reduction.</p>
      </abstract>
      <kwd-group>
        <kwd>Streaming Sensor Data</kwd>
        <kwd>Abstraction</kwd>
        <kwd>Semantic Web</kwd>
        <kwd>Semantic Sensor Web</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Sensors produce huge amounts of low-level data about our environment that arrives in
the form of rapid, continuous, and time-varying streams [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. These data streams could
quickly overwhelm any system not capable of effectively detecting and analyzing the
most important data. Analyzing such sensor data streams and providing meaningful
abstractions in real-time presents a significant research challenge. An abstraction, also
called a feature, is a high-level representation of low-level sensor data.
      </p>
      <p>
        There has been a lot of work in the database community on analyzing and mining
real-time streaming data. Most of the current approaches within the database
community provide mathematical summaries (i.e., minimum, maximum, average and count)
for a single modality stream (like a temperature stream) over time (i.e. within a time
window) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. These summaries are necessary and useful, but provide little help in
answering questions involving real world events, such as: Which weather stations are
currently detecting a Blizzard? Or: What event (or sequence of events) is currently
being detected by a weather station?
      </p>
      <p>The ability to answer such questions requires the semantic integration and
inference over data from multiple single modality sensor streams using external domain
knowledge. A feature-stream can be generated by aggregating a sequence of features
detected by a particular sensor (or set of sensors), over a period of time.
Featurestreams provide a clear and intuitive representation of how events evolve over time.
An intuitive representation of trends in features will present decision makers with
actionable situation awareness.
Consider the following question: What weather events are currently being detected
near Dayton James Cox Airport? In order to answer this question, we would first
need to find sensors near Dayton James Cox Airport, then access data streams for
these sensors, integrate the streams capable of detecting the weather events, and
finally, detect and represent the events.</p>
      <p>The generation of feature streams requires a framework that can generate,
integrate, and reason over multiple heterogeneous sensor streams. Reasoning over the
integrated streams uses background knowledge and rules to generate feature-streams
that represent events in the real world. The feature-stream framework is divided into
four parts (see figure 1): (1) raw data generation, (2) data stream generation, (3)
feature-stream generation, and (4) feature stream access.
1
2
Raw Data Generation: The framework begins with the collection of raw
streaming data from sensors within an environment. In this demonstration, we utilize
MesoWest1 , a project within the Department of Meteorology at the University of
Utah, which provides near real-time access to weather sensor streams using a
service API. Observations provided by MesoWest are encoded as CSV text, and
includes measurements for temperature, visibility, precipitation, pressure, wind
speed, humidity, etc. Example data provided by MesoWest can be seen below.
The example contains information regarding the date and time of the observation,
along with temperature (TMPF), wind speed (SKNT), and precipitation (PREC)
observation values</p>
      <p>PARMETER = MON, DAY, YEAR, HR, MIN, TMZN, TMPF, SKNT, PREC</p>
      <p>VALUE = 11, 5, 2010, 13, 50, PDT, 30, 37, snow
Data Stream Generation: The second phase converts the stream of raw sensor
data into an RDF stream. The raw sensor stream obtained from MesoWest is
initially converted to Observation and Measurements (O&amp;M)2 format. O&amp;M is a
well-accepted XML standard in the sensors community. The SAX (Simple API
for XML) parser3 is used to generate the O&amp;M XML stream. Below is an
example encoding of the temperature, wind speed, and precipitation observations in
O&amp;M. The observation values for different time instants are separated using a
block separator @@.</p>
      <p>&lt;swe:encoding&gt;
1 http://mesowest.utah.edu/
2 http://www.opengeospatial.org/standards/om
3 http://www.saxproject.org/</p>
      <p>
        &lt;swe:TextBlock decimalSeparator="." tokenSeparator="," blockSeparator="@@"/&gt;
&lt;/swe:encoding&gt;
&lt;swe:values&gt;2010-5-11T13:50:00,30,37,snow@@&lt;/swe:values&gt;
The O&amp;M stream is then converted to an RDF4 stream. RDF is a Semantic Web
standard model for representation and interchange of data on the Web. XSLT5 is
used to convert the O&amp;M to RDF, conformant to the W3C Semantic Sensor
Network (SSN) ontology [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Below is an example RDF encoding of a temperature
observation. [Note that ssn, weather, and time correspond to the prefixes for the
SSN ontology, weather ontology, and OWL-Time ontology, respectively;
ssnweather corresponds to individuals generated by the system.]
ssn-weather:Observation_Temperature_KDAY_2005_10_21_5_30
a ssn:Observation ;
ssn:observedProperty weather:Temperature ;
ssn:observedBy ssn-weather:System_KDAY ;
ssn:observationResult ssn-weather:MeasureData_Temperature_KDAY_2010_05_11_13_50 ;
ssn:observationSamplingTime ssn-weather:Instant_2010_05_11_13_50 .
ssn-weather:MeasureData_Temperature_KDAY_2010_05_11_13_50
a ssn:SensorOutput ;
ssn:hasValue "30.0" ;
weather:uom weather:fahrenheit .
ssn-weather:Instant_2010_05_11_13_50_00
a time:Instant ;
time:inXSDDateTime "2010-05-11T13:50:00" .
3
      </p>
      <p>Feature Stream Generation: The third phase integrates the RDF sensor streams
and reasons over the integrated streams to detect features. Feature definitions are
obtained from National Oceanic and Atmospheric Administration (NOAA)6, and
defined in the weather ontology. The feature definitions are initially used to filter
the sensors capable of detecting a feature. A sensor is capable of detecting a
feature if it is capable of observing all the phenomena that compose a feature.
Filtering improves performance by reducing the number of sensor streams that are
reasoned upon. SPARQL7 is used for reasoning over the integrated sensor streams.
An example SPARQL rule for detecting a Flurry over weather station KDAY is
given below.</p>
      <p>PREFIX ssn-weather:&lt;http://knoesis.wright.edu/ssw/ont/ssn-weather.owll#&gt;
PREFIX ssn:&lt;http://http://purl.oclc.org/NET/ssnx/ssn/&gt;
PREFIX weather:&lt;http://knoesis.wright.edu/ssw/ont/weather.owl#&gt;
ASK
{
}
?windSpeedObs ssn:observedBy ssn-weather:System_KDAY .
?windSpeedObs ssn:observedProperty weather:WindSpeed .
?windSpeedObs ssn:observationResult ?windSpeedResult .
?windSpeedResult ssn:hasValue ?windSpeedValue .
?snowObs ssn:observedBy ssn-weather:System_SB1 .
?snowObs ssn:observedProperty weather:Snowfall .
?snowObs ssn:observationResult ?snowResult .
?snowResult ssn:hasValue ?snowValue .</p>
      <p>FILTER(?windSpeedValue &lt; 35)</p>
      <p>FILTER(?snowValue = "true")
The SPARQL rule is used to detect the most recent/current feature. A sequence of
features detected over time results in a feature stream.</p>
      <sec id="sec-1-1">
        <title>4 http://www.w3.org/RDF/</title>
      </sec>
      <sec id="sec-1-2">
        <title>5 http://www.w3.org/TR/xslt</title>
      </sec>
      <sec id="sec-1-3">
        <title>6 http://www.noaa.gov/</title>
      </sec>
      <sec id="sec-1-4">
        <title>7 http://www.w3.org/TR/rdf-sparql-query/</title>
        <p>
          Feature Stream Access: Finally, the feature stream is published as Linked Data
[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The features can be accessed using either directly by issuing SPARQL
queries to the RDF or through a map-based GUI8.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3 Demonstration</title>
      <p>During the workshop, a Google Maps based GUI will be demonstrated, showcasing
the generated feature streams. The user can either select all the weather stations in a
state, or search for a station by named location (using Geonames). Next the user is
provided with an option to select features of interest. The system can currently detect
blizzard, flurry, rain shower, and rain storm. Feature selection will result in the
filtering of stations that are able to detect the features of interest. Clicking on a station
shows the features detected over time along with the associated lower-level sensor
observations. Because the features may not occur in real-time at the demonstration
time, we will have a backup providing examples of interesting past events.</p>
    </sec>
    <sec id="sec-3">
      <title>4 Evaluation</title>
      <p>To evaluate the performance of this system, we collected 120 hours of data for
sensors in (and around) Utah between February 2nd to 6th 2003. . Figure 2 shows an
average of the amount of time (in ms) taken for each phase during feature generation. On
average, for each hour, 427 sensors provided data during the evaluation, and produced
an average of 1104 observations. 9 flurries, 1 rain shower, and 417 clear features were
detected during the evaluation. We found an order of magnitude distinction between
the number of observations and feature generated, which means storing only the
features (if applicable) would result in massive data reduction. A demonstration page9
will provide more details, including the storage evaluation</p>
      <p>Time
(ms)
300,000
200,000
100,000</p>
      <p>0</p>
      <p>Raw Data CollectionRaw Data to O&amp;M</p>
      <p>O&amp;M to RDF</p>
      <p>RDF to Features</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1].
          <source>Gigaom Aricle on Big Data</source>
          ,
          <year>2010</year>
          , http://gigaom.com/cloud/sensor
          <article-title>-networks-top-socialnetworks-for-big-data-2/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2].
          <source>Software must adapt or Die</source>
          ,
          <year>2010</year>
          , http://www.readwriteweb.com/archives/data_analytics_software_must_adapt.php
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]. Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and
          <article-title>Jennifer Widom Models and Issues in Data Stream Systems</article-title>
          ,
          <source>In Proceedings of the 21st ACM Symposium on Principles of Database Systems</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]. Refik Samet and
          <string-name>
            <given-names>Serhat</given-names>
            <surname>Tural</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Web based real-time meteorological data analysis and mapping information system</article-title>
          .
          <source>In Proceedings of WSEAS Transactions of Information Science. and Applications</source>
          ,
          <year>September 2010</year>
          ,
          <fpage>1115</fpage>
          -
          <lpage>1125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]. Bizer,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Heath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            , and
            <surname>Berners-Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Linked Data - The Story</surname>
          </string-name>
          So Far.
          <source>International Journal on Semantic Web and Information Systems</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          . Elsevier.
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>.</given-names>
            <surname>Lefort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Henson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Taylor</surname>
          </string-name>
          , K.,
          <string-name>
            <surname>Barnaghi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Compton</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Castro</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graybeal</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herzog</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janowicz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neuhaus</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nikolov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Page</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <source>Semantic Sensor Network XG Final Report, W3C Incubator Group Report</source>
          (
          <year>2011</year>
          ). Available at http://www.w3.org/2005/Incubator/ssn/XGR-ssn/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>