<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Interactive Learning Scenario for Real-time Environmental State Estimation Based on Heterogeneous and Dynamic Sensor Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Agnes Tegen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Davidsson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan A. Persson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Malmo ̈ University, School of Technology, Sweden Internet of Things and People Research Center</institution>
        </aff>
      </contrib-group>
      <fpage>77</fpage>
      <lpage>79</lpage>
      <abstract>
        <p>With the ongoing advances in the area of Internet of Things, the number of devices with sensors streaming data in our surroundings is growing rapidly. This will create new possibilities in continuously monitoring the state of the environment. However, this increasingly more complex setting is also posing new challenges, e.g. how to properly fuse data from different types of sensors with uncertain availability. We are focusing on a setting where the task is to do real-time continuous estimations of certain aspects of the state of an environment. These estimations are based on data streams from a heterogeneous and dynamic set of sensors in that environment. Typically, data from different types of sensors needs to be fused in order estimate the aspect. For instance, within an office setting this could be what type of activity is currently taking place in a room or the number of people in a certain area of a building. In previous work [1], the concept of dynamic intelligent virtual sensors was suggested as a framework for data fusion. Common for many scenarios of this type, is that there is no available model that fuses the data and estimates the desired aspect of the state of the environment. Thus, such a model needs to be learned based on the streamed data provided by the sensors. Although the sensors may generate large amounts of data, there is typically a lack of labeled data that can be used for supervised learning. The interactive learning challenge described above has been identified within an ongoing project with a number of industrial partners, partially validating its relevance to many real world applications.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In a given environment, we define the set of all sensors that generates data
as S = {s1, s2, . . .}. While all sensors in S produce at least one instance of
data, note that they do not necessarily have available data at all times. We
also introduce a set St ⊆ S, containing all sensors from which data is available
regarding the current point in time t. The data is generated by the sensors in a
sequential fashion. Each instance of data contains the following information: id
(unique identifier for the sensor), data (numerical or categorical measurement of
the environment), ts (timestamp, the point in time when the data was measured
according to the device).
I2nteractAiv.eTLegeeanrneitngal.Scenario for Real-time Environmental State Estimation</p>
      <p>We define a set of states, Y , representing the possible values of the aspect
of the environment that we are interested in. If the entire state set is known
beforehand, it can be defined along with the task. If not, the state set has to be
defined over time, as labeled data becomes available. The labels provided by a
user, denoted yts ∈ Y , may be used as training data and can be seen as ground
truth for the state of the environment at time ts. They can be provided by the
user’s own initiative or when queried by the learner. The labeled data are stored
for a certain time ct, which can be set based on e.g. data storage possibilities.</p>
      <p>The problem discussed in this paper can now be stated as follows: At any
given point in time t, the task is to maximise the accuracy of the estimation of
an aspect of the current state of the environment yˆt ∈ Y , using data from St, as
well as labels and data, where (t − ct) ≤ ts &lt; t, as input.
2</p>
      <p>
        Discussion
In active learning the learner asks an oracle to label data [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Compared to the
more common pool-based setting, where the learner starts with all unlabeled
data and can ask for labels in any order, our setting is stream-based. Starting
with a shortage, or non-existence, of labeled data, the algorithm must learn and
adapt over time, as labeled data gradually becomes available. At each point in
time the learner must decide whether to query or not, as it is not possible to query
for old data points. Still, how much and when to query needs to be balanced, as
there is a cost attached to it. Also, the aspect of reliability of the provided labels
has to be considered, as human feedback is typically noisy. Different humans do
not always agree on definitions or even with themselves over time.
      </p>
      <p>Another complicating factor of the problem is that the entire state set might
not be known from the beginning. This means that the learner must query not
only to obtain training data, but also to learn the state space. If, for instance, the
algorithm is not able to classify a state at a given time with sufficient certainty,
it could query the user for the current state.</p>
      <p>Generally, the different sensors do not provide data at synchronized points
in time, as some are time triggered, with non-standardized time intervals, while
others are event triggered. Furthermore, the timestamp attached to each data
point is based on the sensor’s individual clock. The learner, or a preprocessing
step, needs to accommodate for this and align the data time-wise.</p>
      <p>Sensor intensive systems can produce large amount of data and while it might
be possible to store some data for a specified length of time, chances are all data
cannot be stored indefinite. The learner must therefor incorporate the knowledge
obtained from the data, so that it is not lost if the data is later discarded.</p>
      <p>
        Transfer learning techniques is another way to handle the shortage of labeled
data. For instance, if a model is trained to classify an aspect based on data from
sensors in room A, the model could be adapted to do the corresponding task in
room B. While transfer learning has been used successfully in many applications,
the case where the input feature space for source and target (e.g. sensors in room
A and B respectively) differ, is still a relatively new field of research [
        <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
        ].
Acknowledgements This work was partially financed by the Knowledge
Foundation through the Internet of Things and People research profile.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Mihailescu</surname>
            ,
            <given-names>R.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Persson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davidsson</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eklund</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Towards collaborative sensing using dynamic intelligent virtual sensors</article-title>
          .
          <source>In: International Symposium on Intelligent and Distributed Computing</source>
          . pp.
          <fpage>217</fpage>
          -
          <lpage>226</lpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          , et al.:
          <article-title>A survey on transfer learning</article-title>
          .
          <source>IEEE Transactions on knowledge and data engineering</source>
          <volume>22</volume>
          (
          <issue>10</issue>
          ),
          <fpage>1345</fpage>
          -
          <lpage>1359</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Settles</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Active learning literature survey</article-title>
          .
          <source>Computer Sciences Technical Report 1648</source>
          , University of Wisconsin-Madison (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khoshgoftaar</surname>
            ,
            <given-names>T.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A survey of transfer learning</article-title>
          .
          <source>Journal of Big Data</source>
          <volume>3</volume>
          (
          <issue>1</issue>
          ),
          <volume>9</volume>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>