<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multi-Modal Subjective Context Modelling and Recognition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Qiang Shen</string-name>
          <email>shenqiang19@mails.jlu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Teso</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Computer Science and Technology, Jilin University</institution>
          ,
          <addr-line>Changchun, fausto.giunchiglia</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fausto Giunchiglia</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>32</fpage>
      <lpage>36</lpage>
      <abstract>
        <p>Applications like personal assistants need to be aware of the user's context, e.g., where they are, what they are doing, and with whom. Context information is usually inferred from sensor data, like GPS sensors and accelerometers on the user' smartphone. This prediction task is known as context recognition. A well-defined context model is fundamental for successful recognition. Existing models, however, have two major limitations. First, they focus on few aspects, like location or activity, meaning that recognition methods based on them can only compute and leverage few inter-aspect correlations. Second, existing models typically assume that context is objective, whereas in most applications context is best viewed from the user's perspective. Neglecting these factors limits the usefulness of the context model and hinders recognition. We present a novel ontological context model that captures four dimensions, namely time, location, activity, and social relations. Moreover, our model defines three levels of description (objective context, machine context, and subjective context) that naturally support subjective annotations and reasoning. An initial context recognition experiment on real-world data hints at the promise of our model.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 INTRODUCTION</title>
      <p>
        The term “context” refers to any kind of information necessary to
describe the situation that an individual is in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Automatic recognition
of personal context is the key in applications like personal assistants,
smart environments, and health monitoring apps, because it enables
intelligent agents to respond proactively and appropriately based on
(an estimate of) their user’s context. For instance, a personal
assistant aware that its user is at home, alone, doing housework, could
suggest him or her to order a take-away lunch. Since context
information is usually not available, the machine has to infer it from
sensor data, like GPS coordinates, acceleration, and nearby Bluetooth
devices measured by the user’s smartphone. The standard approach
to context recognition is to train a machine learning model on a large
set of sensor readings and corresponding context annotations to
predict the latter from the former. Existing implementations are quite
diverse, and range from shallow models like logistic regression [14]
to deep neural networks like feed-forward networks [15], LSTMs [7],
and CNNs [12].
      </p>
      <p>
        A context model defines how context data are structured. A good
context model should capture all kinds of situational information
relevant to the application at hand [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and use the right level of
abstraction [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Ontology is a widely accepted tool for formalizing
context information [10], and several context ontologies have been
proposed. Typical examples include CONON [16] and CaCONT [17].
CONON focuses on modeling locations by providing an upper
ontology and lower domain-specific ontologies organized into a
hierarchy. CaCONT defines several types of entities, and provides different
levels of abstraction for specifying location of entities, e.g., GPS and
location hierarchies. Focusing on semantic information of place, the
work in [18] proposed a place-oriented ontology model representing
different levels of place and related activities and improve the
performance of place recognition. In [9], they proposed an ontology model
involving social situation and the interaction between people.
      </p>
      <p>These models, however, suffer from two main limitations. First,
in order to support context recognition, the model should account
for subjectivity of context descriptions. For instance, the objective
location “hospital” plays different roles for different people: for
patients it is a “place for recovering”, while for nurses it is a “work
place”. This makes all the difference for personal assistants because
the services that a user needs strongly depend on his or her
subjective viewpoint. Most context models ignore this fact, with few
exceptions, cf. [8]. Second, arguably answers to four basic questions
– “what time is it?”, “where are you?”, “what are you doing?”, and
“who are you with?” – are necessary to define human contexts.
Correlations between these aspects are also fundamental in recognition
and reasoning: if the user is in her room, a personal assistant should
be more likely to guess that she is “studying” or “resting”, rather than
“swimming”. In stark contrast, most models are restricted to one or
few of the above four aspects and therefore fail to capture important
correlations, like those between activity and location or between time
and social context.</p>
      <p>
        As a remedy, we introduce a novel ontological context model that
supports both reasoning and recognition from a subjective
perspective, that captures time, location, activity, and social relations, and
and that enables downstream context recognition tools to leverage
correlations between these four fundamental dimensions. Our model
also incorporates three levels of description for each aspect, namely
objective, machine-level, and subjective, which naturally support
different kinds of annotations. We apply and test our approach by
collaborating with sociology experts within the SmartUnitn-One
project [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We validate empirically our model by evaluating context
recognition performance on the SmartUnitn-One context and sensor
annotation data set [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which was annotated consistently with our
context model. Our initial results shows that handling correlations
across aspects substantially improves recognition performance and
makes it possible to predict activities that are otherwise very hard to
recognize.
      </p>
    </sec>
    <sec id="sec-2">
      <title>CONTEXT MODELLING</title>
      <p>
        Context is a theory of the world that encodes an individual’
subjective perspective about it [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Individuals have a limited and partial
view of the world at all times in their everyday life. For instance,
consider a classroom with a teacher and a few students. Despite all
the commonalities, each person in the room has a different context
because they focus on different elements of their personal
experience (the students focus on the teacher while the teacher focuses on
the students) and ignore others (like the sound of the projector, the
weather outside, and so on.) Given the diversity and complexity of
individual experiences, formalizing the notion of context in its
entirety is essentially impossible. For this reason, simpler but useful
application-specific solutions are necessary.
      </p>
      <p>
        Previous work has observed that reasoning in terms of questions
like “what time is it?”, “where are you?”, “what are you doing?”,
“who are you with?”, “what are you with?” is fundamental for
describing and collecting the behavior of individuals [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Motivated by
this observation and our previous work [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5, 11</xref>
        ] , we designed an
ontology-based context model organized according to the
aforementioned dimensions of the world: time, location, activity, social
relations and object. Formally, context is defined as a tuple:
      </p>
      <p>Context = hTIME, WE, WA, WO, WIi
where:
TIME captures the exact time of context, e.g., “morning”. We refer
to it as the temporal context. Informally, it answers the question
“When did this context occur?”.</p>
      <p>WE captures the exact location of context, e.g., “classroom”. We
refer to it as the endurant context. Informally, it answers the question
“Where are you?”.</p>
      <p>WA captures the activity of context, e.g., “studying”. We refer to
it as the perdurant context. Informally, it answers the question
“What are you doing?”.</p>
      <p>WO captures the social relations of context, e.g., “friend”. We
refer to it as the social context. Informally, it answers the question
“Who are you with?”.</p>
      <p>WI captures the materiality of context, e.g., “smartphone”. We
refer to it as the object context. Informally, it answers the question
“What are you with?”.</p>
      <p>Figure 1 shows a scenario as a knowledge graph representing the
personal context of an individual in the class. For instance, attributes of
WO are “Class”, “Name”, and “Role”, and their values are “Person”,
“Shen”, and “PhD student”, respectively. Edges represent relations
between entities, e.g., “Shen” is in relation A“ttend” with “Lesson”.</p>
      <p>The example in Figure 1 is presented in objective terms, that is,
facts are stated as if they were independent of personal conscious
experiences. However, each person interprets the world and her
surroundings from her personal privileged point of view, which accounts
for her personal knowledge, mental characteristics, states, etc. For
instance, while in Figure 1 “Shen” has an objective role of Ph.D
student, for other people “Shen” plays the roles of a “friend” or a
“classmate” subjectively. The subjective context which is related to
personal consciousness, knowledge, etc. can provide more
information for applications such as personal assistant in order to give more
intelligent services.</p>
      <p>Notice that a person’s view of her context is radically different
from what her handheld personal assistant observes. In fact,
machines interpret the world via sensors, while humans do not only
interpret the world via their perceptions but with their knowledge as
well. For instance, while a machine views location (e.g., a building)
as a set of coordinates, humans interpret it based on its function (e.g.,
whether the building is their home or office).</p>
      <p>To model context precisely and completely, in addition to
considering four dimensions, as discussed above, we also model three
perspectives: objective context, subjective context and machine
context. Table 1 shows the above example viewed through three types
of perspective. The objective context captures the fact that at the
University of Trento, Italy, at 11:00 AM, a person is attending a
class together with Shen. When moving from objective to
subjective, things change dramatically. From the perspective of the
machine, the temporal context “11:00 AM” is viewed as a timestamp
timestamp “1581938718026”, and in subjective terms it becomes
“morning”; similarly, “University of Trento” becomes coordinates
“46 ◦04’N,11◦09’E” for the machine and “classroom” from a
subjective perspective. For the perdurant context, the activity of taking
lesson can be subjectively annotated as “study” by user, but it can
be described as “connecting WIFI of classroom, sensors such as
gyLevel</p>
      <p>Data Collection. The SmartUnitn-One data set consists of sensor
readings and context annotations obtained from 72 volunteers
(university students) for a period of two weeks. All participants
were required to install the i-Log app [19], which simultaneously
records sensor data from several sensors (cf. Table 2) and context
annotations. During the first week, students were asked to report
their own context every 30 minutes by administering them
questionnaires comprising three questions about location, activity, and social
relations. The i-Log app collected sensor data at the same time.
During the second week, the participants were only required to have
the application running for the sensor data collection. All records
were timestamped automatically. The questions were designed
according to our context model and possible answers were modelled
following the America Time Use Survey (ATUS) [13], leading to
an ontology with over 80 candidate labels, see Figure 2 for the
full list. Object context (WI) information was not collected as it is
too hard to track without disrupting the volunteer’s routines. All
records were processed as in [20]. This resulted in 23309 records,
each comprising 122 sensor readings (henceforth, features) and
self-reported annotations about location, activity, and social context.
Experimental Setup. For every aspect in {WA, WE, WO}, we
trained a random forest to predict that aspect from sensor
measurements. We randomly split the dataset into training (75% of the
records) and validation (25% of the records) subsets and then
selected the maximum depth of the forest using the validation set only.
The classifier performance was evaluated using a rigorous 5-fold
cross validation procedure. The data set was randomly partitioned
into 5 folds. We hold out the selected fold as the test set to train a
classifier on the remaining folds and compute the performance on
the held out (test) fold. Then, we compared this model to another
random forests (with the same maximum depth) that was supplied
both sensor data and annotations for (a subset of) the other aspects
as inputs. In order to account for label skew (e.g., some locations
and activities are much more frequent than others), performance
was measured using the micro-average F1 score to account for class
imbalance.</p>
      <p>Results and Discussion. The average F1 score across users are
reported in Figure 3. The plots show very clearly that knowledge of
other aspects substantially improves recognition performance
regardless of the aspect being predicted: supplying the other aspects as
inputs increases the F1 score of predicting WA and WE by more than
10% and for WO by more than 5%. A breakdown of performance
increase can be viewed in Table 3. The table shows that all aspects
are correlated, as expected, especially activity and location, and that
providing more aspects as inputs increases F1 almost additively.</p>
      <p>Inputs
Sensors + WA
Sensors + WE
Sensors + WO
Sensors + Other Aspects
–
WA
+8.27%
+3.34%
+11.25%</p>
      <p>WE
+8.80%
–
+3.27%
+11.57%</p>
      <p>WO
+2.36%
+3.09%
–
+5.31%
Figure 4 shows F1 scores (again, averaged across users) for each
label. For WO, some labels are clearly easier to predict than others.
The performance improvement is usually in the 5– 10% range, with
the notable exception of “other”, which improves by about 20%.
It seems that location information always facilitates recognition of
WO, while activity does not. Their combination, however, is always
beneficial. For WE, looking at either WO and WA helps recognition
performance in all cases, and providing both WO and WA gives a
larger improvement than than providing them separately. The
exceptions are “library”, “study room”, and “shop”, for which knowing
WA improves more than knowing both WO and WA. This is
somewhat surprising, as we expect social context to be moderately
indicative of location, and deserves further investigation. Some locations
(“canteen”, “on foot”, “auto”, “shop”, and “workplace”) receive a
major increase in recognition performance, from 25% to 40%
approximately. This is partly due to the rarity of these classes in the
data set, which shows that inter-aspect correlations supply to the lack
of supervision. Finally for WA, some activities (like “housework”,
“cultural activities”, and “hobbies”) are very hard to predict, as their
F1 score is below 30%, while others (“work”, “moving”, and
“lesson”) are much easier to predict, with more than 80% F1 score. This
mostly shows that rare activities are harder to predict,
understandably, although other factors might play a role. Using the full
context (with WE and WO) always improves performance, except for
“housework”. For all the other activities, the improvement is from
5% to 20%, and even larger for “Shopping”, “Sport” and
“Traveling”, for which the improvement is up to 30%.</p>
      <p>This analysis provides ample support for our context model:
correlations between different aspects improve context recognition
performance for most users and, even more importantly, some values (like
“Canteen”) that are essentially impossible to recognize suddenly
become much easier when full context information is provided.
4</p>
    </sec>
    <sec id="sec-3">
      <title>CONCLUSION</title>
      <p>We designed a novel context model that captures situational
information about time, location, activity, and social relations of
individuals using subjective—rather than objective—terms. An initial
context recognition experiments on real-world data showed that machine
learning models built using our context model produce higher quality
predictions than models based on less complete context models. As
for future work, we plan to study the effects of subjectivity more
in detail, to migrate our architecture to more refined learning
approaches (e.g., deep neural nets), and to carry out an extensive
comparison against the state-of-the-art in context recognition.
5</p>
    </sec>
    <sec id="sec-4">
      <title>ACKNOWLEDGEMENT</title>
      <p>The research of FG has received funding from the European Union’s
Horizon 2020 FET Proactive project “WeNet – The Internet of us”,
grant agreement No 823783. The research of ST and WZ have
received funding from the “DELPhi - DiscovEring Life Patterns”
project funded by the MIUR Progetti di Ricerca di Rilevante
Interesse Nazionale (PRIN) 2017 – DD n. 1062 del 31.05.2019.
[7] Nils Y Hammerla et al., ‘Deep, convolutional, and recurrent
models for human activity recognition using wearables’, arXiv preprint
arXiv:1604.08880, (2016).
[8] Mieczyslaw M Kokar, Christopher J Matheus, and Kenneth Baclawski,
‘Ontology-based situation awareness’, Information fusion, 10(1), 83–
98, (2009).
[9] Ilir Kola, Catholijn M Jonker, and M Birna van Riemsdijk, ‘Who’s
that?-social situation awareness for behaviour support agents’, in
International Workshop on Engineering Multi-Agent Systems, pp. 127–151.</p>
      <p>Springer, (2019).
[10] Reto Krummenacher and Thomas Strang, ‘Ontology-based context
modeling’, in Proceedings, (2007).
[11] Nardine Osman, Carles Sierra, Ronald Chenu-Abente, Qiang Shen, and
Fausto Giunchiglia, ‘Open social systems’, in 17th European
Conference on Multi-Agent Systems (EUMAS), Thessaloniki, Greece, (2020).
[12] Aaqib Saeed et al., ‘Learning behavioral context recognition
with multi-stream temporal convolutional networks’, arXiv preprint
arXiv:1808.08766, (2018).
[13] Kristina J Shelley, ‘Developing the american time use survey activity
classification system’, Monthly Lab. Rev., (2005).
[14] Yonatan Vaizman, Katherine Ellis, and Gert Lanckriet,
‘Recognizing detailed human context in the wild from smartphones and
smartwatches’, IEEE Pervasive Computing, (2017).
[15] Yonatan Vaizman et al., ‘Context recognition in-the-wild: Unified
model for multi-modal sensors and multi-label classification’,
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous
Technologies, (2018).
[16] Xiaohang Wang et al., ‘Ontology based context modeling and reasoning
using owl.’, in Percom workshops, (2004).
[17] Nan Xu et al., ‘CACOnt: A ontology-based model for context modeling
and reasoning’, in Applied Mechanics and Materials, (2013).
[18] Laura Zavala, Pradeep K Murukannaiah, Nithyananthan Poosamani,
Tim Finin, Anupam Joshi, Injong Rhee, and Munindar P Singh, ‘Platys:
From position to place-oriented mobile computing’, Ai Magazine,
36(2), 50–62, (2015).
[19] Mattia Zeni et al., ‘Multi-device activity logging’, in Proceedings of the
2014 ACM International Joint Conference on Pervasive and Ubiquitous
Computing: Adjunct Publication, pp. 299–302, (2014).
[20] Mattia Zeni et al., ‘Fixing mislabeling by human annotators leveraging
conflict resolution and prior knowledge’, Proceedings of the ACM on
Interactive, Mobile, Wearable and Ubiquitous Technologies, (2019).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Bettini</surname>
          </string-name>
          et al.,
          <article-title>'A survey of context modelling and reasoning techniques'</article-title>
          ,
          <source>Pervasive and Mobile Computing</source>
          , (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Anind</surname>
            <given-names>K Dey</given-names>
          </string-name>
          , '
          <article-title>Understanding and using context', Personal and ubiquitous computing</article-title>
          , (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Fausto</given-names>
            <surname>Giunchiglia</surname>
          </string-name>
          , '
          <article-title>Contextual reasoning', Epistemologia, special issue on I Linguaggi e le Macchine, (</article-title>
          <year>1993</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Fausto</given-names>
            <surname>Giunchiglia</surname>
          </string-name>
          , Enrico Bignotti, and Mattia Zeni, '
          <article-title>Human-like context sensing for robot surveillance'</article-title>
          ,
          <source>International Journal of Semantic Computing</source>
          ,
          <volume>12</volume>
          (
          <issue>01</issue>
          ),
          <fpage>129</fpage>
          -
          <lpage>148</lpage>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Fausto</given-names>
            <surname>Giunchiglia</surname>
          </string-name>
          , Enrico Bignotti, and Mattia Zeni, '
          <article-title>Personal context modelling and annotation'</article-title>
          ,
          <source>in 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)</source>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Fausto</given-names>
            <surname>Giunchiglia</surname>
          </string-name>
          et al., '
          <article-title>Mobile social media usage and academic performance', Computers in Human Behavior, (</article-title>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>