=Paper=
{{Paper
|id=Vol-2787/paper5
|storemode=property
|title=Multi-Modal Subjective Context Modelling and Recognition
|pdfUrl=https://ceur-ws.org/Vol-2787/paper5.pdf
|volume=Vol-2787
|authors=Qiang Shen,Stefano Teso,Wanyi Zhang,Hao Xu,Fausto Giunchiglia
|dblpUrl=https://dblp.org/rec/conf/ecai/ShenTZXG20
}}
==Multi-Modal Subjective Context Modelling and Recognition==
<pdf width="1500px">https://ceur-ws.org/Vol-2787/paper5.pdf</pdf>
<pre>
Eleventh International Workshop Modelling and Reasoning in Context (MRC) @ECAI 2020                                                               32


              Multi-Modal Subjective Context Modelling and
                             Recognition
     Qiang Shen 1,2 and Stefano Teso2 and Wanyi Zhang2 and Hao Xu 1 and Fausto Giunchiglia 1,2


Abstract. Applications like personal assistants need to be aware of          text information [10], and several context ontologies have been pro-
the user’s context, e.g., where they are, what they are doing, and with      posed. Typical examples include CONON [16] and CaCONT [17].
whom. Context information is usually inferred from sensor data, like         CONON focuses on modeling locations by providing an upper on-
GPS sensors and accelerometers on the user’ smartphone. This pre-            tology and lower domain-specific ontologies organized into a hierar-
diction task is known as context recognition. A well-defined context         chy. CaCONT defines several types of entities, and provides different
model is fundamental for successful recognition. Existing models,            levels of abstraction for specifying location of entities, e.g., GPS and
however, have two major limitations. First, they focus on few aspects,       location hierarchies. Focusing on semantic information of place, the
like location or activity, meaning that recognition methods based on         work in [18] proposed a place-oriented ontology model representing
them can only compute and leverage few inter-aspect correlations.            different levels of place and related activities and improve the perfor-
Second, existing models typically assume that context is objective,          mance of place recognition. In [9], they proposed an ontology model
whereas in most applications context is best viewed from the user’s          involving social situation and the interaction between people.
perspective. Neglecting these factors limits the usefulness of the con-         These models, however, suffer from two main limitations. First,
text model and hinders recognition. We present a novel ontological           in order to support context recognition, the model should account
context model that captures four dimensions, namely time, location,          for subjectivity of context descriptions. For instance, the objective
activity, and social relations. Moreover, our model defines three lev-       location “hospital” plays different roles for different people: for pa-
els of description (objective context, machine context, and subjective       tients it is a “place for recovering”, while for nurses it is a “work
context) that naturally support subjective annotations and reasoning.        place”. This makes all the difference for personal assistants because
An initial context recognition experiment on real-world data hints at        the services that a user needs strongly depend on his or her subjec-
the promise of our model.                                                    tive viewpoint. Most context models ignore this fact, with few ex-
                                                                             ceptions, cf. [8]. Second, arguably answers to four basic questions
                                                                             – “what time is it?”, “where are you?”, “what are you doing?”, and
1 INTRODUCTION                                                               “who are you with?” – are necessary to define human contexts. Cor-
The term “context” refers to any kind of information necessary to de-        relations between these aspects are also fundamental in recognition
scribe the situation that an individual is in [2]. Automatic recognition     and reasoning: if the user is in her room, a personal assistant should
of personal context is the key in applications like personal assistants,     be more likely to guess that she is “studying” or “resting”, rather than
smart environments, and health monitoring apps, because it enables           “swimming”. In stark contrast, most models are restricted to one or
intelligent agents to respond proactively and appropriately based on         few of the above four aspects and therefore fail to capture important
(an estimate of) their user’s context. For instance, a personal assis-       correlations, like those between activity and location or between time
tant aware that its user is at home, alone, doing housework, could           and social context.
suggest him or her to order a take-away lunch. Since context infor-             As a remedy, we introduce a novel ontological context model that
mation is usually not available, the machine has to infer it from sen-       supports both reasoning and recognition from a subjective perspec-
sor data, like GPS coordinates, acceleration, and nearby Bluetooth           tive, that captures time, location, activity, and social relations, and
devices measured by the user’s smartphone. The standard approach             and that enables downstream context recognition tools to leverage
to context recognition is to train a machine learning model on a large       correlations between these four fundamental dimensions. Our model
set of sensor readings and corresponding context annotations to pre-         also incorporates three levels of description for each aspect, namely
dict the latter from the former. Existing implementations are quite          objective, machine-level, and subjective, which naturally support dif-
diverse, and range from shallow models like logistic regression [14]         ferent kinds of annotations. We apply and test our approach by
to deep neural networks like feed-forward networks [15], LSTMs [7],          collaborating with sociology experts within the SmartUnitn-One
and CNNs [12].                                                               project [6]. We validate empirically our model by evaluating context
   A context model defines how context data are structured. A good           recognition performance on the SmartUnitn-One context and sensor
context model should capture all kinds of situational information rel-       annotation data set [6], which was annotated consistently with our
evant to the application at hand [2] and use the right level of ab-          context model. Our initial results shows that handling correlations
straction [1]. Ontology is a widely accepted tool for formalizing con-       across aspects substantially improves recognition performance and
                                                                             makes it possible to predict activities that are otherwise very hard to
1 College of Computer Science and Technology, Jilin University, Changchun,   recognize.
  China, email: shenqiang19@mails.jlu.edu.cn, xuhao@jlu.edu.cn
2 University of Trento, Italy, email: {stefano.teso, wanyi.zhang,
  fausto.giunchiglia }@unitn.it


Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Eleventh International Workshop Modelling and Reasoning in Context (MRC) @ECAI 2020                                                              33


2   CONTEXT MODELLING
Context is a theory of the world that encodes an individual’ subjec-
tive perspective about it [3]. Individuals have a limited and partial
view of the world at all times in their everyday life. For instance,
consider a classroom with a teacher and a few students. Despite all
the commonalities, each person in the room has a different context
because they focus on different elements of their personal experi-
ence (the students focus on the teacher while the teacher focuses on
the students) and ignore others (like the sound of the projector, the
weather outside, and so on.) Given the diversity and complexity of
individual experiences, formalizing the notion of context in its en-
tirety is essentially impossible. For this reason, simpler but useful
application-specific solutions are necessary.
   Previous work has observed that reasoning in terms of questions
like “what time is it?”, “where are you?”, “what are you doing?”,
“who are you with?”, “what are you with?” is fundamental for de-                         Figure 1. Illustration of our context model.
scribing and collecting the behavior of individuals [3]. Motivated by
this observation and our previous work [4, 5, 11] , we designed an
ontology-based context model organized according to the aforemen-
tioned dimensions of the world: time, location, activity, social rela-
tions and object. Formally, context is defined as a tuple:

               Context = hTIME, WE, WA, WO, WIi

where:
TIME captures the exact time of context, e.g., “morning”. We refer
  to it as the temporal context. Informally, it answers the question
  “When did this context occur?”.
WE captures the exact location of context, e.g., “classroom”. We re-
  fer to it as the endurant context. Informally, it answers the question
  “Where are you?”.
WA captures the activity of context, e.g., “studying”. We refer to
  it as the perdurant context. Informally, it answers the question
  “What are you doing?”.
WO captures the social relations of context, e.g., “friend”. We re-
  fer to it as the social context. Informally, it answers the question
  “Who are you with?”.
WI captures the materiality of context, e.g., “smartphone”. We re-
  fer to it as the object context. Informally, it answers the question
  “What are you with?”.                                                     Figure 2. Questions and answers in the SmartUnitn-One questionnaire.

Figure 1 shows a scenario as a knowledge graph representing the per-
sonal context of an individual in the class. For instance, attributes of
WO are “Class”, “Name”, and “Role”, and their values are “Person”,         well. For instance, while a machine views location (e.g., a building)
“Shen”, and “PhD student”, respectively. Edges represent relations         as a set of coordinates, humans interpret it based on its function (e.g.,
between entities, e.g., “Shen” is in relation “Attend” with “Lesson”.      whether the building is their home or office).
   The example in Figure 1 is presented in objective terms, that is,          To model context precisely and completely, in addition to con-
facts are stated as if they were independent of personal conscious         sidering four dimensions, as discussed above, we also model three
experiences. However, each person interprets the world and her sur-        perspectives: objective context, subjective context and machine con-
roundings from her personal privileged point of view, which accounts       text. Table 1 shows the above example viewed through three types
for her personal knowledge, mental characteristics, states, etc. For       of perspective. The objective context captures the fact that at the
instance, while in Figure 1 “Shen” has an objective role of Ph.D           University of Trento, Italy, at 11:00 AM, a person is attending a
student, for other people “Shen” plays the roles of a “friend” or a        class together with Shen. When moving from objective to subjec-
“classmate” subjectively. The subjective context which is related to       tive, things change dramatically. From the perspective of the ma-
personal consciousness, knowledge, etc. can provide more informa-          chine, the temporal context “11:00 AM” is viewed as a timestamp
tion for applications such as personal assistant in order to give more     timestamp “1581938718026”, and in subjective terms it becomes
intelligent services.                                                      “morning”; similarly, “University of Trento” becomes coordinates
   Notice that a person’s view of her context is radically different       “46◦ 04’N,11◦ 09’E” for the machine and “classroom” from a sub-
from what her handheld personal assistant observes. In fact, ma-           jective perspective. For the perdurant context, the activity of taking
chines interpret the world via sensors, while humans do not only in-       lesson can be subjectively annotated as “study” by user, but it can
terpret the world via their perceptions but with their knowledge as        be described as “connecting WIFI of classroom, sensors such as gy-


Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Eleventh International Workshop Modelling and Reasoning in Context (MRC) @ECAI 2020                                                                          34


     Level                    TIME                    WE                                         WA                             WO
     Objective Context        2020-02-17 11am         Via Sommarive, 9, 38123 Povo TN            Lesson                         Shen
     Machine Context          1581938718026           46◦ 04’01.9”N 11◦ 09’02.4”E                Accelerometer: 0g,0g,0g        “Shen” is in contact list
     Subjective Context       Morning                 Classroom                                  Studying                       Friend


Table 1.     An example of our three-partitioned context model. Each row gives a different description of the same underlying situation from the perspective of
                                                  the world (top), the machine (middle), and the user (bottom).


    Sensor                           Frequency                 Unit                an ontology with over 80 candidate labels, see Figure 2 for the
                                                                                   full list. Object context (WI) information was not collected as it is
    Acceleration                     20 Hz                     m/s2
    Linear Acceleration              20 Hz                     m/s2
                                                                                   too hard to track without disrupting the volunteer’s routines. All
    Gyroscope                        20 Hz                     rad/s               records were processed as in [20]. This resulted in 23309 records,
    Gravity                          20 Hz                     m/s2                each comprising 122 sensor readings (henceforth, features) and
    Rotation Vector                  20 Hz                     Unitless            self-reported annotations about location, activity, and social context.
    Magnetic Field                   20 Hz                     µT
    Orientation                      20 Hz                     Degrees
    Temperature                      20 Hz                     ◦C                  Experimental Setup. For every aspect in {WA, WE, WO}, we
    Atmospheric Pressure             20 Hz                     hPa                 trained a random forest to predict that aspect from sensor mea-
    Humidity                         20 Hz                     %                   surements. We randomly split the dataset into training (75% of the
    Proximity                        On change                 0/1
    Position                         Every minute              Lat./Lon.           records) and validation (25% of the records) subsets and then se-
    WIFI Network Connected           On change                 Unitless            lected the maximum depth of the forest using the validation set only.
    WIFI Networks Available          Every minute              Unitless            The classifier performance was evaluated using a rigorous 5-fold
    Running Application              Every 5 seconds           Unitless
    Battery Level                    On change                 %
                                                                                   cross validation procedure. The data set was randomly partitioned
    Audio from the internal mic      10 seconds per minute     Unitless            into 5 folds. We hold out the selected fold as the test set to train a
    Notifications received           On change                 Unitless            classifier on the remaining folds and compute the performance on
    Touch event                      On change                 0/1                 the held out (test) fold. Then, we compared this model to another
    Cellular network info            Once per minute           Unitless
    Screen Status, Flight Mode,      On change                 0/1                 random forests (with the same maximum depth) that was supplied
    Battery Charge, Doze Mode,                                                     both sensor data and annotations for (a subset of) the other aspects
    Headset Plugged in, Audio                                                      as inputs. In order to account for label skew (e.g., some locations
    Mode, Music Playback                                                           and activities are much more frequent than others), performance
                                                                                   was measured using the micro-average F1 score to account for class
 Table 2.    List of sensors. Proximity triggers when the phone detects very       imbalance.
             close objects, e.g., the user’s ear during a phone call.
                                                                                   Results and Discussion. The average F1 score across users are re-
                                                                                   ported in Figure 3. The plots show very clearly that knowledge of
roscope, accelerometer are sensed as static”. For the social context,              other aspects substantially improves recognition performance regard-
“Shen” is described as friend subjectively by the user and the ma-                 less of the aspect being predicted: supplying the other aspects as in-
chine senses “Shen” is in the contact list of the user.                            puts increases the F1 score of predicting WA and WE by more than
                                                                                   10% and for WO by more than 5%. A breakdown of performance
3     EMPIRICAL EVALUATION                                                         increase can be viewed in Table 3. The table shows that all aspects
                                                                                   are correlated, as expected, especially activity and location, and that
In order to evaluate the proposed context model, we carried out                    providing more aspects as inputs increases F1 almost additively.
a context recognition experiment using the SmartUnitn-One data
set [6], and studied whether recognition of subjective context is                    Inputs                            WA            WE            WO
feasible and whether taking inter-aspect correlations into account                   Sensors + WA                      –             +8.80%        +2.36%
helps recognition performance.                                                       Sensors + WE                      +8.27%        –             +3.09%
                                                                                     Sensors + WO                      +3.34%        +3.27%        –
Data Collection. The SmartUnitn-One data set consists of sensor                      Sensors + Other Aspects           +11.25%       +11.57%       +5.31%
readings and context annotations obtained from 72 volunteers
(university students) for a period of two weeks. All participants                   Table 3. Improvement in F1 score when using other aspects as inputs to
were required to install the i-Log app [19], which simultaneously                     the recognition model. Columns indicate the aspect being predicted.
records sensor data from several sensors (cf. Table 2) and context
annotations. During the first week, students were asked to report
their own context every 30 minutes by administering them question-                 Figure 4 shows F1 scores (again, averaged across users) for each la-
naires comprising three questions about location, activity, and social             bel. For WO, some labels are clearly easier to predict than others.
relations. The i-Log app collected sensor data at the same time.                   The performance improvement is usually in the 5–10% range, with
During the second week, the participants were only required to have                the notable exception of “other”, which improves by about 20%.
the application running for the sensor data collection. All records                It seems that location information always facilitates recognition of
were timestamped automatically. The questions were designed                        WO, while activity does not. Their combination, however, is always
according to our context model and possible answers were modelled                  beneficial. For WE, looking at either WO and WA helps recognition
following the America Time Use Survey (ATUS) [13], leading to                      performance in all cases, and providing both WO and WA gives a


Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Eleventh International Workshop Modelling and Reasoning in Context (MRC) @ECAI 2020                                                                          35


   Figure 3. F1 of our context recognition model. From left to right: perdurant (WA), endurant (WE), and social context (WO), respectively. The leftmost
       column refers to a predictor that uses sensor data only, while the other columns to predictors that in addition have access to context annotations.


              Figure 4. F1 of individual labels (averaged over users). From left to right: perdurant, endurant, and social context, respectively.


larger improvement than than providing them separately. The excep-                learning models built using our context model produce higher quality
tions are “library”, “study room”, and “shop”, for which knowing                  predictions than models based on less complete context models. As
WA improves more than knowing both WO and WA. This is some-                       for future work, we plan to study the effects of subjectivity more
what surprising, as we expect social context to be moderately indica-             in detail, to migrate our architecture to more refined learning ap-
tive of location, and deserves further investigation. Some locations              proaches (e.g., deep neural nets), and to carry out an extensive com-
(“canteen”, “on foot”, “auto”, “shop”, and “workplace”) receive a                 parison against the state-of-the-art in context recognition.
major increase in recognition performance, from 25% to 40% ap-
proximately. This is partly due to the rarity of these classes in the
data set, which shows that inter-aspect correlations supply to the lack           5    ACKNOWLEDGEMENT
of supervision. Finally for WA, some activities (like “housework”,                The research of FG has received funding from the European Union’s
“cultural activities”, and “hobbies”) are very hard to predict, as their          Horizon 2020 FET Proactive project “WeNet – The Internet of us”,
F1 score is below 30%, while others (“work”, “moving”, and “les-                  grant agreement No 823783. The research of ST and WZ have
son”) are much easier to predict, with more than 80% F1 score. This               received funding from the “DELPhi - DiscovEring Life Patterns”
mostly shows that rare activities are harder to predict, understand-              project funded by the MIUR Progetti di Ricerca di Rilevante Inter-
ably, although other factors might play a role. Using the full con-               esse Nazionale (PRIN) 2017 – DD n. 1062 del 31.05.2019.
text (with WE and WO) always improves performance, except for
“housework”. For all the other activities, the improvement is from
5% to 20%, and even larger for “Shopping”, “Sport” and “Travel-                   REFERENCES
ing”, for which the improvement is up to 30%.
                                                                                   [1] Claudio Bettini et al., ‘A survey of context modelling and reasoning
   This analysis provides ample support for our context model: corre-                  techniques’, Pervasive and Mobile Computing, (2010).
lations between different aspects improve context recognition perfor-              [2] Anind K Dey, ‘Understanding and using context’, Personal and ubiqui-
mance for most users and, even more importantly, some values (like                     tous computing, (2001).
“Canteen”) that are essentially impossible to recognize suddenly be-               [3] Fausto Giunchiglia, ‘Contextual reasoning’, Epistemologia, special is-
come much easier when full context information is provided.                            sue on I Linguaggi e le Macchine, (1993).
                                                                                   [4] Fausto Giunchiglia, Enrico Bignotti, and Mattia Zeni, ‘Human-like
                                                                                       context sensing for robot surveillance’, International Journal of Seman-
                                                                                       tic Computing, 12(01), 129–148, (2017).
4 CONCLUSION                                                                       [5] Fausto Giunchiglia, Enrico Bignotti, and Mattia Zeni, ‘Personal context
We designed a novel context model that captures situational infor-                     modelling and annotation’, in 2017 IEEE International Conference on
                                                                                       Pervasive Computing and Communications Workshops (PerCom Work-
mation about time, location, activity, and social relations of individ-                shops), (2017).
uals using subjective—rather than objective—terms. An initial con-                 [6] Fausto Giunchiglia et al., ‘Mobile social media usage and academic
text recognition experiments on real-world data showed that machine                    performance’, Computers in Human Behavior, (2018).


Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Eleventh International Workshop Modelling and Reasoning in Context (MRC) @ECAI 2020                                                       36


 [7] Nils Y Hammerla et al., ‘Deep, convolutional, and recurrent mod-
     els for human activity recognition using wearables’, arXiv preprint
     arXiv:1604.08880, (2016).
 [8] Mieczyslaw M Kokar, Christopher J Matheus, and Kenneth Baclawski,
     ‘Ontology-based situation awareness’, Information fusion, 10(1), 83–
     98, (2009).
 [9] Ilir Kola, Catholijn M Jonker, and M Birna van Riemsdijk, ‘Who’s
     that?-social situation awareness for behaviour support agents’, in Inter-
     national Workshop on Engineering Multi-Agent Systems, pp. 127–151.
     Springer, (2019).
[10] Reto Krummenacher and Thomas Strang, ‘Ontology-based context
     modeling’, in Proceedings, (2007).
[11] Nardine Osman, Carles Sierra, Ronald Chenu-Abente, Qiang Shen, and
     Fausto Giunchiglia, ‘Open social systems’, in 17th European Confer-
     ence on Multi-Agent Systems (EUMAS), Thessaloniki, Greece, (2020).
[12] Aaqib Saeed et al., ‘Learning behavioral context recognition
     with multi-stream temporal convolutional networks’, arXiv preprint
     arXiv:1808.08766, (2018).
[13] Kristina J Shelley, ‘Developing the american time use survey activity
     classification system’, Monthly Lab. Rev., (2005).
[14] Yonatan Vaizman, Katherine Ellis, and Gert Lanckriet, ‘Recogniz-
     ing detailed human context in the wild from smartphones and smart-
     watches’, IEEE Pervasive Computing, (2017).
[15] Yonatan Vaizman et al., ‘Context recognition in-the-wild: Unified
     model for multi-modal sensors and multi-label classification’, Proceed-
     ings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Tech-
     nologies, (2018).
[16] Xiaohang Wang et al., ‘Ontology based context modeling and reasoning
     using owl.’, in Percom workshops, (2004).
[17] Nan Xu et al., ‘CACOnt: A ontology-based model for context modeling
     and reasoning’, in Applied Mechanics and Materials, (2013).
[18] Laura Zavala, Pradeep K Murukannaiah, Nithyananthan Poosamani,
     Tim Finin, Anupam Joshi, Injong Rhee, and Munindar P Singh, ‘Platys:
     From position to place-oriented mobile computing’, Ai Magazine,
     36(2), 50–62, (2015).
[19] Mattia Zeni et al., ‘Multi-device activity logging’, in Proceedings of the
     2014 ACM International Joint Conference on Pervasive and Ubiquitous
     Computing: Adjunct Publication, pp. 299–302, (2014).
[20] Mattia Zeni et al., ‘Fixing mislabeling by human annotators leveraging
     conflict resolution and prior knowledge’, Proceedings of the ACM on
     Interactive, Mobile, Wearable and Ubiquitous Technologies, (2019).


Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

</pre>