=Paper=
{{Paper
|id=Vol-2787/paper5
|storemode=property
|title=Multi-Modal Subjective Context Modelling and Recognition
|pdfUrl=https://ceur-ws.org/Vol-2787/paper5.pdf
|volume=Vol-2787
|authors=Qiang Shen,Stefano Teso,Wanyi Zhang,Hao Xu,Fausto Giunchiglia
|dblpUrl=https://dblp.org/rec/conf/ecai/ShenTZXG20
}}
==Multi-Modal Subjective Context Modelling and Recognition==
Eleventh International Workshop Modelling and Reasoning in Context (MRC) @ECAI 2020 32 Multi-Modal Subjective Context Modelling and Recognition Qiang Shen 1,2 and Stefano Teso2 and Wanyi Zhang2 and Hao Xu 1 and Fausto Giunchiglia 1,2 Abstract. Applications like personal assistants need to be aware of text information [10], and several context ontologies have been pro- the user’s context, e.g., where they are, what they are doing, and with posed. Typical examples include CONON [16] and CaCONT [17]. whom. Context information is usually inferred from sensor data, like CONON focuses on modeling locations by providing an upper on- GPS sensors and accelerometers on the user’ smartphone. This pre- tology and lower domain-specific ontologies organized into a hierar- diction task is known as context recognition. A well-defined context chy. CaCONT defines several types of entities, and provides different model is fundamental for successful recognition. Existing models, levels of abstraction for specifying location of entities, e.g., GPS and however, have two major limitations. First, they focus on few aspects, location hierarchies. Focusing on semantic information of place, the like location or activity, meaning that recognition methods based on work in [18] proposed a place-oriented ontology model representing them can only compute and leverage few inter-aspect correlations. different levels of place and related activities and improve the perfor- Second, existing models typically assume that context is objective, mance of place recognition. In [9], they proposed an ontology model whereas in most applications context is best viewed from the user’s involving social situation and the interaction between people. perspective. Neglecting these factors limits the usefulness of the con- These models, however, suffer from two main limitations. First, text model and hinders recognition. We present a novel ontological in order to support context recognition, the model should account context model that captures four dimensions, namely time, location, for subjectivity of context descriptions. For instance, the objective activity, and social relations. Moreover, our model defines three lev- location “hospital” plays different roles for different people: for pa- els of description (objective context, machine context, and subjective tients it is a “place for recovering”, while for nurses it is a “work context) that naturally support subjective annotations and reasoning. place”. This makes all the difference for personal assistants because An initial context recognition experiment on real-world data hints at the services that a user needs strongly depend on his or her subjec- the promise of our model. tive viewpoint. Most context models ignore this fact, with few ex- ceptions, cf. [8]. Second, arguably answers to four basic questions – “what time is it?”, “where are you?”, “what are you doing?”, and 1 INTRODUCTION “who are you with?” – are necessary to define human contexts. Cor- The term “context” refers to any kind of information necessary to de- relations between these aspects are also fundamental in recognition scribe the situation that an individual is in [2]. Automatic recognition and reasoning: if the user is in her room, a personal assistant should of personal context is the key in applications like personal assistants, be more likely to guess that she is “studying” or “resting”, rather than smart environments, and health monitoring apps, because it enables “swimming”. In stark contrast, most models are restricted to one or intelligent agents to respond proactively and appropriately based on few of the above four aspects and therefore fail to capture important (an estimate of) their user’s context. For instance, a personal assis- correlations, like those between activity and location or between time tant aware that its user is at home, alone, doing housework, could and social context. suggest him or her to order a take-away lunch. Since context infor- As a remedy, we introduce a novel ontological context model that mation is usually not available, the machine has to infer it from sen- supports both reasoning and recognition from a subjective perspec- sor data, like GPS coordinates, acceleration, and nearby Bluetooth tive, that captures time, location, activity, and social relations, and devices measured by the user’s smartphone. The standard approach and that enables downstream context recognition tools to leverage to context recognition is to train a machine learning model on a large correlations between these four fundamental dimensions. Our model set of sensor readings and corresponding context annotations to pre- also incorporates three levels of description for each aspect, namely dict the latter from the former. Existing implementations are quite objective, machine-level, and subjective, which naturally support dif- diverse, and range from shallow models like logistic regression [14] ferent kinds of annotations. We apply and test our approach by to deep neural networks like feed-forward networks [15], LSTMs [7], collaborating with sociology experts within the SmartUnitn-One and CNNs [12]. project [6]. We validate empirically our model by evaluating context A context model defines how context data are structured. A good recognition performance on the SmartUnitn-One context and sensor context model should capture all kinds of situational information rel- annotation data set [6], which was annotated consistently with our evant to the application at hand [2] and use the right level of ab- context model. Our initial results shows that handling correlations straction [1]. Ontology is a widely accepted tool for formalizing con- across aspects substantially improves recognition performance and makes it possible to predict activities that are otherwise very hard to 1 College of Computer Science and Technology, Jilin University, Changchun, recognize. China, email: shenqiang19@mails.jlu.edu.cn, xuhao@jlu.edu.cn 2 University of Trento, Italy, email: {stefano.teso, wanyi.zhang, fausto.giunchiglia }@unitn.it Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Eleventh International Workshop Modelling and Reasoning in Context (MRC) @ECAI 2020 33 2 CONTEXT MODELLING Context is a theory of the world that encodes an individual’ subjec- tive perspective about it [3]. Individuals have a limited and partial view of the world at all times in their everyday life. For instance, consider a classroom with a teacher and a few students. Despite all the commonalities, each person in the room has a different context because they focus on different elements of their personal experi- ence (the students focus on the teacher while the teacher focuses on the students) and ignore others (like the sound of the projector, the weather outside, and so on.) Given the diversity and complexity of individual experiences, formalizing the notion of context in its en- tirety is essentially impossible. For this reason, simpler but useful application-specific solutions are necessary. Previous work has observed that reasoning in terms of questions like “what time is it?”, “where are you?”, “what are you doing?”, “who are you with?”, “what are you with?” is fundamental for de- Figure 1. Illustration of our context model. scribing and collecting the behavior of individuals [3]. Motivated by this observation and our previous work [4, 5, 11] , we designed an ontology-based context model organized according to the aforemen- tioned dimensions of the world: time, location, activity, social rela- tions and object. Formally, context is defined as a tuple: Context = hTIME, WE, WA, WO, WIi where: TIME captures the exact time of context, e.g., “morning”. We refer to it as the temporal context. Informally, it answers the question “When did this context occur?”. WE captures the exact location of context, e.g., “classroom”. We re- fer to it as the endurant context. Informally, it answers the question “Where are you?”. WA captures the activity of context, e.g., “studying”. We refer to it as the perdurant context. Informally, it answers the question “What are you doing?”. WO captures the social relations of context, e.g., “friend”. We re- fer to it as the social context. Informally, it answers the question “Who are you with?”. WI captures the materiality of context, e.g., “smartphone”. We re- fer to it as the object context. Informally, it answers the question “What are you with?”. Figure 2. Questions and answers in the SmartUnitn-One questionnaire. Figure 1 shows a scenario as a knowledge graph representing the per- sonal context of an individual in the class. For instance, attributes of WO are “Class”, “Name”, and “Role”, and their values are “Person”, well. For instance, while a machine views location (e.g., a building) “Shen”, and “PhD student”, respectively. Edges represent relations as a set of coordinates, humans interpret it based on its function (e.g., between entities, e.g., “Shen” is in relation “Attend” with “Lesson”. whether the building is their home or office). The example in Figure 1 is presented in objective terms, that is, To model context precisely and completely, in addition to con- facts are stated as if they were independent of personal conscious sidering four dimensions, as discussed above, we also model three experiences. However, each person interprets the world and her sur- perspectives: objective context, subjective context and machine con- roundings from her personal privileged point of view, which accounts text. Table 1 shows the above example viewed through three types for her personal knowledge, mental characteristics, states, etc. For of perspective. The objective context captures the fact that at the instance, while in Figure 1 “Shen” has an objective role of Ph.D University of Trento, Italy, at 11:00 AM, a person is attending a student, for other people “Shen” plays the roles of a “friend” or a class together with Shen. When moving from objective to subjec- “classmate” subjectively. The subjective context which is related to tive, things change dramatically. From the perspective of the ma- personal consciousness, knowledge, etc. can provide more informa- chine, the temporal context “11:00 AM” is viewed as a timestamp tion for applications such as personal assistant in order to give more timestamp “1581938718026”, and in subjective terms it becomes intelligent services. “morning”; similarly, “University of Trento” becomes coordinates Notice that a person’s view of her context is radically different “46◦ 04’N,11◦ 09’E” for the machine and “classroom” from a sub- from what her handheld personal assistant observes. In fact, ma- jective perspective. For the perdurant context, the activity of taking chines interpret the world via sensors, while humans do not only in- lesson can be subjectively annotated as “study” by user, but it can terpret the world via their perceptions but with their knowledge as be described as “connecting WIFI of classroom, sensors such as gy- Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Eleventh International Workshop Modelling and Reasoning in Context (MRC) @ECAI 2020 34 Level TIME WE WA WO Objective Context 2020-02-17 11am Via Sommarive, 9, 38123 Povo TN Lesson Shen Machine Context 1581938718026 46◦ 04’01.9”N 11◦ 09’02.4”E Accelerometer: 0g,0g,0g “Shen” is in contact list Subjective Context Morning Classroom Studying Friend Table 1. An example of our three-partitioned context model. Each row gives a different description of the same underlying situation from the perspective of the world (top), the machine (middle), and the user (bottom). Sensor Frequency Unit an ontology with over 80 candidate labels, see Figure 2 for the full list. Object context (WI) information was not collected as it is Acceleration 20 Hz m/s2 Linear Acceleration 20 Hz m/s2 too hard to track without disrupting the volunteer’s routines. All Gyroscope 20 Hz rad/s records were processed as in [20]. This resulted in 23309 records, Gravity 20 Hz m/s2 each comprising 122 sensor readings (henceforth, features) and Rotation Vector 20 Hz Unitless self-reported annotations about location, activity, and social context. Magnetic Field 20 Hz µT Orientation 20 Hz Degrees Temperature 20 Hz ◦C Experimental Setup. For every aspect in {WA, WE, WO}, we Atmospheric Pressure 20 Hz hPa trained a random forest to predict that aspect from sensor mea- Humidity 20 Hz % surements. We randomly split the dataset into training (75% of the Proximity On change 0/1 Position Every minute Lat./Lon. records) and validation (25% of the records) subsets and then se- WIFI Network Connected On change Unitless lected the maximum depth of the forest using the validation set only. WIFI Networks Available Every minute Unitless The classifier performance was evaluated using a rigorous 5-fold Running Application Every 5 seconds Unitless Battery Level On change % cross validation procedure. The data set was randomly partitioned Audio from the internal mic 10 seconds per minute Unitless into 5 folds. We hold out the selected fold as the test set to train a Notifications received On change Unitless classifier on the remaining folds and compute the performance on Touch event On change 0/1 the held out (test) fold. Then, we compared this model to another Cellular network info Once per minute Unitless Screen Status, Flight Mode, On change 0/1 random forests (with the same maximum depth) that was supplied Battery Charge, Doze Mode, both sensor data and annotations for (a subset of) the other aspects Headset Plugged in, Audio as inputs. In order to account for label skew (e.g., some locations Mode, Music Playback and activities are much more frequent than others), performance was measured using the micro-average F1 score to account for class Table 2. List of sensors. Proximity triggers when the phone detects very imbalance. close objects, e.g., the user’s ear during a phone call. Results and Discussion. The average F1 score across users are re- ported in Figure 3. The plots show very clearly that knowledge of roscope, accelerometer are sensed as static”. For the social context, other aspects substantially improves recognition performance regard- “Shen” is described as friend subjectively by the user and the ma- less of the aspect being predicted: supplying the other aspects as in- chine senses “Shen” is in the contact list of the user. puts increases the F1 score of predicting WA and WE by more than 10% and for WO by more than 5%. A breakdown of performance 3 EMPIRICAL EVALUATION increase can be viewed in Table 3. The table shows that all aspects are correlated, as expected, especially activity and location, and that In order to evaluate the proposed context model, we carried out providing more aspects as inputs increases F1 almost additively. a context recognition experiment using the SmartUnitn-One data set [6], and studied whether recognition of subjective context is Inputs WA WE WO feasible and whether taking inter-aspect correlations into account Sensors + WA – +8.80% +2.36% helps recognition performance. Sensors + WE +8.27% – +3.09% Sensors + WO +3.34% +3.27% – Data Collection. The SmartUnitn-One data set consists of sensor Sensors + Other Aspects +11.25% +11.57% +5.31% readings and context annotations obtained from 72 volunteers (university students) for a period of two weeks. All participants Table 3. Improvement in F1 score when using other aspects as inputs to were required to install the i-Log app [19], which simultaneously the recognition model. Columns indicate the aspect being predicted. records sensor data from several sensors (cf. Table 2) and context annotations. During the first week, students were asked to report their own context every 30 minutes by administering them question- Figure 4 shows F1 scores (again, averaged across users) for each la- naires comprising three questions about location, activity, and social bel. For WO, some labels are clearly easier to predict than others. relations. The i-Log app collected sensor data at the same time. The performance improvement is usually in the 5–10% range, with During the second week, the participants were only required to have the notable exception of “other”, which improves by about 20%. the application running for the sensor data collection. All records It seems that location information always facilitates recognition of were timestamped automatically. The questions were designed WO, while activity does not. Their combination, however, is always according to our context model and possible answers were modelled beneficial. For WE, looking at either WO and WA helps recognition following the America Time Use Survey (ATUS) [13], leading to performance in all cases, and providing both WO and WA gives a Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Eleventh International Workshop Modelling and Reasoning in Context (MRC) @ECAI 2020 35 Figure 3. F1 of our context recognition model. From left to right: perdurant (WA), endurant (WE), and social context (WO), respectively. The leftmost column refers to a predictor that uses sensor data only, while the other columns to predictors that in addition have access to context annotations. Figure 4. F1 of individual labels (averaged over users). From left to right: perdurant, endurant, and social context, respectively. larger improvement than than providing them separately. The excep- learning models built using our context model produce higher quality tions are “library”, “study room”, and “shop”, for which knowing predictions than models based on less complete context models. As WA improves more than knowing both WO and WA. This is some- for future work, we plan to study the effects of subjectivity more what surprising, as we expect social context to be moderately indica- in detail, to migrate our architecture to more refined learning ap- tive of location, and deserves further investigation. Some locations proaches (e.g., deep neural nets), and to carry out an extensive com- (“canteen”, “on foot”, “auto”, “shop”, and “workplace”) receive a parison against the state-of-the-art in context recognition. major increase in recognition performance, from 25% to 40% ap- proximately. This is partly due to the rarity of these classes in the data set, which shows that inter-aspect correlations supply to the lack 5 ACKNOWLEDGEMENT of supervision. Finally for WA, some activities (like “housework”, The research of FG has received funding from the European Union’s “cultural activities”, and “hobbies”) are very hard to predict, as their Horizon 2020 FET Proactive project “WeNet – The Internet of us”, F1 score is below 30%, while others (“work”, “moving”, and “les- grant agreement No 823783. The research of ST and WZ have son”) are much easier to predict, with more than 80% F1 score. This received funding from the “DELPhi - DiscovEring Life Patterns” mostly shows that rare activities are harder to predict, understand- project funded by the MIUR Progetti di Ricerca di Rilevante Inter- ably, although other factors might play a role. Using the full con- esse Nazionale (PRIN) 2017 – DD n. 1062 del 31.05.2019. text (with WE and WO) always improves performance, except for “housework”. For all the other activities, the improvement is from 5% to 20%, and even larger for “Shopping”, “Sport” and “Travel- REFERENCES ing”, for which the improvement is up to 30%. [1] Claudio Bettini et al., ‘A survey of context modelling and reasoning This analysis provides ample support for our context model: corre- techniques’, Pervasive and Mobile Computing, (2010). lations between different aspects improve context recognition perfor- [2] Anind K Dey, ‘Understanding and using context’, Personal and ubiqui- mance for most users and, even more importantly, some values (like tous computing, (2001). “Canteen”) that are essentially impossible to recognize suddenly be- [3] Fausto Giunchiglia, ‘Contextual reasoning’, Epistemologia, special is- come much easier when full context information is provided. sue on I Linguaggi e le Macchine, (1993). [4] Fausto Giunchiglia, Enrico Bignotti, and Mattia Zeni, ‘Human-like context sensing for robot surveillance’, International Journal of Seman- tic Computing, 12(01), 129–148, (2017). 4 CONCLUSION [5] Fausto Giunchiglia, Enrico Bignotti, and Mattia Zeni, ‘Personal context We designed a novel context model that captures situational infor- modelling and annotation’, in 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Work- mation about time, location, activity, and social relations of individ- shops), (2017). uals using subjective—rather than objective—terms. An initial con- [6] Fausto Giunchiglia et al., ‘Mobile social media usage and academic text recognition experiments on real-world data showed that machine performance’, Computers in Human Behavior, (2018). Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Eleventh International Workshop Modelling and Reasoning in Context (MRC) @ECAI 2020 36 [7] Nils Y Hammerla et al., ‘Deep, convolutional, and recurrent mod- els for human activity recognition using wearables’, arXiv preprint arXiv:1604.08880, (2016). [8] Mieczyslaw M Kokar, Christopher J Matheus, and Kenneth Baclawski, ‘Ontology-based situation awareness’, Information fusion, 10(1), 83– 98, (2009). [9] Ilir Kola, Catholijn M Jonker, and M Birna van Riemsdijk, ‘Who’s that?-social situation awareness for behaviour support agents’, in Inter- national Workshop on Engineering Multi-Agent Systems, pp. 127–151. Springer, (2019). [10] Reto Krummenacher and Thomas Strang, ‘Ontology-based context modeling’, in Proceedings, (2007). [11] Nardine Osman, Carles Sierra, Ronald Chenu-Abente, Qiang Shen, and Fausto Giunchiglia, ‘Open social systems’, in 17th European Confer- ence on Multi-Agent Systems (EUMAS), Thessaloniki, Greece, (2020). [12] Aaqib Saeed et al., ‘Learning behavioral context recognition with multi-stream temporal convolutional networks’, arXiv preprint arXiv:1808.08766, (2018). [13] Kristina J Shelley, ‘Developing the american time use survey activity classification system’, Monthly Lab. Rev., (2005). [14] Yonatan Vaizman, Katherine Ellis, and Gert Lanckriet, ‘Recogniz- ing detailed human context in the wild from smartphones and smart- watches’, IEEE Pervasive Computing, (2017). [15] Yonatan Vaizman et al., ‘Context recognition in-the-wild: Unified model for multi-modal sensors and multi-label classification’, Proceed- ings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Tech- nologies, (2018). [16] Xiaohang Wang et al., ‘Ontology based context modeling and reasoning using owl.’, in Percom workshops, (2004). [17] Nan Xu et al., ‘CACOnt: A ontology-based model for context modeling and reasoning’, in Applied Mechanics and Materials, (2013). [18] Laura Zavala, Pradeep K Murukannaiah, Nithyananthan Poosamani, Tim Finin, Anupam Joshi, Injong Rhee, and Munindar P Singh, ‘Platys: From position to place-oriented mobile computing’, Ai Magazine, 36(2), 50–62, (2015). [19] Mattia Zeni et al., ‘Multi-device activity logging’, in Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pp. 299–302, (2014). [20] Mattia Zeni et al., ‘Fixing mislabeling by human annotators leveraging conflict resolution and prior knowledge’, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, (2019). Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).