Micro-environment Recognition in the context of Environmental Crowdsensing Mohammad Abboud Hafsa El Hafyani Jingwei Zuo DAVID Lab DAVID Lab DAVID Lab UVSQ - Université Paris-Saclay UVSQ - Université Paris-Saclay UVSQ - Université Paris-Saclay Versailles, France Versailles, France Versailles, France mohammad.abboud.2496@gmail. hafsa.el-hafyani@uvsq.fr jingwei.zuo@uvsq.fr com Karine Zeitouni Yehia Taher DAVID Lab DAVID Lab UVSQ - Université Paris-Saclay UVSQ - Université Paris-Saclay Versailles, France Versailles, France karine.zeitouni@uvsq.fr yehia.taher@uvsq.fr ABSTRACT new paradigm, which empowers volunteers to contribute data With the rapid advancements of sensor technologies and mobile (i.e., GTS) acquired by their personal sensor-enhanced mobile computing, Mobile Crowd-Sensing (MCS) has emerged as a new devices. Polluscope1 , a French project deployed in Île-de-France paradigm to collect massive-scale rich trajectory data. Nomadic (i.e., Paris region), is a typical use case study on MCS. It aims at sensors empower people and objects with the capability of re- getting insights constantly on individual exposure to pollution porting and sharing observations on their state, their behavior everywhere (indoor and outdoor), while enriching the traditional and/or their surrounding environments. Processing and mining monitoring system with the collected data by the crowd. The multi-source sensor data in MCS raise several challenges due recruited participants, on a voluntary basis, collect air quality to their multi-dimensional nature where the measured parame- measurements. Each participant is equipped with a sensor kit, and ters (i.e., dimensions) may differ in terms of quality, variabilty, a mobile device which allows the transmission of collected mea- and time scale. We consider the context of air quality MCS, and surements together with the GPS coordinates as a geo-referenced focus on the task of mining the context from the MCS data. Re- data stream containing (timestamp, longitude, and latitude). In lating the measures to their context is crucial to interpret them addition, the participants are asked to annotate their environ- and analyse the participant’s exposure. This paper investigates ment type through a custom mobile application. This will allow the feasibility of recognizing the human’s context (called herein participants to have personalized insights about their exposure micro-environment) in an environmental MCS scenario. We put to pollution everywhere, either in indoor and outdoor environ- forward a multi-view learning approach, that we adapt to our ments (e.g., Home, Work, Transportation, Streets, Park, etc.), and context, and implement it along with other time series classifica- at a higher resolution along their trajectories, thereby, allowing tion approaches. The experimental results, applied to real MCS to capture local variability and peaks of pollution, depending on data, not only confirm the power of MCS data in characterizing participants’ whereabouts, i.e., micro-environments. the micro-environment, but also show a moderate impact of the It is worth mentioning that air quality strongly depends on integration of mobility data in this recognition. Furthermore, the context2 , and so is the individual exposure to pollution. For multi-view learning shows similar performance as the reference this reason, there is a great interest of making exposure analysis deep learning algorithm, without requiring specific hardware. context-aware. However, the context annotation is by far the most difficult information to collect in a real-life application setting, KEYWORDS since a very few participants thoroughly annotate their micro- environment. Therefore, there is a great interest in unburdening Activity Recognition, Multivariate Time Series Classification, the participants by automatically detecting the context. Multi-view Learning, Mobile Crowd Sensing, Air Quality Moni- When exploring visually the data, we noticed that micro- toring environments preserve a certain pattern. Besides, we observe the existence of an inter-sensor correlation and with the con- 1 INTRODUCTION text. Figure 1 shows the evolution of three dimensions (i.e. Black Nowadays, the Internet of Things (IoT) basically relies on ad- Carbon (BC), NO2 and Particulate Matters (PM)) with micro- vanced sensor technologies to bridge the physical world and environments identification. As shown in Figure 1, BC and NO2 information systems. In particular, along with the widespread preserve the same shapes and statistical characteristics in the use of GPS, various mobile sensors bring rich information col- micro-environment “Car”. Plus, we note that PM values keep lected from both the surrounding environment and human activ- the same statistical characteristics in the micro-environment “In- ities, which are generally represented as Geo-referenced Time door”. Moreover, we can observe the existence of a correlation Series (GTS). Mobile Crowed Sensing (MCS) [12] emerges as a between the three dimensions during the whole timeline. The idea we promote in this paper is to utilize a wisely chosen © 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed- annotated dataset, in order to train a model on all the combination ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus) 1 http://polluscope.uvsq.fr on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0 2 In this paper, the terms "context" and "micro-environment" are used International (CC BY 4.0) interchangeably. of air quality, and mobility dimensions as predictors of the micro- One-Nearest Neighbor (1-NN) classifier with different distance environment. We hypothesize that the multi-variate time series measures, such as Euclidean Distance (ED) or Dynamic Time collected by the MCS campaigns not only depends on the micro- Wrapping (DTW) [2], is always considered as the benchmark to environment but could be a proxy of it. give a preliminary evaluation in the MTSC problem. Considering the real-life scenarios, where it is difficult or ex- pensive to obtain a large amount of labeled data for training, some studies used both labeled and unlabeled data to learn the human activity, that is Semi-Supervised Learning (SSL) [25] on MTSC. The pioneering work by [25] propose a semi-supervised technique for time series classification. The authors demonstrated that semi-supervised learning requires less human effort and gen- erally achieves higher accuracy than training on limited labels. The semi-supervised model [25] is based on the Self-Learning concept with the One-Nearest-Neighbor (1-NN) classifier. First, the labeled set, denoted by 𝑃 (as positively labeled) is applied to train the 1-NN classifier 𝐶. Then, the unlabeled samples 𝑈 are given the pseudo labels progressively based on their distance to the samples in 𝑃. Thereafter, the enriched labeled set 𝑃 allows it- eratively repeating the previous step and improving the classifier. Figure 1: Inter-sensor and micro-environment correla- More recently, the deep learning-based models on MTSC show tions. promising performance under weak supervision. For instance, Zhang et al. [29] propose a novel semi-supervised MTSC model The question that arises itself now is how to combine all these named Time series attentional prototype network (TapNet), to ex- different aspects of the data (geo-location, sensors) to identify plore the valuable information in the unlabeled samples. TapNet the user’s context automatically? And how much a model can dis- projects the raw MTS data into a low-dimensional representation criminate the observations in different micro-environments? To space. The unlabeled samples approach themselves to the class this end, we envision a holistic approach of activity recognition, prototype in the representation space, where the distance-based as depicted in [7]. probability and the labeled samples allow training the model pro- The micro-environment recognition is a crucial figure to ex- gressively. Moreover, the hybrid Convolutional Neural Network posure interpretation. Once the data are correctly annotated, our (CNN) and Long Short-Term Memory (LSTM) structure adopted ultimate goal is to get insight into all the dimensions (spatial, in TapNet allows modeling, respectively, the variable interactions temporal, individual, and contextual) of the exposure to pollution. and the temporal features of MTS. In this paper, we evaluate different approaches and provide a framework dedicated to the preparation, the application and the comparison of different machine learning algorithms. 2.2 Multi-View Learning The rest of this paper is organized as follows. We introduce Another line of studies propose multi-view learning to classify the related work in Section 2. The formal presentation of our time series data originated from multiple sensors to recognize micro-environment recognition model is discussed in Section 3. users activities. Garcia-Ceja et al. [11] propose a method based on Section 4 presents the experimental results and evaluation of the multi-view learning and stacked generalization for fusing audio micro-environment recognition model in the context of environ- and accelerometer sensor data for human activity recognition mental crowd sensing. Section 5 gives the extensive discussion using wearable devices. Each sensor’s data is seen as a different for the perspectives of this work. In section 6, we summarize our “view”, and they are combined using stacked generalization [26]. conclusions and provide directions for future work. The approach trains a specific classification model over each view and an extra meta-learner using the view models as input. The 2 RELATED WORK general idea of the authors is to combine data from heterogeneous Human activity recognition involves a wide range of applications types of sensors to complement each other and thus, increase from smart homes activities [1] to daily human activities [4, recognition accuracy. 17, 28], to human mobility [6, 30] to cite a few. It represents a Wang et al. [23] propose a framework based on deep learning typical scenario of machine learning, and some public datasets to learn features from different aspects of the data based on are widely used in the benchmarks. In this section, we introduce features of sequence and visualization. In order to imitate the a summary of two main topics of related work to our approach. human brain, which can classify data based on visualization, We focus mainly on multi-variate time series (MTS) classification the authors transform the time series into an Area Graph. They and multi-view learning. use well-trained LSTM-A neural networks and CNN-A neural networks to extract the features of time series data. LSTM-A is 2.1 Multi-Variate Time Series Classification used to extract sequence features, while CNN-A is used to extract Human activity recognition falls in the problem of labelling data visual features from the time series. Then, based on the fusion segments with the type of activity, which leads to a multi-variate of features, the authors carry out the time series classification time series classification (MTSC) problem based on data collected task. Although the approach gained promising results, it did not by multiple wearable sensors. There is a wide range of time series outperform deep learning methods such as InceptionTime [10]. classification approaches that can be classified into four cate- Li et al. [15] propose a Multi-view Discriminative Bilinear Pro- gories: distance-based methods [2], feature-based methods [19], jections (MDBP) for multi-view MTSC. The proposed approach ensemble methods [11] and deep learning models [3, 9, 24]. The is a multi-view dimensionality reduction method for time series classification which aims to extract discriminative features from Table 1: An example of the new generated dataset 𝐷 ′ multi-view MTS data. MDBP mainly projects multi-view data to a shared subspace through view-specific bilinear projections that First-Level Learners Associated Prediction Probabilities True Label 𝑙 1 𝑙 2 ... 𝑙𝑖 ... 𝑙𝑛 𝑝 1 𝑝 2 ... 𝑝𝑖 ... 𝑝𝑛 𝑦 preserve the temporal structure of MTS, and learns discriminative features by incorporating a novel supervised regularization. 3 MICRO-ENVIRONMENT RECOGNITION MODEL In step 3, the first-level learner takes as input the time series In this section, we provide an overview of our proposed frame- data coming from each view. Then, each view will generate its work for micro-environment recognition in the context of MCS. own predicted labels with associated prediction probabilities Our proposed approach contains six steps as shown in Figure 2. with the form [𝑙𝑖 , 𝑝 1, 𝑝 2, ..., 𝑝 𝑗 , ..., 𝑝𝑘 , 𝑦], where 𝑙𝑖 is the predicted label of the first-level learner 𝑖, 𝑝 𝑗 is the associated prediction probability for each class 𝑗 of the 𝑘 possible classes, and 𝑦 is the 3.1 Data Collection true label. The first step of our micro-environment recognition process is One of the advantages of multi-view learning is its versatility the data collection. During three campaigns, around one hundred on first and second level learners’ choices. One can flexibly substi- participants have been recruited to collect ambient air measure- tute classifier choices as wished. We opt for k-nearest-neighbor ments along with geo-location for one week 24 hours a day, (kNN) classifier coupled with the Dynamic Time Warping (DTW) while performing their daily activities. Each participant carries distance as first-level learner to be trained on each view of the a multi sensor box and a tablet empowered with GPS chipset. data. kNN is one of the most popular and traditional TSC ap- The sensors collect time annotated measurements of Particulate proaches. kNN with DTW metric was considered for a long time Matter (PM1.0, PM10, PM2.5), nitrogen dioxide (𝑁𝑂 2 ), Black Car- the state-of-the-art in the time series classification problem [9]. bon (BC), Temperature and Relative Humidity, and the tablet Furthermore, most classification approaches require parameters records participants’ geo-locations and allows them to annotate settings, whereas kNN with DTW is parameter free. kNN classi- their context by using a self-reporting mobile app. They report fier has shown to be a strong baseline [9], and there is no single every transition to a micro-environment (e.g., Home, Office, Park, distance measure that significantly outperforms DTW [9]. Hence, Restaurant, etc.), as well as events, which are temporary activities recent research has focused on developing ensemble methods for a brief period (e.g., Start cooking, Open a window, Close a that significantly outperforms the NN coupled with DTW [9]. window, Smoking, Turn on a chimney, etc.). In step 4, we aimed at giving a weight for each learner, thus a new dataset 𝐷 ′ is generated by joining the first-level learner 3.2 Data Preparation predictions and the probability of each prediction, Table 1 shows The second step consists of pre-processing the data which is two the feature vector in this dataset, where 𝑙𝑖 is the predicted label folds. On the one hand, most sensor data are noisy, and require of the first-level learner 𝑖, 𝑝𝑖 is the probability of this prediction, a prepossessing phase to clean them from irrelevant measure- and 𝑦 is the true label. ments. We have observed this especially in the GPS (due to signal In step 5, after generating a new dataset 𝐷 ′ , a second-level loss), and in air quality data even though the sensor data quality classifier, or meta-learner, is trained over 𝐷 ′ through ensemble is a permanent preoccupation of the project, by careful evalu- learning [31]. This approach allows to preserve the statistical ation before their selection, and periodic qualification during properties of each view and learn the classes of the MTS instances the campaign [14]. The sensors for climatic parameters do not with a significant improvement in the classification accuracy. show such defects. Therefore, a de-noising process is applied to Many ensemble methods [31] have been proposed to further clean the data. On the other hand, the highest quality sample of enhance the algorithm’s accuracy by combining learners rather annotated data is selected as a baseline to validate the process than trying to find the best single learner. Due to their versatility of micro-environment recognition. The idea is to generalize the and flexibility, ensemble methods attract many researchers and micro-environment recognition to all participants’ data, by using can be applied in different domains, for example, but not limited the model derived from a good-quality dataset. to, time series classification [11] and time series segmentation [8]. In a previous work [8], we used a multi-view approach for 3.3 Multi-View Learning Model segmenting MCS data, where we employed an unsupervised learning for change detection on each view. We were interested in the stack generalization approach pro- In this work, we conduct our experiments using Random Forest posed in [11], but we have adapted it to best fit for solving our classifier since it has shown high performance when it is applied problem. We propose to learn the micro-environment of partici- in the human activity recognition domain [11]. pants from multi-variate time series through a two-stage model based on multi-view learning. Our multi-view classification ap- proach consists of training a first-level learner on each view (i.e. step 3 in Figure 2), and then train a second-level learner or meta-learner (i.e. step 5 in Figure 2) to combine the output of 4 EXPERIMENTS AND RESULTS each view and enhance the global accuracy of the classification. The experiments are carried out on different environments. The We assume that 𝑌𝑖𝑡 is a dimension of the n-dimenstional time multi-view learning model was implemented in Python 3.6 using series 𝑌𝑡 = (𝑌1𝑡 , 𝑌2𝑡 , ..., 𝑌𝑖𝑡 , ..., 𝑌𝑛𝑡 ). In our model, each view 𝑉𝑖 , scikit-learn 0.23.2 and tslearn [21]. The deep-learning models where 𝑉 = (𝑉1, 𝑉2, ..., 𝑉𝑖 , ..., 𝑉 𝑛) is the set of views, represents a (MLSTM-FCN [13], TapNet [29]) were trained on a single Tesla dimension 𝑌𝑖𝑡 of the multi-variate time series 𝑌𝑡 . Thus, we have V100 GPU of 32 Go memory with CUDA 10.2, using respectively as many views as dimensions. Keras 2.2.4 and PyTorch 1.2.0. Figure 2: Overview of the Micro-Environment Recognition Process. 4.1 Experimental Settings 4.2 Classification Results We evaluated the used models in these experiments using real This section details the experimental results. The micro-environment life data collected within the scope of Polluscope project. For recognition is formulated as a MTSC problem. We used as a base- the experiments we have used the participants’ ambient air data line a basic kNN classifier with DTW as distance metric. To containing (PM10, PM1.0, PM2.5, NO2, BC, Temperature, and compare our multi-view learning approach (2NN-DTW for the Relative Humidity), in addition to the speed dimension derived first-level learners and Random Forest as a meta-learner) with from the geo-locational data. Moreover, we have 8 classes (i.e., state-of-the-art techniques, we implemented the MLSTM-FCN micro-environment to recognize), which can be divided into two [13] and run it on a GPU, since it requires more computational categories indoor such as (“Home”, “Office”, “Restaurant”, and resources than the multi-view approach. Both kNN-DTW and "Store"), and outdoor such as (“Street”, “Bus”, “Car”, and “Train”). MLSTM-FCN were applied on an aggregated feature vector con- We have selected the data of six participants who have thor- taining all the dimensions together. oughly annotated their activities within the campaign. Data were As shown in Table 2, kNN-DTW classifier is not able to dis- split into two third for training and one third for testing, with criminate correctly between the micro-environments, while the care taken to keep the data of each participant grouped either accuracy improves when applying the multi-view approach that in the training or in the testing sets. We used the cross valida- treats each view independently, thus preserving the statistical tion score with "Repeated Stratified K-fold" in order to split the features of each view. For the third experiment, a Long Short training set into training and validation, while we test the overall Term Memory network is generated in order to learn a map- accuracy using the test dataset. ping between the input vector and the classes. MLSTM-FCN has To account for the temporal feature of the data, we segment shown promising results in the experiments. Table 2 shows the them into samples of 10 minutes’ length at maximum. Usually, accuracy of experiments carried out with different conditions. people spend most of their time indoors, thus we should take into consideration other outdoor activities that have a short period of Table 2: Performance of different classifiers on Multi- time compared to indoor activities. For example, the average time source Polluscope data spent in "Bus" is around 8 minutes, for "Car" the average time is 20 minutes, etc. Globally, the distribution of data samples is Model Condition Accuracy highly imbalanced over the different classes, as shown in Figure 3a, which reflects the imbalance of time spent in different micro- Speed 0.450 environments. Imbalanced classes usually cause low performance No speed 0.440 kNN-DTW classification for the minority classes. To cope with this problem, Speed & Re-smp. 0.587 we apply a resampling strategy. Figure 3 shows the distribution No speed & Re-smp. 0.597 of the data in both the original and the re-sampled dataset. We Speed 0.716 have used the random over/under-sampler in order to balance No speed 0.710 Multi-view Based our dataset. Speed & Re-smp. 0.729 To consider the valuable variable of the mobility information, No speed & Re-smp. 0.640 we carry out our experiments on the datasets while considering Speed 0.808 or not the speed variable. We also compare the classifiers’ perfor- No speed 0.784 MLSTM-FCN mance on both resampled and original (i.e., un-resampled) data. Speed & Re-smp. 0.703 Finally, we introduce and evaluate a two-steps approach, by first No speed & Re-smp. 0.691 discriminating indoor from outdoor environments, followed by Speed & Re-smp. 0.83 Grouping Step a refinement step to learn a more specific class. No speed & Re-smp. 0.82 Table 3: Performance of Multi-view Learner (Raw data with/out speed) With Speed Without Speed class F1 F1 Precision Recall Precision Recall Score Score Street 0.65 0.52 0.57 0.69 0.40 0.50 Bus 0.18 0.12 0.14 0.60 0.33 0.42 Office 0.86 0.93 0.89 0.58 0.43 0.49 Restaurant 1.0 0.22 0.36 1.0 0.10 0.18 Home 0.67 0.65 0.66 0.71 0.87 0.78 Car 0.61 0.81 0.69 0.55 0.75 0.63 Store 0.25 0.05 0.08 0.00 0.00 0.00 (a) Original Dataset Train 0.00 0.00 0.00 0.40 0.08 0.13 Table 4: Performance of Multi-view Learner (Re-sampled data with/out speed) With Speed Without Speed class F1 F1 Precision Recall Precision Recall Score Score Street 0.72 0.49 0.59 0.59 0.36 0.45 Bus 0.00 0.00 0.00 0.25 0.04 0.06 Office 0.94 0.94 0.94 0.71 0.76 0.73 Restaurant 0.92 0.80 0.86 0.43 0.20 0.27 Home 0.72 0.78 0.75 0.54 0.70 0.61 Car 0.64 0.80 0.71 0.56 0.63 0.59 Store 0.09 0.05 0.06 0.12 0.05 0.07 (b) Re-sampled Dataset Train 0.54 0.47 0.50 0.26 0.33 0.29 Figure 3: Class Distribution Table 5: Performance of MLSTM-FCN (Raw data with/out speed) Next, we focus on the comparison between the proposed ap- With Speed Without Speed proach and MLSTM-FCN. We also study the impact of using or class F1 F1 not the mobility data, as well as the learning from the original Precision Recall Precision Recall Score Score or the re-sampled data. We report the performances in terms of Street 0.55 0.60 0.57 0.53 0.54 0.53 recall, and F1 score. These results are grouped in Table 2 to Table Bus 0.88 0.70 0.78 0.76 0.65 0.70 7 and Figure 4 to Figure 6. Table 2 represents the overall accuracy Office 0.96 0.88 0.92 0.91 0.85 0.88 of the different classifiers used in these experiments while or not Restaurant 0.78 0.88 0.82 0.78 0.88 0.82 using the Speed data, and with or without re-sampling. Table Home 0.83 0.87 0.85 0.82 0.90 0.86 Car 0.81 0.83 0.82 0.78 0.83 0.80 3 and 4 reports the precision, recall, and the F1-Score metrics Store 0.50 0.60 0.55 0.62 0.32 0.42 of the Multi-view learner for raw data and re-sampled data re- Train 1.0 0.30 0.46 0.38 0.30 0.33 spectively with and without Speed. Table 5 and 6 reports the precision, recall, and F1-Score metrics of the MLSTM-FCN for raw data and re-sampled data respectively with and without Table 6: Performance of MLSTM-FCN (Re-sampled data Speed. Figure 4 shows the accuracy among different views used with/out speed) within the experiment of the Multi-view approach. Moreover, fig- ure 5a and 5b shows the confusion matrix when applying the With Speed Without Speed class Multi-view approach on the re-sampled data with/out the speed F1 F1 Precision Recall Precision Recall Score Score respectively. Figure 6 shows the procedure used for the Grouping Street 0.43 0.54 0.48 0.50 0.42 0.46 step approach which is also based on the multi-view approach, Bus 0.41 0.65 0.50 0.52 0.65 0.58 and table 7 reports the precision, recall, and F1-Score metrics for Office 0.90 0.88 0.89 0.87 0.87 0.87 this approach. Restaurant 0.75 0.75 0.75 0.50 0.62 0.56 The multi-view learner proposed in these experiments em- Home 0.80 0.80 0.80 0.81 0.76 0.78 ploys the stacked generalization approach, which combines the Car 0.80 0.60 0.69 0.65 0.68 0.67 predictions of each independent view in order to get the final Store 0.38 0.48 0.42 0.50 0.24 0.32 classification result. As shown in figure 4 although the first level Train 0.22 0.40 0.29 0.12 0.30 0.17 learners may have a low accuracy but the combination of their predictions, by generating a new dataset D’ and feeding it to train, the meta-learner can improve the accuracy a lot. 5a and 5b and the recall and F1 score metrics in table 4 that the We observe an improvement of the accuracy of the overall clas- model can easily discriminate between the “indoor” and “outdoor” sification when adding the speed dimension to the ambient air activities, but it cannot perfectly distinguish between the micro- dimensions. We also notice from the confusion matrix in figures environments inside each category. For example, even though Figure 4: Accuracy among different views (Re-sampled data) Table 7: Performance of Grouping Step (Re-sampled data 5.1 Multi-view Learner with/out speed) The multi-view learner adopted in this paper is composed by the base learner (i.e., kNN-DTW) and the meta-learner (i.e., Random With Speed Without Speed class Forest), which has greatly improved the performance compared to F1 F1 Precision Recall Precision Recall the single kNN-DTW classifier. The objective of this paper is not Score Score to propose the best classifier for MTS classification, but to provide Street 0.73 0.59 0.65 0.59 0.35 0.44 an insight that the multi-view learner is capable of coordinating Bus 0.60 0.12 0.20 0.00 0.00 0.00 Office 0.92 0.93 0.92 0.80 0.87 0.83 effectively the information from different variables and achieving Restaurant 0.91 0.67 0.77 0.20 0.07 0.10 more reliable performance than a single base learner. Moreover, Home 0.86 0.94 0.90 0.75 0.86 0.80 the results of the grouping approach which is based on the multi- Car 0.71 0.87 0.78 0.66 0.94 0.77 view approach confirms that there is a clear signature for each Store 0.46 0.30 0.36 0.40 0.10 0.16 micro-environment, thus we can have an effective prediction Train 0.29 0.33 0.31 0.33 0.20 0.25 with this approach. Nevertheless, the kNN-DTW is considered as the baseline for MTS classification and is widely outpaced by the advanced ap- proaches such as Shapelets [27, 32, 33] or the frequent patterns [18]. Essentially, the kNN-DTW captures the global feature based most of the samples in the “Train” micro-environment is falsely on the distance measure between the entire sequences, while predicted as “Car”, both “Car” and “Train” micro-environments the local features (e.g., the frequent patterns [18], the interval can be classified as outdoor. Based on this observation, we intro- features [5], Shapelets [27], etc.) are more appropriate when a duced a grouping step before recognizing the micro-environment. specific pattern characterizes a class. More specifically, a combi- In this step we classify the sample into either an “Indoor” or “Out- nation of features extracted from different domains may improve door” environment. Based on the classification result, a model will dramatically the performance of the base learner [16]. Therefore, be specialized for each indoor or outdoor micro-environments. one of the perspectives consists in the optimization of the base Figure 6 shows the added step and the procedure for the classifi- learner and the exploration of the explainability of the multi- cation. view learner on both the feature interpretation and the variable The accuracy of the classifier in the grouping phase (“Indoor” importance for building the classifier. The visual representation or “Outdoor”) showed a good result when using the resampled of Shapelets make them good candidates for such improvement. data. It reaches 0.82 for data without speed dimension, while for data with the speed dimension it reaches around 0.83. Table7 5.2 Label Shortage Issue shows the recall and F1 score for both models trained on resam- pled data with and without speed. These experiments did not The label shortage is a practical issue when building the learning consider the case with original data due to its low performance, model. In particular, in the context of Polluscope, post-labelling in particular, for the minority classes. for time series sensor data is much more costly than classic data (e.g., image, text, etc.) due to the low interpretability over the real- valued sequence. Therefore, the data need to be annotated during 5 DISCUSSIONS & PERSPECTIVES the data collection process. However, certain practical factors In this section, we discuss the perspectives for improving our limit the availability of labels. For instance, the participants are multi-view learning model and the possibility for tackling the not always conscious in annotating their micro-environment. practical label issue in the context of Polluscope. Therefore, for certain time periods, no annotations were marked. we conduct a preliminary test on the Polluscope data with the newly proposed semi-supervised MTSC model TapNet [29]. TapNet [29] is a deep learning based approach designed for multivariate time series classification. By adopting the proto- typical network [20], TapNet allows learning a low-dimensional embeddings for the input MTS where the unlabelled samples help adjusting the class prototype (i.e., class centroid), which leads to a better classifier than using only the labelled samples. Table 8 shows the semi-supervised learning results on Pollus- cope data considering or not the speed variable. We evaluate the performance of TapNet under different supervision ratios in the training set. The results show that the unlabeled samples and the speed variable do improve the performance of the classifier. Besides, the accuracy didn’t drop a lot when eliminating the an- notations in training set (from ratio=1 for fully labelled to 0.5, and even for 0.2 when only 20% data in labelled), indicating that the collected data within each class is not sparsely distributed, thus learning under weak supervision is reliable with the aid of the unlabeled samples. (a) With Speed Table 8: The accuracy results of TapNet on Polluscope data under different supervision ratios Condition Sup_ratio=1 Sup_ratio=0.5 Sup_ratio=0.2 Speed 0.746 0.725 0.717 No speed 0.713 0.703 0.695 Giving the promising results on the data distribution consis- tency, another avenue worth exploring is to consider and in- tegrate a semi-supervised model into our multi-view learner. Various semi-supervised frameworks are applicable to our model, such as applying self-learning [25] to produce the pseudo labels on the multi-view learner, or adopting the label propagation and manifold regularization techniques [22] on the base learner. 6 CONCLUSION Activity recognition has gained the interest of many researchers nowadays, due to the widespread use of mobility sensors. Micro- environment recognition is essential in MCS projects such as Polluscope, in order to be able to analyse the individual’s expo- sure to air pollution and to relate it to her context. The major (b) Without Speed finding of our study is to show to some extent that the ambient air can characterize the micro-environment. Moreover, the accuracy Figure 5: Confusion Matrix (Re-sampled Data) of the model is high enough to consider an automatic detection of the micro-environment without burdening the participants with self-reporting. By using the mobility feature, the accuracy improves slightly though the gain is moderate. Therefore, we can keep characterizing the micro-environment even in the absence of the speed dimension. We employed different approaches and learners, and con- ducted a thorough experimental study, which shows the effi- ciency of MLSTM-FCN and the multi-view approach for time series classification. We have also compared the results with the kNN-DTW classifier which was considered as the baseline. Figure 6: Grouping Process We have also identified several perspectives of this work, and explored the application of semi-supervised learning to cope with the lack of labels for some classes. In future work, we can In order to give an insight for the consistency between the use various algorithms for the first level learner and the meta- labeled and unlabeled data, and to see if the unlabeled data are learner, as multi-view learning is flexible. Finally, we intend to valuable for improving the classifier’s performance in our context, improve the performance of the learned classes by integrating some a priori rules, like the unlikelihood of being in some micro- 116 (2019), 237–245. environment at some time of day, or of transitions between some [14] Baptiste Languille, Valérie Gros, Nicolas Bonnaire, Clément Pommier, Cécile Honoré, Christophe Debert, Laurent Gauvin, Salim Srairi, Isabella Annesi- micro-environments. Maesano, Basile Chaix, et al. 2020. A methodology for the characterization of portable sensors for air quality measure with the goal of deployment in citizen science. Science of the Total Environment 708 (2020), 134698. ACKNOWLEDGMENTS [15] Sheng Li, Y. Li, and Yun Fu. 2016. Multi-View Time Series Classification: A Discriminative Bilinear Projection Approach. Proceedings of the 25th ACM This work has supported by the French National Research Agency International on Conference on Information and Knowledge Management (2016), (ANR) project Polluscope, funded under the grant agreement 989–998. ANR-15-CE22-0018, by the H2020 EU GO GREEN ROUTES funded [16] Jason Lines, Sarah Taylor, and Anthony Bagnall. 2016. HIVE-COTE: The Hierarchical Vote Collective of Transformation-based Ensembles for Time under the research and innovation programme H2020- EU.3.5.2 Series Classification. In 2016 IEEE 16th international conference on data mining grant agreement No 869764, and by the DATAIA convergence in- (ICDM). 1041–1046. stitute project StreamOps, as part of the Programme d’ Investisse- [17] Li Liu, Yuxin Peng, Shu Wang, Ming Liu, and Zigang Huang. 2016. Com- plex activity recognition using time series pattern dictionary learned from ment d’Avenir, ANR-17-CONV-0003. Part of the equipment was ubiquitous sensors. Information Sciences 340-341 (May 2016), 41–57. https: funded by iDEX Paris-Saclay, in the framework of the IRS project //doi.org/10.1016/j.ins.2016.01.020 [18] Guruprasad Nayak, Varun Mithal, Xiaowei Jia, and Vipin Kumar. 2018. Clas- ACE-ICSEN, and by the Communauté d’agglomération Versailles sifying multivariate time series by learning sequence-level discriminative Grand Parc – VGP - (www.versaillesgrandparc.fr). We are thank- patterns. In Proceedings of the 2018 SIAM International Conference on Data ful to VGP (Thomas Bonhoure) for facilitating the campaign. We Mining. SIAM, 252–260. [19] Juha Pärkkä, Miikka Ermes, Panu Korpipää, Jani Mäntyjärvi, Johannes Peltola, would like to thank all the members of the Polluscope consortia and Ilkka Korhonen. 2006. Activity classification using realistic data from who contributed in one way or another to this work: Salim Srairi wearable sensors. IEEE transactions on information technology in biomedicine: and Jean-Marc Naude (CEREMA) who conducted the campaign; a publication of the IEEE Engineering in Medicine and Biology Society 10, 1 (Jan. 2006), 119–128. https://doi.org/10.1109/titb.2005.856863 Boris Dessimond and Isabella Annesi-Maesano (Sorbonne Uni- [20] Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical Networks versity) for their contribution to the campaign; Valerie Gros and for Few-shot Learning. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc., 4077–4087. Nicolas Bonnaire (LSCE), and Anne Kauffman and Christophe [21] Romain Tavenard, Johann Faouzi, Gilles Vandewiele, Felix Divo, Guillaume Debert (Airparif) for their contribution in the periodic qualifica- Androz, Chester Holtz, Marie Payne, Roman Yurchak, Marc Rußwurm, Kushal tion of the sensors and their active involvement in the project. Kolar, and Eli Woods. 2020. Tslearn, A Machine Learning Toolkit for Time Series Data. Journal of Machine Learning Research 21, 118 (2020), 1–6. http: Finally, we would like to thank the participants for their great //jmlr.org/papers/v21/20-091.html effort in carrying the sensors, without whom this work would [22] Jesper E. van Engelen and H. Hoos. 2019. A survey on semi-supervised not be possible. learning. Machine Learning 109 (2019), 373–440. [23] Baoquan Wang, Tonghai Jiang, Xi Zhou, Bo Ma, Fan Zhao, and Yi Wang. 2020. Time-Series Classification Based on Fusion Features of Sequence and REFERENCES Visualization. Applied Sciences 10, 12 (Jan. 2020), 4124. https://doi.org/10. 3390/app10124124 [1] Samaneh Aminikhanghahi and Diane J. Cook. 2019. Enhancing activity recog- [24] Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019. nition using CPD-based activity segmentation. Pervasive and Mobile Comput- Deep Learning for Sensor-based Activity Recognition: A Survey. Pattern ing 53 (2019), 75 – 89. Recognition Letters 119 (March 2019), 3–11. https://doi.org/10.1016/j.patrec. [2] Donald J Berndt and James Clifford. 1994. Using Dynamic Time Warping to 2018.02.010 arXiv: 1707.03502. Find Patterns in Time Series. In KDD workshop, Vol. 10. Seattle, WA, USA:, [25] Li Wei and Eamonn Keogh. 2006. Semi-supervised time series classification. 359–370. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge [3] Kaixuan Chen, Dalin Zhang, Lina Yao, Bin Guo, Zhiwen Yu, and Yunhao discovery and data mining (KDD ’06). Association for Computing Machinery, Liu. 2020. Deep Learning for Sensor-based Human Activity Recognition: New York, NY, USA, 748–753. https://doi.org/10.1145/1150402.1150498 Overview, Challenges and Opportunities. arXiv:2001.07416 [cs] (Jan. 2020). [26] David H. Wolpert. 1992. Stacked generalization. Neural Networks 5, 2 (1992), http://arxiv.org/abs/2001.07416 arXiv: 2001.07416. 241 – 259. https://doi.org/10.1016/S0893-6080(05)80023-1 [4] Heeryon Cho and Sang Min Yoon. 2018. Divide and conquer-based 1D CNN [27] Lexiang Ye and Eamonn Keogh. 2009. Time series shapelets: A New Primitive human activity recognition using test data sharpening. Sensors 18, 4 (2018), for Data Mining. Proceedings of the 15th ACM SIGKDD international conference 1055. on Knowledge discovery and data mining - KDD ’09 (2009), 947–956. [5] Houtao Deng, George Runger, Eugene Tuv, and Martyanov Vladimir. 2013. A [28] Mi Zhang and Alexander A. Sawchuk. 2012. Motion primitive-based human time series forest for classification and feature extraction. Information Sciences activity recognition using a bag-of-features approach. In Proceedings of the 2nd 239 (2013), 142–153. ACM SIGHIT International Health Informatics Symposium (IHI ’12). Association [6] T. M. T. Do and D. Gatica-Perez. 2014. The Places of Our Lives: Visiting for Computing Machinery, New York, NY, USA, 631–640. https://doi.org/10. Patterns and Automatic Labeling from Longitudinal Smartphone Data. IEEE 1145/2110363.2110433 Transactions on Mobile Computing 13, 3 (March 2014), 638–648. https://doi. [29] Xuchao Zhang, Yifeng Gao, Jessica Lin, and Chang-Tien Lu. 2020. TapNet: org/10.1109/TMC.2013.19 Multivariate Time Series Classification with Attentional Prototypical Network. [7] Hafsa El Hafyani. 2020. In 2020 21st IEEE International Conference on Mobile In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 6845– Data Management (MDM). IEEE, 246–247. 6852. [8] Hafsa El Hafyani, Karine Zeitouni, Yehia Taher, and Mohammad Abboud. [30] Yu Zheng, Quannan Li, Yukun Chen, Xing Xie, and Wei-Ying Ma. 2008. Under- 2020. Leveraging Change Point Detection for Activity Transition Mining in standing mobility based on GPS data. In Proceedings of the 10th international the Context of Environmental Crowdsensing. The 9th SIGKDD International conference on Ubiquitous computing. Association for Computing Machinery, Workshop on Urban Computing (2020). New York, NY, USA, 312–321. https://doi.org/10.1145/1409635.1409677 [9] Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane [31] Zhi-Hua Zhou. 2012. Ensemble Methods: Foundations and Algorithms. CRC Idoumghar, and Pierre-Alain Muller. 2019. Deep learning for time series press. classification: a review. Data Mining and Knowledge Discovery 33, 4 (2019), [32] Jingwei Zuo, Karine Zeitouni, and Yehia Taher. 2019. Exploring Interpretable 917–963. Features for Large Time Series with SE4TeC. In Proc. EDBT. 606–609. [10] Hassan Ismail Fawaz, B. Lucas, G. Forestier, Charlotte Pelletier, D. Schmidt, [33] Jingwei Zuo, Karine Zeitouni, and Yehia Taher. 2019. Incremental and Adap- Jonathan Weber, Geoffrey I. Webb, L. Idoumghar, Pierre-Alain Muller, and tive Feature Exploration over Time Series Stream. In 2019 IEEE International Franccois Petitjean. 2020. InceptionTime: Finding AlexNet for Time Series Conference on Big Data (Big Data). 593–602. Classification. ArXiv abs/1909.04939 (2020). [11] Enrique Garcia-Ceja, Carlos E. Galván-Tejada, and Ramon Brena. 2018. Multi- view stacking for activity recognition with sound and accelerometer data. Information Fusion 40 (March 2018), 45–56. https://doi.org/10.1016/j.inffus. 2017.06.004 [12] Bin Guo, Zhu Wang, Zhiwen Yu, Yu Wang, Neil Y Yen, Runhe Huang, and Xingshe Zhou. 2015. Mobile crowd sensing and computing: The review of an emerging human-powered sensing paradigm. ACM computing surveys (CSUR) 48, 1 (2015), 1–31. [13] Fazle Karim, Somshubra Majumdar, Houshang Darabi, and Samuel Harford. 2019. Multivariate LSTM-FCNs for time series classification. Neural Networks