Micro-environment Recognition in the context of
                           Environmental Crowdsensing
            Mohammad Abboud                                              Hafsa El Hafyani                                   Jingwei Zuo
              DAVID Lab                                                   DAVID Lab                                           DAVID Lab
      UVSQ - Université Paris-Saclay                             UVSQ - Université Paris-Saclay                      UVSQ - Université Paris-Saclay
           Versailles, France                                          Versailles, France                                  Versailles, France
      mohammad.abboud.2496@gmail.                                  hafsa.el-hafyani@uvsq.fr                              jingwei.zuo@uvsq.fr
                  com

                                             Karine Zeitouni                                         Yehia Taher
                                            DAVID Lab                                           DAVID Lab
                                    UVSQ - Université Paris-Saclay                      UVSQ - Université Paris-Saclay
                                         Versailles, France                                  Versailles, France
                                      karine.zeitouni@uvsq.fr                               yehia.taher@uvsq.fr

ABSTRACT                                                                               new paradigm, which empowers volunteers to contribute data
With the rapid advancements of sensor technologies and mobile                          (i.e., GTS) acquired by their personal sensor-enhanced mobile
computing, Mobile Crowd-Sensing (MCS) has emerged as a new                             devices. Polluscope1 , a French project deployed in Île-de-France
paradigm to collect massive-scale rich trajectory data. Nomadic                        (i.e., Paris region), is a typical use case study on MCS. It aims at
sensors empower people and objects with the capability of re-                          getting insights constantly on individual exposure to pollution
porting and sharing observations on their state, their behavior                        everywhere (indoor and outdoor), while enriching the traditional
and/or their surrounding environments. Processing and mining                           monitoring system with the collected data by the crowd. The
multi-source sensor data in MCS raise several challenges due                           recruited participants, on a voluntary basis, collect air quality
to their multi-dimensional nature where the measured parame-                           measurements. Each participant is equipped with a sensor kit, and
ters (i.e., dimensions) may differ in terms of quality, variabilty,                    a mobile device which allows the transmission of collected mea-
and time scale. We consider the context of air quality MCS, and                        surements together with the GPS coordinates as a geo-referenced
focus on the task of mining the context from the MCS data. Re-                         data stream containing (timestamp, longitude, and latitude). In
lating the measures to their context is crucial to interpret them                      addition, the participants are asked to annotate their environ-
and analyse the participant’s exposure. This paper investigates                        ment type through a custom mobile application. This will allow
the feasibility of recognizing the human’s context (called herein                      participants to have personalized insights about their exposure
micro-environment) in an environmental MCS scenario. We put                            to pollution everywhere, either in indoor and outdoor environ-
forward a multi-view learning approach, that we adapt to our                           ments (e.g., Home, Work, Transportation, Streets, Park, etc.), and
context, and implement it along with other time series classifica-                     at a higher resolution along their trajectories, thereby, allowing
tion approaches. The experimental results, applied to real MCS                         to capture local variability and peaks of pollution, depending on
data, not only confirm the power of MCS data in characterizing                         participants’ whereabouts, i.e., micro-environments.
the micro-environment, but also show a moderate impact of the                              It is worth mentioning that air quality strongly depends on
integration of mobility data in this recognition. Furthermore,                         the context2 , and so is the individual exposure to pollution. For
multi-view learning shows similar performance as the reference                         this reason, there is a great interest of making exposure analysis
deep learning algorithm, without requiring specific hardware.                          context-aware. However, the context annotation is by far the most
                                                                                       difficult information to collect in a real-life application setting,
KEYWORDS                                                                               since a very few participants thoroughly annotate their micro-
                                                                                       environment. Therefore, there is a great interest in unburdening
Activity Recognition, Multivariate Time Series Classification,
                                                                                       the participants by automatically detecting the context.
Multi-view Learning, Mobile Crowd Sensing, Air Quality Moni-
                                                                                           When exploring visually the data, we noticed that micro-
toring
                                                                                       environments preserve a certain pattern. Besides, we observe
                                                                                       the existence of an inter-sensor correlation and with the con-
1    INTRODUCTION                                                                      text. Figure 1 shows the evolution of three dimensions (i.e. Black
Nowadays, the Internet of Things (IoT) basically relies on ad-                         Carbon (BC), NO2 and Particulate Matters (PM)) with micro-
vanced sensor technologies to bridge the physical world and                            environments identification. As shown in Figure 1, BC and NO2
information systems. In particular, along with the widespread                          preserve the same shapes and statistical characteristics in the
use of GPS, various mobile sensors bring rich information col-                         micro-environment “Car”. Plus, we note that PM values keep
lected from both the surrounding environment and human activ-                          the same statistical characteristics in the micro-environment “In-
ities, which are generally represented as Geo-referenced Time                          door”. Moreover, we can observe the existence of a correlation
Series (GTS). Mobile Crowed Sensing (MCS) [12] emerges as a                            between the three dimensions during the whole timeline.
                                                                                           The idea we promote in this paper is to utilize a wisely chosen
© 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed-   annotated dataset, in order to train a model on all the combination
ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus)
                                                                                       1 http://polluscope.uvsq.fr
on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0
                                                                                       2 In this paper, the terms "context" and "micro-environment" are used
International (CC BY 4.0)
                                                                                       interchangeably.
of air quality, and mobility dimensions as predictors of the micro-   One-Nearest Neighbor (1-NN) classifier with different distance
environment. We hypothesize that the multi-variate time series        measures, such as Euclidean Distance (ED) or Dynamic Time
collected by the MCS campaigns not only depends on the micro-         Wrapping (DTW) [2], is always considered as the benchmark to
environment but could be a proxy of it.                               give a preliminary evaluation in the MTSC problem.
                                                                         Considering the real-life scenarios, where it is difficult or ex-
                                                                      pensive to obtain a large amount of labeled data for training,
                                                                      some studies used both labeled and unlabeled data to learn the
                                                                      human activity, that is Semi-Supervised Learning (SSL) [25] on
                                                                      MTSC. The pioneering work by [25] propose a semi-supervised
                                                                      technique for time series classification. The authors demonstrated
                                                                      that semi-supervised learning requires less human effort and gen-
                                                                      erally achieves higher accuracy than training on limited labels.
                                                                      The semi-supervised model [25] is based on the Self-Learning
                                                                      concept with the One-Nearest-Neighbor (1-NN) classifier. First,
                                                                      the labeled set, denoted by 𝑃 (as positively labeled) is applied to
                                                                      train the 1-NN classifier 𝐶. Then, the unlabeled samples 𝑈 are
                                                                      given the pseudo labels progressively based on their distance to
                                                                      the samples in 𝑃. Thereafter, the enriched labeled set 𝑃 allows it-
                                                                      eratively repeating the previous step and improving the classifier.
Figure 1: Inter-sensor and micro-environment correla-
                                                                      More recently, the deep learning-based models on MTSC show
tions.
                                                                      promising performance under weak supervision. For instance,
                                                                      Zhang et al. [29] propose a novel semi-supervised MTSC model
   The question that arises itself now is how to combine all these    named Time series attentional prototype network (TapNet), to ex-
different aspects of the data (geo-location, sensors) to identify     plore the valuable information in the unlabeled samples. TapNet
the user’s context automatically? And how much a model can dis-       projects the raw MTS data into a low-dimensional representation
criminate the observations in different micro-environments? To        space. The unlabeled samples approach themselves to the class
this end, we envision a holistic approach of activity recognition,    prototype in the representation space, where the distance-based
as depicted in [7].                                                   probability and the labeled samples allow training the model pro-
   The micro-environment recognition is a crucial figure to ex-       gressively. Moreover, the hybrid Convolutional Neural Network
posure interpretation. Once the data are correctly annotated, our     (CNN) and Long Short-Term Memory (LSTM) structure adopted
ultimate goal is to get insight into all the dimensions (spatial,     in TapNet allows modeling, respectively, the variable interactions
temporal, individual, and contextual) of the exposure to pollution.   and the temporal features of MTS.
   In this paper, we evaluate different approaches and provide a
framework dedicated to the preparation, the application and the
comparison of different machine learning algorithms.                  2.2    Multi-View Learning
   The rest of this paper is organized as follows. We introduce       Another line of studies propose multi-view learning to classify
the related work in Section 2. The formal presentation of our         time series data originated from multiple sensors to recognize
micro-environment recognition model is discussed in Section 3.        users activities. Garcia-Ceja et al. [11] propose a method based on
Section 4 presents the experimental results and evaluation of the     multi-view learning and stacked generalization for fusing audio
micro-environment recognition model in the context of environ-        and accelerometer sensor data for human activity recognition
mental crowd sensing. Section 5 gives the extensive discussion        using wearable devices. Each sensor’s data is seen as a different
for the perspectives of this work. In section 6, we summarize our     “view”, and they are combined using stacked generalization [26].
conclusions and provide directions for future work.                   The approach trains a specific classification model over each view
                                                                      and an extra meta-learner using the view models as input. The
2     RELATED WORK                                                    general idea of the authors is to combine data from heterogeneous
Human activity recognition involves a wide range of applications      types of sensors to complement each other and thus, increase
from smart homes activities [1] to daily human activities [4,         recognition accuracy.
17, 28], to human mobility [6, 30] to cite a few. It represents a        Wang et al. [23] propose a framework based on deep learning
typical scenario of machine learning, and some public datasets        to learn features from different aspects of the data based on
are widely used in the benchmarks. In this section, we introduce      features of sequence and visualization. In order to imitate the
a summary of two main topics of related work to our approach.         human brain, which can classify data based on visualization,
We focus mainly on multi-variate time series (MTS) classification     the authors transform the time series into an Area Graph. They
and multi-view learning.                                              use well-trained LSTM-A neural networks and CNN-A neural
                                                                      networks to extract the features of time series data. LSTM-A is
2.1    Multi-Variate Time Series Classification                       used to extract sequence features, while CNN-A is used to extract
Human activity recognition falls in the problem of labelling data     visual features from the time series. Then, based on the fusion
segments with the type of activity, which leads to a multi-variate    of features, the authors carry out the time series classification
time series classification (MTSC) problem based on data collected     task. Although the approach gained promising results, it did not
by multiple wearable sensors. There is a wide range of time series    outperform deep learning methods such as InceptionTime [10].
classification approaches that can be classified into four cate-         Li et al. [15] propose a Multi-view Discriminative Bilinear Pro-
gories: distance-based methods [2], feature-based methods [19],       jections (MDBP) for multi-view MTSC. The proposed approach
ensemble methods [11] and deep learning models [3, 9, 24]. The        is a multi-view dimensionality reduction method for time series
classification which aims to extract discriminative features from                  Table 1: An example of the new generated dataset 𝐷 ′
multi-view MTS data. MDBP mainly projects multi-view data to
a shared subspace through view-specific bilinear projections that                 First-Level Learners   Associated Prediction Probabilities     True Label
                                                                                𝑙 1 𝑙 2 ... 𝑙𝑖 ... 𝑙𝑛    𝑝 1 𝑝 2 ... 𝑝𝑖 ...        𝑝𝑛                𝑦
preserve the temporal structure of MTS, and learns discriminative
features by incorporating a novel supervised regularization.

3     MICRO-ENVIRONMENT RECOGNITION
      MODEL                                                                       In step 3, the first-level learner takes as input the time series
In this section, we provide an overview of our proposed frame-                 data coming from each view. Then, each view will generate its
work for micro-environment recognition in the context of MCS.                  own predicted labels with associated prediction probabilities
Our proposed approach contains six steps as shown in Figure 2.                 with the form [𝑙𝑖 , 𝑝 1, 𝑝 2, ..., 𝑝 𝑗 , ..., 𝑝𝑘 , 𝑦], where 𝑙𝑖 is the predicted
                                                                               label of the first-level learner 𝑖, 𝑝 𝑗 is the associated prediction
                                                                               probability for each class 𝑗 of the 𝑘 possible classes, and 𝑦 is the
3.1     Data Collection                                                        true label.
The first step of our micro-environment recognition process is                    One of the advantages of multi-view learning is its versatility
the data collection. During three campaigns, around one hundred                on first and second level learners’ choices. One can flexibly substi-
participants have been recruited to collect ambient air measure-               tute classifier choices as wished. We opt for k-nearest-neighbor
ments along with geo-location for one week 24 hours a day,                     (kNN) classifier coupled with the Dynamic Time Warping (DTW)
while performing their daily activities. Each participant carries              distance as first-level learner to be trained on each view of the
a multi sensor box and a tablet empowered with GPS chipset.                    data. kNN is one of the most popular and traditional TSC ap-
The sensors collect time annotated measurements of Particulate                 proaches. kNN with DTW metric was considered for a long time
Matter (PM1.0, PM10, PM2.5), nitrogen dioxide (𝑁𝑂 2 ), Black Car-              the state-of-the-art in the time series classification problem [9].
bon (BC), Temperature and Relative Humidity, and the tablet                    Furthermore, most classification approaches require parameters
records participants’ geo-locations and allows them to annotate                settings, whereas kNN with DTW is parameter free. kNN classi-
their context by using a self-reporting mobile app. They report                fier has shown to be a strong baseline [9], and there is no single
every transition to a micro-environment (e.g., Home, Office, Park,             distance measure that significantly outperforms DTW [9]. Hence,
Restaurant, etc.), as well as events, which are temporary activities           recent research has focused on developing ensemble methods
for a brief period (e.g., Start cooking, Open a window, Close a                that significantly outperforms the NN coupled with DTW [9].
window, Smoking, Turn on a chimney, etc.).                                        In step 4, we aimed at giving a weight for each learner, thus
                                                                               a new dataset 𝐷 ′ is generated by joining the first-level learner
3.2     Data Preparation                                                       predictions and the probability of each prediction, Table 1 shows
The second step consists of pre-processing the data which is two               the feature vector in this dataset, where 𝑙𝑖 is the predicted label
folds. On the one hand, most sensor data are noisy, and require                of the first-level learner 𝑖, 𝑝𝑖 is the probability of this prediction,
a prepossessing phase to clean them from irrelevant measure-                   and 𝑦 is the true label.
ments. We have observed this especially in the GPS (due to signal                 In step 5, after generating a new dataset 𝐷 ′ , a second-level
loss), and in air quality data even though the sensor data quality             classifier, or meta-learner, is trained over 𝐷 ′ through ensemble
is a permanent preoccupation of the project, by careful evalu-                 learning [31]. This approach allows to preserve the statistical
ation before their selection, and periodic qualification during                properties of each view and learn the classes of the MTS instances
the campaign [14]. The sensors for climatic parameters do not                  with a significant improvement in the classification accuracy.
show such defects. Therefore, a de-noising process is applied to                  Many ensemble methods [31] have been proposed to further
clean the data. On the other hand, the highest quality sample of               enhance the algorithm’s accuracy by combining learners rather
annotated data is selected as a baseline to validate the process               than trying to find the best single learner. Due to their versatility
of micro-environment recognition. The idea is to generalize the                and flexibility, ensemble methods attract many researchers and
micro-environment recognition to all participants’ data, by using              can be applied in different domains, for example, but not limited
the model derived from a good-quality dataset.                                 to, time series classification [11] and time series segmentation
                                                                               [8]. In a previous work [8], we used a multi-view approach for
3.3     Multi-View Learning Model                                              segmenting MCS data, where we employed an unsupervised
                                                                               learning for change detection on each view.
We were interested in the stack generalization approach pro-                      In this work, we conduct our experiments using Random Forest
posed in [11], but we have adapted it to best fit for solving our              classifier since it has shown high performance when it is applied
problem. We propose to learn the micro-environment of partici-                 in the human activity recognition domain [11].
pants from multi-variate time series through a two-stage model
based on multi-view learning. Our multi-view classification ap-
proach consists of training a first-level learner on each view
(i.e. step 3 in Figure 2), and then train a second-level learner or
meta-learner (i.e. step 5 in Figure 2) to combine the output of                4    EXPERIMENTS AND RESULTS
each view and enhance the global accuracy of the classification.               The experiments are carried out on different environments. The
We assume that 𝑌𝑖𝑡 is a dimension of the n-dimenstional time                   multi-view learning model was implemented in Python 3.6 using
series 𝑌𝑡 = (𝑌1𝑡 , 𝑌2𝑡 , ..., 𝑌𝑖𝑡 , ..., 𝑌𝑛𝑡 ). In our model, each view 𝑉𝑖 ,   scikit-learn 0.23.2 and tslearn [21]. The deep-learning models
where 𝑉 = (𝑉1, 𝑉2, ..., 𝑉𝑖 , ..., 𝑉 𝑛) is the set of views, represents a       (MLSTM-FCN [13], TapNet [29]) were trained on a single Tesla
dimension 𝑌𝑖𝑡 of the multi-variate time series 𝑌𝑡 . Thus, we have              V100 GPU of 32 Go memory with CUDA 10.2, using respectively
as many views as dimensions.                                                   Keras 2.2.4 and PyTorch 1.2.0.
                               Figure 2: Overview of the Micro-Environment Recognition Process.


4.1    Experimental Settings                                           4.2    Classification Results
We evaluated the used models in these experiments using real           This section details the experimental results. The micro-environment
life data collected within the scope of Polluscope project. For        recognition is formulated as a MTSC problem. We used as a base-
the experiments we have used the participants’ ambient air data        line a basic kNN classifier with DTW as distance metric. To
containing (PM10, PM1.0, PM2.5, NO2, BC, Temperature, and              compare our multi-view learning approach (2NN-DTW for the
Relative Humidity), in addition to the speed dimension derived         first-level learners and Random Forest as a meta-learner) with
from the geo-locational data. Moreover, we have 8 classes (i.e.,       state-of-the-art techniques, we implemented the MLSTM-FCN
micro-environment to recognize), which can be divided into two         [13] and run it on a GPU, since it requires more computational
categories indoor such as (“Home”, “Office”, “Restaurant”, and         resources than the multi-view approach. Both kNN-DTW and
"Store"), and outdoor such as (“Street”, “Bus”, “Car”, and “Train”).   MLSTM-FCN were applied on an aggregated feature vector con-
    We have selected the data of six participants who have thor-       taining all the dimensions together.
oughly annotated their activities within the campaign. Data were          As shown in Table 2, kNN-DTW classifier is not able to dis-
split into two third for training and one third for testing, with      criminate correctly between the micro-environments, while the
care taken to keep the data of each participant grouped either         accuracy improves when applying the multi-view approach that
in the training or in the testing sets. We used the cross valida-      treats each view independently, thus preserving the statistical
tion score with "Repeated Stratified K-fold" in order to split the     features of each view. For the third experiment, a Long Short
training set into training and validation, while we test the overall   Term Memory network is generated in order to learn a map-
accuracy using the test dataset.                                       ping between the input vector and the classes. MLSTM-FCN has
    To account for the temporal feature of the data, we segment        shown promising results in the experiments. Table 2 shows the
them into samples of 10 minutes’ length at maximum. Usually,           accuracy of experiments carried out with different conditions.
people spend most of their time indoors, thus we should take into
consideration other outdoor activities that have a short period of     Table 2: Performance of different classifiers on Multi-
time compared to indoor activities. For example, the average time      source Polluscope data
spent in "Bus" is around 8 minutes, for "Car" the average time
is 20 minutes, etc. Globally, the distribution of data samples is
                                                                                  Model               Condition          Accuracy
highly imbalanced over the different classes, as shown in Figure
3a, which reflects the imbalance of time spent in different micro-                                     Speed               0.450
environments. Imbalanced classes usually cause low performance                                        No speed             0.440
                                                                                kNN-DTW
classification for the minority classes. To cope with this problem,                               Speed & Re-smp.          0.587
we apply a resampling strategy. Figure 3 shows the distribution                                  No speed & Re-smp.        0.597
of the data in both the original and the re-sampled dataset. We                                        Speed               0.716
have used the random over/under-sampler in order to balance                                           No speed             0.710
                                                                             Multi-view Based
our dataset.                                                                                      Speed & Re-smp.          0.729
    To consider the valuable variable of the mobility information,                               No speed & Re-smp.        0.640
we carry out our experiments on the datasets while considering                                         Speed               0.808
or not the speed variable. We also compare the classifiers’ perfor-                                   No speed             0.784
                                                                              MLSTM-FCN
mance on both resampled and original (i.e., un-resampled) data.                                   Speed & Re-smp.          0.703
Finally, we introduce and evaluate a two-steps approach, by first                                No speed & Re-smp.        0.691
discriminating indoor from outdoor environments, followed by                                      Speed & Re-smp.          0.83
                                                                              Grouping Step
a refinement step to learn a more specific class.                                                No speed & Re-smp.        0.82
                                                                       Table 3: Performance of Multi-view Learner (Raw data
                                                                       with/out speed)

                                                                                             With Speed                   Without Speed
                                                                           class
                                                                                                             F1                             F1
                                                                                     Precision    Recall           Precision   Recall
                                                                                                           Score                          Score
                                                                        Street       0.65         0.52     0.57    0.69        0.40       0.50
                                                                        Bus          0.18         0.12     0.14    0.60        0.33       0.42
                                                                        Office       0.86         0.93     0.89    0.58        0.43       0.49
                                                                        Restaurant   1.0          0.22     0.36    1.0         0.10       0.18
                                                                        Home         0.67         0.65     0.66    0.71        0.87       0.78
                                                                        Car          0.61         0.81     0.69    0.55        0.75       0.63
                                                                        Store        0.25         0.05     0.08    0.00        0.00       0.00
                         (a) Original Dataset                           Train        0.00         0.00     0.00    0.40        0.08       0.13


                                                                       Table 4: Performance of Multi-view Learner (Re-sampled
                                                                       data with/out speed)

                                                                                             With Speed                   Without Speed
                                                                           class
                                                                                                             F1                             F1
                                                                                     Precision    Recall           Precision   Recall
                                                                                                           Score                          Score
                                                                        Street       0.72         0.49     0.59    0.59        0.36       0.45
                                                                        Bus          0.00         0.00     0.00    0.25        0.04       0.06
                                                                        Office       0.94         0.94     0.94    0.71        0.76       0.73
                                                                        Restaurant   0.92         0.80     0.86    0.43        0.20       0.27
                                                                        Home         0.72         0.78     0.75    0.54        0.70       0.61
                                                                        Car          0.64         0.80     0.71    0.56        0.63       0.59
                                                                        Store        0.09         0.05     0.06    0.12        0.05       0.07
                       (b) Re-sampled Dataset                           Train        0.54         0.47     0.50    0.26        0.33       0.29

                 Figure 3: Class Distribution
                                                                       Table 5: Performance of MLSTM-FCN (Raw data with/out
                                                                       speed)
    Next, we focus on the comparison between the proposed ap-
                                                                                             With Speed                   Without Speed
proach and MLSTM-FCN. We also study the impact of using or                 class
                                                                                                             F1                             F1
not the mobility data, as well as the learning from the original                     Precision    Recall           Precision    Recall
                                                                                                           Score                          Score
or the re-sampled data. We report the performances in terms of          Street       0.55         0.60     0.57    0.53         0.54      0.53
recall, and F1 score. These results are grouped in Table 2 to Table     Bus          0.88         0.70     0.78    0.76         0.65      0.70
7 and Figure 4 to Figure 6. Table 2 represents the overall accuracy     Office       0.96         0.88     0.92    0.91         0.85      0.88
of the different classifiers used in these experiments while or not     Restaurant   0.78         0.88     0.82    0.78         0.88      0.82
using the Speed data, and with or without re-sampling. Table            Home         0.83         0.87     0.85    0.82         0.90      0.86
                                                                        Car          0.81         0.83     0.82    0.78         0.83      0.80
3 and 4 reports the precision, recall, and the F1-Score metrics
                                                                        Store        0.50         0.60     0.55    0.62         0.32      0.42
of the Multi-view learner for raw data and re-sampled data re-
                                                                        Train        1.0          0.30     0.46    0.38         0.30      0.33
spectively with and without Speed. Table 5 and 6 reports the
precision, recall, and F1-Score metrics of the MLSTM-FCN for
raw data and re-sampled data respectively with and without             Table 6: Performance of MLSTM-FCN (Re-sampled data
Speed. Figure 4 shows the accuracy among different views used          with/out speed)
within the experiment of the Multi-view approach. Moreover, fig-
ure 5a and 5b shows the confusion matrix when applying the                                   With Speed                   Without Speed
                                                                           class
Multi-view approach on the re-sampled data with/out the speed                                                F1                            F1
                                                                                      Precision   Recall           Precision   Recall
                                                                                                           Score                         Score
respectively. Figure 6 shows the procedure used for the Grouping
                                                                        Street        0.43        0.54     0.48    0.50        0.42      0.46
step approach which is also based on the multi-view approach,
                                                                        Bus           0.41        0.65     0.50    0.52        0.65      0.58
and table 7 reports the precision, recall, and F1-Score metrics for
                                                                        Office        0.90        0.88     0.89    0.87        0.87      0.87
this approach.
                                                                        Restaurant    0.75        0.75     0.75    0.50        0.62      0.56
    The multi-view learner proposed in these experiments em-            Home          0.80        0.80     0.80    0.81        0.76      0.78
ploys the stacked generalization approach, which combines the           Car           0.80        0.60     0.69    0.65        0.68      0.67
predictions of each independent view in order to get the final          Store         0.38        0.48     0.42    0.50        0.24      0.32
classification result. As shown in figure 4 although the first level    Train         0.22        0.40     0.29    0.12        0.30      0.17
learners may have a low accuracy but the combination of their
predictions, by generating a new dataset D’ and feeding it to
train, the meta-learner can improve the accuracy a lot.                5a and 5b and the recall and F1 score metrics in table 4 that the
    We observe an improvement of the accuracy of the overall clas-     model can easily discriminate between the “indoor” and “outdoor”
sification when adding the speed dimension to the ambient air          activities, but it cannot perfectly distinguish between the micro-
dimensions. We also notice from the confusion matrix in figures        environments inside each category. For example, even though
                                     Figure 4: Accuracy among different views (Re-sampled data)

Table 7: Performance of Grouping Step (Re-sampled data                   5.1    Multi-view Learner
with/out speed)
                                                                         The multi-view learner adopted in this paper is composed by the
                                                                         base learner (i.e., kNN-DTW) and the meta-learner (i.e., Random
                      With Speed                   Without Speed
    class                                                                Forest), which has greatly improved the performance compared to
                                      F1                           F1
               Precision   Recall           Precision   Recall           the single kNN-DTW classifier. The objective of this paper is not
                                    Score                        Score
                                                                         to propose the best classifier for MTS classification, but to provide
 Street        0.73        0.59     0.65    0.59        0.35     0.44
                                                                         an insight that the multi-view learner is capable of coordinating
 Bus           0.60        0.12     0.20    0.00        0.00     0.00
 Office        0.92        0.93     0.92    0.80        0.87     0.83    effectively the information from different variables and achieving
 Restaurant    0.91        0.67     0.77    0.20        0.07     0.10    more reliable performance than a single base learner. Moreover,
 Home          0.86        0.94     0.90    0.75        0.86     0.80    the results of the grouping approach which is based on the multi-
 Car           0.71        0.87     0.78    0.66        0.94     0.77    view approach confirms that there is a clear signature for each
 Store         0.46        0.30     0.36    0.40        0.10     0.16    micro-environment, thus we can have an effective prediction
 Train         0.29        0.33     0.31    0.33        0.20     0.25    with this approach.
                                                                            Nevertheless, the kNN-DTW is considered as the baseline for
                                                                         MTS classification and is widely outpaced by the advanced ap-
                                                                         proaches such as Shapelets [27, 32, 33] or the frequent patterns
                                                                         [18]. Essentially, the kNN-DTW captures the global feature based
most of the samples in the “Train” micro-environment is falsely          on the distance measure between the entire sequences, while
predicted as “Car”, both “Car” and “Train” micro-environments            the local features (e.g., the frequent patterns [18], the interval
can be classified as outdoor. Based on this observation, we intro-       features [5], Shapelets [27], etc.) are more appropriate when a
duced a grouping step before recognizing the micro-environment.          specific pattern characterizes a class. More specifically, a combi-
In this step we classify the sample into either an “Indoor” or “Out-     nation of features extracted from different domains may improve
door” environment. Based on the classification result, a model will      dramatically the performance of the base learner [16]. Therefore,
be specialized for each indoor or outdoor micro-environments.            one of the perspectives consists in the optimization of the base
Figure 6 shows the added step and the procedure for the classifi-        learner and the exploration of the explainability of the multi-
cation.                                                                  view learner on both the feature interpretation and the variable
   The accuracy of the classifier in the grouping phase (“Indoor”        importance for building the classifier. The visual representation
or “Outdoor”) showed a good result when using the resampled              of Shapelets make them good candidates for such improvement.
data. It reaches 0.82 for data without speed dimension, while for
data with the speed dimension it reaches around 0.83. Table7
                                                                         5.2    Label Shortage Issue
shows the recall and F1 score for both models trained on resam-
pled data with and without speed. These experiments did not              The label shortage is a practical issue when building the learning
consider the case with original data due to its low performance,         model. In particular, in the context of Polluscope, post-labelling
in particular, for the minority classes.                                 for time series sensor data is much more costly than classic data
                                                                         (e.g., image, text, etc.) due to the low interpretability over the real-
                                                                         valued sequence. Therefore, the data need to be annotated during
5   DISCUSSIONS & PERSPECTIVES                                           the data collection process. However, certain practical factors
In this section, we discuss the perspectives for improving our           limit the availability of labels. For instance, the participants are
multi-view learning model and the possibility for tackling the           not always conscious in annotating their micro-environment.
practical label issue in the context of Polluscope.                      Therefore, for certain time periods, no annotations were marked.
                                                                      we conduct a preliminary test on the Polluscope data with the
                                                                      newly proposed semi-supervised MTSC model TapNet [29].
                                                                         TapNet [29] is a deep learning based approach designed for
                                                                      multivariate time series classification. By adopting the proto-
                                                                      typical network [20], TapNet allows learning a low-dimensional
                                                                      embeddings for the input MTS where the unlabelled samples
                                                                      help adjusting the class prototype (i.e., class centroid), which
                                                                      leads to a better classifier than using only the labelled samples.
                                                                      Table 8 shows the semi-supervised learning results on Pollus-
                                                                      cope data considering or not the speed variable. We evaluate the
                                                                      performance of TapNet under different supervision ratios in the
                                                                      training set. The results show that the unlabeled samples and
                                                                      the speed variable do improve the performance of the classifier.
                                                                      Besides, the accuracy didn’t drop a lot when eliminating the an-
                                                                      notations in training set (from ratio=1 for fully labelled to 0.5,
                                                                      and even for 0.2 when only 20% data in labelled), indicating that
                                                                      the collected data within each class is not sparsely distributed,
                                                                      thus learning under weak supervision is reliable with the aid of
                                                                      the unlabeled samples.

                           (a) With Speed                             Table 8: The accuracy results of TapNet on Polluscope data
                                                                      under different supervision ratios

                                                                          Condition   Sup_ratio=1     Sup_ratio=0.5    Sup_ratio=0.2
                                                                           Speed          0.746           0.725             0.717
                                                                          No speed        0.713           0.703             0.695


                                                                         Giving the promising results on the data distribution consis-
                                                                      tency, another avenue worth exploring is to consider and in-
                                                                      tegrate a semi-supervised model into our multi-view learner.
                                                                      Various semi-supervised frameworks are applicable to our model,
                                                                      such as applying self-learning [25] to produce the pseudo labels
                                                                      on the multi-view learner, or adopting the label propagation and
                                                                      manifold regularization techniques [22] on the base learner.

                                                                      6    CONCLUSION
                                                                      Activity recognition has gained the interest of many researchers
                                                                      nowadays, due to the widespread use of mobility sensors. Micro-
                                                                      environment recognition is essential in MCS projects such as
                                                                      Polluscope, in order to be able to analyse the individual’s expo-
                                                                      sure to air pollution and to relate it to her context. The major
                         (b) Without Speed
                                                                      finding of our study is to show to some extent that the ambient air
                                                                      can characterize the micro-environment. Moreover, the accuracy
      Figure 5: Confusion Matrix (Re-sampled Data)
                                                                      of the model is high enough to consider an automatic detection
                                                                      of the micro-environment without burdening the participants
                                                                      with self-reporting. By using the mobility feature, the accuracy
                                                                      improves slightly though the gain is moderate. Therefore, we can
                                                                      keep characterizing the micro-environment even in the absence
                                                                      of the speed dimension.
                                                                         We employed different approaches and learners, and con-
                                                                      ducted a thorough experimental study, which shows the effi-
                                                                      ciency of MLSTM-FCN and the multi-view approach for time
                                                                      series classification. We have also compared the results with the
                                                                      kNN-DTW classifier which was considered as the baseline.
                 Figure 6: Grouping Process
                                                                         We have also identified several perspectives of this work, and
                                                                      explored the application of semi-supervised learning to cope
                                                                      with the lack of labels for some classes. In future work, we can
   In order to give an insight for the consistency between the        use various algorithms for the first level learner and the meta-
labeled and unlabeled data, and to see if the unlabeled data are      learner, as multi-view learning is flexible. Finally, we intend to
valuable for improving the classifier’s performance in our context,   improve the performance of the learned classes by integrating
some a priori rules, like the unlikelihood of being in some micro-                             116 (2019), 237–245.
environment at some time of day, or of transitions between some                           [14] Baptiste Languille, Valérie Gros, Nicolas Bonnaire, Clément Pommier, Cécile
                                                                                               Honoré, Christophe Debert, Laurent Gauvin, Salim Srairi, Isabella Annesi-
micro-environments.                                                                            Maesano, Basile Chaix, et al. 2020. A methodology for the characterization
                                                                                               of portable sensors for air quality measure with the goal of deployment in
                                                                                               citizen science. Science of the Total Environment 708 (2020), 134698.
ACKNOWLEDGMENTS                                                                           [15] Sheng Li, Y. Li, and Yun Fu. 2016. Multi-View Time Series Classification: A
                                                                                               Discriminative Bilinear Projection Approach. Proceedings of the 25th ACM
This work has supported by the French National Research Agency                                 International on Conference on Information and Knowledge Management (2016),
(ANR) project Polluscope, funded under the grant agreement                                     989–998.
ANR-15-CE22-0018, by the H2020 EU GO GREEN ROUTES funded                                  [16] Jason Lines, Sarah Taylor, and Anthony Bagnall. 2016. HIVE-COTE: The
                                                                                               Hierarchical Vote Collective of Transformation-based Ensembles for Time
under the research and innovation programme H2020- EU.3.5.2                                    Series Classification. In 2016 IEEE 16th international conference on data mining
grant agreement No 869764, and by the DATAIA convergence in-                                   (ICDM). 1041–1046.
stitute project StreamOps, as part of the Programme d’ Investisse-                        [17] Li Liu, Yuxin Peng, Shu Wang, Ming Liu, and Zigang Huang. 2016. Com-
                                                                                               plex activity recognition using time series pattern dictionary learned from
ment d’Avenir, ANR-17-CONV-0003. Part of the equipment was                                     ubiquitous sensors. Information Sciences 340-341 (May 2016), 41–57. https:
funded by iDEX Paris-Saclay, in the framework of the IRS project                               //doi.org/10.1016/j.ins.2016.01.020
                                                                                          [18] Guruprasad Nayak, Varun Mithal, Xiaowei Jia, and Vipin Kumar. 2018. Clas-
ACE-ICSEN, and by the Communauté d’agglomération Versailles                                    sifying multivariate time series by learning sequence-level discriminative
Grand Parc – VGP - (www.versaillesgrandparc.fr). We are thank-                                 patterns. In Proceedings of the 2018 SIAM International Conference on Data
ful to VGP (Thomas Bonhoure) for facilitating the campaign. We                                 Mining. SIAM, 252–260.
                                                                                          [19] Juha Pärkkä, Miikka Ermes, Panu Korpipää, Jani Mäntyjärvi, Johannes Peltola,
would like to thank all the members of the Polluscope consortia                                and Ilkka Korhonen. 2006. Activity classification using realistic data from
who contributed in one way or another to this work: Salim Srairi                               wearable sensors. IEEE transactions on information technology in biomedicine:
and Jean-Marc Naude (CEREMA) who conducted the campaign;                                       a publication of the IEEE Engineering in Medicine and Biology Society 10, 1 (Jan.
                                                                                               2006), 119–128. https://doi.org/10.1109/titb.2005.856863
Boris Dessimond and Isabella Annesi-Maesano (Sorbonne Uni-                                [20] Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical Networks
versity) for their contribution to the campaign; Valerie Gros and                              for Few-shot Learning. In Advances in Neural Information Processing Systems,
                                                                                               Vol. 30. Curran Associates, Inc., 4077–4087.
Nicolas Bonnaire (LSCE), and Anne Kauffman and Christophe                                 [21] Romain Tavenard, Johann Faouzi, Gilles Vandewiele, Felix Divo, Guillaume
Debert (Airparif) for their contribution in the periodic qualifica-                            Androz, Chester Holtz, Marie Payne, Roman Yurchak, Marc Rußwurm, Kushal
tion of the sensors and their active involvement in the project.                               Kolar, and Eli Woods. 2020. Tslearn, A Machine Learning Toolkit for Time
                                                                                               Series Data. Journal of Machine Learning Research 21, 118 (2020), 1–6. http:
Finally, we would like to thank the participants for their great                               //jmlr.org/papers/v21/20-091.html
effort in carrying the sensors, without whom this work would                              [22] Jesper E. van Engelen and H. Hoos. 2019. A survey on semi-supervised
not be possible.                                                                               learning. Machine Learning 109 (2019), 373–440.
                                                                                          [23] Baoquan Wang, Tonghai Jiang, Xi Zhou, Bo Ma, Fan Zhao, and Yi Wang.
                                                                                               2020. Time-Series Classification Based on Fusion Features of Sequence and
REFERENCES                                                                                     Visualization. Applied Sciences 10, 12 (Jan. 2020), 4124. https://doi.org/10.
                                                                                               3390/app10124124
 [1] Samaneh Aminikhanghahi and Diane J. Cook. 2019. Enhancing activity recog-            [24] Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019.
     nition using CPD-based activity segmentation. Pervasive and Mobile Comput-                Deep Learning for Sensor-based Activity Recognition: A Survey. Pattern
     ing 53 (2019), 75 – 89.                                                                   Recognition Letters 119 (March 2019), 3–11. https://doi.org/10.1016/j.patrec.
 [2] Donald J Berndt and James Clifford. 1994. Using Dynamic Time Warping to                   2018.02.010 arXiv: 1707.03502.
     Find Patterns in Time Series. In KDD workshop, Vol. 10. Seattle, WA, USA:,           [25] Li Wei and Eamonn Keogh. 2006. Semi-supervised time series classification.
     359–370.                                                                                  In Proceedings of the 12th ACM SIGKDD international conference on Knowledge
 [3] Kaixuan Chen, Dalin Zhang, Lina Yao, Bin Guo, Zhiwen Yu, and Yunhao                       discovery and data mining (KDD ’06). Association for Computing Machinery,
     Liu. 2020. Deep Learning for Sensor-based Human Activity Recognition:                     New York, NY, USA, 748–753. https://doi.org/10.1145/1150402.1150498
     Overview, Challenges and Opportunities. arXiv:2001.07416 [cs] (Jan. 2020).           [26] David H. Wolpert. 1992. Stacked generalization. Neural Networks 5, 2 (1992),
     http://arxiv.org/abs/2001.07416 arXiv: 2001.07416.                                        241 – 259. https://doi.org/10.1016/S0893-6080(05)80023-1
 [4] Heeryon Cho and Sang Min Yoon. 2018. Divide and conquer-based 1D CNN                 [27] Lexiang Ye and Eamonn Keogh. 2009. Time series shapelets: A New Primitive
     human activity recognition using test data sharpening. Sensors 18, 4 (2018),              for Data Mining. Proceedings of the 15th ACM SIGKDD international conference
     1055.                                                                                     on Knowledge discovery and data mining - KDD ’09 (2009), 947–956.
 [5] Houtao Deng, George Runger, Eugene Tuv, and Martyanov Vladimir. 2013. A              [28] Mi Zhang and Alexander A. Sawchuk. 2012. Motion primitive-based human
     time series forest for classification and feature extraction. Information Sciences        activity recognition using a bag-of-features approach. In Proceedings of the 2nd
     239 (2013), 142–153.                                                                      ACM SIGHIT International Health Informatics Symposium (IHI ’12). Association
 [6] T. M. T. Do and D. Gatica-Perez. 2014. The Places of Our Lives: Visiting                  for Computing Machinery, New York, NY, USA, 631–640. https://doi.org/10.
     Patterns and Automatic Labeling from Longitudinal Smartphone Data. IEEE                   1145/2110363.2110433
     Transactions on Mobile Computing 13, 3 (March 2014), 638–648. https://doi.           [29] Xuchao Zhang, Yifeng Gao, Jessica Lin, and Chang-Tien Lu. 2020. TapNet:
     org/10.1109/TMC.2013.19                                                                   Multivariate Time Series Classification with Attentional Prototypical Network.
 [7] Hafsa El Hafyani. 2020. In 2020 21st IEEE International Conference on Mobile              In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 6845–
     Data Management (MDM). IEEE, 246–247.                                                     6852.
 [8] Hafsa El Hafyani, Karine Zeitouni, Yehia Taher, and Mohammad Abboud.                 [30] Yu Zheng, Quannan Li, Yukun Chen, Xing Xie, and Wei-Ying Ma. 2008. Under-
     2020. Leveraging Change Point Detection for Activity Transition Mining in                 standing mobility based on GPS data. In Proceedings of the 10th international
     the Context of Environmental Crowdsensing. The 9th SIGKDD International                   conference on Ubiquitous computing. Association for Computing Machinery,
     Workshop on Urban Computing (2020).                                                       New York, NY, USA, 312–321. https://doi.org/10.1145/1409635.1409677
 [9] Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane                     [31] Zhi-Hua Zhou. 2012. Ensemble Methods: Foundations and Algorithms. CRC
     Idoumghar, and Pierre-Alain Muller. 2019. Deep learning for time series                   press.
     classification: a review. Data Mining and Knowledge Discovery 33, 4 (2019),          [32] Jingwei Zuo, Karine Zeitouni, and Yehia Taher. 2019. Exploring Interpretable
     917–963.                                                                                  Features for Large Time Series with SE4TeC. In Proc. EDBT. 606–609.
[10] Hassan Ismail Fawaz, B. Lucas, G. Forestier, Charlotte Pelletier, D. Schmidt,        [33] Jingwei Zuo, Karine Zeitouni, and Yehia Taher. 2019. Incremental and Adap-
     Jonathan Weber, Geoffrey I. Webb, L. Idoumghar, Pierre-Alain Muller, and                  tive Feature Exploration over Time Series Stream. In 2019 IEEE International
     Franccois Petitjean. 2020. InceptionTime: Finding AlexNet for Time Series                 Conference on Big Data (Big Data). 593–602.
     Classification. ArXiv abs/1909.04939 (2020).
[11] Enrique Garcia-Ceja, Carlos E. Galván-Tejada, and Ramon Brena. 2018. Multi-
     view stacking for activity recognition with sound and accelerometer data.
     Information Fusion 40 (March 2018), 45–56. https://doi.org/10.1016/j.inffus.
     2017.06.004
[12] Bin Guo, Zhu Wang, Zhiwen Yu, Yu Wang, Neil Y Yen, Runhe Huang, and
     Xingshe Zhou. 2015. Mobile crowd sensing and computing: The review of an
     emerging human-powered sensing paradigm. ACM computing surveys (CSUR)
     48, 1 (2015), 1–31.
[13] Fazle Karim, Somshubra Majumdar, Houshang Darabi, and Samuel Harford.
     2019. Multivariate LSTM-FCNs for time series classification. Neural Networks