Towards Continuous Monitoring of Environment under Uncertainty: A Fuzzy Granular Decision Tree Approach Preetham N. Reddy Sahith N. Dambekodi Tirtharaj Dash Department of EEE Department of EEE Data Science Research Group BITS Pilani, Goa Campus BITS Pilani, Goa Campus BITS Pilani, Goa Campus Goa 403726, India Goa 403726, India Goa 403726, India f2015174@goa.bits-pilani.ac.in f2015192@goa.bits-pilani.ac.in tirtharaj@goa.bits-pilani.ac.in ABSTRACT complexity of the environment or availability of the required equip- Smart monitoring of environment has been an essential area of ment for the process. In the past, this has led the theoretical and research where decision-making process is inevitable. Reliability computational computer scientists to develop mathematical models of the whole system depends on the stability and consistency of that could almost simulate the actual target environment. its decision-making unit. Real-time decision making is another Towards advancing the field forward, a research was carried out challenge in the field on which the research community has been very recently by Huerta et al.[2] that focused on online decorrela- focusing on improving the performance of the underlying models. tion of humidity and temperature in chemical sensors for continu- The underlying models are usually the learning models, that act ous monitoring. The work focused on automated processing of data as a smart engine after being sufficiently trained for the process. from simultaneous and continuous readings of the variations in In this paper, we propose to use a decision tree model that has humidity and temperature in the targeted environment. Their work the capability of handling uncertainty in the acquired data from included eight(8) metal-oxide sensors (MOX) sensors that were the environment. The resulting model is called as Fuzzy Granular continuously sensing the environment for 537 days with a sam- Decision Tree (FGDT). Series of evaluation of FGDT shows that the pling rate of 1 sample per second [2]. They estimated the effects of model is stable and powerful for the presently considered problem. variations in air humidity and temperature on the chemical sensors signals by using standard energy band model with an assumption CCS CONCEPTS that the variations in sensor conductivity can be expressed as a nonlinear function of changes in the energy bands in the presence •Computing methodologies →Machine learning approaches; of external humidity and temperature. Their study showed that Classification and regression trees; Uncertainty quantifica- the major factors that were affecting the environment were the tion; changes in humidity and the correlated changes in temperature and humidity. To visualize the process, they used a gas discrimination KEYWORDS system that could discriminate among banana, wine, and baseline Environment Monitoring; Decision Tree; Fuzzy Logic; Gas Sensors response. They had used a variant of Support Vector Machine (SVM) ACM Reference format: algorithm to build the discriminatory model for the process. Preetham N. Reddy, Sahith N. Dambekodi, and Tirtharaj Dash. 2017. To- In the process of continuous environment monitoring, there lies wards Continuous Monitoring of Environment under Uncertainty: A Fuzzy a certain degree of uncertainty in the acquired data. An uncertainty Granular Decision Tree Approach. In Proceedings of Workshop on Develop- in the acquired data could lead to the failure or adverse functioning ment aspects of Intelligent Adaptive Systems, Co-located with 10th ISEC-2017, of the system. The uncertainty could arise due to the following Jaipur, India, (DIAS 2017), 5 pages. situations such as failures of sensors, a sudden change in environ- DOI: ment due to additional and unknown factors. In such a case, the discriminatory model that was built may not be trustworthy and 1 INTRODUCTION hence the results should, in turn, be imprecise. This impreciseness or degree of uncertainty in the acquired data or desired outcome Smart monitoring of environment is increasingly becoming popu- is called ‘fuzziness’. An adaptive system that is meant to serve lar because of the obvious fact that very little human intervention for the purpose of continuous environment monitoring should not is required for such systems to perform. Moreover, it has been a only just adapt to the environmental changes with time but also very challenging and yet interesting area of research in the last should be robust with respect to the uncertainties as read through several years [1]. This field has motivated the research community its sensors. To solve the purpose of handling uncertainty or the to design automated and intelligent models (or systems) towards fuzziness present in the acquired data, and to make a robust decision continuous monitoring of an environment in industrial plants, med- based on the data, we propose a granular method – that specifically ical environment or biological processes. However, designing an captures the uncertain inputs – to be induced within an adaptive almost-accurate system has been a challenge given the background system. The research carried out by [2] was experimentally sound and our present work mainly serves the purpose of embedding the DIAS 2017, Jaipur, India Copyright ©2017 for the individual papers by the papers’ authors. Copying permitted uncertainty aspect of the work. In our present work, we redesign a for private and academic purposes. This volume is published and copyrighted by its discriminatory model that also incorporates possible uncertainty editors. . in the acquired data. It should be noted that our work is validated DIAS 2017, , Jaipur, India Preetham N. Reddy, Sahith N. Dambekodi, and Tirtharaj Dash with the experimental data from [2], further information of which has been presented in the following sections. The rest of the paper is organized as follows. The proposed and implemented model has been discussed in the section 2. The results that have been obtained for the present problem has been elaborated in section 3. The obtained results are discussed in the section 4 and the paper has been concluded in the section 5. 2 METHODOLOGY Our proposed work has following major phases: data acquisition phase, uncertainty handling phase, discrimination phase. 2.1 Data acquisition phase In the data acquisition phase, the sensors acquire the data from the environment for further processing in later phases. For more information, one could refer [2]. Figure 1: A sample Gaussian fuzzy membership transforma- tion function (X-axis: value of a sample sensor reading, Y– 2.2 Handling uncertainty in acquired data axis: degree of belongingness, µ) In this work, uncertainty in the acquired data has been taken care with the use of feature transformation using a special fuzzy mem- 2.3 Discrimination phase bership function [3, 4]. The membership function transforms a Discrimination is the process of assigning a class level to a set of single sensor reading (datum) to granules containing the informa- inputs i.e. a set of sensor readings at a particular time. Consider tion of different belongingness. For example, a temperature reading a set of sensor readings at any particular time be represented as of 38° can be called as a low temperature, medium temperature or {x 1 , x 2 , . . . , x n } whose class is unknown. It should be noted that high temperature. Hence, it can have some degree of belongingness there are three classes in the present work: banana, wine, and to each of low, medium and high temperature. In this work, we baseline response. After the fuzzy transformation, the vector can chose to use the Gaussian fuzzy membership function that is given be represented as in equation (1) where a is the maximum value of the membership hiдh hiдh function, σ is the standard deviation of the readings obtained by x = {x 1low , x 1medium , x 1 , x 2low , x 2medium , x 2 ,..., a sensor. The value of x̂ is different for each of the low, medium (2) hiдh and high belongingness. For transformation to low degree of be- x nlow , x nmedium , x n } longingness, x̂ = min{x}; for transformation to medium degree The discrimination phase works with a model that has been built of belongingness x̂ = mean{x}; and for transformation to higher with prior knowledge, called training data, to discriminate the degree of belongingness x̂ = max {x}. Please note that x is a vector sensor readings in the real time, called test data. In this work, that contains all the sensor readings for any particular feature. For we implement a decision tree (DT) classifier that works with the ith feature (alternatively, for ith sensor), x should be represented transformed fuzzy granular feature space that has been obtained as xi . from the second phase. Therefore, the whole model can be known as (x − x̂)2 Fuzzy Granular Decision Tree (FGDT). The principal reason behind   µ(x) = a exp − (1) 2σ 2 choosing DT over other machine learning models is that it is non- The full transformation can be visualized in Figure 1 that rep- parametric that can learn from the training data in a supervised resents the curve for low, medium and high membership function manner. More specifically, we implement CART that is very similar (from left to right). The belongingness can have a maximum value to C4.5 decision tree [6]. The FGDT predicts the class of a set of of 1 (a = 1) and a minimum value of 0. So, for a single datum, the sensor readings by learning simple decision rules inferred from the transformation function generates three different granules. At a features. For completeness and clarity of the readers, we present particular time t, if there are n sensors, there can be n sensor read- the working principle of the FGDT as follows. ings, called features. After the transformation, the feature vector Given a training dataset in fuzzy granular input feature space gets transformed to a higher dimension of 3n. where each of the features is represented as xi ∈ R 3n . Note that A major intuition behind such a high dimensional transformation each xi is a vector in 3n dimension represented as equation (2). Let is that a discriminatory model is expected to perform better with the class labels be represented as d. The FGDT recursively partitions high dimension than in low dimension. This is as per Cover’s the training pattern space such that the patterns with same class theorem on the separability of patterns [5]. Since this work is labels are grouped together. The following Algorithm 1 explains focused on discrimination of the sensor readings into different the generic steps of the FGDT that take the fuzzy granular sensor classes, the transformation should make the patterns separable in readings training instances (D et r ain ) and generates a decision tree higher dimensional space, if they are not easily separable in a lower from that (eT).1 dimension. In a further section, we shall see that this is indeed the 1 The e · symbol is used to represent uncertainty in input data and the tree is generated case for this specific problem. from this fuzzy granular data. Towards Continuous Monitoring of Environment under Uncertainty: A Fuzzy Granular Decision Tree Approach DIAS 2017, , Jaipur, India Data: A transformed training set, D et r ain node of the tree into two sub-trees (binary split). It is also possible Result: A decision tree, T e to split a node into multiple sub-trees (multiple splits) based on if All the instances belong to same class then more than one thresholds at the node [6]. Return a e T with a leaf node labelled with the class; Effect on computational complexity? – It is quite obvious that the end size of the feature set is increased by the transformation into the et r ain = {ϕ} then fuzzy feature space under the three membership functions such if D as low, medium and high. If we consider that there are ninput Return e T = ϕ with warning; number of features in the original data, after the transformation end into the fuzzy pattern space, the size becomes 3 ×ninput . If the time if If there is no feature in D et r ain then complexity for the classical decision tree learning is T (ninput ), the Returns a leaf labeled with the most frequent class or the FGDT has a time complexity of T (3 × ninput ). However, it should disjunction of all the classes) be noted that a fuzzy granular decision tree learner would have to end learn once with the available training data; and the learned model while Until stopping condition is not met do is just deployed for its testing. So, this cost over time would occur Find the feature j ∈ D et r ain with highest informational just once. Moreover, with the availability of high-performance gain or lowest impurity; computing architectures, this complexity could be lowered and Based on feature j, Split the present node of e T to form scalability of the proposed model could be improved. Moreover, sub-trees eTl e f t , e Tr iдht ; although a decision tree learner learns from the data by partitioning Repeat for e Tl e f t , e Tr iдht ; the space into multiple subspaces with conditions, it is already end shown that the decision tree scales well with higher dimensional Algorithm 1: FGDT algorithm (The first three if statements are data [7]. called base cases. An impurity function can also be used for the information gain: lower the impurity, higher the information gain and vice versa.) 3 RESULTS An adaptive system should not only just adapt to the environmental changes but also should be robust with respect to the uncertainties The stopping conditions while generating the e T from Det r ain as reading through the sensors. This could be done via robust mod- could be one of the following. A fully grown most generalized eling of the environment or via robust decision making withing tree has been obtained, the training pattern space has been well the system. However, our present work demonstrates the former partitioned into multiple sub-spaces based on classes, the generated category where the implemented model adapts to the changes in tree returns lowest error on validation data2 . For measuring the the environment by also incorporating the possible uncertainties by impurity, an ‘entropy’ estimate is used which is given in equation transforming the problem into an imprecise problem. To support (3). our claim about the improvement, we used the experimental data m Õ available from the work of Huerta et al. [2] to study the perfor- H (X, e T) = − P(Xi ) log2 (P(Xi )) (3) mance of our proposed fuzzy granular model, FGDT. To elaborate, i=1 to evaluate the impact on discrimination performance of our pro- Here, P(·) is the probability estimate. Based on the estimation in posed FGDT due to decorrelation of signals from temperature and equation (3), the entropy before split and entropy after splitting a humidity sensors, four different feature sets were used. The first node is computed for each feature. The feature which would provide set of features is a set of raw sensor time series (RS); the second lowest entropy after split is considered to be split up based on some feature set is a set of raw sensor data with humidity and tempera- condition, usually a threshold. Entropy after split is computed from ture (RS, T, H); the third set of features is a set of filtered data (FS) the entropy estimate of each of the new sub-trees as by decorrelating sensors; the fourth set of features is a set of raw k Õ sensor data with filtered sensor data (RS, FS). We used these same Haf t er Spl it (X, e T) = − P(X) log2 (Hi (Xi ), e Ti ), (4) feature set for a proper evaluation of our proposed model. However, i=1 additionally, we also use a new dataset that also contains ‘t0’ and where, k is number of splits at a node of the tree, Hi (Xi ), e Ti ) is the ‘dt’ along with filtered sensor data (FS) for evaluation. entropy measure for the sub-tree that would be produced after split. To properly estimate the generalization ability of the proposed By using these entropy estimates that is entropy before split and FGDT model, we used the standard procedures in machine learning entropy after split, the information gain which is denoted as G can to evaluate the performance of FGDT when discriminating sam- be computed as ples not used for training the classifier so as to remove evaluation G = Hbe f or eSpl it (X, e T) − Haf t er Spl it (X, e T) (5) bias during the testing phase. There are 919438 number of sensor- reading instances in the dataset. The available data was randomly The feature that would produce the highest information gain should partitioned into 80% and 20%. The partition with 20% of data was be used to split at a particular node in the tree e T. It should be noted used for the independent test. The 80% partition was used for 5-fold that the discussion and algorithm implemented in this work splits a cross validation (5CV). For a fair comparison, we used accuracy as 2 A set of instances that were not used during the training process but are used to the performance evaluation measure for this work as the same was check the performance of the generated model after training. These are not test data. also used in [2], although accuracy should not be always considered DIAS 2017, , Jaipur, India Preetham N. Reddy, Sahith N. Dambekodi, and Tirtharaj Dash reliable [8]. However, as the number of instances in the dataset is results during independent real-time tests. Moreover, the number very large, at the order of 9 × 105 , it would not have a large effect on of instances in a training set and the number of features do also play the accuracy measure. We considered the fold-wise performance in crucial roles in the process. With a higher number of instances with 5CV along with independent test performance for the evaluation a small set of features might not generate an adequate function that purpose. The results have been noted and depicted in Table 1. To would generate output from the inputs. After transformation to have a better evaluation of the model, the 20% partition was not fuzzy space and hence increasing the dimension of the input space, made fixed from the beginning, rather, with each fold evaluation, such a function would be possible for the purpose. This could lead to the test partition has been chosen. the proper generalization of the FGDT and hence the independent A comparison of the result obtained by our proposed FGDT test performance. One should note that the present work uses the with the results obtained by Huerta et al.[2], as shown in Table 2, CART algorithm available in Scikit-learn [9]. However, one could suggests that the proposed FGDT model is superior to the ISVM try many other machine learning approaches for the same task model for this problem for most of the datasets with selected fea- after incorporating our proposed fuzzy granular transformation tures. However, for the dataset with only FS as feature set, the approach as a step in feature engineering before training of the performance of FGDT is far lower than the ISVM model. This is learning model. probably because of the fact that the FS dataset is quite difficult to be separable which could be revealed from a sample plot between 5 CONCLUSIONS F1 vs F2 of the FS feature set (see Figure 2). The uncertainty arising in the acquired data from the environment in the process of continuous home monitoring has been handled by developing a new fuzzy granular approach that has been combined with a decision tree for decision making. The proposed model FGDT has been evaluated with regard to experimentally validated data from continuous monitoring environment. The performance of our FGDT model is found to be superior to a recently published work on the same problem. Moreover, the statistical results also show that FGDT has not only better discrimination capability but also it is quite stable and consistent. Although the present work discusses the applicability of fuzzy modeling towards the uncertainty handling in the aspect of envi- ronmental monitoring systems; there are numerous possible appli- cations of such techniques in real-world problems such as medical diagnosis [10], robotic navigation control mechanisms [11, 12], handling uncertainty in software testing [13]. All these mentioned real-world problems may not always be subject to precise computa- Figure 2: Scatter plot between F1 and F2 tion, and hence uncertainty handling methods such as fuzzy gran- ular decision tree (FGDT) approach would aid to achieve greater performance of the intended adaptive systems. 4 DISCUSSION The FGDT algorithm does handle uncertainty in the acquired data REFERENCES [1] Joey F Boatman and Bryan S Reichel. Environment monitoring system, April 6 by transforming it to a feature space that is higher in dimension 1999. US Patent 5,892,690. than that of the original data. This could led the decision tree [2] Ramon Huerta, Thiago Mosqueiro, Jordi Fonollosa, Nikolai F Rulkov, and Irene classifier capture the underlying relationship between the inputs Rodriguez-Lujan. Online decorrelation of humidity and temperature in chemical sensors for continuous monitoring. Chemometrics and Intelligent Laboratory and outputs in a better way. This is evident from the fact that Systems, 157:169–176, 2016. the higher dimension of the transformed data makes the visibility [3] Sushmita Mitra, Rajat K De, and Sankar K Pal. Knowledge-based fuzzy mlp for classification and rule generation. IEEE Transactions on Neural Networks, sparse and hence drawing a clear boundary between two different 8(6):1338–1350, 1997. classes becomes easy [5], which could not have been possible by [4] Tirtharaj Dash, Sanjib Kumar Nayak, and HS Behera. Hybrid gravitational using SVM classifier as used in [2]. More specifically, when a lower search and particle swarm based fuzzy mlp for medical data classification. In Computational Intelligence in Data Mining-Volume 1, pages 35–43. Springer, 2015. dimensional input space is projected onto a higher dimensional [5] Thomas M Cover. Geometrical and statistical properties of systems of linear in- space, the sparsity in the projected space increases and thereby equalities with applications in pattern recognition. IEEE transactions on electronic increasing the chance of learning a possibly-optimal hyperplane computers, (3):326–334, 1965. [6] Leo Breiman, Jerome Friedman, Charles J Stone, and Richard A Olshen. Classifi- that would be serving as the boundary among various resulting cation and regression trees. CRC press, 1984. groups or classes of patterns. Technically, the distance between [7] Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, and David Suter. Fast supervised hashing with decision trees for high-dimensional data. In any pattern in the project space and the learned hyperplane does Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, improve which makes a strong impact on the training of the decision pages 1963–1970, 2014. tree learner. This interpretation could probably be true because of [8] Siddharth Dinesh and Tirtharaj Dash. Reliable evaluation of neural network for multiclass classification of real-world data. arXiv preprint arXiv:1612.00671, 2016. the fact that a more generalized model that is built after training and [9] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, validation would possess a higher capability of generating accurate Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Towards Continuous Monitoring of Environment under Uncertainty: A Fuzzy Granular Decision Tree Approach DIAS 2017, , Jaipur, India Table 1: Performance (accuracy) of FGDT with different feature set for 5CV and independent test Features→ RS RS, T, H FS FS, RS FS, t0, dt Fold CV Test CV Test CV Test CV Test CV Test 1 96.99 97.01 97.03 97.03 37.30 37.32 96.96 96.94 99.10 99.09 2 96.95 96.99 96.96 97.03 37.20 37.32 96.95 96.97 99.06 99.09 3 96.91 97.01 96.96 97.04 37.27 37.31 96.99 96.96 99.06 99.09 4 96.97 97.00 96.99 97.04 37.36 37.32 97.01 96.98 99.11 99.09 5 96.93 97.01 96.96 97.04 37.38 37.32 96.90 96.98 99.11 99.09 average 95.95 97.01 96.98 97.04 37.30 37.33 96.96 96.97 99.10 99.09 ±std.dev. ±0.03 ±0.01 ±0.03 ±0.01 ±0.01 ±0.01 ±0.04 ±0.01 ±0.02 ±0.01 Table 2: Comparison of FGDT with results obtained by Huerta et al. [2] ISVM [2] FGDT Feature Set CV Test CV Test RS 78.5 76.5 95.95 ± 0.03 97.01 ± 0.01 RS, T, H 73.3 71.1 96.98 ± 0.03 96.98 ± 0.03 FS 72.4 71.2 37.30 ± 0.01 37.33 ± 0.01 RS, FS 82.6 80.9 96.96 ± 0.04 96.97 ± 0.01 FS, t0, dt – – 96.96 ± 0.04 96.97 ± 0.01 Vincent Dubourg, et al. Scikit-learn: Machine learning in python. Journal of network. In Proceedings of the 2nd international conference on perception and Machine Learning Research, 12(Oct):2825–2830, 2011. machine intelligence, pages 196–200. ACM, 2015. [10] Tanistha Nayak, Tirtharaj Dash, D Chandrasekhar Rao, and Prabhat K Sahu. [12] Tirtharaj Dash. Automatic navigation of wall following mobile robot using Evolutionary neural networks versus adaptive resonance theory net for breast adaptive resonance theory of type-1. Biologically Inspired Cognitive Architectures, cancer diagnosis. In Proceedings of the International Conference on Informatics 12:1–8, 2015. and Analytics, page 97. ACM, 2016. [13] Salman Abdul Moiz. Uncertainty in software testing. In Trends in Software [11] Tirtharaj Dash, Tanistha Nayak, and Rakesh Ranjan Swain. Controlling wall Testing, pages 67–87. Springer, 2017. following robot navigation based on gravitational search and feed forward neural