INTRODUCTION

A tool for emergency detection with deep learning neural networks

Emanuele Cipolla

emanuele.cipolla@icar.cnr.it 0

Riccardo Rizzo

riccardo.rizzo@icar.cnr.it 0

Dario Stabile

dario.stabile@icar.cnr.it 0

Filippo Vella

filippo.vella@icar.cnr.it 0 0 Institute for High Performance Computing and Networking - ICAR National Research Council of Italy - Palermo , Italy

The ubiquitous presence of sensor networks, control units and detection devices allows for a signi cant availability of data. The increased computational power also encourages a wider development of deep neural networks that represent data in multiple levels of abstraction. In this contribution we present a tool that process the daily precipitation amount in Tuscany region and the emergency situations reported in web news, in order to detect emergency situations. The results are encouraging and show how machine learning can help in predicting emergency situations and to reduce the impact of critical situations.

INTRODUCTION

The possibility to collect and store large amount of data captured by sensor networks enables the search of connections among data and the e ects of natural events that generate great damages to people and things. Here we aim at detecting emergency situations processing data sampled by a set of pluviometers through Deep Convolutional Neural Network. A recent survey about Deep Neural Networks has been published by Schmidhuber [ 1 ]. Many di erent application domains are taken into account in the work, with the notable exception of meteorological emergency alerts and risk. Kang et al. proposed a system for the emergency alert system based on a deep learning architecture that takes as input images from closed circuits camera images and detects events bound to re or car accidents [ 2 ]. Basha et al. use data from a sensor network to predict river ood through linear regression models [ 3 ]. The emergency situations can also be characterized as outliers in a network of sensors analyzed as a minimum spanning tree [ 4 ].

In this work we considered the data from a set of pluviometers and we desire to assess if the given pattern in the input will produce a sort of emergency or not. We use a new deep convolutional architecture and compared it with a more traditional neural network as the multilayer Perceptron(MLP) in order to understand whether there are applications that require one or the other kind of network in the eld of time-series processing.

We trained the networks using freely available measurements by Servizio Idrogeologico Regionale della Toscana (SIR)1, gathered by sensor networks with emergency noti cations commonly found online newspapers and weblogs. We tested the technique on a dataset of Tuscanian meteorological data ranging from 2012 to 2014, and we have compared these values with the emergency detection in the same region along the same years, with promising results. [5]

The paper is organized as follows. The next section presents a description of Deep Learning Neural Networks. In Section 3 we describe the approach used for the classi cation of emergencies and the operations of pre-processing for the construction of the dataset, in Section 4 the experimental results are presented. Finally, in Section 5 discusses the future directions of this work. 2

NEURAL NETWORKS FOR EMERGENCY

CLASSIFICATION Deep-learning methods typically employ from 5 to 20 non-linear modules that extracts a set of features from the input and transfer them to the next module. [6] The weights of the layers of features are learned directly from data, allowing to discover intricate structures in high-dimensional data, regardless of their domain (science, business, etc.). With this mechanism very complex functions can be learned combining these modules: the resulting networks are often very sensitive to minute details and insensitive to large irrelevant variations. 2.1

MLP Multilayer Perceptron

A multilayer perceptron (MLP) is a feedforward network that maps sets of input data onto a set of appropriate outputs; it consists of at least three layers of fully connected nodes in a directed graph: an input layer, an hidden layer and an output layer. Except for the input nodes, each node is a neuron (or processing element) with a nonlinear activation function - usually a sigmoid, or the hyperbolic tangent, chosen to model the bioelectrical behaviour of biological neurons in a natural brain. Learning occurs through backpropagation algorithm that modi es connections weights in order to minimize the di erence between the actual network ouput and the expected result. 2.2

CNN-Convolutional Neural Network

Convolutional Neural Networks (CNN) are a variant of multilayer perceptrons inspired by visual mechanisms found in living organisms, where arrangements of dedicated cells of very di erent complexities are found: each one of them is assigned to small, overlapping sub-regions of the visual eld. This processing behaviour can be reproduced using a convolutional lter over a given signal, hence the name of the network con guration. 1 http://www.sir.toscana.it/

With respect to MLP, neurons in CNN are arranged in three dimensions, and only some of the layers they form are fully connected to each other; connectivity patterns re ecting spatial local correlation are actively sought. Moreover, to achieve translational invariance - very useful in image processing - each lter is replicated across the entire visual eld.

In a CNN we can nd three kinds of layers: { Convolutional layers are driven by a set of learnable lters with very speci c purpose spanning over the whole input matrix. The convolution of each lter across the input gives an activation map as output that is used to determine which feature is detected by a given neuron; { Pooling layers perform a non-linear downsampling to progressively reduce the spatial size of the representation in order to use less parameters and computations and prevent over tting. It is common to use a pooling layer between 2 convolutional layers: the dropout technique is often used. { ReLU layers increase the non-linearity of the decision function by applying the activation function f (x) = max(0; x). Other functions may be used. { Fully connected layers are stacked after several nonlinear layers to perform high-level reasoning 3

EMERGENCY DETECTION CLASSIFICATION THROUGH

A neural network-based approach is presented for the detection of emergency situations, through rainfall level measurements. The approach is based on the training of a neural network with a set of data relating to the rainfall level measurements coming from a network of sensors, together with a series of emergency noti cations that are commonly found in online newspapers and weblogs. A neural network uses a mathematical pattern recognition paradigm to learn complex interactions between inputs and outputs. The purpose of the implemented neural network is to detect potential risk situations, taking the rainfall levels as input.

The methodology is applied to actual data obtained from a set of hundreds of meteorological stations placed in Tuscany, made available by SIR-Toscana in 2012-2014 period. This sensor and surveillance network, can provide both real-time and historic samples from hydrometric, pluviometric, thermometric, hygrometric, freatimetric and mareographic sensors, allowing a general characterization of hydroclimatic phenomena.

In this work we focused on data relating to the rainfall levels. We assumed that traces of past emergency situations can be found in the World Wide Web as online newspaper articles, forums or personal blog. We collected two set of words that we used to compose the queries in the web: the set A is formed by key words about hydrogeological emergencies such as: esondazione (over ow), violento temporale (cloud burst), diluvio (deluge), allagamento ( ooding), inondazione ( ood), rovinosa tempesta (severe storm), violento acquazzone (violent downpour); the set B is formed by the names of the cities in the Tuscany region such as: Firenze, Pisa, Livorno, Grosseto, Lucca, Siena, Massa, Carrara, Pistoia. We have automatically queried the Bing™search engine, through its Search API, using keywords in the set given by the Cartesian product of the set A for the set B; C = A B. We dated the resulting pages using a supervised approach, after duplicate URLs had been removed. An approach based on regular expressions applied over the text-only description extracted by the Bing bot has rst been used, with mixed outcomes. Excluding erroneous and mixed matches, there was no real guarantee that the writers would not have altered a date for their own reasons so, a further control based on the Last-Modi ed HTTP header has been used. Finally, we used the subset of search results that employed so-called pretty URLs, in particular those with day, month and year information separated by forward slashes, as they require a little more expertise to get altered after publication.

To de ne the pattern classi cation problem, we have arranged a set of input vectors as rows in a matrix. Each row contains a label for the input sequence (typically its detection date), a k-day-long sequence of measurements for all the stations in a given area, a label to indicate emergency/not emergency and a label to identify the quadrant in which emergency is veri ed.

In order to test the behaviour and e cacy of the Convolutional Neural Network, 4 di erent experiments were performed: { The rst experiment, called \original" in the following section, uses raw data without any preprocessing; { The second experiment uses a \balanced" version of the original dataset having an equal number of emergencies and \quiet days". This dataset has been randomly formed discarding a set of negative days in order to have a comparable number between the positive and negative examples. { The third,\quantized" experiment, was performed on the result of the use the Adaptive Extended Local Ternary Patterns (AELTP) quantization algorithm on the original dataset.This processing tends to enhance the di erences among near values and highlight the derivatives [7]. { The fourth and nal \balanced-quantized" experiment of tests combines the balancing technique of the second experiment and the quantization of the third.

To implement the above architectures and perform the tests, we used Keras, an high-level Python neural networks library, capable of running on top of two of the most important libraries for numerical computation used for deep learning: TensorFlow and Theano. The use of higher level libraries like Keras allows developers and data scientists to rapidly produce and test prototypes, while relaying most implementation details to the chosen lower level library.

Modular structure facilities to build both convolutional networks and recurrent networks and these are available as well as combinations of the two. The models built with Keras are understood as a sequence of modules like neural layers, cost functions, optimizers, initialization schemes, activation functions and regularization schemes, that can be plugged together.

The convolutional neural network we adopted takes as input the collection of the pluviometric data for a single day in form of a matrix. In g.1 is shown a schematic representation of the network.

The net architecture has two main convolutional stages followed by the subsampling and a fully connected stage. The rst convolutional stage is performed with thirty two kernels with size 3x3 followed by a processing with recti ed linear units and a dropout with parameter equal to 0.25. The second convolution stage is performed with sixty four kernels with size 3x3 followed by a processing with recti ed linear units. Before the fully connected step a dropout with parameter equal to 0.25 is performed. The last stage of the net, with fully connected units is formed by linear recti ed units followed by a dropout step with parameter equal to 0.5 and a set of units with softmax activation function. 4

EXPERIMENTAL RESULTS

Experiments were conducted using di erent batch sizes in the training phase to evaluate if the training is stable versus di erent training settings. The experimental results are reported in an tabular form in tables 1 and 2 where the results in terms of Accuracy, Precision, Recall and F1 score. For these quantities the mean and the variance, while the batch size varied, have been calculated. In 1, the experiments with the original dataset and the dataset with quantized values, obtained through [7] algorithm, are compared. The results with the original dataset are better than the results with the \quantized" dataset for both networks. The CNN does not detect any emergency (positive) sample showing that this processing is negative for the learning performance of these models. Accuracy is high for both the models and also for the experiment with the quantized dataset. Somehow, since the dataset is strongly asymmetrical, this parameter Original dataset loses importance as the overwhelming number of negative samples hides the performance on the limited set of positive samples. It can be seen that the MLP network has a better performance than the CNN, obtaining a maximum peak of F1 score of 31.8% with batch-size equal to nine.

Although the performance of the MLP decreases a using quantized dataset (third experiment), its performances are still higher compared to CNN that has not been able to recognize any true positive and false positive, because the network did not properly learn.

Result of the other experiments are shown in table 2. Using the balanced dataset, with a number of positive samples equal to the number of the negative samples that have been randomly reduced, the CNN network always has better performance than the MLP network, obtaining a maximum peak with batch-size equal to twelve.

Considering the F1 measure alone, a synthetic value for the result the value of 71.9% is the highest value for all the experiments and let us draw the consequence Balanced dataset

Balanced-Quant.

dataset that the CNN, although with an increased computation cost, has the best results for this problem. In general F1 has been chosen since its value takes into account Precision and Recall values and allows a general comparison. A classi cation with Support Vector Machines [8] has been done and the results are shown in table 3. The values of F1 score is increased when the balanced dataset is used. Moreover, the performance with the balanced and quantized dataset obtained a value that is comparable. In both cases the results with SVM (55.2% and 48.7%) are lower if compared with the CNN results where a balanced dataset is used showing that a linear separation is not the best solution for this problem.

Figure 2 shows the plots of the F1 measure for all experiments with the neural networks. The plots are just a more detailed version of the values shown in the above tables and they show how the best results are got the the CNN with a balanced dataset and in particular with a batch size equal to twelve but also with other values of the batch size the F1 is higher that 60% and in any case the performance of CNN overcome the values of the MLP. For all the other experiments the values obtained with MLP are more stable and the statistics of the box plot show values that are near the average value. The results of the CNN are more variable and can be also less good than the result of the MLP. The experimental set with CNN when a balanced dataset is used allows the best result and should be chosen for this kind if problems. 5

CONCLUSIONS

Looking at overall experimental results, we can say that, with this speci c dataset and the quantization method that we have chosen, both types of neural network showed a signi cant loss of performance in the classi cation of emergencies. The best results are obtained when a reduced number of negative samples are used and the number of positive samples and the negative samples is quite similar. Since the dataset is unbalanced we cut o a set of negative samples so that the quantity of samples in the two set is equal. In this case less in more, since the results are the best we obtained limiting the over tting over the negative samples and suitably learning the positive (emergency) cases. Moreover, in this rst approach we worked with data from a wide geographic area. Given the promising results obtained, we would like to extend the tests using a larger dataset, for example relating to hourly measurements of rainfall, and using multiple parallel networks, to be able to classify the various stages that precede a possible emergency and alert a speci c geographic area. 5. E. Cipolla, U. Maniscalco, R. Rizzo, D. Stabile, and F. Vella, \Analysis and visualization of meteorological emergencies," Journal of Ambient Intelligence and Humanized Computing, pp. 1{12, 2016. [Online]. Available: http://dx.doi.org/10. 1007/s12652-016-0351-x 6. Y. LeCun, Y. Bengio, and G. Hinton, \Deep learning," Nature, vol. 521, no. 7553, pp. 436{444, 2015. 7. A. A. Mohamed and R. V. Yampolskiy, \Adaptive extended local ternary pattern (aeltp) for recognizing avatar faces," in Machine Learning and Applications (ICMLA), 2012 11th International Conference on, vol. 1. IEEE, 2012, pp. 57{62. 8. C. Cortes and V. Vapnik, \Support-vector networks," Machine learning, vol. 20, no. 3, pp. 273{297, 1995. with with

Original (c) MLP Dataset with

Quantized(d) CNN

Dataset with

Quantized (e) MLP Dataset with balanced(f) CNN

Dataset with balanced (g) MLP with Quantized and(h) CNN with Quantized and balanced Dataset balanced Dataset

Schmidhuber , \ Deep learning in neural networks: An overview," Neural Networks , vol. 61 , pp. 85 { 117 , 2015 .

Kang and

Choo , \ A deep-learning-based emergency alert system," ICT Express , 2016 .

E. A.

Basha ,

Ravela , and

Rus , \ Model-based monitoring for early warning ood detection," in Proceedings of the 6th ACM conference on Embedded network sensor systems. ACM , 2008 , pp. 295 { 308 .

Cipolla and

Vella , \ Identi cation of spatio-temporal outliers through minimum spanning tree," in Signal-Image Technology and Internet-Based Systems (SITIS), 2014 Tenth International Conference on. IEEE , 2014 , pp. 248 { 255 .