=Paper=
{{Paper
|id=Vol-2769/63
|storemode=property
|title=Monitoring Social Media to Identify Environmental Crimes through NLP. A preliminary study
|pdfUrl=https://ceur-ws.org/Vol-2769/paper_63.pdf
|volume=Vol-2769
|authors=Raffaele Manna,Antonio Pascucci,Wanda Punzi Zarino,Vincenzo Simoniello,Johanna Monti
|dblpUrl=https://dblp.org/rec/conf/clic-it/MannaPZSM20
}}
==Monitoring Social Media to Identify Environmental Crimes through NLP. A preliminary study==
Monitoring Social Media to Identify Environmental Crimes through NLP
A Preliminary Study
Raffaele Manna, Antonio Pascucci, Wanda Punzi Zarino, Vincenzo Simoniello, Johanna Monti
UNIOR NLP Research Group
University “L’Orientale”
Naples, Italy
[rmanna, apascucci, jmonti]@unior.it
[w.zarino, vincenzosimoniello]@gmail.com
Abstract our aim is to monitor social media in order to de-
tect environmental crimes.
This paper presents the results of research Our research is guided by the following question:
carried out on the UNIOR Eye corpus, can Natural Language Processing (NLP) represent
a corpus which has been built by down- a valuable ally to identify these kinds of crimes
loading tweets related to environmental through the monitoring of social media? For this
crimes. The corpus is made up of 228,412 purpose, we compiled a corpus of tweets starting
tweets organized into four different sub- from a list of 41 terms related to environmental
sections, each one concerning a specific crimes, e.g. combustione illecita (illicit combus-
environmental crime. For the current tion), rifiuti radioattivi (radioactive waste), dis-
study we focused on the subsection of carica abusiva (illegal dumping), and we used the
waste crimes, composed of 86,206 tweets Twitter API to download all the tweets (specifi-
which were tagged according to the two la- cally 228,412) related to these terms introduced
bels alert and no alert. The aim is to build by hashtag. In this research, a special focus is
a model able to detect which class a tweet dedicated to the tweets related to La terra dei
belongs to. fuochi (literally the Land of Fires) (Peluso, 2015),
a large area located between Naples and Caserta
(in the South of Italy) victim of illegal toxic wastes
dumped by organized crime for about fifty years
1 Introduction and routinely burned to make space for new toxic
wastes.
In the current era, social media represent the
In order to achieve our purpose, we trained differ-
most common means of communication, espe-
ent machine learning algorithms to classify report
cially thanks to the speed with which a post can
emergency text and user-generated reports. The
go viral and reach in no time every corner of the
paper is organized as follows: in Section 2 we dis-
globe. The speed with which information is pro-
cuss Related Work, in Section 3 we present the
duced creates an abundance of (linguistic) data,
UNIOR Earth your Estate (UNIOR Eye) corpus.
which can be monitored and handled with the use
The case study is described in Section 4 and Re-
of hashtags (#). Hashtags are user-generated la-
sults are discussed in Section 5. Conclusions are
bels, which allow other users to track posts with a
in Section 6 along with directions for Future Work.
specific theme on Twitter. Moreover, social media
such as Twitter can be powerful tools for identi- 2 Related Work
fying a variety of information sources related to
people’s actions, decisions and opinions before, As previously mentioned, hashtags are one of the
during and after broad scope events, such as en- most important resources - if not the most impor-
vironmental disasters like earthquakes, typhoons, tant - in text data such as those of Twitter. The pos-
volcanic eruptions, floods, droughts, forest fires, sibility to aggregate data according to their content
landslides (Imran et al., 2015; Maldonado et al., allows users to monitor all the discussion about a
2016; Corvey et al., 2010). In light of the above, specific subject in real-time (an emblematic case
is the hashtag #Covid 19).
Copyright c 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0 Concerning the topic of our research, namely en-
International (CC BY 4.0). vironmental issues, the most representative and
productive hashtags have proved to be #terradei- such as preparation and evacuation.(Imran et al.,
fuochi and #rifiuti (respectively with a frequency 2016) presented Twitter corpora composed of over
of 92,322 and 62,750 occurrences), that directly 52 million crisis-related tweets, collected during
refer to circumstances that have a strong impact 19 different crises that took place from 2013 to
on the environment and people’s health. The use 2015. These corpora were manually-annotated
of hashtags proved to be useful in monitoring nat- by volunteers and crowd-sourced workers provid-
ural disasters, such as earthquakes, flood and hur- ing two types of annotations, the first one related
ricane. to a set of categories, the second one concern-
For a survey on information processing and man- ing out-of-vocabulary words (e.g. slangs, places
agement of social media contents to study natu- names, abbreviations, misspellings). The schol-
ral disasters, see (Imran et al., 2016). (Neubig et ars then built machine-learning classifiers in or-
al., 2011) focused on the 2011 East Japan earth- der to demonstrate the effectiveness of the anno-
quake. The scholars built a system able to ex- tated datasets, also publishing word2vec word em-
tract the status of people involved in the disaster beddings trained on more than 52 million mes-
(e.g. if they declared to be alive, they request for sages. The preliminary results of this study posit
help, their information requests, information about that a classification with a high precision of tweets
missing people). About one hundred scholars par- relevant to the disaster is possible to assist crisis
ticipated spontaneously in the project ANPI NLP managers and first responders. Our study is not
(ANPI means Safety in Japanese) and the results devoted to monitor natural disasters but to moni-
show convincing performances by the classifier tor natural human-caused disasters. More specifi-
they built. (Maldonado et al., 2016) investigated cally, the aim is to exploit NLP techniques to con-
natural disasters in Ecuador, monitoring Twitter tribute to the identification of intentional environ-
to filter contents according to four different cate- mental crimes through social media analysis. To
gories: volcanic, telluric, fires and climatological. the best of our knowledge, this perspective of in-
The filtering process is based on keywords related vestigation is rather novel in the field.
to the four categories. The scholars released a web
application that graphically shows the database 3 The UNIOR Eye Corpus
evolution. The efficiency of the tweet filtering al-
gorithm that they developed is expressed in terms This section outlines the way the UNIOR Eye cor-
of precision (%93.55). (Tarasconi et al., 2017) in- pus was created and how it is internally structured.
vestigated tweets related to eight different event The research has been carried out in the frame-
types (floods, wildfires, storms, extreme weather work of the C4E - Crowd for the Environment
conditions, earthquakes, landslides, drought and (Progetto PON Ricerca e Innovazione 2014-2020)
snow) in Italian, English and Spanish. The cor- project2 .
pus is composed of 9,695 tweets and can be ex- The UNIOR Eye corpus is made up of 228,412
tremely useful to perform information extraction tweets related to environmental crimes down-
in the aforementioned three languages. (Sit et loaded through Twitter API, covering the period
al., 2019) used the Hurricane Irma, which devas- from 01 January 2013 to 06 August 2020. The
tated Caribbean Islands and Florida in September compilation phase of the corpus was divided into
2017, as a case-study: the scholars demonstrate two steps: the creation of a vocabulary containing
that by monitoring tweets it is possible to detect keywords related to environmental crimes and the
potential areas with high density of affected indi- creation of the corpus. During this work phase, the
viduals and infrastructure damage throughout the data was structured and organized according to the
temporal progression of the disaster. By focus- different keywords, obtained from glossaries and
ing on tweets generated before, during, and after documents specific to the topic.
Hurricane Sandy, a superstorm which severely im- Precisely, the following resources
pacted New York in 2012, (Stowe et al., 2016) • Glossario di termini sull’ambiente (FIMP, 2017) (a
proposed an annotation schema to identify rele- guide from A to Z concerning the complex issue of en-
vant Twitter data (within a corpus of 22.2M unique vironmental pollution);
tweets from 8M unique Twitter users), catego- 2
http://www.unior.it/ateneo/20574/1/c4e-crowd-for-the-
rizing these tweets into fine-grained categories, environment-progetto-pon-ricerca-e-innovazione-2014-
2020.html
• Glossario dinamico per l’Ambiente ed il Paesaggio (IS- dogs in illegal dump: two pets among hazardous waste,
PRA, 2012) (a glossary supplied by the Italian Institute asbestos and gas cylinders URL)
for Environmental Protection and Research);
• Glossario ambientale3 (a glossary supplied by the na- After this phase it was possible to create the corpus
tional agency for the environmental protection of Tus- by downloading from Twitter all the tweets con-
cany);
taining these keywords preceded by the hashtag.
• BeSafeNet4 (a glossary based on the Glossary on Emer- These hashtags helped us to gather the informa-
gency Management, which has been developed in 2001 tion needed to detect crimes against the environ-
by European Centre of Technological Safety (TESEC)
of Euro-Mediterranean network of Centres EUR-OPA ment. More specifically, the corpus is internally
Major Hazard Agreement of Council of Europe in col- divided into semantic areas, each one concerning
laboration with other centres of network); a specific environmental crime: rifiuti e terra dei
• HERAmbiente5 (a glossary provided by Herambiente, fuochi (waste and Terra dei fuochi); reati contro le
the largest company in the waste management sector); acque (water-related crimes); materiali e sostanze
• Enciclopediambiente6 (the first freely available online pericolose (hazardous substances and materials);
Encyclopedia on the Environment, designed by a group incendi e roghi ambientali (environmental fires).
of four engineers with the aim of spreading “environ- These sets are further divided into more specific
mental knowledge”)
subsets, e.g. the folder reati contro le acque
and the following two web sources (water-related crimes) contains the subsets acque
di scarico, acque reflue, fiumi inquinati, liquami
• a dossier containing important provisions aimed at
dealing with environmental and industrial emergencies
(sewage, wastewater, polluted rivers, slurry). The
and encouraging the development of the affected ar- resulting corpus contains, therefore, a total of
eas7 ; 228,412 tweets, 22,780,746 tokens, 569,905 types
• a document on environmental crimes and environmen- with a type/token ratio (TTR) of 0.025.
tal protection8 .
4 Case Study
were consulted. All of these language re-
This section describes the steps taken to perform
sources contain information and definitions of
the preliminary experiments on a selected part
the basic terms related to environmental disas-
of the UNIOR Eye corpus. First, the dataset on
ters and crimes, e.g. Rifiuti pericolosi (hazardous
which the experiments and data preparation oper-
waste): waste products which can generate poten-
ations were carried out is presented, then the pre-
tial/substantial risk to human health/the environ-
processing steps are listed and, finally, the differ-
ment if handled improperly. Hazardous waste con-
ent machine learning approach used are described.
tains at least one of these characteristics: flamma-
bility, corrosivity, or toxicity,9 and is included in 4.1 Dataset
special lists. Here are some examples.
As described in Section 3, the UNIOR Eye cor-
• HASHTAG HASHTAG Fiumicino: eternit e rifiuti pus is divided into four semantic areas related to
pericolosi al Passo della Sentinella URL HASHTAG the most common crimes against the environment.
(HASHTAG HASHTAG Fiumicino: eternit and haz- Among the four semantic areas, we decided to use
ardous waste in Passo della Sentinella URL HASH- the waste crimes subsection to test a specific use
TAG); case: whether an NLP system can understand and
report emergency text and user-generated reports.
• Cani in gabbia in discarica abusiva: Due animali tra Therefore, for the experiments described in this
rifiuti pericolosi, amianto e bombole gas URL (Caged paper, we focus our investigation on a sub-section
3
http://www.arpat.toscana.it/glossario-ambientale of the UNIOR Eye corpus, namely tweets about
4
http://www.besafenet.net/it-it/glossary waste related crimes and tweets with the hashtag
5
http://ha.gruppohera.it/glossario ambiente/ #terradeifuochi contained in the corresponding se-
6
http://www.enciclopediambiente.com
7 mantic area: waste and Terra dei fuochi. This sub-
https://www.senato.it/japp/bgt/showdoc/17/
DOSSIER/0/740667/index.html?part=dossier dossier1- section of the corpus contains 86,206 tweets. First,
sezione sezione12-h2 h28 for the total number of tweets, hashtags, mentions
8
https://scuola21.fermi.mn.it/documenti/reati ambientali. and URLs are replaced with placeholder words.
pdf
9
https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/ Then tweets were annotated by the paper authors
?uri=OJ:C:2018:124:FULL&from=IT on the basis of two labels: i) alert and ii) no alert,
i.e. if the tweet contains or not a message aimed at in IAA according to the strength of agreement cri-
reporting and locating a waste related crime. teria described in (Landis and Koch, 1977) for
Below, we provide a sample of annotated tweets each pair of annotators; whereas, for agreement
following our two labels, alert - no alert: among four annotators, we follow the suggested
standard in (Krippendorff, 2004). The calculated
• Ore 11:40 autostrada A1 altezza Afragola Acerra di- value of Krippendorff’s α is 0.706. Considering
rezione Roma. Roghi Tossici indisturbati, la HASH-
TAG... URL HASHTAG HASHTAG (11:40 am A1 mo- the standard value in (Krippendorff, 2004), our
torway near Afragola Acerra towards Rome. Undis- value of α=0.706 is considered as acceptable and
turbed toxic fires, the HASHTAG ... URL HASHTAG
HASHTAG) — ALERT
expressing a good data reliability. In Table 1 we
show the results for pairs of annotators.
• MENTION ministro, piuttosto che pensare alla HASH-
TAG pensi ai continui roghi MENTION (MENTION Pair of annotators Value of κ
Minister, rather than thinking about the HASHTAG
think about the continuous fires MENTION) — NO a1 - a2 0.691
ALERT a1 - a3 0.742
a1 - a4 0.841
During the annotation phase, we noted that the no a2 - a3 0.676
alert class is the one which contains the majority a2 - a4 0.644
of tweets and includes examples of hate speech, a3 - a4 0.641
satirical texts, news about emergency actions as
well as politically oriented texts. Consequently, Table 1: Cohen’s κ values for pairs of annotators.
our dataset built in this way is unbalanced for the
two classes, counting 81,235 tweets for the no According to (Landis and Koch, 1977), five out of
alert class and 4,970 alert tweets. In order to vi- six Cohen’s κ values show a “substantial” strength
sualize alert tweets, we exploit Carto10 , a cloud of agreement for each pair; while a pair (a1-a4)
computing platform that provides a geographic in- show a κ value considered “almost perfect” in the
formation system, web mapping, and spatial data research cited.
science tools11 .
4.3 Preprocessing
4.2 Inter-annotator Agreement Before feeding the machine learning algorithms,
When different annotators label a corpus, it is im- some pre-processing steps are performed. Since
portant to calculate the inter-annotator agreement the majority of mentions and hashtags are shared
(IAA) with a twofold objective: i) make sure that by both alert and no-alert samples, we focus on
annotators agree and ii) test the clarity of guide- the tweet itself, by removing any reference to peo-
lines. As previously mentioned, the dataset (com- ple, entities and organizations conveyed through
posed of 86,206 tweets) has been annotated by hashtags and mentions. Therefore, the placeholder
four of the paper authors on the basis of two labels: words related to hashtags, URLs and mentions
i) alert and ii) no alert. This implies that each au- are removed. Then, punctuation is removed from
thor annotated about 21,000 tweets. Then, to cal- the tweets along with a custom list of function
culate inter-annotator agreement we randomly se- words such as determiners, prepositions and con-
lected 10% of the tweets (i.e. 8,620) which were junctions. Finally, the tweets are lower-cased and
tagged by all annotators. the tokenization is performed.
The agreement among the four annotators is mea-
4.4 Machine Learning Approaches
sured using Krippendorff’s α coefficient; instead,
to estimate the agreement between pairs of anno- We set the problem of tweets related to waste
tators, we use Cohen’s κ coefficient (Artstein and crimes as a supervised binary classification prob-
Poesio, 2008). Taking into account the recommen- lem between different textual content.
dations set out in (Artstein and Poesio, 2008; Krip- To tackle the problem as first task within the C4E
pendorff, 2004), we interpret the κ values obtained Project, we select a machine learning approach
using Support Vector Machines (SVM) with lin-
10
carto.com ear kernel and C=1 and Multinomial Naive Bayes
11
A map showing toxic fires alert tweets in the UNIOR
Eye corpus is available at this link https://uniornlp.carto.com/ (MNB) as classification algorithms (Imran et al.,
builder/04f2cca9-08cd-4b9f-90cd-79fc0d93af42/embed 2015). Since the task concerns the classification
of tweets belonging to the alert class, to deal with validation, our SVM reaches an accuracy of 0.868
the unbalanced dataset, we use the undersampling with the mean and standard deviation of 0.008, in-
technique by automatically reducing the number stead the accuracy of the MNB is 0.841 with the
of samples for the majority class (no alert) (Li et mean and standard deviation of 0.010. In Table 2
al., 2009), until they were balanced with the sam- we show the performances achieved by both mod-
ples of the alert class. We used the tf-idf technique els.
to extract the features used by both algorithms. To
MNB Precision Recall F-Measure
build algorithms and extract features, we used the
alert 0.871 0.816 0.843
Python scikit-learn library.
no alert 0.807 0.864 0.835
In addition to the MNB and SVM with tf-idf tech-
nique, we built two models with sentence embed- SVM Precision Recall F-Measure
dings as features and SVM with the tuning of C alert 0.857 0.878 0.867
parameter as a classification algorithm. In the first no alert 0.883 0.862 0.873
model (FT-SVM), we used the Italian pre-trained
word vectors from fastText12 (Bojanowski et al., Table 2: Results in terms of Precision, Recall and
2017) to build our sentence embeddings by av- F-Measure.
eraging word embeddings for all tokens for each
Both classifiers with tf-idf achieve good accuracy
tweet; then, C=10 is found as the best C parameter
and seem to have a good ability to classify a con-
value using GridSearchCV13 instance. In the sec-
siderable amount of tweets providing good re-
ond model (mDB-SVM), we generated sentence
sults in terms of precision and recall. One of the
embeddings using the pretrained multilingual Dis-
reasons for these performances may be ascribed
tilBERT (Sanh et al., 2019) model from Trans-
to a discriminating lexical composition regarding
formers14 . To accomplish this, each tweet is rep-
the samples belonging to the alert and no alert
resented as a list of tokens and, then, each list
classes.
is padded to the same size (max len = 94). The
Regarding the accuracy of sentence embeddings
attention mask is used. Before fitting the sen-
models on the test set, FT-SVM reaches an accu-
tence embeddings thus constructed in the SVM
racy of 0.822, while mDB 0.774. By evaluating
classifier, it is searched for the best value of the
the predictive performance of the two models with
C parameter set to C=0.1. For both models (FT-
10-fold cross-validation, FT-SVM achieves an ac-
SVM and mDB-SVM) the pre-processing steps
curacy of 0.825 with the mean and standard devi-
described above are performed.
ation of 0.011, while mDB-SVM reaches the ac-
curacy of 0.773 with the mean and standard devi-
5 Results
ation of 0.013. In Table 3, the results in terms of
In this section, we show the results obtained by our Precision, Recall and F-Measure are shown.
models in terms of Precision, Recall, F-Measure
FT-SVM Precision Recall F-Measure
and Accuracy. For all models, the results are ob-
alert 0.826 0.817 0.821
tained on 30% of the dataset set aside as a test
no alert 0.818 0.827 0.822
set, keeping the samples balanced between the two
mDB-SVM Precision Recall F-Measure
classes. Furthermore our models were evaluated
alert 0.785 0.766 0.775
using a 10-Fold Cross-Validation15 .
no alert 0.765 0.783 0.774
As a baseline to compare with, we used Dummy
classifier which achieves an accuracy of 0.501. On
Table 3: Classification Reports for FT-SVM and
the test set, the SVM classifier achieves an accu-
mDB-SVM.
racy of 0.870, while for the MNB classifier it is
0.839. Regarding the evaluation by 10-fold cross Both models fed with sentence embeddings con-
12
https://fasttext.cc/docs/en/pretrained-vectors.html
structed with different techniques, seem to per-
13
https://scikit-learn.org/stable/modules/generated/ form well in this classification task. In partic-
sklearn.model selection.GridSearchCV.html# ular, the FT-SVM model based on sentence em-
14
https://huggingface.co/transformers/pretrained models. beddings built with FastText seems to have better
html
15
https://scikit-learn.org/stable/modules/generated/ scores in terms of Precision and F-measure than
sklearn.model selection.KFold.html those achieved by the mDB-SVM model. One
of the reasons could be that sentence embeddings ure 3 and Figure 4.
built with FastText benefit from a resource tailored
on the Italian language compared to a multilingual
one used in DBert-SVM. Specifically, mDB-SVM
achieved good results in terms of precision and f-
measure for the alert class. Instead, in terms of
Recall, both models have a high proportion of rel-
evant instances for the no alert class.
5.1 Confusion Matrices
In this section we show the four confusion ma-
trices in order to graphically display the perfor-
mances achieved by the different models. In Fig-
ure 1 we show the confusion matrix of the MNB
model, while in Figure 2 that of the SVM model.
Figure 3: FT-SVM model confusion matrix.
Figure 1: MNB model confusion matrix.
Figure 4: mDB-SVM model confusion matrix.
6 Conclusions and Future Work
We presented a case study within the C4E project
aimed at monitoring social media to provide sup-
port against environmental crimes. In particular,
we described the UNIOR Eye corpus, in some sec-
tions still in progress, on which we tested four
models with three different features extraction and
construction techniques on a part of the corpus.
We proposed two classifiers, namely SVM and
MNB, with tf-idf features as the first experiment;
then, SVM with C parameter tuning fed with sen-
Figure 2: SVM model confusion matrix. tence embeddings. These embeddings were built
both using Italian pre-trained fastText model and
The confusion matrices of the FT-SVM and the using pre-trained DistilBert multilingual model.
mDB-SVM model are shown respectively in Fig- Our purpose was to classify alert tweets related
to waste crimes vs no alert tweets. Future re- David Karol. 2018. Party polarization on environmen-
search will include the enlargement of the corpus, tal issues: toward prospects for change. Research
Paper. Niskanen Center, Washington, DC.
applications of NLP in the field of environmen-
tal protection as well as the analysis of contextual Klaus Krippendorff. 2004. Reliability in content
features related to environmental issues used as a analysis: Some common misconceptions and rec-
medium to polarize public opinion (Karol, 2018). ommendations. Human communication research,
30(3):411–433.
Acknowledgements J Richard Landis and Gary G Koch. 1977. The mea-
surement of observer agreement for categorical data.
This research has been carried out within the biometrics, pages 159–174.
framework of two Innovative Industrial PhD
projects supported by the PON Ricerca e Inno- Yaoyong Li, Kalina Bontcheva, and Hamish Cunning-
ham. 2009. Adapting svm for data sparseness and
vazione 2014/20 and the POR Campania FSE imbalance: a case study in information extraction.
2014/2020 funds and two research grants sup- Natural Language Engineering, 15(2):241–271.
ported by the PON Ricerca e Innovazione 2014/20
Miguel Maldonado, Darwin Alulema, Derlin Morocho,
in the context of the C4E project. and Marida Proaño. 2016. System for monitoring
Authorship contribution is as follows: Raffaele natural disasters using natural language processing
Manna is author of section 4. Section 2 is by An- in the social network twitter. In 2016 IEEE Interna-
tonio Pascucci. Section 5 is by Raffaele Manna tional Carnahan Conference on Security Technology
(ICCST), pages 1–6. IEEE.
and Antonio Pascucci. Sections 1, 3 and 6 are
by Wanda Punzi Zarino and Vincenzo Simoniello. Graham Neubig, Yuichiroh Matsubayashi, Masato
We are grateful to Prof. Johanna Monti for super- Hagiwara, and Koji Murakami. 2011. Safety in-
vising the research. formation mining—what can nlp do in a disaster—.
In Proceedings of 5th International Joint Conference
on Natural Language Processing, pages 965–973.
References Pasquale Peluso. 2015. Dalla terra dei fuochi alle
terre avvelenate: lo smaltimento illecito dei rifiuti
Ron Artstein and Massimo Poesio. 2008. Inter-coder in italia. Rivista di Criminologia, Vittimologia e Si-
agreement for computational linguistics. Computa- curezza, 9(2):13–30.
tional Linguistics, 34(4):555–596.
Victor Sanh, Lysandre Debut, Julien Chaumond, and
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Thomas Wolf. 2019. Distilbert, a distilled version
Tomas Mikolov. 2017. Enriching word vectors with of bert: smaller, faster, cheaper and lighter. arXiv
subword information. Transactions of the Associa- preprint arXiv:1910.01108.
tion for Computational Linguistics, 5:135–146.
Muhammed Ali Sit, Caglar Koylu, and Ibrahim Demir.
William J Corvey, Sarah Vieweg, Travis Rood, and 2019. Identifying disaster-related tweets and their
Martha Palmer. 2010. Twitter in mass emergency: semantic, spatial and temporal context using deep
What nlp can contribute. In Proceedings of the learning, natural language processing and spatial
NAACL HLT 2010 Workshop on Computational Lin- analysis: a case study of hurricane irma. Interna-
guistics in a World of Social Media, pages 23–24. tional Journal of Digital Earth, 12(11):1205–1229.
FIMP. 2017. Fimp ambiente - federazione italiana Kevin Stowe, Michael Paul, Martha Palmer, Leysia
medici pediatriglossario di termini sull’ambiente. Palen, and Kenneth M Anderson. 2016. Identify-
una guida dalla a alla z per orientarsi nel complesso ing and categorizing disaster-related tweets. In Pro-
tema dell’inquinamento ambientale. ceedings of The fourth international workshop on
natural language processing for social media, pages
Muhammad Imran, Carlos Castillo, Fernando Diaz, 1–6.
and Sarah Vieweg. 2015. Processing social media
messages in mass emergency: A survey. ACM Com- Francesco Tarasconi, Michela Farina, Antonio Mazzei,
puting Surveys (CSUR), 47(4):1–38. and Alessio Bosca. 2017. The role of unstructured
data in real-time disaster-related social media mon-
Muhammad Imran, Prasenjit Mitra, and Carlos itoring. In 2017 IEEE International Conference on
Castillo. 2016. Twitter as a lifeline: Human- Big Data (Big Data), pages 3769–3778. IEEE.
annotated twitter corpora for nlp of crisis-related
messages. arXiv preprint arXiv:1605.05894.
ISPRA. 2012. Ispra – l’istituto superiore per la pro-
tezione e la ricerca ambientale, glossario dinamico
per l’ambiente ed il paesaggio.