<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Long-term Social Media Data Collection at the University of Turin</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valerio Basile</string-name>
          <email>basile@di.unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mirko Lai</string-name>
          <email>mirko.lai@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuela Sanguinetti</string-name>
          <email>msanguin@di.unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Turin</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We report on the collection of social media messages - from Twitter in particular in the Italian language that is continuously going on since 2012 at the University of Turin. A number of smaller datasets have been extracted from the main collection and enriched with different kinds of annotations for linguistic purposes. Moreover, a few extra datasets have been collected independently and are now in the process of being merged with the main collection. We aim at making the resource available to the community to the best of our possibility, in accordance with the Terms of Service provided by the platforms where data have been gathered from.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>(Italian) In questo articolo descriviamo il
lavoro di raccolta di messaggi — da
Twitter in particolar modo — in lingua italiana
che va avanti in maniera continuativa dal
2012 presso l’Universita` di Torino.
Diversi dataset sono stati estratti dalla
raccolta principale ed arricchiti con
differenti tipi di annotazione per scopi
linguistici. Inoltre, dataset ulteriori sono stati
raccolti indipendentemente, e fanno ora parte
della raccolta principale. Il nostro scopo e`
rendere questa risorsa disponibile alla
comunita` in maniera piu` completa possibile,
considerati i termini d’uso imposti dalle
piattaforme da cui i dati sono stati estratti.</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction</title>
      <p>The online micro-blogging platform Twitter1 has
been a popular source for natural language data
since the second half of the 2010’s, due to the
enormous quantity of public messages exchanged
by its users, and the relative ease of collecting
them through the official API.</p>
      <p>Many researchers implemented systems to
collect large datasets of tweets, and share them
with the community. Among them, the
Contentcentered Computing group at the University of
Turin2 is maintaining a large, diversified
collection of datasets of tweets in the Italian language3.
However, although the Twitter datasets in Italian
make the majority of our collection, over the years,
and also in the recent past, several resources have
been created in other languages and including data
retrieved from other sources than Twitter.</p>
      <p>In this paper, we report on the current status of
the collection (Section 2) and we give an overview
of several annotated datasets included in it
(Section 3). Finally, we describe our current and future
plans to make the data and annotations available to
the research community (Section 4).
2</p>
    </sec>
    <sec id="sec-3">
      <title>TWITA: Long-term Collection of</title>
    </sec>
    <sec id="sec-4">
      <title>Italian Tweets</title>
      <p>
        The current effort to collect tweets in the
Italian language started in 2012 at the University of
Groningen
        <xref ref-type="bibr" rid="ref19 ref3 ref6">(Basile and Nissim, 2013)</xref>
        . Taking
inspiration from the large collection of Dutch tweets
by Tjong Kim Sang and van den Bosch (2013),
Basile and Nissim (2013) implemented a pipeline
to collect and automatically annotate a large set
of tweets in Italian by leveraging the Twitter API.
The process interrogates the stream API with a set
of keywords designed to capture the Italian
language and at the same time excluding other
languages. At the time of its publishing, the resource
contained about 100 million tweets in Italian in
the first year (from February 2012 to February
2http://beta.di.unito.it/index.
php/english/research/groups/
content-centered-computing/people
3Some of the datasets included in this report and their
methodology of annotation are described in Sanguinetti et al.
(2014)
2013). The automatic collection, however,
continued, and in 2015 was transferred from the
University of Groningen to the University of Turin. From
June 2018, a new filter based on the five Italian
vowels has been added to the pipeline, along with
the language filter provided by the Twitter API,
which was not previously available, in order to
limit the number of accidentally captured tweets
in other languages. In the latest version of the
data collection pipeline, a Python script
employing the tweepy library4 gathers JSON tweets
using the following filter: track=[”a”,”e”,”i”,”o”,”u”]
and languages=[”it”]. We stored the raw, complete
JSON tweet structures in zipped files for backup.
Meanwhile, we store the text and the most useful
metadata (username , timestamp, geolocalization,
retweet and reply status) in a relational database in
order to perform efficient queries.
      </p>
      <p>At the time of this writing, the collection
comprises more than 500 million tweets in the
Italian language, spanning 7 years (57 months) from
February 2012 to July 2018. There are a few
holes in the collection, sometimes spanning entire
months, due to incidents involving the server
infrastructure or changes in the Twitter API which
required manual adjustment of the collection
software. Figure 1 shows the percentage of days in
each month for which the collection has data, at
the time of this writing.</p>
    </sec>
    <sec id="sec-5">
      <title>Annotated Datasets</title>
      <p>In the past years, the TWITA collection has been
made available to many research teams interested
in the study of social media in the Italian language
with computational methods. Several such studies
focused on creating new linguistic resources
starting from the raw tweets and basic metadata
provided by TWITA, including a number of datasets
created for shared tasks of computational
linguistics. In this section, we give an overview of such
resources. Moreover, some datasets were created
independently from TWITA, and are now
managed under the same infrastructure, therefore we
include them in this report.</p>
      <p>For each dataset, we provide a summary
infobox with basic information, including the type
of annotation performed on the the dataset and
how it was achieved, i.e., by means of expert
annotators or a crowdsourcing platform.
3.1</p>
      <sec id="sec-5-1">
        <title>Datasets From TWITA</title>
        <p>The datasets described in this section are subsets
of the main TWITA dataset, obtained by sampling
the collection according to different criteria, and
annotated for several purposes.</p>
        <p>
          TWitterBuonaScuola
          <xref ref-type="bibr" rid="ref17">(Stranisci et al., 2016)</xref>
          is a corpus of Italian tweets on the topic
of the national educational and training
systems. The tweets were extracted from a
specific hashtag (#labuonascuola, the nickname of
an education reform, translating to the good
school) and a set of related keywords: “la
buona scuola” (the good school), “buona scuola”
(good school), “riforma scuola” (school
reform), “riforma istruzione” (education reform).
Name: TWitterBuonaScuola
Size: 35,148 total tweets, 7,049 annotated tweets
Time period: February 22, 2014–December 31, 2014
Annotation: polarity, irony and topic
Annotation method: crowdsourcing
URL: http://twita.dipinfo.di.unito.it/tw-bs
TW-SWELLFER
          <xref ref-type="bibr" rid="ref18">(Sulis et al., 2016)</xref>
          is a
corpus of Italian tweets on subjective
wellbeing, in particular regarding the topics of
fertility and parenthood. The tweets were
collected by searching for 11 hashtags — #papa
(father), #mamma (mother), #babbo (dad),
#incinta (pregnant), #primofiglio (first child),
#secondofiglio (second child), #futuremamme
(future moms), #maternita (materhood), #paternita`
(fatherhood), #allattamento (nursing),
#gravidanza (pregnancy) — and 19 related keywords.
Name: TW-SWELLFER
Size: 2,760,416 total tweets, 1,508 annotated tweets
Time period: 2014
Annotation: polarity, irony and sub-topic
Annotation method: crowdsourcing
URL: http://twita.dipinfo.di.unito.it/tw-swellfer
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Italian Hate Speech Corpus (Sanguinetti et al.,</title>
        <p>2018b; Poletto et al., 2017) is a corpus of hate
speech on social media towards migrants and
ethnic minorities, in the context of the Hate Speech
Monitoring Program of the University of Turin5.
The tweets were collected according to a set
of keywords: invadere (invade), invasione
(invasion), basta (enough), fuori (out), comunist*
(communist*), african* (African), barcon* (migrants
boat*).</p>
        <p>
          Name: Italian Hate Speech Corpus
Size: 236,193 total tweets, 6,965 annotated tweets
Time period: October 1st, 2016–April 25th, 2017
Annotation: hate speech, aggressiveness, offensiveness,
stereotype, irony, intensity
Annotation method: crowdsourcing and experts
URL: http://twita.dipinfo.di.unito.it/ihsc
TWITTIR O`
          <xref ref-type="bibr" rid="ref10">(Cignarella et al., 2017)</xref>
          is a
dataset of tweets overlapping with other datasets
included in the University of Turin collection,
on which a finer-grained annotation of irony
is superimposed. The TWITTIRO` tweets are
taken from TWitterBuonaScuola, SENTIPOLC
(see Section 3.2), and TWSpino (see Section 3.3).
Name: TWITTIRO`
Size: 1,600 total tweets: 400 tweets from TWSpino,
600 from SENTIPOLC tweets, 600 tweets from
TWitterBuonaScuola
Time period: 2012–2016
Annotation: fine-grained irony
Annotation method: experts
URL: http://twita.dipinfo.di.unito.it/twittiro
3.2
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>Shared Task Datasets</title>
        <p>
          The large collection of Italian tweets of the
University of Turin has been exploited in different
occasions to extract datasets to organize shared tasks
for the Italian community, in particular under the
umbrella of the EVALITA evaluation campaign6.
In this section, we describe such datasets.
SENTIPOLC The SENTIment POLarity
Classification task was proposed in two editions of
the EVALITA campaign, namely in 2014
          <xref ref-type="bibr" rid="ref4">(Basile
et al., 2014)</xref>
          and 2016
          <xref ref-type="bibr" rid="ref2">(Barbieri et al., 2016)</xref>
          .
Both editions were organized into three different
5http://hatespeech.di.unito.it/
6http://www.evalita.it/
sub-tasks: subjectivity and polarity classification,
and irony detection. The data for SENTIPOLC
2014 were gathered from TWITA and Senti-TUT
(see Section 3.3), while for the 2016 edition the
dataset was further expanded by including other
data sources, such as TWitterBuonaScuola (see
Section 3.1) and a subset of TWITA overlapping
with the dataset used for the shared task on Named
Entity Recognition and Linking in Italian Tweets
          <xref ref-type="bibr" rid="ref2 ref5">(Basile et al., 2016, NEEL-it)</xref>
          .
        </p>
        <p>
          Name: SENTIPOLC
Size: 6,448 (SENTIPOLC 2014), 9,410 (SENTIPOLC
2016) tweets
Time period: 2012 (SENTIPOLC 2014), 2014
(SENTIPOLC 2016)
Annotation: subjectivity, polarity, irony
Annotation method: experts (SENTIPOLC 2014),
crowdsourcing and experts (SENTIPOLC 2016)
URL: http://twita.dipinfo.di.unito.it/sentipolc
PoSTWITA
          <xref ref-type="bibr" rid="ref17 ref18 ref8 ref9">(Bosco et al., 2016b)</xref>
          is the shared
task on Part-of-Speech tagging of Twitter posts
held at EVALITA 2016. Its content was extracted
from the SENTIPOLC corpus described above.
The PoSTWITA dataset consists of Italian tweets
tokenized and annotated at PoS level with a tagset
inspired by the Universal Dependencies scheme7.
Name: PoSTWITA
Size: 6,738 tweets
Time period: 2012
Annotation: part of speech
Annotation method: experts
URL: http://twita.dipinfo.di.unito.it/postwita
        </p>
        <p>
          After the task took place, the PoSTWITA
corpus has been used in a new independent project
on the development of a Twitter-based Italian
treebank fully compliant with the Universal
Dependencies, thus becoming PoSTWITA-UD
          <xref ref-type="bibr" rid="ref15 ref16">(Sanguinetti et al., 2018a)</xref>
          . In particular, the first core
of the resource was automatically annotated by
out-of-domain parsing experiments using different
parsers. The output with the best results was then
revised by two annotators for the final version of
the resource.
        </p>
        <p>PoSTWITA-UD has been made available in the
official UD repository8 since v2.1 release.</p>
        <p>
          Name: PoSTWITA-UD
Size: 6,712 tweets
Time period: 2012
Annotation: dependency-based syntactic annotation
Annotation method: experts
URL: http://twita.dipinfo.di.unito.it/postwita-ud
7http://universaldependencies.org/
8https://github.com/
UniversalDependencies/UD_
Italian-PoSTWITA
IronITA The irony detection task proposed for
EVALITA 20189 consists in automatically
classifying tweets according to the presence of irony
(sub-task A) and sarcasm (sub-task B). Given the
array of situations and topics where ironic or
sarcastic devices can be used, the corpus has been
created by resorting to multiple annotated sources,
such as the already mentioned TWITTIRO` ,
SENTIPOLC, and the Italian Hate Speech Corpus.
Name: IronITA
Size: 4,877 tweets
Time period: 2012–2016
Annotation: irony, sarcasm
Annotation method: crowdsourcing and experts
URL: http://twita.dipinfo.di.unito.it/ironita
HaSpeeDe The Hate Speech Detection task10 at
EVALITA 2018 consists in automatically
annotating messages from Twitter and Facebook. The
dataset proposed for the task is the result of a
joint effort of two research groups on harmonizing
the annotation previously applied to two different
datasets: the first one is a collection of Facebook
comments developed by the group from CNR-Pisa
and created in 2016
          <xref ref-type="bibr" rid="ref11">(Del Vigna et al., 2017)</xref>
          , while
the other one is a subset of the Italian Hate Speech
Corpus (described in Section 3.1). The
annotation scheme has thus been simplified, and it only
includes a binary value indicating whether
hateful contents are present or not in a given tweet or
Facebook comment. The task organizers created
such harmonized scheme also in view of a
crossdomain evaluation, with one dataset used for
training and the other one for testing the system.
        </p>
        <p>It is worth pointing out, however, that despite
their joint use in the task, the resources are
maintained separately, thus only the Twitter section of
the dataset is part of TWITA.</p>
        <p>Name: HaSpeeDe
Size: 4,000 tweets and 4,000 Facebook comments
Time period: 2016–2017 for the Twitter dataset, May 2016
for the Facebook dataset
Annotation: hate speech
Annotation method: crowdsourcing and experts for the
Twitter dataset, experts for the Facebook dataset
URL: http://twita.dipinfo.di.unito.it/haspeede</p>
      </sec>
      <sec id="sec-5-4">
        <title>3.3 Independently-collected Datasets</title>
        <p>To complete the overview of the social media
datasets, in this section we describe collections
of tweets that have been compiled independently
9http://www.di.unito.it/˜tutreeb/
ironita-evalita18</p>
        <p>10http://www.di.unito.it/˜tutreeb/
haspeede-evalita18
from TWITA. However, they are now hosted in
the same infrastructure and therefore can be
considered part of the same collection.</p>
        <p>
          Senti-TUT
          <xref ref-type="bibr" rid="ref1 ref6">(Bosco et al., 2013)</xref>
          is a dataset
of Italian tweets with a focus on politics and
irony. Senti-TUT includes two corpora: TWNews
contains tweets retrieved by querying the
Twitter search API with a series of hashtags related
to Mario Monti (the Italian First Minister at the
time); TWSpino contains tweets from Spinoza11, a
popular satirical Italian blog on politics.
        </p>
        <p>
          Name: Senti-TUT
Size: 3,288 (TWNews), 1,159 tweets (TWSpino)
Time period: October 16th, 2011–February 3rd, 2012
(TWNews), July 2009–February 2012 (TWSpino)
Annotation: polarity, irony
Annotation method: experts
URL: http://twita.dipinfo.di.unito.it/senti-tut
Felicitta`
          <xref ref-type="bibr" rid="ref1">(Allisio et al., 2013)</xref>
          was a project on
the development of a platform that aimed to
estimate and interactively display the degree of
happiness in Italian cities, based on the analysis of data
from Twitter. For its evaluation, a gold corpus was
created by Bosco et al. (2014), using the same
annotation scheme provided for Senti-TUT.
Name: Felicitta`
Size: 1,500 tweets
Time period: November 1st, 2013–July 7th, 2014
Annotation: polarity, irony
Annotation method: experts
URL: http://twita.dipinfo.di.unito.it/felicitta
ConRef-STANCE-ita
          <xref ref-type="bibr" rid="ref12">(Lai et al., 2018)</xref>
          is a
collection of tweets on the topic of the Referendum
held in Italy on December 4, 2016, about a reform
of the Italian Constitution. This is supposedly a
highly controversial topic, chosen to highlight
language features useful for the study of stance
detection. The tweets were collected by searching
for specific hashtags: #referendumcostituzionale
(constitutional referendum), #iovotosi (I vote yes),
#iovotono (I vote no). Subsequently, the collection
was enriched by recovering the conversation chain
from each retrieved tweet to its source,
annotating triplets consisting in one tweet, one retweet,
and one reply posted by the same user in a specific
temporal window. The aim of the collection is to
monitor the evolution of the stance of 248 users
during the debate in four different temporal
windows and also inspecting their social network.
11http://www.spinoza.it
Name: ConRef-STANCE-ita
Size: 2,976 tweets (963 triplets)
Time period: November 24th, 2016–December 7th, 2016
Annotation: stance
Annotation method: crowdsourcing and experts
URL: http://twita.dipinfo.di.unito.it/conref-stance-ita
Finally, there are a number of additional datasets
hosted in our infrastructure that are being actively
developed at the time of this writing. Some of
those datasets include a collection of geo-localized
tweets on the 2016 edition of the “giro d’Italia”
cycling competition, a dataset of tweets
concerning the 2016 local elections in 10 major Italian
cities, and an addendum to the
ConRef-STANCEita dataset described in Section 3.3.
        </p>
        <p>
          Furthermore, we limited this report to the
datasets of tweets in the Italian language, which
make for the majority of our collection.
However, we curate several datasets in other languages,
often as a result of collaborations with
international research teams and projects, such as, for
instance, TwitterMariagePourTous
          <xref ref-type="bibr" rid="ref17 ref18 ref8 ref9">(Bosco et al.,
2016a)</xref>
          , a corpus of 2,872 French tweets extracted
in the period 16th December 2010 - 20th July 2013
on the topic of same-sex marriage. In addition,
several new corpora have been developed within
the Hate Speech Monitoring program (see Section
3.1), aiming at studying hate speech phenomenon
against different targets such as women and the
LGBTQ community, and resorting to other data
sources than Twitter (Facebook and online
newspapers in particular). Although such resources are
still under construction - therefore it is not possible
to provide any corpus statistics yet - our goal is to
include them in our resource infrastructure, thus
making a step forward and ensuring its
improvement also in terms of diversity of data sources.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Data Availability</title>
      <p>The main goal of collecting and organizing
datasets such as the ones described in this paper is,
generally speaking, to provide the NLP research
community with powerful tools to enhance the
state of the art of language technologies.
Therefore, our default policy is to share as much data
as possible, as freely as possible. Twitter has
proven to behave cooperatively towards the
scientific community, relaxing the limits imposed to
data sharing for non-commercial use over time12.</p>
      <p>12https://developer.twitter.com/en/
developer-terms/agreement-and-policy.
However, there are considerations about the
privacy of the users that must be accounted for in
releasing Twitter data. In particular, the EU General
Data Protection Regulation from 2018 (GDPR)13
strictly regulates data and user privacy. For
instance, if a tweet has been deleted by a user, it
should not be published in other forms (Article
17), although it can still be used for scientific
purposes.</p>
      <p>Technically, we follow these consideration by
implementing an interface to download the ID of
the tweets in our collection, and tools to retrieve
the original tweets (if still available). The
annotated datasets can instead be shared in their
entirety, given their limited size, thus we provide
links to download them in tabular format. Finally,
we are developing interactive interfaces to select
and download samples of the collection based on
the time period and sets of keywords and hashtags.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>Valerio Basile and Manuela Sanguinetti are
partially supported by Progetto di Ateneo/CSP 2016
(Immigrants, Hate and Prejudice in Social Media,
S1618 L2 BOSC 01).</p>
      <p>
        Mirko Lai is partially supported by Italian
Ministry of Labor
        <xref ref-type="bibr" rid="ref11">(Contro l’odio: tecnologie
informatiche, percorsi formativi e story telling
partecipativo per combattere l’intolleranza, avviso
n.1/2017 per il finanziamento di iniziative e
progetti di rilevanza nazionale ai sensi dell’art. 72 del
d.l. 3 luglio 2017, n. 117 - anno 2017)</xref>
        .
html
13https://gdpr-info.eu/
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Leonardo</given-names>
            <surname>Allisio</surname>
          </string-name>
          , Valeria Mussa, Cristina Bosco, Viviana Patti, and
          <string-name>
            <given-names>Giancarlo</given-names>
            <surname>Ruffo</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Felicitta`: Visualizing and estimating happiness in Italian cities from geotagged tweets</article-title>
          .
          <source>In Proceedings of the First International Workshop on Emotion and Sentiment in Social and Expressive Media: approaches and perspectives from AI</source>
          (ESSEM
          <year>2013</year>
          ), pages
          <fpage>95</fpage>
          -
          <lpage>106</lpage>
          , Turin, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Valerio Basile, Danilo Croce, Malvina Nissim, Nicole Novielli, and
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the Evalita 2016 SENTIment POLarity Classification Task</article-title>
          .
          <source>In Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ), Naples, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          and
          <string-name>
            <given-names>Malvina</given-names>
            <surname>Nissim</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Sentiment analysis on Italian tweets</article-title>
          .
          <source>In Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis</source>
          , pages
          <fpage>100</fpage>
          -
          <lpage>107</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Andrea Bolioli, Malvina Nissim, Viviana Patti, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Overview of the Evalita 2014 SENTIment POLarity Classification Task</article-title>
          .
          <source>In Proceedings of the Fourth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2014</year>
          ), Pisa, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Annalina Caputo, Anna Lisa Gentile, and
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Rizzo</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the EVALITA 2016 Named Entity rEcognition and Linking in Italian tweets (NEEL-IT) task</article-title>
          .
          <source>In Proceedings of the Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          )
          <article-title>&amp; the Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ), Naples, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          , Viviana Patti, and
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Bolioli</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Developing corpora for sentiment analysis: The case of irony and Senti-TUT</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          ,
          <volume>28</volume>
          (
          <issue>2</issue>
          ):
          <fpage>55</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          , Leonardo Allisio, Valeria Mussa, Viviana Patti, Giancarlo Ruffo, Manuela Sanguinetti, and
          <string-name>
            <given-names>Emilio</given-names>
            <surname>Sulis</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Detecting happiness in Italian tweets: Towards an evaluation dataset for sentiment analysis in Felicitta`</article-title>
          .
          <source>In Proceedings of the 5th International Workshop on EMOTION</source>
          ,
          <string-name>
            <surname>SOCIAL</surname>
            <given-names>SIGNALS</given-names>
          </string-name>
          ,
          <article-title>SENTIMENT &amp; LINKED OPEN DATA</article-title>
          , pages
          <fpage>56</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          , Mirko Lai, Viviana Patti, and
          <string-name>
            <given-names>Daniela</given-names>
            <surname>Virone</surname>
          </string-name>
          . 2016a.
          <article-title>Tweeting and being ironic in the debate about a political reform: the French annotated corpus Twitter-MariagePourTous</article-title>
          .
          <source>In Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC</source>
          <year>2016</year>
          ,
          <article-title>Portorozˇ</article-title>
          , Slovenia.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          , Fabio Tamburini, Andrea Bolioli, and
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Mazzei</surname>
          </string-name>
          . 2016b.
          <article-title>Overview of the EVALITA 2016 Part Of Speech on TWitter for ITAlian task</article-title>
          .
          <source>In Proceedings of the Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          )
          <article-title>&amp; the Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ), Naples, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Alessandra</given-names>
            <surname>Teresa</surname>
          </string-name>
          <string-name>
            <surname>Cignarella</surname>
          </string-name>
          , Cristina Bosco, and
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Twittiro`: a social media corpus with a multi-layered annotation for irony</article-title>
          .
          <source>In Proceedings of the Fourth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2017</year>
          ), Rome, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Fabio Del Vigna</surname>
            ,
            <given-names>Andrea</given-names>
          </string-name>
          <string-name>
            <surname>Cimino</surname>
            , Felice Dell'Orletta,
            <given-names>Marinella</given-names>
          </string-name>
          <string-name>
            <surname>Petrocchi</surname>
            , and
            <given-names>Maurizio</given-names>
          </string-name>
          <string-name>
            <surname>Tesconi</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Hate me, hate me not: Hate speech detection on Facebook</article-title>
          .
          <source>In Proceedings of the First Italian Conference on Cybersecurity (ITASEC17)</source>
          ,, pages
          <fpage>86</fpage>
          -
          <lpage>95</lpage>
          , Venice, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Mirko</given-names>
            <surname>Lai</surname>
          </string-name>
          , Viviana Patti, Giancarlo Ruffo, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Stance evolution and twitter interactions in an italian political debate</article-title>
          .
          <source>In NLDB</source>
          , volume
          <volume>10859</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>15</fpage>
          -
          <lpage>27</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Fabio</given-names>
            <surname>Poletto</surname>
          </string-name>
          , Marco Stranisci, Manuela Sanguinetti, Viviana Patti, and
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Hate speech annotation: Analysis of an Italian Twitter corpus</article-title>
          .
          <source>In Proceedings of the Fourth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2017</year>
          ), Rome, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Manuela</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          , Emilio Sulis, Viviana Patti, Giancarlo Ruffo, Leonardo Allisio, Valeria Mussa, and
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Developing corpora and tools for sentiment analysis: the experience of the University of Turin group</article-title>
          .
          <source>In First Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2014</year>
          ), pages
          <fpage>322</fpage>
          -
          <lpage>327</lpage>
          , Pisa, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Manuela</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          , Cristina Bosco, Alberto Lavelli, Alessandro Mazzei, Oronzo Antonelli, and
          <string-name>
            <given-names>Fabio</given-names>
            <surname>Tamburini</surname>
          </string-name>
          .
          <year>2018a</year>
          .
          <article-title>PoSTWITA-UD: an Italian Twitter treebank in Universal Dependencies</article-title>
          .
          <source>In Proceedings of the 11th Language Resources and Evaluation Conference LREC</source>
          <year>2018</year>
          ), pages
          <fpage>1768</fpage>
          -
          <lpage>1775</lpage>
          , Miyazaki, Japan.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Manuela</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          , Fabio Poletto, Cristina Bosco, Viviana Patti, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Stranisci</surname>
          </string-name>
          .
          <year>2018b</year>
          .
          <article-title>An Italian Twitter Corpus of Hate Speech against Immigrants</article-title>
          .
          <source>In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          ), Miyazaki,
          <string-name>
            <given-names>Japan. European</given-names>
            <surname>Language Resources Association (ELRA).</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Stranisci</surname>
          </string-name>
          , Cristina Bosco, Delia Iraz Hernndez Faras, and
          <string-name>
            <given-names>Viviana</given-names>
            <surname>Patti</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Annotating sentiment and irony in the online italian political debate on #labuonascuola</article-title>
          .
          <source>In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC</source>
          <year>2016</year>
          ), Paris, France, may.
          <source>European Language Resources Association (ELRA).</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Emilio</given-names>
            <surname>Sulis</surname>
          </string-name>
          , Cristina Bosco, Viviana Patti, Mirko Lai, Delia Irazu´ Herna´ndez Far´ıas, Letizia Mencarini, Michele Mozzachiodi, and
          <string-name>
            <given-names>Daniele</given-names>
            <surname>Vignoli</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Subjective well-being and social media. A semantically annotated Twitter corpus on fertility and parenthood</article-title>
          .
          <source>In Proceedings of the Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          )
          <article-title>&amp; the Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ), Naples, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Tjong Kim Sang</surname>
          </string-name>
          and
          <string-name>
            <surname>A. van den Bosch.</surname>
          </string-name>
          <year>2013</year>
          .
          <article-title>Dealing with big data: The case of Twitter</article-title>
          .
          <source>Computational Linguistics in the Netherlands Journal</source>
          ,
          <volume>3</volume>
          (
          <issue>12</issue>
          /
          <year>2013</year>
          ):
          <fpage>121</fpage>
          -
          <lpage>134</lpage>
          . Reporting year:
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>