=Paper= {{Paper |id=Vol-2263/paper003 |storemode=property |title=Overview of the EVALITA 2018 Aspect-based Sentiment Analysis Task (ABSITA) |pdfUrl=https://ceur-ws.org/Vol-2263/paper003.pdf |volume=Vol-2263 |authors=Pierpaolo Basile,Valerio Basile,Danilo Croce,Marco Polignano |dblpUrl=https://dblp.org/rec/conf/evalita/BasileBCP18 }} ==Overview of the EVALITA 2018 Aspect-based Sentiment Analysis Task (ABSITA)== https://ceur-ws.org/Vol-2263/paper003.pdf
    Overview of the EVALITA 2018 Aspect-based Sentiment Analysis task
                               (ABSITA)
                 Pierpaolo Basile                                   Valerio Basile
            University of Bari Aldo Moro                          University of Turin
         pierpaolo.basile@uniba.it                            basile@di.uniroma1.it

                     Danilo Croce                              Marco Polignano
           University of Rome “Tor Vergata”               University of Bari Aldo Moro
           croce@info.uniroma2.it                       marco.polignano@uniba.it
                      Abstract                         content. For instance, platforms like Amazon1 or
                                                       TripAdvisor2 allow people to express their opin-
    English. ABSITA is the Aspect-based                ions on products, such as food, electronic items,
    Sentiment Analysis task at EVALITA                 clothes, and services, such as hotels and restau-
    2018 (Caselli et al., 2018). This task             rants.
    aimed to foster research in the field of              In such a social context, Sentiment Analysis
    aspect-based sentiment analysis within the         (SA) is the task of automatically extract subjective
    Italian language: the goal is to identify          opinions from a text. In its most basic form, a SA
    the aspects of given target entities and the       system takes in input a text written in natural lan-
    sentiment expressed for each aspect. Two           guage and assign it a label indicating whether the
    subtasks are defined, namely Aspect Cat-           text is expressing a positive or negative sentiment,
    egory Detection (ACD) and Aspect Cate-             or neither (neutral, or objective, text). However,
    gory Polarity (ACP). In total, 20 runs were        reviews are often quite detailed in expressing the
    submitted by 7 teams comprising 11 to-             reviewer’s opinion on several aspects of the target
    tal individual participants. The best sys-         entity. Aspect-based Sentiment Analysis (ABSA)
    tem achieved a micro F1-score of 0.810 for         is an evolution of Sentiment Analysis that aims
    ACD and 0.767 for ACP.                             at capturing the aspect-level opinions expressed in
                                                       natural language texts (Liu, 2007).
    Italiano. ABSITA è l’esercizio di valu-              At the international level, ABSA was intro-
    tazione di aspect-based sentiment analy-           duced as a shared task at SemEval, the most
    sis di EVALITA 2018 (Caselli et al., 2018).        prominent evaluation campaign in the Natu-
    Il compito ha l’obiettivo di promuovere la         ral Language Processing field, in 2014 (SE-
    ricerca nel campo della sentiment analy-           ABSA14), providing a benchmark dataset of re-
    sis per lingua italiana: ai partecipanti è        views in English (Pontiki et al., 2014). Datasets
    stato richiesto di identificare gli aspetti ril-   of computer laptops and restaurant reviews were
    evanti per le entitá fornite come input e la      annotated with aspect terms (both fine-grained,
    sentiment espressa per ognuno di essi. In          e.g. ”hard disk”, ”pizza”, and coarse-grained, e.g.,
    particolare abbiamo definito come sotto-           ”food”) and their polarity (positive or negative).
    task l’Aspect Category Detection (ACD) e              The task was repeated in SemEval 2015 (SE-
    l’Aspect Category Polarity (ACP). In to-           ABSA15) and 2016 (SE-ABSA16), aiming to fa-
    tale, sono state presentate 20 soluzioni di        cilitate more in-depth research by providing a new
    7 team composti in totale da 11 singoli            ABSA framework to investigate the relations be-
    partecipanti. Il miglior sistema ha ot-            tween the identified constituents of the expressed
    tenuto un punteggio di micro F1 di 0,810           opinions and growing up to include languages
    per ACD e 0,767 per ACP.                           other than English and different domains (Pontiki
                                                       et al., 2015; Pontiki et al., 2016).
                                                          ABSITA (Aspect-based Sentiment Analysis on
1   Introduction                                       Italian) aims at providing a similar evaluation with
In recent years, many websites started offering a      respect to texts in Italian. In a nutshell, partic-
high level interaction with users, who are no more        1
                                                              http://www.amazon.com
                                                          2
a passive audience, but can actively produce new              http://www.tripadvisor.com
ipants are asked to detect within sentences (ex-               pect has been detected in the text. Table 1 shows
pressing opinions about accommodation services)                examples of annotation for the ACD task.
some of the aspects considered by the writer.                     For the ACP task, the input is the review text
These aspects belongs to a close set ranging from              paired with the set of aspects identified in the text
the cleanliness of the room to the price of the ac-            within the ACD subtask, and the goal is to assign
commodation. Moreover, for each detected as-                   polarity labels to each of the aspect category. Two
pect, participants are asked to detect a specific po-          binary polarity labels are expected for each aspect:
larity class, expressing appreciation or criticism             POS an NEG, indicating a positive and negative
towards it.                                                    sentiment expressed towards a specific aspect, re-
   During the organization of the task, we col-                spectively. Note that the two labels are not mutu-
lected a dataset composed of more than 9,000 sen-              ally exclusive: in addition to the annotation of pos-
tences and we annotated them with aspects and                  itive aspects (POS:true, NEG:false) and neg-
polarity labels. During the task, 20 runs were sub-            ative aspects (POS:false, NEG:true), there
mitted by 7 teams comprising 11 individual partic-             can be aspects with no polarity, or neutral polar-
ipants.                                                        ity (POS:false, NEG:false). This is also the
   In the rest of the paper Section 2 provides a de-           default polarity annotation for the aspects that are
tailed definition of the task. Section 3 describes             not detected in a text. Finally, the polarity of an
the dataset made available in the evaluation cam-              aspect can be mixed (POS:true, NEG:true),
paign, while Section 4 reports the official evalu-             in cases where both sentiments are expressed to-
ation measures. In Section 5 and 6, the results                wards a certain aspect in a text. Table 2 summa-
obtained by the participants are reported and dis-             rizes the possible annotations with examples.
cussed, respectively. Finally, Section 7 derives the           The participants could choose to submit only the
conclusions.                                                   results of the ACD subtask, or both tasks. In the
                                                               latter case, the output of the ACD task is used
2   Definition of the task                                     as input for the ACP. As a constraint on the re-
                                                               sults submitted for the ACP task, the polarity of
In ABSITA, Aspect-based Sentiment Analysis is                  an aspect for a given sentence can be different than
decomposed as a cascade of two subtasks: Aspect                (POS:false, NEG:false) only if the aspect is
Category Detection (ACD) and Aspect Category                   detected in the ACD step.
Polarity (ACP). For example, let us consider the
sentence describing an hotel:                                  3       Dataset
    I servizi igienici sono puliti e il personale cor-
diale e disponibile. (Toilets are clean but the staff is not   The data source chosen for creating the ABSITA
friendly nor helpful.)                                         datasets is the popular website booking.com3 . The
    In the ACD task, one or more ”aspect cate-                 platform allows users to share their opinions about
gories” evoked in a sentence are identified, e.g.              hotels visited through a positive/negative textual
the pulizia (cleanliness) and staff cat-                       review and a fine-grain rating system that can be
egories in sentence 2. In the Aspect Category                  used for assigning a score to each different as-
Polarity (ACP) task, the polarity of each ex-                  pect: cleanliness, comfort, facilities, staff, value
pressed category is recognized, e.g. a positive                for money, free/paid WiFi, location. Therefore,
category polarity is expressed concerning the                  the website provides a large number of reviews in
pulizia category while it is negative if con-                  many languages.
sidering the staff category.                                      We extracted the textual reviews in Italian, la-
    In our evaluation framework, the set of aspect             beled on the website with one of the eighth con-
categories is known and given to the participants,             sidered aspects. The dataset contains reviews left
so the ACD task can be seen as a multi-class, non-             by users for hotels situated in several main Italian
exclusive classification task where each input text            cities such as Rome, Milan, Naples, Turin, Bari,
has to be classified as evoking or not each aspect             and more. We split the reviews into groups of sen-
category. The participant systems are asked to re-             tences which describe the positive and the nega-
turn a binary vector where each dimension cor-                 tive characteristics of the selected hotel. The re-
responds to an aspect category and the values 0                views have been collected between the 16th and
                                                                   3
(false) and 1 (true) indicate whether each as-                         https://www.booking.com
 Sentence                                                                 C LEANLINESS     S TAFF    C OMFORT   L OCATION
 I servizi igienici sono puliti e il personale cordiale e disponibile           1             1          0          0
 La posizione è molto comoda per il treno e la metro.                          0             0          0          1
 Ottima la disponibilitá del personale, e la struttura della stanza            0             1          1          0

                                   Table 1: Examples of categories detection ACD.
                    Sentence                                                       Aspect      POS     NEG
                    Il bagno andrebbe ristrutturato                            C LEANLINESS     0       0
                    Camera pulita e spaziosa.                                  C LEANLINESS     1       0
                    Pulizia della camera non eccelsa.                          C LEANLINESS     0       1
                    Il bagno era pulito ma lasciava un po’ a desiderare        C LEANLINESS     1       1

              Table 2: Examples of polarity annotations with respect to the cleanliness aspect.


the 17th of April 2018 using Scrapy4 , a Python                         scores provided by the original review platform.
web crawler. We collect in total 4,121 distinct re-                     Incomplete, irrelevant, and incomprehensible
views in Italian language.                                              sentences have been discarded from the dataset
   The reviews have been manually checked to                            during the annotation. At the end of the annotation
verify the annotation of the aspects provided by                        process, we obtained the gold standard dataset
booking.com, and to add missing links between                           with the associations among sentence, sentiment
sentences and aspects. We started by annotat-                           and aspect. The entire annotation process took a
ing a small portion of the whole dataset split by                       few weeks to complete. The positive and negative
sentences (250 randomly chosen sentences) us-                           polarities are annotated independently, thus for
ing four annotators (the task organizers) in order                      each aspect the four sentiment combination
to check the agreement of the annotation. For                           discussed in Section 2 are possible: positive, neg-
the ACD task, we asked the annotators to answer                         ative , neutral and mixed. The resulting classes
straightforward questions in the form of “Is aspect                     are: cleanliness positive, cleanliness negative,
X mentioned in the sentence Y ?” (Tab. 1).                              comfort positive,    comfort negative,      ameni-
   The set of italian aspects is the direct trans-                      ties positive, amenities negative, staff positive,
lation of those booking.com: PULIZIA (clean-                            staff negative, value positive, value negative,
liness), COMFORT, SERVIZI (amenities), STAFF,                           wifi positive, wifi negative, location positive,
QUALITA - PREZZO (value), WIFI (wireless Internet                       location negative, other positive, other negative.
connection) and POSIZIONE (location). Similarly,                        For each aspect, the sentiment is encoded in two
for the ACP subtask, the annotation is performed                        classes:
at sentence level, but with the set of aspects al-
ready provided by the ACD annotation, and check-                          • negative = (* positive = 0, * negative = 1)
boxes to indicate positive and negative polarity of                       • positive = (* positive = 1, * negative = 0)
each aspect (Tab. 2). The result of the pilot anno-
tation has been used to compute an inter-annotator                        • neutral = (* positive = 0, * negative = 0)
agreement measure, in order to understand if it
was possible to allow annotators to work indepen-                         • mixed = (* positive = 1, * negative = 1)
dently each other on a different set of sentences.
                                                                        Please note that the special topic, OTHER has been
We found agreement ranging from 82.8% to 100%
                                                                        added for completeness, to annotate sentences
with an average value of 94.4% obtained counting
                                                                        with opinions on aspects not among the seven con-
the number of sentences annotated with the same
                                                                        sidered by the task. The aspect OTHER is provided
label by all the annotators.
                                                                        additionally and it is not part of the evaluation of
   In order to complete the annotation, we as-
                                                                        results provided for the task.
signed different 1,000 reviews to each annotator
                                                                           We released the data in Comma-separated Value
(about 2,500 sentences on average). We split
                                                                        format (CSV) with UTF-8 encoding and semi-
the dataset among the annotators so that each of
                                                                        colon as separator. The first attribute is the id of
them received a uniformly balanced distribution
                                                                        the review. Note that in booking.com the order of
of positive and negative aspects, based on the
                                                                        positive and negative sentences is strictly defined
  4
      https://scrapy.org                                                and this can make too easy the task. To overcome
    Dataset           Description                                                                                 #Sentences
                      Trial dataset containing a small set of features used for checking the format of the file       30
    Trial set
                      format                                                                                      0.34% of Total
                      The dataset contains sentences provided for training. They have been selected using a           6,337
    Training set
                      random stratification of the whole dataset.                                                 69.75% of Total
                      The dataset contains sentences provided for testing. They contains sentences without the        2,718
    Test set
                      annotations of aspects.                                                                     29.91% of Total


                     Table 3: List of datasets released for the ABSITA task at EVALITA 2018.
               Dataset        clean pos     comf pos     amen pos      staff pos    value pos    wifi pos    loca pos
               Trial set              2            8            6              3            1           1           5
               Training set         504          978          948            937          169          43       1,184
               Test set             193          474          388            411           94          18         526

               Dataset        clean neg    comf neg      amen neg      staff neg   value neg     wifi neg    loca neg
               Trial set              1           2             3              1           1            0           1
               Training set         383       1,433           920            283         251           86         163
               Test set             196         666           426            131         126           52         103

               Table 4: Distribution of the sentences in the datasets among the aspects and polarities.

                                                                     |Sa ∩Ga |
this issue, we randomly assign for each sentence                       |Ga | .  Here Sa is the set of aspect category
a new position in the review. As a consequence,                      annotations that a system returned for all the test
the final positional id showed in the data file do                   sentences, and Ga is the set of the gold (cor-
not reflect the real order of the sentences in the                   rect) aspect category annotations. For instance,
review. The text of the sentence is provided at                      if a review is labeled in the gold standard with
the end of the line and delimited by ”. It is pre-                   the two aspects Ga = {CLEANLINESS, STAFF},
ceded by three binary values for each aspect indi-                   and the system predicts the two aspects Sa =
cating respectively: the presence in the sentence                    {CLEANLINESS, COMFORT}, we have that |Sa ∩
(aspectX presence:0/1), the positive polarity for                    Ga | = 1, |Ga | = 2 and |Sa | = 2 so that Pa = 12 ,
that aspect (aspectX pos:0/1) and finally the neg-                   Ra = 12 and F 1a = 12 . For the ACD task the
ative polarity (aspectX neg:0/1). Fig. 1 shows an                    baseline will be computed by considering a system
example of the annotated dataset in the proposed                     which assigns the most frequent aspect category
format.                                                              (estimated over the training set) to each sentence.
   The list of the datasets released for the task                       For the ACP task we evaluate the entire
is provided in Tab. 3 and the distribution of                        chain, thus considering both the aspect cate-
the sentences among aspects and polarity is pro-                     gories detected in the sentences together with
vided in Tab. 4. The subdivision adopted for                         their corresponding polarity, in the form of
it is respectively 0.34%, 69.75%, 29,91% for                         (aspect, polarity) pairs. We again compute
trial, training and test data. The datasets can                      Precision, Recall and F1 -score now defined as
                                                                               2Pp Rp
be freely downloaded from http://sag.art.                            F 1p = Pp +R   p
                                                                                      . Precision (Pp ) and Recall (Rp )
uniroma2.it/absita/ and reused in non-                                                         p   |S ∩G |
                                                                                                    p           p    p   |S ∩G |
                                                                     are defined as Pp =       |Sp | ; Rp =     |Gp | ,
commercial projects and researches. After the
                                                                     where Sp is the set of (aspect, polarity) pairs
submission deadline, we also distributed the gold
                                                                     that a system returned for all the test sen-
standard test set and evaluation script.
                                                                     tences, and Ga is the set of the gold (correct)
                                                                     pairs annotations. For instance, if a review
4    Evaluation measures and baselines                               is labeled in the gold standard with the pairs
                                                                     Gp = {(CLEANLINESS, P OS), (STAFF, P OS)},
We evaluate the ACD and ACP subtasks sepa-                           and the system predicts the three pairs Sp =
rately by comparing the classifications provided                     {(CLEANLINESS, P OS), (CLEANLINESS, N EG),
by the participant systems to the gold standard an-                  (COMFORT, P OS)}, we have that |Sp ∩ Gp | = 1,
notations of the test set. For the ACD task, we                      |Gp | = 2 and |Sp | = 3 so that Pa = 13 , Ra = 12
compute Precision, Recall and F1 -score defined                      and F 1a = 0.28.
as: F 1a = P2Pa +R
                a Ra
                   a
                     , where Precision (Pa ) and Re-                    For the ACP task, the baseline is computed by
call (Ra ) are defined as: Pa = |Sa|S∩G
                                      a|
                                         a|
                                            ; Ra =                   considering a system which assigns the most fre-
sentence_id; aspect1_presence; aspect1_pos; aspect1_neg; ...; sentence
201606240;0;0;0;0;0;0;0;0;0;0;0;0;1;1;0;0;0;0;1;1;0;"Considerato il prezzo e per una sola notte,va    ..."
201606241;1;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;"Almeno i servizi igienici andrebbero rivisti e   ..."
201606242;0;0;0;1;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;"La struttura purtroppo \‘e vecchia e ci vorrebbero ..."



                        Figure 1: Sample of the annotated dataset in CSV format.

    System         Micro-P    Micro-R:     Micro-F1         System         Micro-P     Micro-R:     Micro-F1
    ItaliaNLP 1    0.8397     0.7837       0.8108           ItaliaNLP 1    0.8264      0.7161       0.7673
    gw2017 1       0.8713     0.7504       0.8063           UNIPV          0.8612      0.6562       0.7449
    gw2017 2       0.8697     0.7481       0.8043           gw2017 2       0.7472      0.7186       0.7326
    X2Check gs     0.8626     0.7519       0.8035           gw2017 1       0.7387      0.7206       0.7295
    UNIPV          0.8819     0.7378       0.8035           ItaliaNLP 2    0.8735      0.5649       0.6861
    X2Check w      0.8980     0.6937       0.7827           SeleneBianco   0.6869      0.5409       0.6052
    ItaliaNLP 2    0.8658     0.6970       0.7723           ilc 2          0.4123      0.3125       0.3555
    SeleneBianco   0.7902     0.7181       0.7524           ilc 1          0.5452      0.2511       0.3439
    VENSES 1       0.6232     0.6093       0.6162           mfc baseline   0.2451      0.1681       0.1994
    VENSES 2       0.6164     0.6134       0.6149
    ilc 2          0.5443     0.5418       0.5431       Table 6: Results of the submissions for the ACP
    ilc 1          0.6213     0.4330       0.5104       subtask.
    mfc baseline   0.4111     0.2866       0.3377

Table 5: Results of the submissions for the ACD         perform the baseline demonstrating the efficacy
subtask.                                                of the solutions proposed and the affordability of
                                                        all two tasks. The results obtained for the ACD
quent (aspect, polarity) pair (estimated over the       task (Tab. 5) show a small range of variability, at
training set) to each sentence.                         least in the first part of the ranking (the top results
   We produced separate rankings for the tasks,         are concentrated around a F1 score value of 0.80).
based on the F1 scores. Participants who submit-        On the contrary, the values of precision and recall
ted only the result of the ACD task appear in the       show higher variability, indicating significant dif-
first ranking only.                                     ference among the proposed approaches.

5    Results                                            6    Discussion
We received submissions from several teams that
participated in past editions of EVALITA, in par-       The teams of the ABSITA challenge have been in-
ticular to the SENTIPOLC (Sentiment Polarity            vited to describe their solution in a technical re-
Classification (Barbieri et al., 2016)) and NEEL-it     port and to fill in a questionnaire, in order to gain
(Named Entity Recognition (Basile et al., 2016)),       an insight on their approaches and to support their
but also some new entries in the community. In to-      replicability. Five systems (ItaliaNLP, gw2017,
tal, 20 runs were submitted by 7 teams comprising       X2Check, UNIPV, SeleneBianco) are based on su-
11 individual participants. The task allowed par-       pervised machine learning, that is, all the systems
ticipant teams to send up to 2 submissions from         for which we have access to the implementation
each team. In particular, 12 runs were submitted        details, with the exception of VENSES, which is a
to ACD task and 8 runs to the ACP task.                 rule-based unsupervised system. Among the sys-
   We also provide the result of a baseline sys-        tem that use supervised approaches, three systems
tem that assigns to each instance the most frequent     (ItaliaNLP, gw2017, UNIPV) employ deep learn-
class in each task, i.e., the aspect (C OMFORT) and     ing (in particular LTSM networks, often in their
polarity (positive) for that aspect, according to the   bi-directional variant).
frequency of classes in the training set. The results      All runs submitted can be considered ”con-
of the submissions for the two tasks, and the base-     strained runs”, that is, the systems were trained on
line (namely mfc baseline), are reported in Tab. 5      the provided data set only.
and Tab. 6. Of the seven teams who participated            Besides additional training data, some sys-
to the ACD task, five teams also participated to the    tems employ different kind of external resources.
ACP task.                                               Among these, pre-trained word embeddings are
   The results obtained by the teams largely out-       used as word representations by UNIPV (Fast-
text5 ) and gw2017 (word embeddings provided by         cussion. In particular, the ABSA (Aspect-based
the SpaCy framework6 ). The system of ItaliaNLP         Sentiment Analysis) task concerns the association
employs word embedding created from the ItWaC           of a polarity (positive, negative, neutral/objective)
corpus (Baroni et al., 2009) and corpus extracted       to the piece of the sentence that refers to an as-
from Booking.com.                                       pect of interest. In ABSITA, we proposes to au-
   Some of the systems are ABSA extensions built        tomatically extract users’ opinions about aspects
on top of custom or pre-existing NLP pipelines.         in hotel rewievs. The complexity of the task has
This is the case for ItaliaNLP, VENSES and              been successfully faced by the solutions submit-
X2Check. Other systems make use of off-the-             ted to the task. Systems that used supervised ma-
shelf NLP tools for preprocessing the data, such        chine learning approaches, based on semantic and
as SpaCy (gw2017, UNIPV) and Freeling7 (Se-             morphosyntactic features representation of textual
leneBianco).                                            contents, demonstrate encouraging performances
   Finally, additional resources used by the sys-       in the task. Good results have also been obtained
tems often include domain-specific or affective         using rule-based systems, even though they suffer
lexicons. ItaliaNLP employed the MPQA affec-            from generalization issues and need to be tailored
tive lexicon (Wilson et al., 2005), and further de-     on the set of sentences to classify. The decision to
veloped an affective lexicon from a large corpus of     use additional resources as additional lexicons in
tweets by distant supervision. The UNIPV system         conjunction with semantic word embeddings have
makes use of the affective lexicon for Italian devel-   been demonstrated to be successful. More details
oped in the framework of the OpeNER project8 .          about the implementation of the systems that par-
   In the ACD task, the precision of the second         ticipated in the task can be found in their specific
ranked system (gw2017) is significantly higher          reports. In conclusion, we consider the ABSITA
than that of the first system (ItaliaNLP), although     2018 task a success and an improvement of state of
the latter ranks at the top because of a higher re-     the art for the ABSA task in the Italian language.
call. This unbalance between precision and recall
is mainly due to the high number of aspect that
can be assigned at the same time to a sentence: a       References
system returning too many aspects is exposed to         Francesco Barbieri, Valerio Basile, Danilo Croce,
low precision but higher recall, while a more con-        Malvina Nissim, Nicole Novielli, and Viviana Patti.
servative system would achieve the opposite sit-          2016. Overview of the Evalita 2016 SENTIment
                                                          POLarity Classification Task. In Proceedings of
uation. Further details about the systems devel-          Third Italian Conference on Computational Linguis-
oped for the task can be found in the technical           tics (CLiC-it 2016) & Fifth Evaluation Campaign of
reports of the partecipants: ItaliaNLP (Cimino et         Natural Language Processing and Speech Tools for
al., 2018), UNIPV (Nicola, 2018), VENSES (Del-            Italian. Final Workshop (EVALITA 2016), Naples,
                                                          Italy, December.
monte, 2018), X2Check (Di Rosa and Durante,
2018), gw2017 (Bennici and Portocarrero, 2018)          Marco Baroni, Silvia Bernardini, Adriano Ferraresi,
                                                         and Eros Zanchetta. 2009. The wacky wide web:
7   Conclusion                                           a collection of very large linguistically processed
                                                         web-crawled corpora. Language Resources and
The large availability of user-generated contents        Evaluation, 43:209–226.
over the Web that characterizes the current ten-
                                                        P. Basile, A. Caputo, A.L. Gentile, and G. Rizzo. 2016.
dencies of virtually sharing opinions with others
                                                           Overview of the evalita 2016 named entity recog-
has promoted the diffusion of platforms able to            nition and linking in italian tweets (neel-it) task.
analyze and reuse them for personalized services.          In 5th Evaluation Campaign of Natural Language
A challenging task is the analysis of the users’           Processing and Speech Tools for Italian (EVALITA
opinions about a product, service or topic of dis-         2016), Napoli, Italia, 12/2016.

   5
     https://github.com/facebookresearch/               Mauro Bennici and Xileny Seijas Portocarrero. 2018.
fastText/blob/master/pretrained-vectors.                 Ensemble for aspect-based sentiment analysis. In
md                                                       Tommaso Caselli, Nicole Novielli, Viviana Patti,
   6
     https://spacy.io/                                   and Paolo Rosso, editors, Proceedings of the 6th
   7                                                     evaluation campaign of Natural Language Process-
     http://nlp.lsi.upc.edu/freeling/node/
   8                                                     ing and Speech tools for Italian (EVALITA’18),
     https://github.com/opener-project/\\
VU-sentiment-lexicon                                     Turin, Italy. CEUR.org.
Tommaso Caselli, Nicole Novielli, Viviana Patti, and       Zhao, Bing Qin, Orphée De Clercq, Véronique
  Paolo Rosso. 2018. Evalita 2018: Overview of             Hoste, Marianna Apidianaki, Xavier Tannier, Na-
  the 6th evaluation campaign of natural language          talia Loukachevitch, Evgeny Kotelnikov, Nuria Bel,
  processing and speech tools for italian. In Tom-         Salud Marı́a Jiménez-Zafra, and Gülşen Eryiğit.
  maso Caselli, Nicole Novielli, Viviana Patti, and        2016. SemEval-2016 task 5: Aspect based senti-
  Paolo Rosso, editors, Proceedings of Sixth Evalua-       ment analysis. In Proceedings of the 10th Interna-
  tion Campaign of Natural Language Processing and         tional Workshop on Semantic Evaluation, SemEval
  Speech Tools for Italian. Final Workshop (EVALITA        ’16, San Diego, California, June. Association for
  2018), Turin, Italy. CEUR.org.                           Computational Linguistics.

Andrea Cimino, Lorenzo De Mattei, and Felice             Theresa Wilson, Janyce Wiebe, and Paul Hoffmann.
  Dell’Orletta. 2018. Multi-task Learning in Deep          2005. Recognizing contextual polarity in phrase-
  Neural Networks at EVALITA 2018. In Tom-                 level sentiment analysis. In Proceedings of the con-
  maso Caselli, Nicole Novielli, Viviana Patti, and        ference on human language technology and empiri-
  Paolo Rosso, editors, Proceedings of the 6th evalua-     cal methods in natural language processing, pages
  tion campaign of Natural Language Processing and         347–354. Association for Computational Linguis-
  Speech tools for Italian (EVALITA’18), Turin, Italy.     tics.
  CEUR.org.

Rodolfo Delmonte. 2018. Itvenses - a symbolic sys-
  tem for aspect-based sentiment analysis. In Tom-
  maso Caselli, Nicole Novielli, Viviana Patti, and
  Paolo Rosso, editors, Proceedings of the 6th evalua-
  tion campaign of Natural Language Processing and
  Speech tools for Italian (EVALITA’18), Turin, Italy.
  CEUR.org.

Emanuele Di Rosa and Alberto Durante.          2018.
  Aspect-based sentiment analysis: X2check at absita
  2018. In Tommaso Caselli, Nicole Novielli, Viviana
  Patti, and Paolo Rosso, editors, Proceedings of the
  6th evaluation campaign of Natural Language Pro-
  cessing and Speech tools for Italian (EVALITA’18),
  Turin, Italy. CEUR.org.

Bing Liu. 2007. Web data mining. Springer.

Giancarlo Nicola. 2018. Bidirectional attentional
  lstm for aspect based sentiment analysis on ital-
  ian. In Tommaso Caselli, Nicole Novielli, Viviana
  Patti, and Paolo Rosso, editors, Proceedings of the
  6th evaluation campaign of Natural Language Pro-
  cessing and Speech tools for Italian (EVALITA’18),
  Turin, Italy. CEUR.org.

Maria Pontiki, Dimitris Galanis, John Pavlopoulos,
 Harris Papageorgiou, Ion Androutsopoulos, and
 Suresh Manandhar. 2014. Semeval-2014 task 4:
 Aspect based sentiment analysis. In Proceedings of
 the 8th International Workshop on Semantic Evalua-
 tion (SemEval 2014), pages 27–35, Dublin, Ireland,
 August. Association for Computational Linguistics
 and Dublin City University.

Maria Pontiki, Dimitris Galanis, Haris Papageorgiou,
 Suresh Manandhar, and Ion Androutsopoulos. 2015.
 Semeval-2015 task 12: Aspect based sentiment anal-
 ysis. In Proceedings of the 9th International Work-
 shop on Semantic Evaluation (SemEval 2015), pages
 486–495, Denver, Colorado, June. Association for
 Computational Linguistics.

Maria Pontiki, Dimitrios Galanis, Haris Papageor-
 giou, Ion Androutsopoulos, Suresh Manandhar, Mo-
 hammad AL-Smadi, Mahmoud Al-Ayyoub, Yanyan