<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ATE ABSITA @ EVALITA2020: Overview of the Aspect Term Extraction and Aspect-based Sentiment Analysis Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lorenzo De Mattei</string-name>
          <email>lorenzo.demattei@di.unipi.it</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Graziella De Martino</string-name>
          <email>graziella.demartino@uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessio Miaschi Marco Polignano</string-name>
          <email>alessio.miaschi@phd.unipi.it</email>
          <email>alessio.miaschi@phd.unipi.it marco.polignano@uniba.it</email>
          <email>marco.polignano@uniba.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Iovine</string-name>
          <email>andrea.iovine@uniba.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giulia Rambelli</string-name>
          <email>giulia.rambelli@phd.unipi.it</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bari A. Moro, Dept. Computer Science</institution>
          ,
          <addr-line>E. Orabona 4, Bari</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Bari A. Moro, Dept. Computer Science</institution>
          ,
          <addr-line>E. Orabona 4, Bari</addr-line>
          ,
          <institution>SWAP Research Group</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Pisa University of Bari A. Moro, Ist. di Ling. Comp. Dept. Computer Science</institution>
          , “
          <addr-line>Antonio Zampolli” E. Orabona 4, Bari</addr-line>
          ,
          <institution>Pisa ItaliaNLP Lab SWAP Research Group</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Pisa, Coling Lab Pisa, Aix-Marseille University</institution>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Pisa, Ist. di Ling. Comp., “Antonio Zampolli”, Pisa ItaliaNLP Lab</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Over the last years, the rise of novel sentiment analysis techniques to assess aspect-based opinions on product reviews has become a key component for providing valuable insights to both consumers and businesses. To this extent, we propose ATE ABSITA: the EVALITA 2020 shared task on Aspect Term Extraction and Aspect-Based Sentiment Analysis. In particular, we approach the task as a cascade of three subtasks: Aspect Term Extraction (ATE), Aspect-based Sentiment Analysis (ABSA) and Sentiment Analysis (SA). Therefore, we invited participants to submit systems designed to automatically identify the ”aspect term” in each review and to predict the sentiment expressed for each aspect, along with the sentiment of the entire review. The task received broad interest, with 27 teams registered and more than 45 participants. However, only three teams submitted their working systems. The results obtained underline the task's difficulty, but they also show how it is possible to deal with it using innovative approaches and models. Indeed, two of them are based on large pre-trained language models as typical in the current state of the art for the English language. (de Mattei et al., 2020)</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>“Copyright © 2020 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).”</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction and motivation</title>
      <p>
        Leaving comments and reviews on the Web has
become a common practice for users to express
their opinions about products, experiences, and
more. Thus, companies need to deal with
increasingly large amounts of textual data, which
can be useful to identify their products’ strengths
and weaknesses. However, the automatic
analysis of reviews poses numerous problems related
to its processing. First of all, reviewers often use
informal language, with a wide variety of
colloquialisms and contractions, which make review
analysis through lexicon-based techniques
difficult. Second, automatically identifying aspects of
the product within a sentence is not easy, due to the
intrinsic subjectivity in the definition of ”aspect”.
These issues have already been addressed in the
area of Text Mining and Sentiment Analysis.
Recently, the sentiment analysis and opinion mining
tasks have seen a surge in interest, thanks to the
large quantity of data available for analysis and the
new natural language processing techniques based
on language models such as BERT
        <xref ref-type="bibr" rid="ref10">(Devlin et al.,
2019)</xref>
        and GPT
        <xref ref-type="bibr" rid="ref19">(Radford et al., 2019)</xref>
        . Thus, we
proposed the ATE ABSITA: the EVALITA 2020
        <xref ref-type="bibr" rid="ref3">(Basile et al., 2020)</xref>
        shared task on Aspect Term
Extraction and Aspect-Based Sentiment Analysis.
      </p>
      <p>
        Sentiment Analysis (or Opinion Mining) is the
task of identifying what the user thinks about a
particular element. It often takes the form of a
classification task with the purpose of annotating a
portion of text with a positive, negative, or neutral
label. Aspect-based Sentiment Analysis (ABSA)
is an evolution of Sentiment Analysis that aims
at capturing the aspect-level opinions expressed
in natural language texts
        <xref ref-type="bibr" rid="ref13">(Liu, 2007)</xref>
        . Very
often, the ABSA task is performed on a set of
aspects defined a priori, limiting its applicability in
the real scenario. Aspect Term Extraction (ATE) is
the task of identifying ”aspect term” in a text
without knowing a priori the list that contains it.
According to the literature definition, a term/phrase
is considered as an aspect when it co-occurs with
some “opinion words” that indicate a sentiment
polarity on it
        <xref ref-type="bibr" rid="ref17 ref18">(Pontiki et al., 2016a)</xref>
        .
      </p>
      <p>
        At the international level, SemEval, the most
prominent evaluation campaign in the Natural
Language Processing field, provided in 2014
SEABSA14
        <xref ref-type="bibr" rid="ref16">(Pontiki et al., 2014)</xref>
        a benchmark
dataset of reviews in the English language for the
ABSA task. Given a set of sentences with
preidentified entities (e.g., restaurants), the task was
about identifying the aspect term occurring in the
sentences and returning a list containing all the
distinct aspect term. It was then asked for all
retrieved aspect term to determine whether the
polarity of each of them was positive, negative,
neutral, or conflict. The same task was replicated in
2015, 2016, consolidating the four subtasks of
SEABSA14
        <xref ref-type="bibr" rid="ref16">(Pontiki et al., 2014)</xref>
        within a unified
framework. Besides, SE-ABSA15
        <xref ref-type="bibr" rid="ref17 ref18">(Pontiki et al.,
2016b)</xref>
        included an out-of-domain ABSA subtask,
involving test data from a domain unknown to the
participants.
      </p>
      <p>
        ABSA is not a novel task at EVALITA. A first
edition was proposed at EVALITA 2018 by
        <xref ref-type="bibr" rid="ref4">(Basile et
al., 2018)</xref>
        . The task was subdivided into two
subtasks: Aspect Category Detection (ACD) and
Aspect Category Polarity (ACP). The first was about
the identification of categories mentioned into the
review, knowing the categories a priori. The
latter was about the detection of the polarity of the
opinion of the user about the previous detected
categories. However, it bears some similarities
with at least other two tasks from the previous
editions of the campaign. SENTIPOLC
        <xref ref-type="bibr" rid="ref1 ref16">(Basile et al.,
2014)</xref>
        , featured in the 2014 and 2016 editions of
EVALITA, is a shared task on the polarity
classification of social media content. The other is
NEEL-it
        <xref ref-type="bibr" rid="ref17 ref18 ref2">(Basile et al., 2016)</xref>
        , held at EVALITA
2016. NEEL-it is the task of Named Entity
Recognition and Linking, that is, the task of identifying
the spans of an input text that refer to named
entities, and linking them to entries in a knowledge
base, e.g., pages of Wikipedia.
aspect term
mantenere la
temperatura
costruzione
We define the ATE ABSITA task as a cascade of
three subtasks: Aspect Term Extraction (ATE),
      </p>
    </sec>
    <sec id="sec-3">
      <title>Aspect-based Sentiment Analysis (ABSA), Sentiment Analysis (SA).</title>
      <p>For example, let us consider the sentence
describing a review of a metallic bottle:</p>
      <p>La borraccia termica svolge egregiamente il
proprio compito di mantenere la temperatura,
calda o fredda che sia. La costruzione e` ottimale
e ben rifinita. Acquisto straconsigliato!
The thermal water bottle does its job very well to
keep the temperature, whether hot or cold. The
construction is optimal and well finished.</p>
      <p>Purchase highly recommended!</p>
    </sec>
    <sec id="sec-4">
      <title>In the Aspect Term Extraction (ATE) task,</title>
      <p>one or more ”aspect term” mentioned in a
sentence are identified, e.g. mantenere la
temperatura (keep the temperature)
and costruzione (construction) in the
sentence. Given a sequence X = x; :::; xT of
T words, the ATE task can be formulated as a
token/word level sequence labeling problem to
predict an aspect label sequence Y = y1; :::; yT ,
where each yi comes from a finite label set
Y = B; I; O which describes the possible aspect
labels (begin, inside, outside). An example of
ATE annotation is provided in Fig. 1.</p>
      <p>In the Aspect-based Sentiment Analysis
(ABSA) task, the polarity of each expressed aspect
is recognized, e.g. a positive category
polarity is expressed concerning the mantenere la
temperatura aspect. The two labels are not
mutually exclusive: in addition to the annotation
of positive aspects (POS:true, NEG:false) and
negative aspects (POS:false, NEG:true), there can be
aspects with mixed polarity (POS:true, NEG:true),
or neutral polarity (POS:false, NEG:false). An
example ot ABSA annotation is showed in Tab. 1.</p>
      <p>In the Sentiment Analysis (SA) task, the
polarity of the review is provided. In particular, we
decided to use the score left by the user at the item
as the polarity value. It is defined as an integer
number within the [1::5] range. An example is
provided in Tab. 2.</p>
      <p>Review
La borraccia termica svolge egregiamente
il proprio compito di mantenere la
temperatura,calda o fretta che sia. La costruzione e`
ottimale e ben rifinita. Acquisto straconsigliato!
Score
5</p>
      <p>In the ATE task here described, the set of
aspects is not defined in advance, and the task
itself is formalized as a Sequence Labeling task.
The ABSA task can, instead, be formalized as a
multi-class classification task. Finally, the
Sentiment Analysis is considered as a regression task.
For each review, participants will be asked to
return a vector of aspects, a vector of aspect:polarity
pairs, and a review:score pair. Two binary
polarity labels are expected for each aspect: POS and
NEG, indicating a positive and negative sentiment
expressed towards a specific aspect, respectively.
The participants may choose to submit each of the
three subtasks independently.
3</p>
    </sec>
    <sec id="sec-5">
      <title>Dataset</title>
      <p>The data source chosen for creating the datasets is
an eCommerce platform famous worldwide. The
platform allows users to share their opinions about
the items that they bought through a textual
review and a final score of satisfaction. Therefore,
the website provides a large number of reviews in
many languages, including Italian (Fig. 2). We
have collected 4364 real user reviews, written in
the Italian language, involving 23 products. The
training, dev and test sets will be randomly
generated in the following ratios: 70% training, 2.5%
dev, 27.5% test set. This means that the test set
will be not out-of-domain. The items cover very
different domains of use. In particular, the
existing objects refer to: SD Memory Cards, Irons,
Water Bottles, Action Cameras, Razors, Phones,
Printer Cartridges, Coffee Capsules, Backpacks,
Hair Dryers, 2 different Movies, 2 different Books,
Toy Phones, Car Light bulbs, Sweatshirts, Boots,
Fans, Storage Chest, Shoe Cabinets, Personal
Digital Assistants, TV streaming boxes/sticks. A
portion of the collected data has been manually
annotated by three different subjects. Then, we
measured the inter-annotator agreement metric as
the value of quality of all the annotations. In
particular, we obtained a score of 73.2% over 100
reviews. Thanks to the good score, we decided
to continue the annotation process by annotating
each review individually (i.e. one annotator per
review). At the end of the annotation process, we
obtained the gold annotated dataset. We randomly
split the gold dataset to create a
training/validation/test partition of it.</p>
      <p>We do not provide any unique ID that could be
used to retrieve more information about the
writers. Consequently, we do not violate copyrights
and/or we do not have privacy issues.
Furthermore, in order to avoid harming the interests of the
manufacturers, we do not disclose any information
about the specific items for which the reviews have
been issued.</p>
      <p>The data format used is NDJSON 1 with UTF-8
encoding and newline as delimiter. Note that some
reviews may not contain any aspect, but the final
review score is always available. An example of</p>
      <sec id="sec-5-1">
        <title>1http://ndjson.org/</title>
        <p>{"sentence":"L’attore...e le musiche indimenticabili", "id_sentence":"4c0b","score":5,
"polarities":[[0,0],[1,0]], "aspects_position":[[2,8],[16,23]], "aspects":["attore","musiche"]}
{"sentence":"Schermo guasto dopo appena due settimane,...","id_sentence":"4e1671","score":1,
"polarities":[[0,1]],"aspects_position":[[0,7]],"aspects":["Schermo"]}
{"sentence":"Ottimo telefono belle foto","id_sentence":"4eca9d08","score":4,"polarities":[[1,0]],
"aspects_position":[[22,26]],"aspects":["foto"]}
annotated data is provided in the code reported in
Fig. 3.
4</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Annotation Schema</title>
      <p>This section describes the protocol that will be
used to annotate the datasets for the three subtasks.
The objective of this protocol is to get a reasonably
objective definition of the characteristics of an
aspect term. Due to the highly subjective nature of
aspects, it does not encompass all conceivable
aspect term. We define an aspect term as:
(a) An attribute (characteristic, property,
feature, quality) of the object itself; (b) a tangible or
abstract part of the object, for which an opinion
can be inferred from the review; (c) the activities
that the object is able (or not able) to perform; (d)
the object’s ability to be suitable for certain
categories of people.</p>
      <p>Judgment can be assigned in three ways: 1.
Directly: the aspect term occurs with an opinion term
(i.e., “la durata della batteria e` ottima”); 2.
Indirectly: the judgment about the product is transitive
to a quality or part of the object. In other words, if
an opinion is expressed about the object itself, and
it is then stated for which characteristic the
judgment is applied, these characteristics are annotated
as an aspect term (i.e., “questo telefono e` ottimo,
soprattutto per la durata della batteria”); 3.
Deductible: the opinion is not expressed directly but
it is inferable from the review or from the
knowledge of the reference domain.</p>
      <p>The aspect term must represent the product
characteristics, but it cannot represent a concept
that is larger than the product itself. An aspect
term does not identify opinions regarding
elements external to the object, such as: (a) The
shipment (it is not an intrinsic property of the object);
(b) the company that produced them, the series to
which the product belongs or other products with
which the object is compared; (c) the elements that
refer to the action of purchasing the item; (d) the
elements that refer to the customer care.
Moreover, in the case of aspect term composed of
several words, all the words that make up the aspect
term must be contiguous. In case they are
separated by one or more words that are not part of
the aspect term, the whole expression is discarded.
More details and example of annotations are
available on the task website2.
5</p>
    </sec>
    <sec id="sec-7">
      <title>Evaluation measures and baselines</title>
      <p>We evaluate the three subtasks (ATE, ABSA and
SA) separately by comparing the results obtained
by the participant systems on the gold standard
annotations of the test set.</p>
      <p>For the ATE task, we compute Precision, Recall
2http://www.di.uniba.it/˜swap/ate_
absita/examples.html
and F1-score defined as:</p>
      <p>F 1a =</p>
      <sec id="sec-7-1">
        <title>2PaRa Pa + Ra</title>
        <p>In order to account for both exact and partial
matches of aspect term, we define Precision (Pa)
and Recall (Ra) as:</p>
        <p>Pa = jSa\Gaj+0:5 jP ARaj</p>
        <p>jSaj
Ra = jSa\Gaj+0:5 jP ARaj
jGaj
Here, Sa is the set of aspect term annotations that
a system returned for all the test sentences, Ga
is the set of the gold (correct) aspect term
annotations and P ARa is the set of partial matches
(predicted and gold aspect term have some
overlapping text). For instance, if a review is
labeled in the gold standard with the two aspect term
Ga = fcostruzione; mantenere la temperaturag,
and the system predicts the two aspects Sa =
fcostruzione; temperaturag, we have that jSa \
Gaj = 1, jP ARaj = 1, jGaj = 2 and jSaj = 2,
so that Pa = 1:5 = 0:75, Ra = 1:5 = 0:75 and
2 2
F 1a = 1:5 = 0:75. For the ATE task, we
con2
sidered a simple baseline approach which
considers every name entity as an aspect term. The
algorithm is based on a Name Entity Recognition
(NER) annotation obtained through the SpaCy 3
tool on the Italian model ’it core news sm’. The
implementation of the baseline on the training set
is available as a Python3 Notebook on our website.</p>
        <p>For the ABSA task (Task 2), we evaluate
the entire chain, thus considering both the
aspect term detected in the sentences together with
their corresponding polarities, in the form of
(aspect; polarity) pairs. We again compute
Precision (Pp), Recall (Rp) and F1-score (F 1p)
defined as following:
(1)
(2)
(3)
(4)
with a partial match. For instance, if a review is
labeled in the gold standard with the pairs:
Gp = f(mantenere la temperatura; P OS);
(costruzione; P OS)g,
and the system predicts the three pairs
Sp = f(temperatura; N EG); (costruzione; P OS);
(acquisto; P OS)g,
we have that jSp \ Gpj = 1, jP ARpj = 0 ,
jGpj = 2 and jSpj = 3 so that Pp = 13 , Rp = 12
and F 1p = 0:4. As a baseline for the ABSA task,
we decided to assign the most frequent polarity
class (i.e. the positive one) to each aspect found
by the baseline strategy for Task 1.</p>
        <p>
          To evaluate the SA task (Task 3), we compute
the Root Mean Squared Error (RM SEw) between
the scores predicted by the participant systems and
those found in the gold dataset. For this task, we
employed three different baselines. The first
predicts the most frequent value in the training set:
5. The second predicts the average value of the
scores found on the training set (4:46299). The
third one uses AlBERTo
          <xref ref-type="bibr" rid="ref15">(Polignano et al., 2019)</xref>
          as an approach to develop a Regression task.
6
        </p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Task statistics</title>
      <p>The task has generated great interest in the
scientific community. We obtained 27 registered teams,
for a total of 45 separate participants.
Nevertheless, the difficulty of the task discouraged many
of them. At the end of the evaluation phase, we
obtained 8 submissions from 3 different teams.
7</p>
    </sec>
    <sec id="sec-9">
      <title>Submitted systems</title>
      <p>
        The three teams participating in the task are the
following:
• A2C
        <xref ref-type="bibr" rid="ref11 ref20">(Rosa and Durante, 2020)</xref>
        : the
team is composed of two members of the
App2Check company, who developed a
classification model based on state-of-the-art
language models. In particular, they
investigate the ATE task through the use of four
different configurations of language models:
1. Native Italian pre-trained language
models, with no specific NER fine-tuning and
3. with NER fine-tuning; 2. Multilingual
pre-trained language model, with no
specific NER fine-tuning and 4. with NER
fine-tuning. For the first and the third
configuration, they considered
dbmdz/bert-baseF 1p =
      </p>
      <sec id="sec-9-1">
        <title>2PpRp Pp + Rp</title>
        <p>Pp = jSp\Gpj+0:5 jP ARpj</p>
        <p>jSpj
Rp = jSp\Gpj+0:5 jP ARpj
jGpj
Where Sp is the set of (aspect; polarity) pairs
that a system returned for all the test sentences,
Ga is the set of the gold (correct) pairs annotations
and P ARp is the set of (aspect; polarity) pairs</p>
        <sec id="sec-9-1-1">
          <title>3https://spacy.io/</title>
          <p>
            italian-xxl-uncased4 and GilBERTo5. For the
second configuration, they considered two
implementations of RoBERTa:
xml-robertalarge3
            <xref ref-type="bibr" rid="ref8">(Conneau et al., 2019)</xref>
            , xml-
robertabase4
            <xref ref-type="bibr" rid="ref12">(Liu et al., 2019)</xref>
            , and multilingual
BERT
            <xref ref-type="bibr" rid="ref14">(Pires et al., 2019)</xref>
            . The xlm RoBERTa
Large multilingual model was chosen as the
competition model. The ABSA task has
been performed by fine-tuning a multilingual
BERT model in order to assign the polarity
label to each portion of text that contains at
least one previously detected aspect.
Similarly, the SA task has been approached using
a multilingual BERT model on a 1 to 5
sentiment scale. The system submitted by the
A2C team obtained the best results overall.
• SentNa
            <xref ref-type="bibr" rid="ref11 ref20">(Francesco Mele and Vettigli, 2020)</xref>
            :
the authors proposed a hybrid model that
joins rule-based and machine learning
methodologies in order to combine their
respective advantages. The main idea for
dealing with the ATE task is to identify a
set of plausible aspects via some predefined
rules. Then, a classifier is used to filter
out the wrong candidates. The rules are
defined on POS-Tagging patterns. The
authors defined a set of about 3000 rules.
The sentiment analysis problem has been
solved by building the features representing
the text using n-grams, and adding a set of
features annotated in SenticNet
            <xref ref-type="bibr" rid="ref7">(Cambria et
al., 2010)</xref>
            . Then, a regressor composed of
800 Decision Trees with 4 layers has been
trained using Gradient Boosting. The final
prediction is computed by averaging the
output of each tree.
• ghostwriter19
            <xref ref-type="bibr" rid="ref5">(Bennici, 2020)</xref>
            : the team
composed of one member of the
YouAreMyGuide Company proposes a solution
based on mixing transfer learning, zero-shot
learning
            <xref ref-type="bibr" rid="ref6">(Brown et al., 2020)</xref>
            , and ONNX6,
in order to access the power of BERT while
using limited resources. In order to deal
with the ATE and ABSA tasks, the author
uses the AlBERTo
            <xref ref-type="bibr" rid="ref15">(Polignano et al., 2019)</xref>
            language model and an auto training system
          </p>
        </sec>
        <sec id="sec-9-1-2">
          <title>4https://github.com/dbmdz/berts</title>
        </sec>
        <sec id="sec-9-1-3">
          <title>5https://github.com/idb-ita/GilBERTo</title>
        </sec>
        <sec id="sec-9-1-4">
          <title>6https://microsoft.github.io/</title>
          <p>onnxruntime/
such as Ktrain7 for fine-tuning the system. At
this point, the model has been exported with
ONNX in maximum compatibility mode with
the original. The optimization options have
been set to a minimum for CPU usage. The
performances have remained unchanged, but
the speed of inference has significantly
improved. For the sentiment analysis task, the
author uses a zero-shot learning strategy as a
way to make predictions without prior
training. In particular, he reuses the embedding of
AlBERTo for encoding the sentence and a
BiLSTM as classification model to predicting a
class from 1 to 5.
8</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Discussion of results</title>
      <p>The results in tables from 3-5 show the optimal
performances of the system developed by the A2C
team, which obtained first place in all three
subtasks. The use of pre-trained language models has
proven to be the winning strategy. In particular,
the differences between the results of A2C and
ghostwriter19 show how a large RoBERTa model
can strongly outperform a smaller language model
such as AlBERTo, even though the latter has been
specifically trained on the Italian language. This
result was expected, since the ALBERTo baseline
also obtained low results. We hypothesize that
the difference in style between the tweets that were
used to train ALBERTo and the reviews contained
in this dataset are a significant factor in the low
applicability of this model. Additionally, the results
obtained by the A2C system also show that
pre</p>
      <sec id="sec-10-1">
        <title>7https://github.com/amaiya/ktrain</title>
        <p>training the language model for the Named Entity
Recognition (NER) task is also useful for
identifying aspect term. This is due to the fact that aspect
term share some properties with named entities.
For example, they are often configured as a noun,
an adjective, or a combination of both.</p>
        <p>The results obtained by SentNa are also
interesting. Their model, which is based on decision
trees, has obtained a good final score for the SA
task. This confirms the findings obtained in
earlier Sentiment Analysis tasks in Italian campaigns
such as EVALITA, which already demonstrated
that techniques such as Decision Trees, Random
Forests, and SVD can be effective solutions to this
task. Nevertheless, the SentNa system
demonstrates that an enriched encoding of the sentences,
including lexical features such as polarity value,
attention, pleasantness, and sensitivity of its
composing n-grams, can support a more accurate
prediction of the whole sentence polarity.
9</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Conclusion</title>
      <p>In the ATE ABSITA task at EVALITA 2020, we
focused the attention of research groups that work
on computational linguistics for the Italian
language on the problem of analyzing user reviews.
Specifically, we subdivided the problem into three
parts: Aspect Term Extraction (ATE),
AspectBased Sentiment Analysis (ABSA), Sentence
Sentiment Analysis (SA). In the ATE task, the goal
was to identify one or more “aspect term”
discussed in the review. The second task was about
identifying the sentiment evoked by the user while
talking about a specific aspect (ABSA). Finally,
we asked participants to identify the polarity
associated with the entire review (SA). The dataset we
released has been collected from a world-famous
eCommerce platform. In particular, we extracted
and manually annotated 4364 real user reviews,
written in the Italian language, about 23
different products. Although the results obtained by
the systems that participated in the task are very
close to those available in the English language
literature, the F1 scores for the ATE and ABSA
subtasks demonstrate its complexity. It is evident
that an F1 score of about 0.60 generates a
nonnegligible margin of error of prediction. The
diversity in terms, linguistic expressions, and in the
physical characteristics of the products themselves
makes the automatic extraction of ”aspect term” a
task that is far from being resolved. This
complexity can also explain the low number of
participants. It is easy to see a substantial
discrepancy between the number of people enrolled in the
task and those who have proposed a solution for
it. In our opinion, this is caused by the difficulty
in addressing the problem with the current natural
language analysis techniques. However, this also
means that there is still a wide margin for
improvement in this area, and that this problem can be
addressed again in the next edition of EVALITA. We
firmly believe that extracting fine-grained opinions
from user reviews can be a great asset for
improving products, processes, and software systems.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Andrea Bolioli, Malvina Nissim, Viviana Patti, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Overview of the evalita 2014 sentiment polarity classification task</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Annalina Caputo, Anna Lisa Gentile, and
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Rizzo</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Overview of the evalita 2016 named entity recognition and linking in italian tweets (neel-it) task</article-title>
          .
          <source>In of the Final Workshop 7 December</source>
          <year>2016</year>
          , Naples, page
          <volume>40</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          , Danilo Croce, Maria Di Maro, and
          <string-name>
            <surname>Lucia</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Evalita 2020: Overview of the 7th evaluation campaign of natural language processing and speech tools for italian</article-title>
          .
          <source>In Valerio Basile</source>
          , Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2020</year>
          ),
          <article-title>Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          et al.
          <year>2018</year>
          .
          <article-title>O verview of the evalita 2018 aspect-based sentiment analysis task (absita)</article-title>
          .
          <source>EVALITA Evaluation of NLP and Speech Tools for Italian</source>
          ,
          <volume>12</volume>
          :
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Mauro</given-names>
            <surname>Bennici</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>ghostwriter19 @ ATE ABSITA: Zero-Shot and ONNX to speed up BERT on sentiment analysis tasks at EVALITA 2020</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Tom B Brown</surname>
          </string-name>
          , Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry,
          <string-name>
            <given-names>Amanda</given-names>
            <surname>Askell</surname>
          </string-name>
          , et al.
          <year>2020</year>
          .
          <article-title>Language models are few-shot learners</article-title>
          . arXiv preprint arXiv:
          <year>2005</year>
          .14165.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Erik</given-names>
            <surname>Cambria</surname>
          </string-name>
          , Robert Speer, Catherine Havasi, and
          <string-name>
            <given-names>Amir</given-names>
            <surname>Hussain</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Senticnet: A publicly available semantic resource for opinion mining</article-title>
          .
          <source>In AAAI fall symposium: commonsense knowledge</source>
          , volume
          <volume>10</volume>
          . Citeseer.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Alexis</given-names>
            <surname>Conneau</surname>
          </string-name>
          , Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzma´n, Edouard Grave, Myle Ott, Luke Zettlemoyer, and
          <string-name>
            <given-names>Veselin</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Unsupervised cross-lingual representation learning at scale</article-title>
          . arXiv preprint arXiv:
          <year>1911</year>
          .02116.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Lorenzo de Mattei</surname>
            , Graziella de Martino, Andrea Iovine, Alessio Miaschi, Marco Polignano, and
            <given-names>Giulia</given-names>
          </string-name>
          <string-name>
            <surname>Rambelli</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>ATE ABSITA@EVALITA2020: Overview of the Aspect Term Extraction and Aspect-based Sentiment Analysis Task</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In NAACL-HLT (1).</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          Antonio Sorgente Francesco Mele and
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Vettigli</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>SentNA@ATE ABSITA: Sentiment Analysis of customer reviews using Boosted Trees with lexical and lexicon-based features</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Yinhan</given-names>
            <surname>Liu</surname>
          </string-name>
          , Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen,
          <string-name>
            <surname>Omer Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Mike</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Luke</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Veselin</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          . arXiv preprint arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Bing</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Web data mining</article-title>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Telmo</given-names>
            <surname>Pires</surname>
          </string-name>
          , Eva Schlinger, and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Garrette</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>How multilingual is multilingual bert? arXiv preprint</article-title>
          arXiv:
          <year>1906</year>
          .01502.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Polignano</surname>
          </string-name>
          , Pierpaolo Basile, Marco de Gemmis, Giovanni Semeraro, and
          <string-name>
            <given-names>Valerio</given-names>
            <surname>Basile</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Alberto: Italian bert language understanding model for nlp challenging tasks based on tweets</article-title>
          .
          <source>In Proceedings of the Sixth Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2019</year>
          ). CEUR.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Maria</given-names>
            <surname>Pontiki</surname>
          </string-name>
          et al.
          <year>2014</year>
          .
          <article-title>Semeval-2014 task 4: Aspect based sentiment analysis</article-title>
          .
          <source>In Proceedings of the 8th International Workshop on Semantic Evaluation.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Maria</given-names>
            <surname>Pontiki</surname>
          </string-name>
          et al. 2016a. Semeval
          <article-title>-2016 task 5: Aspect based sentiment analysis</article-title>
          .
          <source>In 10th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Maria</given-names>
            <surname>Pontiki</surname>
          </string-name>
          et al. 2016b. SemEval
          <article-title>-2016 task 5: Aspect based sentiment analysis</article-title>
          .
          <source>In Proceedings of the 10th International Workshop on Semantic Evaluation</source>
          , SemEval '
          <fpage>16</fpage>
          , San Diego, California, June. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Alec</given-names>
            <surname>Radford</surname>
          </string-name>
          , Jeff Wu, Rewon Child, David Luan,
          <string-name>
            <given-names>Dario</given-names>
            <surname>Amodei</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Language models are unsupervised multitask learners</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Emanuele</given-names>
            <surname>Di</surname>
          </string-name>
          Rosa and
          <string-name>
            <given-names>Alberto</given-names>
            <surname>Durante</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>App2Check@ATE ABSITA 2020: Aspect Term Extraction and Aspect-based Sentiment Analysis</article-title>
          . In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors,
          <source>Proceedings of the 7th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA 2020), Online</article-title>
          . CEUR.org.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>