=Paper=
{{Paper
|id=Vol-2693/invited2
|storemode=property
|title=Universal, Unsupervised (rule-based), Uncovered Sentiment Analysis
|pdfUrl=https://ceur-ws.org/Vol-2693/invited2.pdf
|volume=Vol-2693
|authors=Carlos Gómez-Rodríguez
|dblpUrl=https://dblp.org/rec/conf/ecai/Gomez20
}}
==Universal, Unsupervised (rule-based), Uncovered Sentiment Analysis==
Proceedings of the Workshop on Hybrid Intelligence for Natural Language Processing Tasks HI4NLP (co-located at ECAI-2020)
Santiago de Compostela, August 29, 2020, published at http://ceur-ws.org
Syntactically Enriched Multilingual Sentiment Analysis
Carlos Gómez-Rodrı́guez
Universidade da Coruña, CITIC
Elviña, 15701 A Coruña, Spain
carlos.gomez@udc.es
Abstract
Sentiment analysis of natural language texts needs to deal with linguis-
tic phenomena like negation, intensification or adversative clauses. In
this talk, I present an approach to tackle such phenomena by means of
syntactic information. Our approach combines machine learning and
symbolic processing: the former is used to obtain dependency trees for
input sentences, and the latter to obtain the sentiment polarity for each
sentence using handwritten rules that traverse the tree. Thanks to uni-
versal guidelines for syntactic annotation, our approach is applicable
to multiple languages without rewriting the rules. Additionally, very
accurate parsing is not needed for our approach to be helpful: fast and
simple parsers will do, even if they lag behind state-of-the-art accuracy.
The contributions presented in this talk are joint work with
David Vilares, Miguel A. Alonso and Iago Alonso-Alonso.
1 Background
Polarity classification is a basic task in sentiment analysis of natural language texts, consisting on determining
whether the expressed opinion in a piece of text is positive, negative or neutral. Since the seminal work by Pang
et al. [3], many approaches have addressed the task by using machine learning models on features extracted
from individual words or n-grams. However, a sentence is more than a set of words. For example, the following
sentences contain the same words, but have opposite polarities:
This phone is expensive, and not really very good. / This phone is good, and not really very expensive.
What sets them apart is word order, which in turn determines their syntactic structure, i.e., the way in which
words interact with each other to form meaningful sentences. Namely, their polarities differ mainly due to the
different roles played by the negation “not”. In the first example, it modifies the positive word “good”, making
the phone “not good” and thus expressing a negative view, while in the second example, it modifies the negative
word “expensive”, making it “not expensive” which is positive. Since syntactic relations like these can happen
at arbitrarily long distances, n-grams cannot always capture them – but a syntactic dependency tree can.
Syntax can be incorporated into a sentiment analysis model in various ways: for example, by decomposing
trees into features for a supervised classifier [2] or by training a neural network on a syntactic treebank augmented
with sentiment information [4]. Here, I present a different approach, where a dependency parse tree is used in
conjunction with sentiment lexicons to propagate polarity according to a set of handwritten rules.
2 Sentiment analysis with syntactic rules
Our approach is described in detail in [6] (for Spanish) and [7] (for multiple languages). In brief, given a sentence,
our sentiment analysis process has the following steps:
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
5
• The sentence is tokenized, part-of-speech tagged and parsed to a syntactic dependency tree. Any off-the-shelf
dependency parser can be used for this purpose.
• Standard third-party sentiment lexicons are used to assign individual polarities to subjective words (for
example, in a scale of -5 to +5, the word “excellent” could be assigned a polarity of +5, and “good” could
be assigned +3).
• Handwritten rules are used to propagate the polarity in a top-down fashion, from individual words to the
root of the tree (which yields the global polarity of the sentence). Rules are written to deal with syntactic
phenomena that influence polarity. For example, given a negation, we use rules on the dependency tree to
identify the scope of negation, and we modify the polarity of the negated element by subtracting -4 if it is
positive, or +4 if it is negative (e.g. “not good” will be assigned 3 − 4 = −1). Rules are also written to deal
with adversative clauses, intensifiers and conditional statements.
Two conditioning factors need to be taken into account when writing the rules to ensure generalizability of
the system across languages: first, rules are dependent on syntactic annotation criteria (but the appearance of
universal annotation criteria that can be applied cross-linguistically, like Universal Dependencies, means that
this is not an obstacle for multilinguality). Second, lists of negation words, intensifiers, adversative conjunctions
and words introducing conditionals are needed to make the rules generalizable across languages, but these are
available with standard sentiment lexicons.
Our experiments show that our approach outperforms existing unsupervised alternatives, as well as supervised
models when evaluated outside of their corpus of origin. Using a high-accuracy parser is not vital [1], as we
obtain good results even with modest parsing accuracy. Thus, very fast parsers, like those based on sequence
labeling [5], can be used to obtain an overall efficient and green sentiment analysis system.
Acknowledgements
This work has received funding from the European Research Council (ERC), under the European Union’s Horizon
2020 research and innovation programme (FASTPARSE, grant agreement No 714150), from the ANSWER-ASAP
project (TIN2017-85160-C2-1-R) from MINECO, and from Xunta de Galicia and ERDF (ED431B 2017/01,
ED431G 2019/01).
References
[1] Carlos Gómez-Rodrı́guez, Iago Alonso-Alonso, and David Vilares, ‘How important is syntactic parsing ac-
curacy? An empirical evaluation on rule-based sentiment analysis’, Artificial Intelligence Review, 52(3),
2081–2097, (Oct 2019).
[2] Mahesh Joshi and Carolyn Penstein-Rosé, ‘Generalizing dependency features for opinion mining’, in Proc. of
ACL-IJCNLPShort Papers, pp. 313–316, Suntec, Singapore, (August 2009). ACL.
[3] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan, ‘Thumbs up? sentiment classification using machine
learning techniques’, in Proc. of EMNLP, pp. 79–86. ACL, (July 2002).
[4] Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christo-
pher Potts, ‘Recursive deep models for semantic compositionality over a sentiment treebank’, in Proc. of
EMNLP, pp. 1631–1642, Seattle, Washington, USA, (October 2013). ACL.
[5] Michalina Strzyz, David Vilares, and Carlos Gómez-Rodrı́guez, ‘Viable dependency parsing as sequence
labeling’, in Proc. of NAACL, pp. 717–723, Minneapolis, Minnesota, (June 2019). ACL.
[6] David Vilares, Miguel A. Alonso, and Carlos Gómez-Rodrı́guez, ‘A syntactic approach for opinion mining on
Spanish reviews’, Natural Language Engineering, 21(01), 139–163, (2015).
[7] David Vilares, Carlos Gómez-Rodrı́guez, and Miguel A. Alonso, ‘Universal, unsupervised (rule-based), un-
covered sentiment analysis’, Knowledge-Based Systems, 118, 45 – 55, (2017).
6