=Paper=
{{Paper
|id=None
|storemode=property
|title=Czech Aspect-Based Sentiment Analysis: A New Dataset and Preliminary Results
|pdfUrl=https://ceur-ws.org/Vol-1422/95.pdf
|volume=Vol-1422
|dblpUrl=https://dblp.org/rec/conf/itat/TamchynaFV15
}}
==Czech Aspect-Based Sentiment Analysis: A New Dataset and Preliminary Results==
<pdf width="1500px">https://ceur-ws.org/Vol-1422/95.pdf</pdf>
<pre>
J. Yaghob (Ed.): ITAT 2015 pp. 95–99
Charles University in Prague, Prague, 2015


                                  Czech Aspect-Based Sentiment Analysis:
                                  A New Dataset and Preliminary Results

                                        Aleš Tamchyna, Ondřej Fiala, Kateřina Veselovská

                                   Charles University in Prague, Faculty of Mathematics and Physics
                                             Institute of Formal and Applied Linguistics
                                           Malostranské náměstí 25, Prague, Czech Republic
                                   {tamchyna,fiala,veselovska}@ufal.mff.cuni.cz

Abstract: This work focuses on aspect-based sentiment                2    Dataset of IT Product Reviews
analysis, a relatively recent task in natural language pro-
cessing. We present a new dataset for Czech aspect-based             We downloaded a number of user product reviews which
sentiment analysis which consists of segments from user              are publicly available on the website of an established
reviews of IT products. We also describe our work in                 Czech online shop with electronic devices. Each review
progress on the task of aspect term extraction. We believe           consists of negative and positive aspects of the product.
that this area can be of interest to other workshop partic-          This setting pushes the customer to rate its important char-
ipants and that this paper can inspire a fruitful discussion         acteristics.
on the topic with researchers from related fields.                      The dataset consists of two parts: (i) random short seg-
                                                                     ments and (ii) longest reviews. The difference in length is
                                                                     reflected also in the use of language.
1    Introduction                                                       The first part of this dataset contains 1000 positive and
                                                                     1000 negative reviews which were selected from source
Sentiment analysis (or opinion mining) is a field related            data and their targets were manually tagged. These tar-
to natural language processing (NLP) which studies how               gets were either aspects of the evaluated product or some
people express emotions (or opinions, sentiments, evalu-             general attributes (e.g. price, ease of use). The polarity
ations) in language and which develops methods to auto-              of each aspect is based on whether the user submitted the
matically identify such opinions.                                    segment as negative or positive. These short reviews often
   The most typical task of sentiment analysis is to look at         contain only the aspect without any evaluative phrase.
some short text (a sentence, paragraph, short review) and               The second part of dataset consists of the longest
determine its polarity – positive, negative or neutral.              reviews. We chose 100 of them for each polarity.
   Aspect-based sentiment analysis (ABSA) refers to dis-             These reviews represent more usual text and they tend to
covering aspects (aspect terms, opinion targets) in text and         keep proper sentence structure. The longest review has
classifying their polarity. The prototypical scenario are            7057 characters.
product reviews: we assume that products have several as-               The whole dataset provides a consistent view of lan-
pects (such as size or battery life for cellphones) and we           guage used in the on-line environment preserving both
attempt to identify users’ opinions on these individual as-          specific word forms and language structures. There is also
pects.                                                               a large amount of domain specific slang due to the origin
   This is a more fine-grained approach than the standard            of the text.
formulation of sentiment analysis where the goal would be
to classify the polarity of entire sentences (or even whole              Dataset part        #targets   #reviews   Avg. length
                                                                         Random, positive        640        1000         34.17
reviews) without regard for internal structure.
                                                                         Random, negative        508        1000         39.72
   Recently, ABSA has been gaining researchers’ interest,                Longest, positive       484         100        953.35
as evidenced e.g. by the two consecutive shared tasks or-                Longest, negative       353         100        855.04
ganized within SemEval in 2014 and 2015 [7, 6].
   ABSA can be roughly divided into two subtasks:                              Table 1: Statistics of the annotated data.
(i) identification of aspects (or aspect term extraction) in
text, i.e. marking (occurrences of) words which are evalu-              The data was annotated by a single annotator. The basic
ated; (ii) polarity classification, i.e. deciding whether the        instruction was to mark all aspects or general characteris-
opinions about the identified words are positive, negative           tics of the product. The span of the annotated term should
or neutral.                                                          be as small as possible (often a single noun). For eval-
   In this work, we introduce a new Czech dataset of prod-           uation, the span can be expanded e.g. to the immediate
uct reviews annotated for ABSA and describe a prelimi-               dependency subtree of the target. Any part of speech can
nary method of aspect term identification which combines             be marked; e.g. both “funkčnost” (“functionality”) and
a rule-based approach and machine learning.                          “funkční” (“functional”) should be marked.
96                                                                                                               A. Tamchyna, O. Fiala, K. Veselovská


                         a-tree
                                                                             words listed in a subjectivity lexicon for the given lan-
                         zone=en                                             guage.
                                                                                In the example in Figure 1, the rule vbnm_sb_adj is
                                                                             triggered because “amazing” is an evaluative word and it
                                       is   .
                                       Pred AuxK
                                                                             is a predicate adjective – the word “rice”, as the subject
                                       VBZ .                                 of this syntactic construction, is then marked as a likely
                                                                             aspect term.
                                   rice amazing                                 Originally, the rules were written for English. Their
                                   Sb Pnom
                                   NN JJ
                                                                             adaptation to Czech proved very simple. We modified ex-
                                                                             pressions which involved morphological tags to work with
                          The  fried
                                                                             the Czech positional tagset [1]. Some of the rules included
                          AuxA Atr                                           lexical items, such as the lemma “be” for identifying the
                          DT   JJ
                                                                             linking verbs of predicate nominals. Simple translation of
                                                                             these few words to Czech sufficed in such cases.
Figure 1: Dependency tree for the sentence “The fried rice
is amazing.” Morphological tags (such as NN for nouns)
and analytical functions (e.g. Sb for sentence subject) are                  3.2 Model
shown in the parse tree. The positive evaluative word                        We chose linear-chain conditional random fields (CRFs)
“amazing” triggers a rule which marks “rice” as a possible                   for our work [2]. In this model, aspect identification is
aspect.                                                                      viewed as a sequence labeling task. The input x are words
                                                                             in the sentence and the output is a labeling y of the same
                                                                             length: each word is marked as either the beginning of an
                                                                             aspect (B), inside an aspect (I) or outside an aspect (O).2
   The whole dataset contains 1985 target tags; 1124 of
                                                                                A linear-chain CRF is a statistical model. It is related to
these are positive and 861 are negative. Detailed target
                                                                             hidden Markov models (HMMs), however it is a discrim-
statistics are shown in Table 1.
                                                                             inative model, not a generative one – it directly models
   The dataset is freely available for download at the fol-
                                                                             the conditional probability of the labeling P(y|x). Linear-
lowing URL:
                                                                             chain CRFs assume that the probability of the current label
     http://hdl.handle.net/11234/1-1507.                                     (B, I or O) only depends on the previous label and on the
                                                                             input words x.
                                                                                Formally, a linear-chain CRF is the following condi-
3 Pipeline                                                                   tional probability distribution:
                                                                                                            T K
Our work is inspired by the pipeline of [15]. We run mor-                                          1
                                                                                     P(y|x) =          exp{ ∑ ∑ λk fk (yt , yt−1 ,t, x)}            (1)
phological analysis and tagging on the data to identify the                                       Z(x)     t=1 k=1
parts of speech of words and their morphological features
(e.g. case or gender for Czech). We also obtain depen-                          Roughly speaking, P(y|x) is the score of the sentence
dency parses of the sentences. Then, we use several hand-                    labeling y, exponentiated and normalized.
crafted rules based on syntax to mark the likely aspects in                     The score of y corresponds to the sum of scores for
the data. Figure 1 shows a sample dependency parse tree                      labels yt at each position t ∈ {1, . . . , T } in the sentence.
and rule application.                                                        The score at position t is the product between the values
   Unlike [15], the core of our approach is a machine-                       of feature functions fk (yt , yt−1 ,t, x) and their associated
learning model and the outputs of the rules only serve as                    weights λk , which are estimated in the learning stage.
additional “hints” (features) to help the model identify as-                    Feature functions can look at the current label yt , the
pects.                                                                       previous label yt−1 and the whole input sentence x (which
                                                                             is constant).
                                                                                Z(x) is the normalization function which sums over all
3.1     Syntactic Rules                                                      possible label sequences:
                                                                                                             T    K
We use the same rules as [15], Table 2 contains their de-
                                                                                        Z(x) = ∑ exp{ ∑ ∑ λk fk (yt0 , yt−1
                                                                                                                        0
                                                                                                                            ,t, x)}                 (2)
scription. Here, we categorize the rules somewhat dif-                                                      t=1 k=1
                                                                                                  y0
ferently, their types correspond to the actual features pre-
sented to the model.                                                            To train the model, we require training data, i.e. sen-
   The rules are designed for opinion target identification,                 tences with the labeling already assigned by a human an-
i.e. discovering targets of evaluative statements.1 They                     notator. During CRF learning, the weights λk are opti-
are based on syntactic relations with evaluative words, i.e.                 mized to maximize the likelihood of the observed labeling
                                                                                 2 This “BIO” labeling scheme is common for CRFs.        In practice, it
      1 The underlying assumption of this approach is that opinion targets   brings us a consistent slight improvement as opposed to using only binary
tend to be the sought-after aspects.                                         classification (inside vs. outside an aspect).
Czech Aspect-Based Sentiment Analysis: A New Dataset and Preliminary Results                                                                  97


       ID                  Description                                            Example
       adverb              Actor or patient of a verb with a subjective adverb.   The pizza tastes so good.
       but_opposite        Words coordinated with an aspect with “but”.           The food is outstanding, but everything else sucks.
       coord               Words coordinated with an aspect are also aspects.     The excellent mussels, goat cheese and salad.
       sub_adj             Nouns modified by subjective adjectives.               A very capable kitchen.
       subj_of_pat         Subject of a clause with a subjective patient.         The bagel have an outstanding taste.
       verb_actant_pat     Patient of a transitive evaluative verb.               I liked the beer selection.
       verb_actant_act     Actor of an intransitive evaluative verb.              Their wine sucks.
       vbnm_patn           Predicative nominal (patient).                         Our favourite meal is the sausage.
       vbnm_sb_adj         Subject of predicative adjectives.                     The fried rice is amazing.

                                                    Table 2: List of syntactic rules.


in the dataset. Gradient-based optimization techniques are                  Rule features. Finally, for each type of rule, we ex-
usually applied for learning.                                            tract features for the current token, the preceding and the
   At prediction time, the weights λk are fixed and we are               following token, indicating whether the rule marked that
looking for such a labeling ŷ which is the most probable                token. Again, these features have two versions: one stan-
according to the model, i.e.:                                            dalone and one concatenated with the surface form of the
                                                                         current token.
                         ŷ = arg max P(y|x)                    (3)
                                y
                                                                         4    Experiments
ŷ can be found efficiently using a variant of the Viterbi
algorithm (dynamic programming). In our work, we use                     We analyze our data using Treex [8], a modular NLP
the CRF++ toolkit3 both for training and prediction.                     toolkit. Sentences are first tokenized and tagged using
                                                                         Morphodita [12]. Then we obtain their dependency parses
3.3    Feature Set                                                       using the MST parser [4]. We use Czech SubLex [14] is
                                                                         our subjectivity lexicon both for the CRF sublex features
We now describe the various feature sets evaluated in this               and for the rules. The rules are implemented as blocks
work.                                                                    within the Treex platform.
   Surface features. We use the surface forms of the cur-
rent word, two preceding and two following words as sep-
                                                                         4.1 Results
arate features. Additionally, we extract all (four) bigrams
and (three) trigrams of surface forms from this window.                  Table 3 shows the obtained precision (P), recall (R) and
We also use the CRF++ bigram feature template without                    f-measure (F1) for both parts of the data set. The results
any arguments; this simply produces the concatenation of                 in all cases were acquired using 5-fold cross-validation on
the previous and current label (yt−1 , yt ).                             the training data.
   Morpho-syntactic features. We extract unigrams, bi-                       Random segments. The baseline (surface-only) fea-
grams and trigrams from a limited context window (iden-                  tures achieve the best precision but the recall is very low.
tical to the above) around the current token but instead of                  Morpho-syntactic features lower the precision by a sig-
surface forms, we look at:                                               nificant margin but push recall considerably. As the review
                                                                         data come from the “wild”, they are quite noisy; many seg-
   • lemma,                                                              ments are written without punctuation, reducing the bene-
   • morphological tag,                                                  fit of using morphological analysis, let alone dependency
                                                                         parsing.4
   • analytical function.                                                    Often, the segments are rather short, such as “Rychlé
                                                                         dodání” (“fast delivery”) or “Fotky fakt parádní.” (“Pho-
   Analytical functions are assigned by the dependency                   tos really awesome.”). This also considerably limits the
parser and their values include “Sb” for subject, “Pred”                 benefit that a parser can bring – there is a major domain
for predicate etc.                                                       mismatch both in the text topic and types of sentences be-
   Sublex features. We mark all words in the data whose                  tween the parser’s training data and this dataset, so we can-
lemma is found in the subjectivity lexicon. For each to-                 not expect parsing accuracy to be high.
ken in the window of size 4 around the current token (in-                    Most of the improvement from adding morpho-
cluded), we extract a feature indicating whether it was                  syntactic features thus probably comes from the availabil-
marked as subjective. We also concatenate these indica-                  ity of word lemmas – this allows the CRF to learn which
tor features with the surface form of the current token.
                                                                             4 This issue could perhaps be addressed by using a spell-checker, we
      3 http://taku910.github.io/crfpp/                                  leave that to future work.
98                                                                                                    A. Tamchyna, O. Fiala, K. Veselovská


                                                       Random segments (2000)         Longest reviews (200)
                            Feature set
                                                         P     R        F1              P      R       F1
                            surface                    85.22 36.85    51.45           47.18 8.05 13.76
                            +morpho-syntactic          75.88 54.17    63.21           40.17 23.08 29.31
                            +sublex                    78.19 55.09    64.64           58.74 18.99 28.70
                            +rules                     76.54 57.69    65.79           51.74 21.39 30.27

         Table 3: Precision, recall and f-measure obtained using various feature sets on the two parts of the dataset.


words are frequently marked as aspects in this domain                     of sentences, the rules are trigged much more often than
and to generalize this information beyond their current in-               for the random segments. Rule features can therefore have
flected form.                                                             a more prominent effect on the model.
   Adding the information from the sentiment lexicon fur-
ther improves performance, though not as much as we
would expect. We could possibly further increase its im-
pact through more careful feature engineering – so far, the               5   Related Work
features only capture whether a subjective term is present
in a small linear context. For example, the lemma of the
evaluative word could be included in the feature.5                        In terms of using rules for ABSA, our work is inspired
   Finally, adding the output of syntactic rules further im-              by [15]. Such rules can also be used iteratively to ex-
proves the results. Due to the uncommon syntactic struc-                  pand both the aspects and evaluative terms using the dou-
ture of the segments, most rules were not active very often,              ble propagation algorithm [10]. Other methods of discov-
so the space for improvement is quite limited. Yet the re-                ering opinion targets are described, inter alia, in [3, 9, 5].
sults show that when the rules do trigger, their output can               Linear-chain CRFs have been applied in sentiment analy-
be a useful signal for the CRF.                                           sis and they are also well suited for ABSA, they were used
   The observed improvement in recall at the slight ex-                   e.g. by the winning submission by [13] to the SemEval
pense of precision is in line with the results of [15] where              2014 Task 4.
the system based on the same rules achieved high recall                      For Czech, a dataset for ABSA was published by [11].
and rather low precision.                                                 This dataset is in the domain of restaurant reviews and
   Long reviews. It is immediately apparent that the long                 closely follows the methodology of [7]. Our work focuses
reviews are a much more difficult dataset than review seg-                on reviews of IT products, naturally complementing this
ments – the best f-measure achieved on the short segments                 dataset. It should further support research in this area and
is 65.79 while here it is only 30.27. This can be explained               enable researchers to evaluate their approaches on diverse
by the lower density of aspect terms compared to random                   domains.
review segments and a much higher sentence length – af-
ter sentence segmentation, the average sentence length is
over 29 words, compared to only 6 words for the random
segments.                                                                 6   Conclusion
   When using only the baseline features, the recall is ex-
tremely low. Adding morpho-syntactic features has a sim-
ilar effect as for the random segments – precision is low-                We have presented a new dataset for ABSA in the Czech
ered but recall nearly triples.                                           language and we have described a baseline system for the
   Interestingly, adding features from the subjectivity lexi-             subtask of aspect term extraction.
con changes the picture considerably. This feature set ob-                   The dataset consists of segments from user reviews of
tains the highest precision but recall is lower compared to               IT products with the annotation of aspects and their polar-
both +morpho-syntactic and +rules. It may be that due to                  ity.
the high sentence length, sublex features help identify as-                  The system for aspect term extraction is based on
pects within the short window but their presence pushes                   linear-chain CRFs and uses a number of surface and
the model to ignore the more distant ones. A more thor-                   linguistically-informed features. On top of these features,
ough manual evaluation would be required to confirm this.                 we have shown that task-specific syntactic rules can pro-
   Finally, the addition of syntactic rules leads to the high-            vide useful input to the model.
est f-measure, even though neither recall nor precision are                  Utility of the syntactic rules could be further evaluated
the best. In this dataset, possibly again thanks to the length            on other domains (such as the Czech restaurant reviews)
                                                                          or languages (e.g. using the official SemEval data sets)
    5 CRF++ feature templates do not offer a simple way to achieve this   and the impact of individual rules could be thoroughly an-
without also generating a large number of uninformative feature types.    alyzed across these data sets.
Czech Aspect-Based Sentiment Analysis: A New Dataset and Preliminary Results                                                        99


Acknowledgements                                                     [10] Qiu, G., Liu, B., Bu, J., Chen, C.: Opinion word expansion
                                                                          and target extraction through double propagation. Compu-
                                                                          tational Linguistics 37 (1) (March 2011), 9–27
This research was supported by the grant GA15-06894S
                                                                     [11] Steinberger, J., Brychcín, T., Konkol, M.: Aspect-level sen-
of the Grant Agency of the Czech Republic and by the
                                                                          timent analysis in Czech. In: Proceedings of the 5th Work-
SVV project number 260 224. This work has been us-
                                                                          shop on Computational Approaches to Subjectivity, Sen-
ing language resources developed, stored and distributed                  timent and Social Media Analysis, Baltimore, USA, June
by the LINDAT/CLARIN project of the Ministry of Edu-                      2014, Association for Computational Linguistics
cation, Youth and Sports of the Czech Republic (project              [12] Straková, J., Straka, M., Hajič, J.: Open-source tools for
LM2010013).                                                               morphology, lemmatization, POS tagging and named en-
                                                                          tity recognition. In: Proceedings of 52nd Annual Meeting
                                                                          of the Association for Computational Linguistics: System
References                                                                Demonstrations, 13–18, Baltimore, Maryland, June 2014,
                                                                          Association for Computational Linguistics
 [1] Hajič, J., Vidová-Hladká, B.: Tagging inflective languages:    [13] Toh Z., Wang, W.: DLIREC: Aspect term extraction and
     Prediction of morphological categories for a rich, struc-            term polarity classification system. In: Proceedings of the
     tured tagset. In: Proceedings of the COLING — ACL Con-               8th International Workshop on Semantic Evaluation (Se-
     ference, 483–490, 1998                                               mEval 2014), 235–240, Dublin, Ireland, August 2014, As-
 [2] Lafferty, J. D., McCallum, A., Pereira, F. C. N.: Condi-             sociation for Computational Linguistics and Dublin City
     tional random fields: Probabilistic models for segment-              University
     ing and labeling sequence data. In: Proceedings of the          [14] Veselovská, K., Bojar, O.: Czech SubLex 1.0, 2013
     Eighteenth International Conference on Machine Learning,        [15] Veselovská, K., Tamchyna, A.: ÚFAL: Using hand-crafted
     ICML’01, 282–289, San Francisco, CA, USA, 2001, Mor-                 rules in aspect based sentiment analysis on parsed data. In:
     gan Kaufmann Publishers Inc.                                         Proceedings of the Eighth International Workshop on Se-
 [3] Liu, B.: Web data mining: exploring hyperlinks, contents,            mantic Evaluation (SemEval 2014), 694–698, Dublin, Ire-
     and usage data (Data-centric systems and applications).              land, 2014, Dublin City University, Dublin City University
     Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006
 [4] McDonald, R., Pereira, F., Ribarov, K., Hajič. J.: Non-
     projective dependency parsing using spanning tree algo-
     rithms. In: Proceedings of the conference on Human Lan-
     guage Technology and Empirical Methods in Natural Lan-
     guage Processing, 523–530, 2005
 [5] Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, and
     ChengXiang Zhai. Topic sentiment mixture: Modeling
     facets and opinions in weblogs. In Proceedings of the 16th
     International Conference on World Wide Web, WWW ’07,
     pages 171–180, New York, NY, USA, 2007. ACM.
 [6] Pontiki, M., Galanis, D., Papageorgiou, H., Manandhar, S.,
     Androutsopoulos, I.: SemEval-2015 task 12: Aspect based
     sentiment analysis. In: Proceedings of the 9th International
     Workshop on Semantic Evaluation (SemEval 2015), 486–
     495, Denver, Colorado, June 2015, Association for Com-
     putational Linguistics
 [7] Pontiki, M., Galanis, D., Pavlopoulos, J., Papageorgiou, H.,
     Androutsopoulos, I., Manandhar, S.:.         SemEval-2014
     task 4: Aspect based sentiment analysis. In: Proceedings
     of the 8th International Workshop on Semantic Evaluation
     (SemEval 2014), 27–35, Dublin, Ireland, August 2014, As-
     sociation for Computational Linguistics and Dublin City
     University
 [8] Popel, M., Žabokrtský, Z.: TectoMT: modular NLP frame-
     work. In: Hrafn Loftsson, Eirikur Rögnvaldsson, and
     Sigrun Helgadottir, (eds.), IceTAL 2010, volume 6233 of
     Lecture Notes in Computer Science, 293–304, Iceland Cen-
     tre for Language Technology (ICLT), Springer, 2010
 [9] Popescu, A. -M., Etzioni, O.: Extracting product features
     and opinions from reviews. In: Proceedings of the Con-
     ference on Human Language Technology and Empirical
     Methods in Natural Language Processing, HLT’05, 339–
     346, Stroudsburg, PA, USA, 2005

</pre>