=Paper= {{Paper |id=Vol-2174/paper6 |storemode=property |title=Negation Cues Detection Using CRF on Spanish Product Review Texts |pdfUrl=https://ceur-ws.org/Vol-2174/paper6.pdf |volume=Vol-2174 |authors=Henry Loharja,Lluís Padró,Jordi Turmo }} ==Negation Cues Detection Using CRF on Spanish Product Review Texts== https://ceur-ws.org/Vol-2174/paper6.pdf
    Negation Cues Detection Using CRF on Spanish
               Product Review Texts
Detección de Claves de Negación Usando CRF en El Texto de
              Revisión de Productos en Español
                  Henry Loharja1 , Lluı́s Padró1 , and Jordi Turmo1
                         1
                           Universitat Politècnica de Catalunya
                                 https://www.upc.edu/
                       {loharja, padro, turmo}@cs.upc.edu

      Abstract: This article describes the negation cue detection approach designed and
      built by UPC’s team participating in NEGES 2018 Workshop on Negation in Span-
      ish. The approach uses supervised CRFs as the base for training the model with
      several features engineered to tackle the task of negation cue detection in Spanish.
      The result is evaluated by the means of precision, recall, and F1 score in order to
      measure the performance of the approach. The approach was ranked in 1st posi-
      tion in the official testing results with average precision around 91%, average recall
      around 82%, and average F1 score around 86%.
      Keywords: negation cue detection, conditional random field, product review
      Resumen: Este artı́culo describe el enfoque de detección de claves de negación
      diseñado y construido por el equipo de la UPC que participa en textit Taller
      NEGES 2018: Identificación de Claves de Negación. El enfoque usa el CRF su-
      pervisado como la base para el entrenamiento del modelo con varias caracterı́sticas
      diseñadas para resolver la tarea de detección de claves de negación en español. El
      resultado se evalúa mediante el método de precisión, exhaustividad y Valor-F para
      medir el rendimiento del enfoque. El enfoque se clasificó en primero posición en los
      resultados de las pruebas oficiales con una media de precisión cerca del 91 %, una
      media de exhaustividad cerca del 82 % y una media de Valor-F cerca del 86 %.
      Palabras clave: detección de clave de negación, campo aleatorio condicional, re-
      visión del producto



1   Introduction                                             we use NLTK1 as the toolkit to build the sys-
                                                             tem. The result is measured using the widely
This paper describes the negation cue detec-                 used performance measurement of precision,
tion model approaches presented by UPC’s                     recall, and F1 score.
team for the NEGES 2018 workshop task 2
(negation cues detection) (Jiménez-Zafra et                    The article is organized as follows. Sec-
al., 2018a). The aim of the task is to au-                   tion 2 describes the approach used to learn
tomatically detect negation cues in product                  the negation cues detection model. Section 3
review texts in Spanish. To do this, the par-                describes the system built based on the ap-
ticipants must develop a system able to iden-                proach explained in the previous section and
tify all the negation cues present in the doc-               the details of the implementation. The re-
uments. The SFU ReviewSP-NEG corpus                          sults achieved by our approach are presented
(Jiménez-Zafra et al., 2018b) will be used to               and briefly analyzed in Section 4. Finally,
train and test the systems. The approach we                  Section 5 gives conclusion about the work
develop relies on a supervised learned model                 that has been done.
using Conditional Random Fields (written as
CRF in the following contents) as the core
with specially engineered features for the de-
tection of negation cues in Spanish. The ap-                    1
                                                                  NLTK – the Natural Language Toolkit (Bird,
proach is then implemented in Python and                     Klein, and Loper, 2009)

                                                       49

                     Proceedings of NEGES 2018: Workshop on Negation in Spanish, pages 49-54
                                       Seville, Spain, September, 18, 2018
2    Negation Cues Detection
     Approach                                                                        N
                                                                                     X
Before describing the approach, let us be-                        max l(λ) = max           p(y (i) |x(i) )
                                                                   λ
gin by addressing some definitions. A neg-                                           i=1
ative sentence n is defined as a vector of                 .
words (w1 , w2 , ..., wn ) containing one or more              where N is the number of observation se-
negation cues, where the latter can be a                   quences x(i) and label sequences y (i) .
word (e.g. no), a morpheme (e.g.in-capaz)                      Training CRFs might be time-consuming
or a multi-word expression (e.g. ya no, to-                for some tasks since the time needed for train-
davı́a no) which inherently expressing nega-               ing depends quadratically on the number of
tion. The goal of negation cue detection is to             class labels and linearly on the number of
predict vector c given the sentence n where                training instances and the average sequence
c ∈ {1, 0}|n| is a vector of length same with              length. However, state-of-the-art solutions
the length of n so that ci = 1 if wi is part of            use CRF models for many NLP tasks where
the negation cue and ci = 0 otherwise.                     time consumption is still tolerable.
    It is possible that more than one negation                 As discussed before, the goal of nega-
cue can appear inside a sentence. In Spanish,              tion cue detection is to obtain vectors which
one of the special characteristic of negation              represent the sentence and give information
cue is that a cue can consist of more than                 whether the token or words which are part
one word, not necessarily consecutive. This                of the sentence is a part of the negation cue
special characteristic increases the complex-              in a value. Using the knowledge of named
ity of detecting whether two words recognized              entity recognition, we can infer that nega-
as cue are indeed two separated cues or are                tion detection is a type of NER in which we
actually the same non-contiguous cue. This                 would like to recognize entities that are parts
also makes negation cue detection in Spanish               of negation. In other words, we would like
a more challenging task compared to nega-                  to classify whether each words inside a sen-
tion cue detection in English because that                 tence is part of negation cue or not a part of
case is scarce in English.                                 it. From this, we define a three-class classi-
    The approach we use for this work will                 fication problem for each word which we ob-
be one of the state of the art approach: a                 serve: Begin-Cue(B-C), Inside-Cue(I-C), or
CRF based negation detection. We try to re-                Out(O). A word classified as Out is not part
produce the approach used by previous works                of a cue. In order to handle cues which con-
(Agarwal and Yu, 2010) which are using CRF                 sist of more than one word, we give two kind
as its base and we use the corpus given by the             of classification for the cues which are Begin-
task in order to see how the approach perform              Cue for the first words that identify that start
with the data provided in Spanish. Condi-                  of a cue and Inside-Cue for the rest of the
tional random fields (CRFs) are a type of dis-             words of a cue which are not the first word
criminative undirected probabilistic graphi-               but is still identified as part of the same cue.
cal model used for structured prediction (Laf-
ferty, McCallum, and Pereira, 2001). The                   3  Negation Cues Detection
most important feature of a CRF model is                      System
that it can take context into account: the                 3.1 Data Preprocessing
linear chain CRF predicts sequences of labels
                                                           For the preliminary, we do some preprocess-
for sequences of input samples. Thus, the
                                                           ing to the data in the corpus provided in or-
model does not work with local probabilities
                                                           der to match the input format of the system
like p(yt |xt ) where t is the position of x within
                                                           we built. The corpus provided by this task is
the sequence, instead, it estimates the condi-
                                                           using CoNLL format. Each line corresponds
tional probability of the whole sequence:
                                                           to a token or word and each annotation is
          1        XXK                                     provided in a column with empty lines indi-
p(y|x) =      exp{     λj fj (x, yt , yt−1 )}              cate end of sentence. We produce two set
         Z(x)      t       j=1                             of data with different format with respect to
                                                           their usage for each step:
   The estimation of weights (λj ) for each
feature fj is carried out by maximizing the                    1. Data format with BIO tagging
conditional log likelihood:                                       (Ramshaw and Marcus, 1999) in
                                                      50
      order to be used as input for training.          the frequency from the most to the least, we
      The annotated token is tagged with ”B-           chose the top 25 words with the most fre-
      C” if it is in the beginning of negation         quency. These 25 words became the dictio-
      cue; tagged with ”I-C” if it is part of          nary of negation cues in our baseline system.
      the negation cue but not the first word          We also developed several rules to capture
      of the cue; and tagged with ”O” if it is         the characteristics of negation cues in Span-
      outside of the cue. One of the examples          ish which we have explained before. These
      of sentence in this format is:                   rules are used to decide whether more than
                                                       one cues which appear in a sentence is actu-
        • El|O coche|O funciona|O                      ally part of a cue or separated cues. The rule
          estupendamente|O ,|O                         checked whether a word is in a list of spe-
          es|O muy|O manejable|O                       cial word which we created and then check
          ,|O por|O cierto|O ,|O                       if it fulfill condition of having another cues
          casi|B-C no|I-C consume|O                    that precede it. Here are the algorithm from
          gasolina|O algo|O que|O                      baseline system which describe the rule:
          para|O mi|O es|O muy|O
          importante|O .|O                               if word in DICTIONARY then
                                                             if word in SPECIAL then
 2. Raw data format without any tagging in
                                                                 if exist cue before then
    order to be used for testing input.
                                                                     word is part of cue
        • Las ruedas a los 15000                                 end if
          kms las tuve que cam-                              else
          biar , todas , las cuatro                              word is new cue
          , por ser de una marca                             end if
          coreana , que no da mucho                      end if
          resultado .
                                                          After having implemented the combina-
    After the preprocessing is done, the docu-         tion of dictionary lookup and rules we de-
ments is ready to be used as input for the             veloped, we then use the baseline system to
next respective steps. This preprocessing              tag the documents from development testing
part did not alter any important information           dataset. We use the result as the prelimi-
contained in the data as the purpose is only           nary result to be later compared with the
to change the format in order to make it eas-          result from the system we developed using
ier to be used in the following steps.                 our proposed approach. By doing this, we
3.2    Baseline System                                 could see whether the approach we have can
                                                       give more advantage compared to using sim-
Before implementing the approach we have
                                                       ple techniques.
explained before in the system, we developed
a baseline system to be used as starting point         3.3   Learning The Model for
and a comparison. Our aim is to see whether                  Negation Cue Detection using
the approach we have will perform better                     CRF
than a baseline approach which used simple
                                                       The system we built use a toolkit named
techniques. To reach this, we use the base-
                                                       NLTK which is a Python based toolkit for
line system as comparison with the system
                                                       building Python programs to work with hu-
we develop using the approach we propose.
                                                       man language data. NLTK provides easy-
The baseline system we developed uses sim-
                                                       to-use text processing libraries for classifica-
ple techniques which are common such as dic-
                                                       tion, tokenization, stemming, tagging, pars-
tionary lookup combined with some rules for
                                                       ing, and semantic reasoning. One of the mod-
detecting negation cues.
                                                       ules in NLTK is an CRF tagger which can
    The first thing we did in this baseline was
                                                       be used for the tagging of text using Python
to create a dictionary based on the training
                                                       CRFSuite 2 as it’s core. This module are what
dataset from the corpus. We collected all the
                                                       we mainly used in our approach for nega-
words which are tagged as negation cues from
                                                       tion detection by adapting a point of view
all the documents in the training dataset to-
gether with their frequency. After having                2
                                                           Python CRFSuite -Python bindings to CRFSuite
sorted the collected negation cues based on            (Okazaki, 2007)

                                                  51
of named entity recognition. There are two              6. HAS CAP: word contains capitalized
                                                           letter.
                                                        7. HAS DASH: word contains dash (-).
                                                        8. HAS US: word contains underscore ( ).
                                                        9. PUNCTUATION: word contains punc-
                                                           tuation.
                                                       10. SUFn: suffixes in the n character length
                                                           ranged from two to four.
                                                       11. PREFn: prefixes in the n character
Figure 1: Flow that describe negation detec-               length ranged from two to four.
tion approach in the system.
                                                       12. 2GRAMBEFORE: bigram of up to 6
main parts in the system we built: Training                word before the observed word.
and testing. Training is the part in which we          13. 2GRAMAFTER: bigram of up to 1 word
use the CRF tagger module in NLTK to train                 after the observed word.
a model by using the training data which we
have prepared before. The result of the train-         14. BEFOREPOS: the information of part
ing part is a model for detecting negation.                of speech of up to 6 word before the ob-
The training process will use orthographic                 served word.
feature set which is designed for negation cue         15. AFTERPOS: the information of part of
detection and to capture the characteristics               speech of up to 1 word after the observed
of negation cue in Spanish. The simplest and               word.
most clear feature set is the vocabulary from
the training data. We also include the infor-          16. SPECIAL: word is one of the special
mation about part of speech as feature in or-              words in the special dictionary. The
der to enrich the feature set. Generalizations             words we included as special words
over how the words written (capitalization,                are: ”nada”, ”ni”, ”nunca”, ”ningun”,
affixes, etc.) are also important information              ”ninguno”, ”ninguna”, ”alguna”, ”ape-
that are included as features. The present                 nas”, ”para nada”, and ”ni siquiera”.
approach includes training vocabulary, sev-                These words have more tendency to be
eral orthographic features based on regular                part of negation cue with multiple words.
expressions as well as prefixes and suffixes in            This feature is included in order to cap-
the character length ranged from two to four.              ture the characteristic of negation cue
To model localization context, neighboring                 that can consist of more than one words
words in the window [-6,1] are also added as               which are separated by other non-cue
features. This size of window is selected from             words in between.
several experiment using various window size               By using the features mentioned above,
to acquire optimum result. We use bigram               we do the training using the given data and
in the process of including the information            CRF module in NLTK to produce the model
about localization of six word before and one          which can be used to detect the negation
word after the word being observed. Here               cue in Spanish. The parameters for train-
are the complete set of features used in the           ing the CRF are the default parameters used
training:                                              in NLTK toolkit. This model will be used
 1. WORD: the vocabulary of word.                      as one of the input for the next step which
                                                       is testing. Testing is the process of detecting
 2. POS: the information of part of speech             negation from the testing data (data in which
    of the word.                                       negations are not annotated or raw data) by
 3. INIT CAP: word starts with capitaliza-             using the model which we get from the train-
    tion.                                              ing process as the knowledge base. The result
                                                       of the testing process is an annotated ver-
 4. ALPHANUM: word consists of alphanu-                sion of testing data in which words in each
    meric characters.                                  sentence are classified into either part cue or
 5. HAS NUM: word contains number.                     outside of them. The result we obtain after
                                                  52
the testing process will be in the format of           Domain             Precision     Recall     F1
BIO tagged since this is the format which we
                                                       Coches               83.33       74.47     78.65
use to represent our data. Related to that, we
                                                       Hoteles              96.08       80.33     87.5
do some post-processing to change the format
                                                       Lavadoras            97.3         80       87.81
of the result into the same original format as
                                                       Moviles              95.1        88.99     91.94
the input (training data). We use the origi-
                                                       Musica               83.33       96.15     89.28
nal data format of CoNLL and then add the
                                                       Ordenadores          89.13       78.85     83.68
information of negation cue which we obtain
                                                       Libros               90.85       89.58     90.21
from the testing.
                                                       Peliculas            92.93       83.64     88.04
4   Results                                            Micro Average        91.97       84.85     88.14
After finished with the testing process, we
will obtain the result of the negation cue de-        Table 2: Measurement result of development
tection as annotated documents of testing             testing using CRF based approach
data we provide as input. In order to eval-
uate the performance of the approach used
                                                      which use simple techniques. The result also
in the system, we will use recall, precision,
                                                      gives a fairly high value of performance with
and f1 score measurement. We use the eval-
                                                      most of them reach over 80%. Especially in
uation script provided by the organizers to
                                                      precision, the average reach more than 90%.
make sure that our output match the require-
                                                      This is possible due to a fairly simple task of
ment. In the first phase, we use the develop-
                                                      detecting negation cue detection. Most of the
ment testing data which is provided in order
                                                      cues consist of word such as ”no”,”ni”,”nada”
to measure the performance of our system.
                                                      and several other words which describe nega-
We perform the testing on each document
                                                      tion with little variability of vocabulary. This
in the development testing dataset which are
                                                      leads to a fairly easy detection of cues and
divided based on the domain. Each docu-
                                                      the small number of false positives. On the
ment is processed separately and also eval-
                                                      other hand, the recall have much lower re-
uated separately. To give a general view of
                                                      sult with some reach even lower than 80%.
the performance, we also calculate the micro
                                                      This happens due to the higher number of
average of the whole result from the devel-
                                                      false negatives caused by the difficulty of de-
opment testing. Table 1 shows the result of
                                                      tecting non-contiguous multi-token cues. In
baseline system we have obtained using the
                                                      most of the cases of false negative, our sys-
development testing data meanwhile Table 2
                                                      tem has difficulties to detect such cases, for
shows the result of system based on our pro-
                                                      example:
posed approach using the same development
dataset.                                                • No es cosa del paralelo ni del equili-
                                                          brado.
 Domain            Precision    Recall     F1
 Coches              88.89      85.11     86.96       In the example, no...ni is a negation cue
 Hoteles               86       70.49     77.48       meanwhile our system recognize them as two
 Lavadoras           94.74       80       86.75       separated cues. Another kind of false nega-
 Moviles              94.9      85.32     89.86       tive is the opposite, where two separated cues
 Musica              79.31      88.46     83.64       is recognized as one cue. Those cases con-
 Ordenadores         85.71      69.23     76.59       tribute to most of the false negatives in the
 Libros              88.65      86.81     87.72       development testing result.
 Peliculas           92.55      79.09     85.29           The official testing result measurement
 Micro Average       90.06      81.31     85.32       can be observed in Table 3. This result is
                                                      obtained using the model we have and the
                                                      official testing dataset provided by the orga-
Table 1: Measurement result of development            nizers. The evaluation is done directly by
testing using baseline system                         the organizers and we receive the measure-
                                                      ment result as can be seen in Table 3 after
   As can be observed from Table 1 and 2,             we submit our testing result.
the result using our proposed approach gives              Based on the evaluation from organizers,
better result compared to the baseline system         our result is ranked first compared to other
                                                 53
 Domain          Precision      Recall     F1            AEI/FEDER,UE.)
 Coches            95.08        85.29     89.92          References
 Hoteles            94          79.66     86.24
 Lavadoras         94.74        78.26     85.72          Agarwal, S. and H. Yu. 2010. Biomedical
 Moviles           89.8         77.19     83.02            negation scope detection with conditional
 Musica            92.96        75.86     83.54            random fields. 17:696–701, 11.
 Ordenadores       91.36        91.36     91.36          Bird, S., E. Klein, and E. Loper. 2009.
 Libros            84.19        84.52     84.35             Natural language processing with Python:
 Peliculas         89.68        85.28     87.42             analyzing text with the natural language
 Average           91.48        82.18     86.45             toolkit. ” O’Reilly Media, Inc.”.
                                                         Jiménez-Zafra, S. M., N. P. Cruz-Dı́az,
Table 3: Measurement result of official test-               R. Morante, and M. T. Martı́n-Valdivia.
ing                                                         2018a. Resumen de la Tarea 2 del Taller
                                                            NEGES 2018: Detección de Claves de Ne-
                                                            gación. In Proceedings of NEGES 2018:
participants in the same task. As can be seen               Workshop on Negation in Spanish, volume
on the table, the official testing result follows           2174, pages 35–41.
the same pattern as the development testing
result with higher precision and lower recall.           Jiménez-Zafra, S. M., M. Taulé, M. T.
Even though we can’t observe the cases hap-                 Martı́n-Valdivia, L. A. Ureña-López, and
pening in official testing result, we can in-               M. A. Martı́. 2018b. SFU Review SP-
fer that similar cases in development testing               NEG: a Spanish corpus annotated with
probably also can be found by looking at the                negation for sentiment analysis. A typol-
result. The percentage also have almost sim-                ogy of negation patterns. Language Re-
ilar value with precision reach around 91%,                 sources and Evaluation, 52(2):533–569.
recall around 82%, and F1 score around 86%.              Lafferty, J., A. McCallum, and F. C. Pereira.
The average of result in official testing has               2001. Conditional random fields: Proba-
slightly lower value compared to the one in                 bilistic models for segmenting and labeling
development testing but the difference is not               sequence data.
significant.
                                                         Okazaki, N. 2007. CRFsuite: a fast imple-
5   Conclusion                                             mentation of Conditional Random Fields
                                                           (CRFs).
In this article we have described the approach
and system we built for the participation                Ramshaw, L. A. and M. P. Marcus. 1999.
in NEGES 2018: Workshop on Negation in                     Text chunking using transformation-based
Spanish task 2 of negation cues detection for              learning. In Natural language processing
Spanish product review texts. Our approach                 using very large corpora. Springer, pages
to detect the negation cues consisted of a su-             157–176.
pervised approach combining CRF and sev-
eral features for negation cue detection in
Spanish for training the model. The model
will then be used to classify whether a word
in the observed data or testing data is a part
of negation cue or not. This approach was
ranked in 1st position in the official testing
results with average precision around 91%,
average recall around 82%, and average F1
score around 86%.

Acknowledgements
This works has been partially funded
by the Spanish Goverment and by the
European Union through GRAPHMED
project  (TIN2016-77820-C3-3-R   and
                                                    54