Deep Learning approach for Negation Cues
                  Detection in Spanish
    Aplicación Basada en Deep Learning para Identificación de
                 Claves de Negación en Castellano
     Hermenegildo Fabregat1 , Juan Martinez-Romo1−2 ,Lourdes Araujo1−2
            1
              Universidad Nacional de Educación a Distancia (UNED)
                    2
                      IMIENS: Instituto Mixto de Investigación
               {gildo.fabregat, lurdes, juaner}@lsi.uned.es

      Abstract: This paper describes the negation cues detection model presented by the
      UNED group for task 2 (Task 2: Negation cues detection) of the NEGES workshop
      collocated in the SEPLN congress (Sevilla, 2018). This task deals with negation cues
      detection in Spanish reviews in domains such as cars, music and books. In order to
      deal with the extraction of both semantic and syntactic patterns and the extraction
      of contextual patterns, we have proposed a model based on the combination of some
      dense neural networks and one Bidirectional Long Short-Term Memory (Bi-LSTM).
      The evaluation is divided by domains and using an inter-domain average we have
      obtained acceptable results.
      Keywords: Negation detection, negation cues, Deep Learning, Bi-LSTM
      Resumen: Este artı́culo describe el modelo propuesto por el grupo UNED para
      la tarea 2 (Task 2: Negation cues detection) del workshop NEGES, asociado a la
      conferencia SEPLN (Sevilla, 2018). Esta tarea trata la detección de “señales o claves”
      de negación en castellano, centrando la atención en comentarios de dominios tales
      como coches, musica y libros. Con el fin de extraer patrones tanto sintácticos como
      semánticos además de patrones basados en información contextual, el modelo esta
      basado en el uso de varias redes neuronales junto a una LSTM (Long Short-Term
      Memory) bidireccional. Estando la evaluación de la tarea dividida en función del
      dominio de los comentarios, los resultados medios obtenidos durante la evaluación
      han sido aceptables.
      Palabras clave: Detección de negación, claves de negación, Deep Learning, Bi-
      LSTM

1    Introduction                                            dish), Skeppstedt (2011) (Swedish) and Co-
                                                             tik et al. (2016) (Spanish) which also explore
To understand the meaning of a sentence th-                  other syntactic approaches based on rules de-
rough the use of the natural language pro-                   rived from PoS-tagging and dependency tree
cessing techniques it is necessary to take in-               patterns for negation detection in Spanish.
to account that a sentence can express a ne-
gated fact. In some languages such as En-                        The proposal of the task 2 of NEGES
glish, detection and processing of negation                  workshop (Jiménez-Zafra et al., 2018a) focu-
is a recurrent working area. It is a very in-                ses on the detection of negated cues in Spa-
teresting field of study if we consider the in-              nish. For this purpose the organizers facilita-
fluence of the negation in tasks such as sen-                te the corpus SFU ReviewSP-NEG (Jiménez-
timent analysis and relationship extraction                  Zafra et al., 2018b) which consists of 400 re-
(Reitan et al., 2015; Chowdhury and Lave-                    views related to 8 different domains (cars, ho-
lli, 2013). NegEx (Chapman et al., 2001) is                  tels, washing machines, books, cell phones,
one of the most popular algorithms for ne-                   music, computers and movies), 221866 words
gation detection in English. The use of this                 and 9455 sentences, out of which 3022 senten-
algorithm for other languages has been ad-                   ces contain at least one negation structure.
dressed by some recent works, such as Chap-                  The organizers have presented the corpus di-
man et al. (2013) (French, German and Swe-                   vided in three sets: training, development and
                                                       43

                     Proceedings of NEGES 2018: Workshop on Negation in Spanish, pages 43-48
                                       Seville, Spain, September, 18, 2018
test. As can be seen in the figure 1, the cor-                it is a supervised approach which uses the fo-
pus was presented using the format CoNLLL                     llowing embedded features: words, lemmas,
(Hajič et al., 2009).                                        PoS-tagging and case-tagging. Both words
                                                              and lemmas are encoded using a pre-trained
 hoteles 21 1 Y y cc coordinating - - -                       Spanish word embedding (Cardellino, 2016)
 hoteles 21 2 no no rn negative no - -                        and both PoS-tagging and casing embedding
                                                              models have been implemented using two Ke-
 hoteles 21 3 hay haber vmip3s0 main - - -
                                                              ras Embedding Layer1 initialized using a ran-
 hoteles 21 4 en en sps00 preposition - - -                   dom uniform distribution. In order to avoid
 hoteles 21 5 la el da0fs0 article - - -                      any cascade error we used both lemmas and
                                                              PoS-tagging provided in the corpus.
 hoteles 21 6 habitación habitación ncfs000 common
 ---
                                                                                   XL         XW           XP          XC
 hoteles 21 7 ni ni rn negative ni - -
 hoteles 21 8 una uno di0fs0 indefinite - - -
                                                                                         Embedding layers
 hoteles 21 9 triste triste aq0cs0 qualificative - - -

 hoteles 21 10 hoja hoja ncfs000 common - - -                                                Concatenate


Figure 1: Corpus SFU ReviewSP-NEG - An-
notation format.
                                                              Bidirectional LSTM

    Each line corresponds to a token, where
an empty line is the end of a sentence
and each column represents an annotation                                                                              ...

about a specific term (for instance, column
                                                                                                                            ...
one contains the name of the domain file
and columns three and four contain word
and lemma). Column eight onwards shows
the annotations related to negation. If the
sentence has no negations, column eight
has a value “***” and there are no more
columns. Otherwise, the notation for each                                                Dense Neural network

negation is provided in three columns. The
first column contains the word that belongs                   Time Distributed wrapper

to the negation cue. The second and third
                                                                                         Dense Neural network
columns contain “-”.

   This work is organized as follows: Section
2 contains both the description of the pro-                                  T1         T2         T3           ...         Tz
posed model and the description of the fea-                   Output Layer
tures and resources used. In section 3 we re-
port and discuss the results obtained during                  Figure 2: Architecture of the proposed mo-
the evaluation stage. And finally, in section 4               del, where XL and XW (L: Lemma, W: Raw
conclusions and future work are presented.                    word) are the encoded word inputs and XP
                                                              and XC are the encoded inputs representing
2    Proposed model                                           the PoS-tagging and casing information. Bi-
                                                              LSTM inputs (Yx ) are the concatenated em-
Inspired by the model presented by Fancellu,
                                                              bedded features of each word. In the output
Lopez, and Webber (2016), the problem is
addressed as a sequence labeling task.                        layer, Tx represents the assigned tag.

                                                                 The casing embedding matrix is a hot-
   The proposed model has been implemen-
                                                              one encoding matrix of size 8 which was
ted using Python’s Keras library (Chollet and
others, 2015) with TensorFlow backend and                        1
                                                                     https://keras.io/layers/embeddings/

                                                         44
calculated for each input token making                   The model has been trained with data from
use the following encoder dictionary: { 0:               all the categories and this process has been
Input token is numerical - 1: - 2: - 3: Initial          limited to 25 epochs in order to avoid pos-
character is upper case - 4: Input token is              sible over-fitting. We have evaluated during
mainly numerical - 5: Contains at least one              the training phase for each epoch the gene-
digit - 6: Other case }.                                 rated model using the script provided by the
                                                         organizers (Morante and Blanco, 2012) and
    In order to ensure that the words pre-               the development set and we have observed
sented in the corpus, which are linked by                that, for most of the domains, 20th epoch are
an underscore such as “ya que” are not                   enough to reach the best results (Figure 3).
being left out of the embedding, we have
carried out a preprocessing step to divide
these expressions according to the number
of underscores that these expressions have.
To standardize the sentences to a common


                                                         "#
length, after dividing expressions with more

                                                         ! "
than one term, a padding of up to 200
positions has been applied. To label the
targets, we follow the standard IOB labeling
scheme (Ramshaw and Marcus, 1999). The
first cue of a negation phrase is denoted by B
(Begin) and the remaining cues, if any with
I (Inside). O (Out) indicates that the word              Figure 3: Training phase, temporal evalua-
does not correspond to any kind of entity                tion for each domain using development set.
considered. For example:
                                                         Pre-trained resources parameters and mo-
    Del (O) buffet (O) del (O) desayuno (O)              del’s hyper-parameters are the following:
no (B) puedo (O) opinar (O) ya que (B) no
                                                               – Pre-trained English Word Embedding
(I) lo (O) incluia (O) nuestro (O) regimen
                                                                 dimension: 300
(O) . (O)
                                                               – Embeddings dimension (Casing / PoS-
Figure 2 shows the proposed model architec-                      tagging): 8 / 50
ture. The first layer is a densely connected                   – Hidden Dense units (output dimension /
hidden layer (Dense neural network), which                       activation function): 200 / tanh
has as activation function the hyperbolic tan-
gent function (tanh). This layer takes as in-                  – LSTM output dimension: 300
put the concatenation of the different embed-                  – Dropout (for each dense unit): 0.25
dings. The output of the first layer is con-
nected to an LSTM (Long Short-Term me-                         – Batch size / Model optimizer: 32 / Ada-
mory) enveloped in a bidirectional wrapper                       Grad (Duchi, Hazan, and Singer, 2011)
(forward and backward processing network).
                                                         Once the model has been set and it has a sta-
For each network, this second layer uses a
                                                         ble and similar performance for all categories,
hidden state for processing data from the cu-
                                                         the model has been re-trained with the data
rrent step taking into account information of
                                                         of the development set.
previous steps. In the next layer and connec-
ted to the output layer, another dense hidden            3       Evaluation
layer has been used to reduce the complexity
of the bidirectional LSTM output. To avoid               In this section we describe the obtained re-
possible over-fitting we have applied a dro-             sults, taking into account the following eva-
pout factor of 0.25 to the output of this den-           luation criteria proposed by the organizers:
se layer. Finally, another dense hidden layer,
                                                               – Punctuation tokens are ignored.
using the softmax activation function, calcu-
lates the probabilities of all tags for each word              – True positives are counted when the sys-
in a sentence. The most probable label is the                    tem produces negation elements exactly
one selected as the final tag.                                   as they are in gold.
                                                    45
Domain                          Precision                    Recall                  F-measure
Cars                             44.74 %                     72.34 %                   55.29 %
Hotels                           51.32 %                     63.93 %                   56.94 %
Washing machines                 55.36 %                     68.89 %                   61.39 %
Books                            53.11 %                     65.28 %                   58.57 %
Phones                           54.62 %                     65.14 %                   59.42 %
Music                            43.59 %                     65.38 %                   52.31 %
Computers                        38.57 %                     51.92 %                   44.26 %
Films                            50.00 %                     59.09 %                   54.17 %

                   Table 1: Baseline - Evaluation per domain: development set

Domain                          Precision                    Recall                  F-measure
Cars                        94.23 % (88.37 %)           72.06 % (80.85 %)         81.67 % (84.44 %)
Hotels                      97.67 % (90.62 %)           71.19 % (47.54 %)         82.35 % (62.36 %)
Washing machines            92.00 % (96.88 %)           66.67 % (68.89 %)         77.31 % (80.52 %)
Books                       79.52 % (91.00 %)           66.27 % (63.19 %)         72.29 % (74.59 %)
Phones                      93.33 % (94.20 %)           73.68 % (59.63 %)         82.35 % (73.03 %)
Music                       92.59 % (85.19 %)           57.47 % (88.46 %)         70.92 % (86.79 %)
Computers                      – (84.62 %)                 – (63.46 %)               – (72.53 %)
Films                       86.26 % (93.33 %)           69.33 % (63.64 %)         76.87 % (75.68 %)

                  Table 2: Evaluation per domain: test set ( development set )

  – Partial matches are not counted as FP,              the organizers using the unannotated test set.
    only as FN.                                         Due to an error submitting the system out-
                                                        put, there are no test results for the computer
  – False negatives are counted either by the           category. On the one hand, the results obtai-
    system not identifying negation elements            ned in a preliminary analysis (development
    present in gold, or by identifying them             set) show that the proposed system signifi-
    partially.                                          cantly improves the results obtained by the
  – False positives are counted when the sys-           baseline. On the other hand, as shown in ta-
    tem produces a negation element not                 ble 2, the difference between recall and preci-
    present in gold.                                    sion is very remarkable. Taking into account
                                                        that we have not generated a specific model
                                                        for each domain, due to the needs of the sys-
In order to carry out a study of the perfor-
                                                        tem presented, the differences between pre-
mance of the presented system, it has been
                                                        cision and recall observed during the evalua-
compared with a baseline based on a lookup
                                                        tion of the test set may indicate, among other
of a filtered list of terms extracted from the
                                                        things, that the system has some over-fitting
training set. To take into account the scope of
                                                        and is adjusting to very recurrent patterns or
the negation, the sentences have been divided
                                                        that there are expressions that have not been
according to the following delimiters: “.” - “,”
                                                        processed correctly (for example, there may
- “;”. The list of terms has been tunned in or-
                                                        be expressions that are not correctly included
der to improve the results obtained through
                                                        in the word embedding used). On the other
this baseline. Table 1 shows the results obtai-
                                                        hand, the fall of the recall value in the music
ned using the baseline (evaluating it with the
                                                        domain is notable, comparing the results of
development set) and table 2 shows the re-
                                                        the test and training.
sults obtained using the proposed approach.
As can be seen, table 2 presents two scores                Because the gold standard has not been
for each evaluation metric (precision, recall           published, we have not been able to perform
and f-measure). These scores correspond to              an exhaustive analysis of the recognition mis-
the evaluation of the system using the deve-            takes made evaluating with the test set. Ho-
lopment set during the training phase and to            wever, some of the detected errors during the
the evaluation of the system carried out by             training phase related to the obtained recall,
                                                   46
correspond to situations in which the model           Chapman, W. W., D. Hilert, S. Velupillai,
has not been able to recognize some multi-              M. Kvist, M. Skeppstedt, B. E. Chapman,
word expressions related to a negation such             M. Conway, M. Tharp, D. L. Mowery, and
as “a no ser que” and “no hay mas que”.                 L. Deleger. 2013. Extending the Ne-
                                                        gEx lexicon for multiple languages. Stu-
4   Concluding Remarks                                  dies in health technology and informatics,
The detection of negation cues is an impor-             192:677.
tant task in the natural language processing          Chollet, F. et al. 2015. Keras. https://
area. In this field we present a deep learning          github.com/fchollet/keras.
model for detection of negation cues inspired
in named entity recognition architectures             Chowdhury, M. F. M. and A. Lavelli. 2013.
and negation scope detection models. This               Exploiting the scope of negations and he-
model achieves high performance without                 terogeneous features for relation extrac-
any sophisticated features extraction process           tion: A case study for drug-drug interac-
and although the model has some weaknesses              tion extraction. In Proceedings of the 2013
in terms of coverage, the results are accep-            Conference of the North American Chap-
table and comparable with those obtained                ter of the Association for Computational
by the UPC-TALP team (average results,                  Linguistics: Human Language Technolo-
91.47 % precision, 82.17 % recall and 86.44 %           gies, pages 765–771.
F-measure).                                           Cogswell, M., F. Ahmed, R. B. Girshick,
                                                        L. Zitnick, and D. Batra. 2015. Re-
As a future work, based on the low recall ob-           ducing Overfitting in Deep Networks by
tained we will explore others regularization            Decorrelating Representations. CoRR,
methods such as the use of some regulariza-             abs/1511.06068.
tion function (Cogswell et al., 2015) and we          Cotik, V., V. Stricker, J. Vivaldi, and
will explore some model modifications such as           H. Rodrı́guez Hontoria. 2016. Syntactic
the addition of a semantic vector representa-           methods for negation detection in radio-
tion for the whole sentence and the use of a            logy reports in Spanish. In Proceedings
CRF-based layer instead of the current dense            of the 15th Workshop on Biomedical Na-
based output layer. Finally, the study of the           tural Language Processing, BioNLP 2016:
patterns generated by the current model can             Berlin, Germany, August 12, 2016, pages
lead to the creation of a rule-based auxiliary          156–165. Association for Computational
model for the re-labeling of negation begin-            Linguistics.
ning cues (label B). If we take into account
that the model has been trained using non-            Duchi, J., E. Hazan, and Y. Singer. 2011.
handcrafted features, the results obtained in-          Adaptive subgradient methods for onli-
dicate that the system is capable of achieving          ne learning and stochastic optimization.
more competitive levels of precision and re-            Journal of Machine Learning Research,
call.                                                   12(Jul):2121–2159.
                                                      Fancellu, F., A. Lopez, and B. Webber. 2016.
Acknowledgments                                         Neural networks for negation scope detec-
This work has been partially supported by               tion. In Proceedings of the 54th Annual
the projects EXTRECM (TIN2013-46616-                    Meeting of the Association for Compu-
C2-2-R), PROSA-MED (TIN2016-77820-C3-                   tational Linguistics (Volume 1: Long Pa-
2-R), and EXTRAE (IMIENS 2017).                         pers), volume 1, pages 495–504.
References                                            Hajič, J., M. Ciaramita, R. Johansson,
                                                        D. Kawahara, M. A. Martı́, L. Màrquez,
Cardellino, C. 2016. Spanish billion words              A. Meyers, J. Nivre, S. Padó, J. Štěpánek,
  corpus and embeddings.                                et al. 2009. The CoNLL-2009 shared task:
Chapman, W. W., W. Bridewell, P. Hanbury,               Syntactic and semantic dependencies in
  G. F. Cooper, and B. G. Buchanan. 2001.               multiple languages. In Proceedings of the
  A Simple Algorithm for Identifying Ne-                Thirteenth Conference on Computational
  gated Findings and Diseases in Discharge              Natural Language Learning: Shared Task,
  Summaries. Journal of Biomedical Infor-               pages 1–18. Association for Computatio-
  matics, 34(5):301 – 310.                              nal Linguistics.
                                                 47
Jiménez-Zafra, S. M., N. P. Cruz-Dı́az,
   R. Morante, and M. T. Martı́n-Valdivia.
   2018a. Resumen de la Tarea 2 del Taller
   NEGES 2018: Detección de Claves de Ne-
   gación. In Proceedings of NEGES 2018:
   Workshop on Negation in Spanish, volu-
   me 2174, pages 35–41.
Jiménez-Zafra, S. M., M. Taulé, M. T.
   Martı́n-Valdivia, L. A. Ureña-López, and
   M. A. Martı́. 2018b. SFU Review SP-
   NEG: a Spanish corpus annotated with
   negation for sentiment analysis. A typo-
   logy of negation patterns. Language Re-
   sources and Evaluation, 52(2):533–569.
Morante, R. and E. Blanco. 2012. * SEM
  2012 shared task: Resolving the scope
  and focus of negation. In Proceedings of
  the First Joint Conference on Lexical and
  Computational Semantics-Volume 1: Pro-
  ceedings of the main conference and the
  shared task, and Volume 2: Proceedings of
  the Sixth International Workshop on Se-
  mantic Evaluation, pages 265–274. Asso-
  ciation for Computational Linguistics.
Ramshaw, L. A. and M. P. Marcus. 1999.
  Text chunking using transformation-based
  learning. In Natural language processing
  using very large corpora. Springer, pages
  157–176.
Reitan, J., J. Faret, B. Gambäck, and L. Bun-
  gum. 2015. Negation scope detection for
  twitter sentiment analysis. In Proceedings
  of the 6th Workshop on Computational
  Approaches to Subjectivity, Sentiment and
  Social Media Analysis, pages 99–108.
Skeppstedt, M. 2011. Negation detection in
   Swedish clinical text: An adaption of Ne-
   gEx to Swedish. In Journal of Biomedi-
   cal Semantics, volume 2, page S3. BioMed
   Central.


                                                 48