The impact of phrases on Italian lexical simplification

       Sara Tonelli, Alessio Palmero Aprosio                Marco Mazzon
             Fondazione Bruno Kessler       Dept. of Psychology and Cognitive Science
                     Trento, Italy                     University of Trento
        {satonelli,aprosio}@fbk.eu marco.mazzon@studenti.unitn.it


                    Abstract                       clarity and readability. Thanks to the develop-
    English. Automated lexical simplification      ment of benchmarks (Paetzold and Specia, 2016a)
    has been performed so far focusing only        and freely available tools for lexical simplification
    on the replacement of single tokens with       (Paetzold and Specia, 2015), a number of works
    single tokens, and this choice has affected    have focused on this challenge, see for exam-
    both the development of systems and the        ple the systems participating in the simplification
    creation of benchmarks. In this paper,         shared task at SemEval-2012 (Specia et al., 2012).
    we argue that lexical simplification in real   However, the task has been designed as an exer-
    settings should deal both with single and      cise to replace complex single tokens with simpler
    multi-token terms, and present a bench-        single tokens, and most widely used benchmarks
    mark created for the task. Besides, we de-     and systems all follow this paradigm. We believe,
    scribe how a freely available system can       however, that this setting covers only a limited
    be tuned to cover also the simplification of   number of lexical simplifications as they would be
    phrases, and perform an evaluation com-        performed in a real scenario. In particular, we ad-
    paring different experimental settings.        vocate the need to shift the lexical simplification
                                                   paradigm from single tokens to phrases, and to de-
    Italiano. La semplificazione lessicale au-     velop datasets and tools that deal also with these
    tomatica è stata affrontata fino ad ora       cases. This is mainly the contribution of this work,
    dalla comunità di ricerca TAL concentran-     which covers four main points:
    dosi sulla sostituzione di parole singole
    con altre parole singole. Questa modalità       • We analyse existing corpora of simplified
    ha condizionato sia lo sviluppo di sis-            texts, not specifically developed for a shared
    temi di semplificazione che la creazione           task or for system evaluation, and we mea-
    di benchmark per la valutazione. In                sure the impact of phrases in lexical simplifi-
    questo articolo, sosteniamo che la sempli-         cations
    ficazione lessicale in contesti reali debba
    includere sia parole singole che espres-         • We modify a state-of-the-art tool for lexical
    sioni composte da più parole, e presenti-         simplification in order to support phrases
    amo un benchmark creato a questo fine.
    Inoltre, descriviamo come adattare un sis-       • We compare different strategies for phrase
    tema disponibile per la semplificazione            extraction and evaluate them over a bench-
    lessicale in modo che supporti anche la            mark
    semplificazione di sintagmi, e presentiamo
    una valutazione confrontando diversi set-        • We perform all the above on Italian, for
    ting sperimentali.                                 which there was no lexical simplification sys-
                                                       tem available.
1   Introduction
                                                      Besides, we make freely available the first
Lexical simplification is a well-studied topic     benchmark for the evaluation of Italian lexical
within the NLP community, dealing with the au-     simplification, with the goal to support research
tomatic replacement of complex terms with sim-     on this task and to foster the development of Ital-
pler ones in a sentence, in order to improve its   ian simplification systems.
2       Corpus analysis and Benchmark                              This revision process led to the creation of a
        creation                                                 benchmark with pairs extracted from the two orig-
                                                                 inal corpora, where only cases of lexical simplifi-
We first analyse existing simplification corpora in              cation are present2 . Some statistics related to the
Italian to study the impact of phrases on lexical                benchmark are reported in Table 1. We identify
simplification. There are only two such manually                 four possible lexical simplification types: a sin-
created corpora, which contain different types                   gle token is replaced by a single token (ST→ST),
of data but have been annotated following the                    a single token is simplified through a phrase
same scheme: the Simpitiki corpus (Tonelli et al.,               (ST→P), a phrase is simplified through a single to-
2016) and the one developed by the ItaNLP Lab                    ken (P→ST), and a phrase is replaced by another
in Pisa (Brunato et al., 2015). The former contains              phrase (P→P).
1,163 sentence pairs1 , where one is the original
sentence and the other is the simplified one. The                                 ST→ST     ST→P      P→ST     P→P     Total
pairs were created starting from Wikipedia edits                     ItaNLP          369      112      139      87     707
and from documents in the public administration                      Simpitiki       112       24       30      28     194
domain. The ItaNLP corpus, instead, contains                         Total           481      136      169     115     901
1,393 pairs extracted from children’s stories and
from educational material. Both corpora were                     Table 1: Statistics on lexical simplification bench-
annotated following the scheme proposed in                       mark (ST = Single token, P = Phrase)
(Brunato et al., 2015), in which simplifications
were classified as Split, Merge, Reordering, Insert,                We observe that the most frequent lexical sim-
Delete and Transformation (plus a set of sub-                    plification type is ST→ST, on which most systems
classes for the Insert, Delete and Transformation                and shared tasks are based. However, this simpli-
cases). Since our goal was to isolate a benchmark                fication type covers only half of the cases included
of pairs containing only the lexical cases, we                   in our benchmark. This confirms the need to in-
discarded the classes not compatible with lexical                clude cases of phrase-based simplification in the
simplifications (e.g. Delete, Reordering) and                    creation of benchmarks. It corroborates also the
then manually checked the others to identify the                 importance of developing systems for lexical sim-
cases of interest. When, as in the majority of                   plification that support phrase replacement, so as
cases, a lexical simplification was present together             to make them work in real settings and not only
with other simplification types, we re-wrote the                 on ad-hoc test sets. Another interesting remark is
target sentence in order to retain only lexical                  that single tokens are not necessarily simpler than
cases. For example, in the examples below, a)                    phrases, or vice versa: in our data, there are 136
is the original sentence and b) is the simplified                ST→P and 169 P→ST, showing that no general
one in the Simpitiki corpus, which contains a                    rule can be applied to favour (or demote) Ps over
lexical simplification of ‘include’ and a shift of               STs.
position of ‘per convenzione’. We created version                   We use the final benchmark3 , containing 901
c), so that only the lexical simplification is present:          sentence pairs, to evaluate a system for lexical
                                                                 simplification taking into account phrases, as de-
a) Eurasia è il termine con cui per convenzione si              scribed in the following Section.
definisce la zona geografica che include l’Europa
e l’Asia.
                                                                 3       Automated lexical simplification
b) Eurasia è, per convenzione, il termine con cui               In this Section we describe the experiments we
si definisce la zona geografica che comprende                    carried out to perform automated lexical simpli-
l’Europa e l’Asia.                                               fication using the benchmark presented in Section
c) Eurasia è il termine con cui per convenzione                 2. We describe the tool used and how it was mod-
si definisce la zona geografica che comprende                        2
                                                                       In Simpitiki we focused only on the pairs in the public
l’Europa e l’Asia.                                               administration domain due to project constraints. We plan
                                                                 to include the pairs from Wikipedia in the next benchmark
                                                                 version.
    1                                                                3
      The number is slightly different from what was reported          Available    at   https://drive.google.com/
in the original paper because the corpus was revised after the   file/d/0B4QAWZllD-egYS0yNWZ5dTdYQVE/
first release.                                                   view?usp=sharing
ified to deal with phrases. We also detail the re-        The first system variant (word2phrase) includes
sources (language model and word embeddings)           phrase recognition, i.e. before extracting the em-
created for the task.                                  beddings and creating the LM, the documents
                                                       are analysed by the word2phrase module in the
3.1   The Lexenstein system                            word2vec package. This is an implementation of
We use Lexenstein (Paetzold and Specia, 2015),         the algorithm presented in (Mikolov et al., 2013),
an open source tool for lexical simplification, to     which basically identifies words that appear fre-
collect a list of candidates that should replace a     quently together, and infrequently in other con-
given word in the text. In particular, the Paetzold    texts, and treats them as single tokens (connected
generator (Paetzold and Specia, 2016b) is based        by an underscore).
on an unsupervised approach to produce simpli-            The         second         system        variant
fication candidates using a context-aware word         (word2phrase+LemmaPos) adds another in-
embeddings model: features used for the selec-         formation layer, in that each document is first
tion include word2vec vectors (Mikolov et al.,         lemmatized and PoS tagged using the Tint NLP
2013), language model created by SRILM (Stol-          Suite (Aprosio and Moretti, 2016), that works at
cke, 2002), and conditional probability of a candi-    token level; then word2phrase is run, and then the
date given the PoS tag of the target word. So far,     embeddings and the LM are created. In this way,
no evaluation on Lexestein for Italian is available.   we obtain so-called ‘context-aware’ embeddings,
   For each complex word, five candidate replace-      which is the recommended setting in (Paetzold
ments are first retrieved, ranked according to sev-    and Specia, 2016b).
eral features, such as n-gram frequencies and word
vector similarity with the target word, and then re-   4   Evaluation
ranked according to their average rankings (Glavaš
                                                       The evaluation of automated simplification is an
and Štajner, 2015).
                                                       open issue since, similar to machine translation,
   Since we wanted to test different strategies to
                                                       there may be different acceptable simplifications
create the embeddings (i.e. with and without
                                                       for a term, while a benchmark usually presents
phrases), we created the word/phrase vectors and
                                                       only one solution. Therefore, we perform two
the language model starting from freely available
                                                       evaluations: the first is based on an automated
corpora (1.3 billion words in total): the Italian
                                                       comparison between Lexenstein output and the
Wikipedia,4 OpenSubtitles2016 (Lison and Tiede-
                                                       gold simplifications in the benchmark. The sec-
mann, 2016),5 PAISÀ,6 and the Gazzetta Uffi-
                                                       ond is a manual evaluation aimed at scoring flu-
ciale,7 a collection of Italian laws. Due to the
                                                       ency, adequacy and simplicity of the output.
size of the data, both the corpus and the model are
                                                          For the first evaluation, we compute the Mean
available upon request to the authors.
                                                       Reciprocal Rank (MRR), which is usually adopted
3.2   Experimental Setup                               to evaluate a list of possible responses ordered by
                                                       probability of correctness against a gold answer.
We conduct several experiments to evaluate the
                                                       We use this metrics because Lexenstein returns 5
quality of lexical simplification when taking into
                                                       possible simplifications, ranked by relevance, and
account phrases (or not), and compare different
                                                       with MRR it is possible to weight the response
strategies for phrase recognition. We compare dif-
                                                       matching with the gold simplification according to
ferent variants to create the embeddings and the
                                                       its rank. In particular, MRR is computed as:
language model (LM) that were then used by Lex-
enstein.                                                                          |Q|
   The first baseline model relies on the standard                            1 X 1
                                                                    MRR =
Lexenstein setting: word embeddings are created                              |Q| ranki
                                                                                  i=1
using the word2vec package, and the LM consid-
ers each token separately.                               where Q is the number of simplifications to be
                                                       performed (901) and ranki is the position of the
  4
    https://it.wikipedia.org/wiki/Pagina_              correct simplification in the rank returned by Lex-
principale
  5                                                    enstein.
    http://www.opensubtitles.org/
  6
    http://www.corpusitaliano.it/                        We run the system in the three configurations
  7
    http://www.gazzettaufficiale.it/                   described in Section 3.2 on each source sentence
in the benchmark. The single or multi-token term        We introduce also this kind of evaluation in order
to be simplified is given. If it is found in the LM,    to have a fine-grained analysis of system output.
the system suggests 5 ranked simplification candi-      For example, in the original sentence d) (see
dates. Otherwise, no output is given.                   below), ‘tempestivamente’ was simplified with
   Results show that the baseline model, i.e. the       ‘periodicamente’, which is grammatically correct
standard Lexenstein configuration replacing only        (high Fluency) but does not preserve the meaning
single tokens with single tokens, yields MRR =          of the original sentence (low Adequacy).
0.036. The one using word2phrase achieves
MRR = 0.042, while the version including                d)    Il   richiedente    dovrà  comunicare
also lemma and PoS information yields MRR =             tempestivamente l’esattezza dei recapiti for-
0.050. A detailed evaluation is reported in Table       niti.
2: for each of the three experimental settings, we
report the number of cases in which the gold sim-          When using word2phrase without lemmatiza-
plification matches the first ranked replacement re-    tion, the average Fluency is 3.72, Adequacy is
turned by Lexenstein (1st), the second, the third,      2.60 and Simplicity is 2.95. This shows that, while
and so on. In the last column, we report how many       PoS and form of a simplified term are generally
times (out of 901) the rank returned by Lexenstein      correct also without any processing, the preserva-
does not contain the gold simplification present in     tion of the meaning is a critical issue. Simplic-
the benchmark.                                          ity achieves better scores than Adequacy, but it
                                                        still needs improvements. Results obtained using
                  1st   2nd   3rd   4th   5th   none    lemma and PoS in combination with word2phrase
 Baseline         23    12     7     3     2    854     are slightly better, with 2.64 Adequacy and 3.01
 word2phrase      30     8     8     4     1    850     Simplicity. In general, the above evaluations show
 +LemmaPos        32    16    11     4     4    834     that using word2phrase with lemma and PoS in-
                                                        formation is a promising approach to improve the
Table 2: Rank of correct simplifications returned
                                                        performance of lexical simplification in real set-
by Lexenstein
                                                        tings. The performance of Lexenstein could be
                                                        further improved by adding other corpora to the
   This evaluation shows that, although limited,        LM and post-process the output of the system, so
using word2phrase in combination with lemma             as to discard inconsistent simplifications, for ex-
and PoS information yields an improvement over          ample when a verb is simplified through an ad-
the baseline. However, the informativeness of this      verb. However, some linguistic phenomena like
automated simplification is limited because the         non-local dependencies cannot be addressed using
cases labeled as ‘none’ include both wrong sim-         this approach, and a separate strategy to simplify
plifications and correct simplifications that are not   them should be taken into account.
present in the benchmark. Besides, they include
also cases in which the word to be simplified was       5   Conclusions
not found in the LM.
   In order to better understand where the ap-          In this work, we presented a first analysis of the
proach fails, we also perform a manual evaluation.      role of phrases in Italian lexical simplification.
Following the standard scheme for human evalua-         We also introduced the adaptation of Lexenstein,
tion of automatic text simplification (Saggion and      an existing lexical simplification system, so as
Hirst, 2017), we judge Fluency (grammaticality),        to take phrases into account. In the future, we
Adequacy (meaning preservation) and Simplicity          plan to test other approaches for the extraction
of lexical simplifications using a five-point Likert    of phrases, for example by applying algorithms
scale (the higher the score, the better the output).    for recognising multiword expressions. We also
For the setting using lemma and PoS, we do not          plan to integrate our best model for phrase sim-
judge Fluency, since the output is lemmatized and       plification in ERNESTA (Barlacchi and Tonelli,
not converted in the original form of the source        2013), a system for syntactic simplification of Ital-
term (we plan to add this in the near future). Eval-    ian documents. Furthermore, within the H2020
uation is performed using a set of 150 sentence         SIMPATICO project, we will integrate our phrase
pairs randomly extracted from the benchmark.            simplification approach in the existing services
of Trento Municipality and perform a pilot study          Gustavo Paetzold and Lucia Specia. 2016a. Bench-
with real users.                                            marking lexical simplification systems. In Nico-
                                                            letta Calzolari (Conference Chair), Khalid Choukri,
Acknowledgments                                             Thierry Declerck, Sara Goggi, Marko Grobelnik,
                                                            Bente Maegaard, Joseph Mariani, Helene Mazo,
The research leading to this paper was supported            Asuncion Moreno, Jan Odijk, and Stelios Piperidis,
                                                            editors, Proceedings of the Tenth International Con-
by the EU Horizon 2020 Programme via the                    ference on Language Resources and Evaluation
SIMPATICO Project (H2020-EURO-6-2015, n.                    (LREC 2016), Paris, France, may. European Lan-
692819).                                                    guage Resources Association (ELRA).

                                                          Gustavo H. Paetzold and Lucia Specia. 2016b. Unsu-
                                                            pervised lexical simplification for non-native speak-
References                                                  ers. In Dale Schuurmans and Michael P. Wellman,
Alessio Palmero Aprosio and Giovanni Moretti. 2016.         editors, Proceedings of the Thirtieth AAAI Con-
  Italy goes to Stanford: A collection of CoreNLP           ference on Artificial Intelligence, February 12-17,
  modules for Italian. CoRR, abs/1609.06204.                2016, Phoenix, Arizona, USA., pages 3761–3767.
                                                            AAAI Press.
Gianni Barlacchi and Sara Tonelli. 2013. ERNESTA:
  A Sentence Simplification Tool for Children’s Sto-      H. Saggion and G. Hirst. 2017. Automatic Text Sim-
  ries in Italian. In Alexander Gelbukh, editor, Com-       plification. Synthesis Lectures on Human Language
  putational Linguistics and Intelligent Text Process-      Technologies. Morgan & Claypool Publishers.
  ing: 14th International Conference, CICLing 2013,
  Samos, Greece, March 24-30, 2013, Proceedings,          Lucia Specia, Sujay Kumar Jauhar, and Rada Mihalcea.
  Part II, pages 476–487, Berlin, Heidelberg. Springer      2012. Semeval-2012 task 1: English lexical sim-
  Berlin Heidelberg.                                        plification. In Proceedings of the First Joint Con-
                                                            ference on Lexical and Computational Semantics -
Dominique Brunato, Felice Dell’Orletta, Giulia Ven-         Volume 1: Proceedings of the Main Conference and
  turi, and Simonetta Montemagni. 2015. Design and          the Shared Task, and Volume 2: Proceedings of the
  Annotation of the First Italian Corpus for Text Sim-      Sixth International Workshop on Semantic Evalua-
  plification. In Proceedings of The 9th Linguistic An-     tion, SemEval ’12, pages 347–355, Stroudsburg, PA,
  notation Workshop, pages 31–41, Denver, Colorado,         USA. Association for Computational Linguistics.
  USA, June. Association for Computational Linguis-
  tics.                                                   Andreas Stolcke. 2002. Srilm - an extensible language
                                                            modeling toolkit. pages 901–904.
Goran Glavaš and Sanja Štajner. 2015. Simplifying
  lexical simplification: Do we need simplified cor-      Sara Tonelli, Alessio Palmero Aprosio, and Francesca
  pora? In Proceedings of the 53rd Annual Meet-             Saltori. 2016. SIMPITIKI: a Simplification corpus
  ing of the Association for Computational Linguistics      for Italian. In Proceedings of the 3rd Italian Confer-
  and the 7th International Joint Conference on Natu-       ence on Computational Linguistics (CLiC-it), vol-
  ral Language Processing (Volume 2: Short Papers),         ume 1749 of CEUR Workshop Proceedings.
  pages 63–68, Beijing, China, July. Association for
  Computational Linguistics.

Pierre Lison and Jörg Tiedemann. 2016. Opensub-
   titles2016: Extracting large parallel corpora from
   movie and TV subtitles. In Proceedings of the Tenth
   International Conference on Language Resources
   and Evaluation LREC 2016, Portorož, Slovenia,
   May 23-28, 2016.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S.
  Corrado, and Jeffrey Dean. 2013. Distributed rep-
  resentations of words and phrases and their com-
  positionality. In Advances in Neural Information
  Processing Systems 26: 27th Annual Conference on
  Neural Information Processing Systems 2013. Pro-
  ceedings of a meeting held December 5-8, 2013,
  Lake Tahoe, Nevada, United States., pages 3111–
  3119.

Gustavo Paetzold and Lucia Specia. 2015. Lexenstein:
  A framework for lexical simplification. In ACL-
  IJCNLP 2015 System Demonstrations, ACL, pages
  85–90, Beijing, China.