=Paper= {{Paper |id=Vol-1650/smbm16SeguraBedmar |storemode=property |title=Simplifying Drug Package Leaflets |pdfUrl=https://ceur-ws.org/Vol-1650/smbm16SeguraBedmar.pdf |volume=Vol-1650 |authors=Isabel Segura Bedmar,Luis Núñez-Gómez,Paloma Martínez Fernández,Maribel Quiroz |dblpUrl=https://dblp.org/rec/conf/smbm/Segura-BedmarNF16 }} ==Simplifying Drug Package Leaflets== https://ceur-ws.org/Vol-1650/smbm16SeguraBedmar.pdf
                             Simplifying Drug Package Leaflets


      Isabel Segura-Bedmar, Luis Núnez-Gómez, Paloma Martı́nez, Maribel Quiroz
               Computer Science Department, University Carlos III of Madrid
                      Avd. Universidad, 30,Leganés, Madrid, Spain
     isegura@inf.uc3m.es, lununezg@pa.uc3m.es, pmf@inf.uc3m.es




                     Abstract                        recent studies (Pires et al., 2015; Piñero-López
                                                     et al., 2016) show that the readability and under-
    Drug Package Leaflets provide informa-           standability of these documents have not been im-
    tion for patients on how to safely use           proved during the last seven years. Therefore, fur-
    medicines. European Commission and re-           ther efforts must be made to improve the under-
    cent studies stress that further efforts must    standability of package leaflets in order to ensure
    be made to improve the readability and un-       the proper use of medicines and to increase patient
    derstandability of package leaflets in or-       safety.
    der to ensure the proper use of medicines           One of the main reasons why the understand-
    and to increase patient safety. To the best      ability has not been improved is that these doc-
    of our knowledge, this is the first work         uments still contain a considerable number of
    that directly deals with the automatic sim-      technical terms describing adverse drug reactions,
    plification of drug package leaflets. Our        diseases and other medical concepts. Posology
    approach to lexical simplification com-          (dosage quantity and prescription), contraindica-
    bines the use of domain terminological re-       tions and adverse drug reactions seem to be the
    sources to give a set of synonym candi-          sections most difficult to understand (March et al.,
    dates for a given target term, and the use       2010). To help solving this problem, we pro-
    of their frequencies in a large collection of    pose an automatic system to simplify drug pack-
    documents in order to select the simplest        age leaflets.
    synonym.                                            Text simplification is a Natural Language Pro-
                                                     cessing (NLP) task that aims to rewrite text into
1   Introduction
                                                     an equivalent with less complexity for readers.
Since 2001, according to a directive of the Eu-      There are two main approaches to this task: lexical
ropean Parliament (Directive 2001/83/EC) (EU,        and syntactic simplification. Lexical simplifica-
2001), every drug product must be accompanied        tion basically consists of replacing complex con-
by a package leaflet before being placed on the      cepts with simpler synonyms, while syntactic sim-
market. This document provides informative de-       plification aims to reduce the grammatical com-
tails about a medicine, including its appearance,    plexity of a text while preserving its meaning.
actions, side effects and drug interactions, con-       Text simplification techniques have been ap-
traindications, special warnings, etc. This di-      plied to simplify texts from different domains such
rective also required that Drug Package Leaflets     as crisis management (Temnikova, 2012), health
(DPL) must be written in order to provide clear      information (Jonnalagadda et al., 2009; Kandula
and comprehensible information for patients since    et al., 2010; Jonnalagadda and Gonzalez, 2011),
their misunderstanding could be a potential source   aphasic readers (Devlin, 1999), language learn-
of drug related problems, such as medication er-     ers (Petersen and Ostendorf, 2007). Comprehen-
rors and adverse drug reactions. In 2009, the Eu-    sive surveys of the text simplification field can be
ropean Commission published a guideline (EC,         found in (Shardlow, 2014; Siddharthan, 2014).
2009) with recommendations and advices in order         To the best of our knowledge, this is the first
to issue package leaflets with accessible and un-    work that directly deals with the automatic sim-
derstandable information for patients. However,      plification of drug package leaflets. In particular,
we focus on the lexical simplification of adverse        the other side, understandability is about the dif-
drug reactions that are described in these docu-         ficulty to interpret a word (Barbieri et al., 2005)
ments. Moreover, our work is one of the few stud-        and lexical simplification approaches are required.
ies that address the simplification of texts written        Concerning syntactic simplification it consists
in Spanish. Our approach for lexical simplification      on transforming complex and long sentences into
combines the use of terminological resources that        simplest and independent sentences eliminating
provide a set of synonym candidates for a given          coordination (of clauses, verbs, etc.), dropping
target term, and the use of their frequencies in a       subordination utterances (relative clauses, gerun-
large collection of documents in order to select the     dive and participle utterances), resolving anaphora
most common synonym.                                     and transforming passive into active voice. First
   The paper is organized as follows. Section 2          a parser is used to obtain a dependency tree that
presents related work. Section 3 describes our ap-       represents the syntactic structure of the sentence
proach. Experiments, results, and discussion are         (noun, prepositional and verbal phrases and how
given in Section 4. Finally, the paper is concluded      they are related to) (Dorr et al., 2003). Then,
and future work is proposed in Section 5.                rule-based approaches are used in syntactic sim-
                                                         plification. Rules can be automatically learned
2   Related Work                                         from annotated corpora of text (syntactic trees of
                                                         sentences where original sentences are related to
First works in text simplification started 20 years
                                                         their simplified sentences) (Zhu et al., 2010), or
ago (Chandrasekar et al., 1996). It is based on
                                                         handcrafted rules (Chandrasekar et al., 1996; Sid-
transforming a text in an equivalent text that is eas-
                                                         dharthan, 2002). The rules include split, drop,
ier to read and probably easier to understand by a
                                                         copying and reordering operations over syntactic
target audience.
                                                         trees.
   There is a need to adapt contents for some
groups of people because information is not                 Related to lexical simplification, this task con-
equally accessible to everyone. It is unlikely that      sists on replacing words (taking into account the
professional editors will adapt text for all liter-      context) and complex utterances by easier words
acy levels, and NLP techniques could help sim-           or phrases. A heuristic used is that complex words
plify texts by automating some tasks. In this way,       have a low frequency. Moreover, lexical resources,
it is possible to help content editors to generate       as Wordnet (Miller, 1995), are used to extract syn-
adapted contents. On the other hand, text simplifi-      onyms as candidates to replace a complex or dif-
cation is essential in several types of texts: News,     ficult word. Combining a lexical resource and a
Government and administrative information, laws          probabilistic model is an approach that has been
and rights, etc. As it was mentioned before, there       tried (De Belder et al., 2010). Probabilistic models
are two subtasks of text simplification (Saggion         are obtained from lexical simplifications, which
et al., 2011): (1) syntactic simplification that di-     have previously done applying E2R guidelines, as
vides complex sentences in simplest sentences, (2)       in the Simple Wikipedia. McCarthy and Navigli
lexical simplification whose objective is to substi-     (McCarthy and Navigli, 2007) introduce work to
tute complex vocabulary by common vocabulary             propose candidates to replace a word using con-
(looking for synonyms that are simpler than the          texts. In Semeval 2012, English Lexical Simpli-
original word considering the context in the sen-        fication challenge (Specia et al., 2012) with ten
tence). Moreover, a clarification step could be          participant systems, the evaluation results showed
included to provide definitions and explanations         that proposals based on frequency give good re-
for acronyms, abbreviations and unusual words.           sults comparing to other sophisticated systems.
These tasks are not completely automatic, they              Focusing on research devoted to synonym sub-
have to be manually reviewed in some cases.              stitution in Spanish texts, lack of semantic re-
   Firstly, we have to distinguish between read-         sources is a handicap. A recent work is described
ability and understandability because these con-         in (Bott et al., 2012), LexSiS system that uses
cepts capture different aspects of the complexity        Spanish OpenThesaurus to build a vector space
of the text. Readability is about the structure of       model according to the distributional hypothesis
sentences (it concerns syntax and consequently re-       that establishes that different uses of a word tend
quires syntactic simplification approaches). On          to appear in different lexical contexts. A vector is
built in a window of nine words around each word-           biology or chemical texts). The latest version of
sense in a corpus extracted from the OpenThe-               June 2008 contains one hundred and sixty million
sarus and compared using the cosine similarity              of documents (from journals, books and newspa-
combined with word frequency and word length.               pers covering more than one hundred subjects). In
This approach can be enhanced including rule-               2018 The Royal Spanish Academy (RAE) will de-
based lexical simplification, see (Drndarevic et            liver the CORPES XXI, a higher Spanish corpus
al., 2012), where some patterns that avoid incor-           with four hundred million of forms.
rect substitutions are defined, for instance, to re-           Finally, there are specific works to simplify nu-
place reporting verbs (confirm, suggest, explain,           merical expressions. Bautista and Saggion (2014)
etc.) that leaves correct syntactic structures as           (Bautista and Saggion, 2014) propose a rule-based
well as other editing transformations (numerical            lexical component that simplifies numerical ex-
expressions or periphrasis). Following the same             pressions in Spanish texts. This work makes
approach, CASSA method is reported in (Baeza-               news articles more accessible to certain readers by
Yates et al., 2015) where the Spanish corpus used           rewriting difficult numerical expressions in a sim-
to extract word occurrences is the Google Books             pler way.
Ngram corpus that contains real web frequen-
cies. This work also obtains word senses from               3       The EasyLecto system
OpenThesaurus.
   But before simplifying we have to know the               The EasyLecto system aims to simplify the drug
level of readability and understandability of a text        package leaflets, in particular, replacing the terms
by using complexity measures. There are simple              describing adverse drug reactions with synonyms
measures based on frequency of words in texts as            that are easier to understand for the patients.
well as length of phrases, FOX index (Gunning,                 Figure 1 illustrates the EasyLecto system ar-
1986), Flesch-Kinaid (Kincaid et al., 1975) mea-            chitecture. The first module of the EasyLecto
sures are used in English. In Spanish texts, several        system aims to automatically annotate adverse
indexes have been proposed to measure the struc-            drug reactions in texts. This module uses a
tural complexity of a text (Anula, 2007): the num-          dictionary-based approach that combines termi-
ber or verbal predicates in subordinate clauses,            nological resources, such as MedDRA, the ATC
and the index of sentence recursion (a measure that         system (a drug classification system developed
counts the number of nested clauses in the text).           by the World Health Organization) or CIMA (a
To measure the lexical complexity two indexes are           database of medicines approved in Spain), or dic-
proposed: an index of low frequency words (the              tionaries gathered from websites about health and
number of content words1 with low frequency di-             medicines such as MedLinePlus4 , vademecum.es5
vided by the total number of lexical words) and an          or prospectos.net6 . The reader can find a de-
index of lexical density (number of distinct con-           tailed description of the NER module in (Segura-
tent words /total of discourse segments2 ). Finally,        Bedmar et al., 2015).
other indexes such as the average length of sen-               Once adverse drug reactions are automatically
tences and average length of words (syllables) al-          identified in texts, a set of synonyms is proposed
though they are criticized. These indexes have to           for each one of them. MedDRA7 is a medical ter-
be validated by the end users. Knowing the read-            minology dictionary about events associated with
ability level of a document, users have the oppor-          drugs. It is a multilingual dictionary (11 lan-
tunity to choose the most suitable text, from a col-        guages) and its main goal is to provide a classifica-
lection of documents delivering the same informa-           tion system for efficient communication of adverse
tion (Sbattella and Tedesco, 2012).                         drug reactions data between countries. MedDRA
   With respect to Spanish corpora for extraction           is composed of a five-level hierarchy. The most
of frequencies and word contexts, the CREA3 cor-            specific level, ”Lowest Level Terms” (LLTs), con-
pus available online is not a useful resource when          tains a total of 72,072 terms that express how in-
domain specific texts are required (for instance,           formation is communicated in practice. The main
   1                                                            4
     A content word is a word with meaning (nouns, verbs,         https://www.nlm.nih.gov/medlineplus/spanish/
                                                                5
adjectives and adverbs)                                           http://www.vademecum.es
   2                                                            6
     sentences or phrases                                         https://www.prospectos.net
   3                                                            7
     http://corpus.rae.es/creanet.html                            http://www.meddra.org/
                                    Figure 1: The EasyLecto system arquitecture.


advantage of MedDRA is that its structured format          concept. Moreover, an article related to a given
allows easily obtaining a list of possible adverse         medical concept can also be used to obtain the
drug reactions and their synonyms. Thus, we de-            definition of this concept by getting its first sen-
cided to use MedDRA as a source of synonyms for            tence. Finally, all downloaded articles, the defini-
adverse drug reactions. Moreover, for a given ef-          tions (first sentence of each article) and their re-
fect in MedDRA, we used its longest synonym as             lated health topics were translated into JSON ob-
definition for the effect.                                 jects in order to create an index (see Figure 2) us-
   The following step is to select the appropriate         ing ElasticSearch9 , an open source search engine.
synonym, that is, the simplest synonym. The more              All told, the EasyLecto system proposes a def-
common a term is in a collection of texts, the more        inition and a set of synonyms from MedDRA, as
familiar the term is likely to be to the reader (El-       well as a definition and a set of synonyms from
hadad, 2006). Thus, our system proposes those              MedLinePlus, for each drug effect. Then, the fre-
synonyms with higher frequency. In order to know           quency of each synonym is calculated using the
how common a word is, we gathered a large col-             index built from MedLinePlus, and finally the syn-
lection of texts such as the MedLinePlus articles 8 ,      onym with the highest frequency is selected as the
and indexed it in order to obtain the frequency of         simplest synonym.
each drug effect.                                             Due to the horizontal scalability provided by
   MedLinePlus is an online resource with health           ElasticSearch, it is possible to index large collec-
information for patients, which contains more than         tions of documents, as is the case of the Med-
1,000 articles about diseases and 6,000 articles           linePlus. The main advantage of ElasticSearch
about medicines. The Spanish version is one of             is its capacity to create distributed systems by
the most comprehensive and trusted Spanish lan-            specifying only the configuration of the hierarchy
guage health websites at the moment. We devel-             of nodes. Then, ElasticSearch is self-managed
oped a web crawler to browse and download pages            to maintain better fault tolerance and load distri-
related to drugs and diseases from the MedLine-            bution. Another important advantage of Elastic-
Plus website. Each MedLinePlus article provides            Search is that it does not require very high com-
exhaustive information about a given medical con-          puting power and a high storage capacity to index
cept, and also proposes a list of related health top-      large collections. In this study, ElasticSearch (ver-
ics, which can be considered as synonyms of this           sion 2.2) was installed on a Ubuntu Server 14.04

   8                                                          9
       https://www.nlm.nih.gov/medlineplus/spanish/               http://elasticsearch.org
                                                                     gold-standard synonym, that is, the synonym pro-
                                                                     posed by the human annotators, to the simplest
                                                                     synonym, that is, the synonym with the highest
                                                                     frequency in the index built from the MedLine-
                                                                     Plus articles. Since we used two different re-
                                                                     sources, MedDRa and MedLinePlus, in order to
                                                                     achieve the set of synonym candidates, we eval-
                                                                     uated the simplest synonym from each of the re-
                                                                     sources. Thus, for the synonym obtained from
                                                                     MedLinePlus, EasyLecto achieves an accuracy of
                                                                     68.7%, while for the MedDRA synonym, the ac-
Figure 2: An index was generated from the Med-
                                                                     curacy is much lower (around 37.2%). This is
LinePlus articles using ElasticSearch.
                                                                     mainly due to MedDRA being a highly specific
                                                                     standardized medical terminology, which implies
with 8GB of RAM and 500GB of disk space.                             its terms are not familiar to most people. Med-
   A demo of the EasyLecto system is available at:                   LinePlus on the other hand is a health information
http://jacky.uc3m.es/EasyLecto/. This tool allows                    website for patients, which uses a more readable
to load a document highlighting the adverse drug                     language and a lay vocabulary.
reactions (in blue) (see Figure 3). If the user se-                     We conducted an error analysis in order to ob-
lects any of these adverse drug reactions, the tool                  tain the main causes of false positives and false
displays a popup window with information about                       negatives in our system. In particular, we studied
the definitions and synonyms proposed by the sys-                    in detail a random sample of 30 documents. Ta-
tem. Figure 4 shows the synonyms and defini-                         ble 1 presents some errors that our system makes
tions proposed for the effect ’dispepsia’ (dyspep-                   on the EasyDPL corpus. Most errors are due
sia). While the most frequent MedDRA synonym                         to the absence of a simpler synonym for a term;
was ’indigestión’ (indigestion), the most common                    some terms could only be explained by a small
synonym from MedLinePlus was ’enfermedades                           sentence or phrase (for example, terms such as
del estómago’ (stomach diseases).                                   akathisia or eosinophilia). Another cause of er-
                                                                     ror was that some terms were replaced by their
4        Evaluation                                                  hypernyms in the gold-standard corpus (for exam-
The dataset used for the evaluation is the Easy-                     ple, allergic alveolitis was substituted by allergy),
DPL (easy drug package leaflets) corpus10 , which                    whereas the system failed because it does not ex-
contains 306 package leaflets annotated with 1,400                   ploit the hierarchical relationships between terms
adverse drug reactions and their simplest syn-                       and is not able to propose more general terms as
onyms. The corpus was manually annotated by                          synonyms for a specific term. Some errors, such
three trained annotators. The quality and consis-                    as dysphoria-hoarseness or diaphoresis - sweat-
tency of the corpus were evaluated by measuring                      ing, may occur due to the lack of synonyms in
inter-annotator agreement (IAA). IAA also deter-                     the resources. An approach based on a word vec-
mines the complexity of the task and provides an                     tor model able to compute the similarity between
upper bound on the performance of the automatic                      words based on their contexts, could reduce such
systems for the simplification of adverse drug re-                   errors.
actions in drug package leaflets. In particular, the                    In addition to the quantitative evaluation, we
Fleiss’ kappa (Fleiss, 1971) was calculated, which                   also used SurveyMonkey to collect some quick
is an extension of Cohen’s kappa (Cohen, 1960)                       user feedback on the EasyLecto system 11 . We
that measures the degree of consistency for two or                   defined a survey with 10 closed-ended questions,
more annotators. The assessment showed a kappa                       in which users should pick just one answer from
of 0.709, which is considered substantial on the                     a list of given options. We asked users about the
Landis and Koch scale (Landis and Koch, 1977).                       usefulness and the performance of the EasyLecto,
   For each drug effect annotated in the EasyDDI                     as well as about its usability, design and visual ap-
corpus, the evaluation consisted in comparing the                    peal. A total of 26 users completed the survey,
    10                                                                 11
         http://labda.inf.uc3m.es/doku.php?id=en:labda recursosPLN          https://es.surveymonkey.com/r/8HMVJKV
Figure 3: A drug package leaflet annotated with the EasyLecto system. Adverse drug reactions are
highlighted in blue




                Figure 4: Simplification (synonyms and definitions) for the effect ’dispepsia’.




  Drug Effect                                  Gold-standard synonym                                     EasyLecto synonym
  acatisia (akathisia)                         incapacidad de quedarse quieto (inability to sit still)   acatisia
  bursitis                                     hinchazón alrededor de los músculos (swelling           bursitis
                                               around the muscles)
  eosinofilia (eosinophilia)                   problemas en la sangre (blood problems)                   eosinofilia
  cloasma (chloasma)                           manchas durante el embarazo (spots during preg-           cloasma
                                               nancy)
  miositis (myositis)                          inflamación en la piel (skin inflammation)               miositis
  alveolitis alérgica (allergic alveolitis)   alergı́a (allergy)                                        alveolitis alérgica
  diaforesis (diaphoresis)                     sudoración (sweating)                                    diaforesis
  disforia (dysphoria)                         ronquera (hoarseness)                                     disforia

                                      Table 1: Some errors of the EasyLecto system.
most of them being software engineers or PhD stu-        dex provides us information about how common
dents in computer science. The analysis of the sur-      a word is. EasyLecto was evaluated on a gold-
vey shows that most users have positive opinions         standard corpus with 306 texts manually annotated
about the EasyLecto system. Almost 97% of users          by three trained experts. Experiments show an ac-
think that the EasyLecto system helps to simplify        curacy of 68.7% for the MedLinePlus synonym
drug package leaflets. Regarding the definitions         and 37.1% for the MedDRA synonym. Therefore,
proposed by the system, 75% of users believe that        resources that have been specially written for pa-
the definitions help to understand the text. Almost      tients are a better source of simpler synonyms that
30% of them would like to obtain three or more           the specialized terminological resources (such as
synonyms from the system. Around 81% of users            MedDRA). On the other hand, the error analysis
think that the EasyLecto has a friendly interface.       shows that some of the system answers might as
                                                         well be valid and simple synonyms, even though
5   Conclusions and future work                          they are not the same as proposed by the gold-
                                                         standard corpus. In order to obtain a more realistic
Although drug package leaflets should be de-             evaluation, we plan to extend the EasyDPL corpus
signed and written ensuring complete understand-         by adding several simpler synonyms for each term.
ing of their contents, several factors can have an
                                                            In addition to the quantitative evaluation, the
influence on patient understanding of drug pack-
                                                         subjective impression of 26 users was documented
age leaflets. Low literacy is directly associated
                                                         by a simple questionnaire published in Survey-
with limited understanding and misinterpretation
                                                         Monkey. In general, users have positive percep-
of these documents (Davis et al., 2006b; Davis et
                                                         tions of the EasyLecto system. We are aware that
al., 2006a). Older people are more likely to have
                                                         our evaluation system based on user experience
lower literacy skills, as well as decreased memory
                                                         has a lot of shortcomings (e.g., the number of users
and poorer reading comprehension (Kutner et al.,
                                                         is very small and they are not representative of the
2006). Therefore, low literacy along with older
                                                         general public). Therefore, we plan to extend and
age may lead to an unintentional non-compliance
                                                         improve the evaluation with a large set of users
or inappropriate use of drugs, leading to danger-
                                                         that includes elderly users, people with disabilities
ous consequences for patients, such as therapeu-
                                                         or with low literacy levels.
tic failure or adverse drug reactions. Several stud-
ies (March et al., 2010; Pires et al., 2015; Piñero-       In this work, we only focus on the simplifica-
López et al., 2016) have shown that there is an ur-     tion of adverse drug reactions, however we plan to
gent need to improve the quality of drug package         extend our approach in order to simplify not only
leaflets because they are usually too difficult to un-   other medical concepts (such as diseases, medical
derstand for patients, and this could be a potential     procedures, medical tests, etc), but also complex
source of drug related problems, such as medica-         words from open-domain texts. As future work,
tion errors and adverse drug reactions. In partic-       we also plan to integrate additional resources such
ular, patients have problems to understand those         as BabelNet (Navigli and Ponzetto, 2012) or the
sections describing dosages and adverse drug re-         UMLS Metathesaurus (Lindberg et al., 1993). In
actions.                                                 addition to providing broader coverage for terms
   The EasyLecto system aims the simplification          and more synonyms, these resources will allow to
of drug package leaflets, in particular, the sim-        develop a multilingual simplification system.
plification of terms describing adverse drug reac-          To the best of our knowledge, while word vector
tions by synonyms that are easier to understand by       models based on n-grams have already been used
patients. The system uses a dictionary-based ap-         (Bott et al., 2012), word vector models trained us-
proach in order to automatically identify adverse        ing deep learning techniques have not been ex-
drug reactions in drug package leaflets. MedDRA          plored for the task of simplification yet. We also
and MedLinePlus are used as sources of synonyms          plan to study the use of word embeddings learned
and definitions for these effects. Our main hypoth-      by Word2Vec (Mikolov et al., 2013) or Glove
esis is that a simple word will likely be more com-      (Pennington et al., 2014). One important advan-
mon in a collection of texts than their more dif-        tage of these models is that they allow to com-
ficult synonyms. We built an index from a large          pute the similarity between terms without the need
collection of texts such as MedLinePlus. This in-        of using synonym dictionaries that are generally
domain-dependent.                                         Jan De Belder, Koen Deschacht, and Marie-Francine
                                                             Moens. 2010. Lexical simplification. In Proceed-
Acknowledgments                                              ings of ITEC2010: 1st international conference on
                                                             interdisciplinary research on technology, education
                                                             and communication.
This work was supported by eGovernAbility-
Access project (TIN2014-52665-C2-2-R).                    Siobhan Lucy Devlin. 1999. Simplifying natural lan-
                                                             guage for aphasic readers. Ph.D. thesis, University
                                                             of Sunderland.
References                                                Bonnie Dorr, David Zajic, and Richard Schwartz.
Alberto Anula. 2007. Tipos de textos, complejidad           2003. Hedge trimmer: A parse-and-trim approach
  lingüı́stica y facilitación de la lectura. In Actas     to headline generation. In Proceedings of the HLT-
  del IV Congreso de la Asociación Asiática de His-       NAACL 03 on Text summarization workshop-Volume
  panistas.                                                 5, pages 1–8. Association for Computational Lin-
                                                            guistics.
Ricardo Baeza-Yates, Luz Rello, and Julia Dembowski.      Biljana Drndarevic, Sanja Štajner, and Horacio Sag-
  2015. Cassa: A context-aware synonym simplifica-           gion. 2012. Reporting simply: A lexical simplifica-
  tion algorithm. In Human Language Technologies:            tion strategy for enhancing text accessibility. In Pro-
  The 2015 Annual Conference of the North American           ceedings of Easy-to-Read on the Web Symposium.
  Chapter of the ACL, page 13801385.
                                                          EC. 2009. Guideline on the readability of the labelling
Thimoty Barbieri, Antonio BIANCHI, Licia SBAT-              and package leaflet of medicinal products for human
  TELLA, Ferdinando CARELLA, and Marco                      use.
  FERRA. 2005. Multiabile: A multimodal learning
  environment for the inclusion of impaired e-learners    Noémie Elhadad. 2006. Comprehending technical
  using tactile feedbacks, voice, gesturing, and            texts: predicting and defining unfamiliar terms. In
  text simplification. Assistive Technology: From           AMIA.
  Virtuality to Reality, 16(1):406–410.
                                                          Council EU. 2001. Directive 2001/83/ec of the eu-
Susana Bautista and Horacio Saggion. 2014. Can nu-          ropean parliament and of the council of 6 novem-
  merical expressions be simpler? implementation and        ber 2001 on the community code relating to medic-
  demostration of a numerical simplification system         inal products for human use. Official Journal L,
  for spanish. In LREC, pages 956–962.                      311(28):11.
                                                          Joseph L Fleiss. 1971. Measuring nominal scale
Stefan Bott, Luz Rello, Biljana Drndarevic, and Hora-        agreement among many raters. Psychological bul-
   cio Saggion. 2012. Can spanish be simpler? lexsis:        letin, 76(5):378.
   Lexical simplification for spanish. In Proceedings
   of COLING 2012, pages 357–374.                         Robert Gunning. 1986. The technique of clear writing.

Raman Chandrasekar, Christine Doran, and Bangalore        Siddhartha Jonnalagadda and Graciela Gonzalez.
  Srinivas. 1996. Motivations and methods for text           2011. Biosimplify: an open source sentence sim-
  simplification. In Proceedings of the 16th confer-         plification engine to improve recall in automatic
  ence on Computational linguistics-Volume 2, pages          biomedical information extraction. arXiv preprint
  1041–1044. Association for Computational Linguis-          arXiv:1107.5744.
  tics.
                                                          Siddhartha Jonnalagadda, Luis Tari, Jörg Hakenberg,
Jacob Cohen. 1960. A coefficient of agreement for            Chitta Baral, and Graciela Gonzalez. 2009. To-
   nominal scale. Educ Psychol Meas, 20:37–46.               wards effective sentence simplification for auto-
                                                             matic processing of biomedical text. In Proceed-
                                                             ings of Human Language Technologies: The 2009
Terry C Davis, Michael S Wolf, Pat F Bass, Mark
                                                             Annual Conference of the North American Chap-
  Middlebrooks, Estela Kennen, David W Baker,
                                                             ter of the Association for Computational Linguistics,
  Charles L Bennett, Ramon Durazo-Arvizu, Anna
                                                             Companion Volume: Short Papers, pages 177–180.
  Bocchini, Stephanie Savory, et al. 2006a. Low lit-
                                                             Association for Computational Linguistics.
  eracy impairs comprehension of prescription drug
  warning labels.     Journal of general internal         Sasikiran Kandula, Dorothy Curtis, and Qing Zeng-
  medicine, 21(8):847–851.                                  Treitler. 2010. A semantic and syntactic text sim-
                                                            plification tool for health content. In AMIA Annu
Terry C Davis, Michael S Wolf, Pat F Bass, Jason A          Symp Proc, volume 2010, pages 366–70.
  Thompson, Hugh H Tilson, Marolee Neuberger, and
  Ruth M Parker. 2006b. Literacy and misunder-            J Peter Kincaid, Robert P Fishburne Jr, Richard L
  standing prescription drug labels. Annals of Internal     Rogers, and Brad S Chissom. 1975. Derivation
  Medicine, 145(12):887–894.                                of new readability formulas (automated readability
  index, fog count and flesch reading ease formula)        Carla Pires, Marina Vigário, and Afonso Cavaco.
  for navy enlisted personnel. Technical report, DTIC        2015. Readability of medicinal package leaflets: a
  Document.                                                  systematic review. Revista de saude publica, 49:1–
                                                             13.
Mark Kutner, Elizabeth Greenberg, and Justin Baer.
 2006. A first look at the literacy of america’s adults    Horacio Saggion, Elena Gómez Martı́nez, Esteban
 in the 21st century. nces 2006-470. National Center         Etayo, Alberto Anula, and Lorena Bourg. 2011.
 for Education Statistics.                                   Text simplification in simplext. making text more
                                                             accessible. Procesamiento del lenguaje natural,
J Richard Landis and Gary G Koch. 1977. The mea-             47:341–342.
   surement of observer agreement for categorical data.
   biometrics, pages 159–174.                              Licia Sbattella and Roberto Tedesco. 2012. Calcu-
                                                             lating text complexity during the authoring phase.
Donald A Lindberg, Betsy L Humphreys, and Alexa T            In Proceedings of Easy-to-Read on the Web Sympo-
  McCray. 1993. The unified medical language                 sium.
  system.   Methods of information in medicine,
                                                           Isabel Segura-Bedmar, Paloma Martı́nez, Ricardo Re-
  32(4):281–291.
                                                              vert, and Julián Moreno-Schneider. 2015. Explor-
                                                              ing spanish health social media for detecting drug ef-
Cerdá JC March, Rodrı́guez MA Prieto, Azarola A
                                                              fects. BMC medical informatics and decision mak-
  Ruiz, Lorda P Simón, Cantalejo I Barrio, and Alina
                                                              ing, 15(2):1.
  Danet. 2010. [quality improvement of health infor-
  mation included in drug information leaflets. patient    Matthew Shardlow. 2014. A survey of automated text
  and health professional expectations]. Atención pri-     simplification. International Journal of Advanced
  maria/Sociedad Española de Medicina de Familia y         Computer Science and Applications, 4(1).
  Comunitaria, 42(1):22–27.
                                                           Advaith Siddharthan. 2002. Resolving attachment and
Diana McCarthy and Roberto Navigli. 2007. Semeval-           clause boundary ambiguities for simplifying relative
  2007 task 10: English lexical substitution task. In        clause constructs. In Proceedings of the Student
  Proceedings of the 4th International Workshop on           Workshop, 40th Meeting of the Association for Com-
  Semantic Evaluations, pages 48–53. Association for         putational Linguistics (ACL02), pages 60–65.
  Computational Linguistics.
                                                           Advaith Siddharthan. 2014. A survey of research on
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-         text simplification. ITL-International Journal of Ap-
  rado, and Jeff Dean. 2013. Distributed representa-         plied Linguistics, 165(2):259–298.
  tions of words and phrases and their compositional-
  ity. In Advances in neural information processing        Lucia Specia, Sujay Kumar Jauhar, and Rada Mihalcea.
  systems, pages 3111–3119.                                  2012. Semeval-2012 task 1: English lexical sim-
                                                             plification. In Proceedings of the First Joint Con-
George A Miller.       1995.  Wordnet: a lexical             ference on Lexical and Computational Semantics-
  database for english. Communications of the ACM,           Volume 1: Proceedings of the main conference and
  38(11):39–41.                                              the shared task, and Volume 2: Proceedings of the
                                                             Sixth International Workshop on Semantic Evalua-
Roberto Navigli and Simone Paolo Ponzetto. 2012.             tion, pages 347–355. Association for Computational
  Babelnet: The automatic construction, evaluation           Linguistics.
  and application of a wide-coverage multilingual se-
  mantic network. Artificial Intelligence, 193:217–        Irina Temnikova. 2012. Text Complexity and Text Sim-
  250.                                                        plification in the Crisis Management domain. Ph.D.
                                                              thesis, University of Wolverhampton.
Jeffrey Pennington, Richard Socher, and Christopher D
                                                           Zhemin Zhu, Delphine Bernhard, and Iryna Gurevych.
   Manning. 2014. Glove: Global vectors for word
                                                             2010. A monolingual tree-based translation model
   representation. In EMNLP, volume 14, pages 1532–
                                                             for sentence simplification. In Proceedings of the
   1543.
                                                             23rd international conference on computational lin-
Sarah E Petersen and Mari Ostendorf. 2007. Text              guistics, pages 1353–1361. Association for Compu-
  simplification for language learners: a corpus anal-       tational Linguistics.
  ysis. In Proceedings of Workshop on Speech and
  Language Technology for Education, pages 69–72.

Ángeles Marı́a Piñero-López, Pilar Modamio, F. Ce-
   cilia Lastra, and L. Eduardo Mariño. 2016. Read-
   ability analysis of the package leaflets for biologi-
   cal medicines available on the internet between 2007
   and 2013: An analytical longitudinal study. J Med
   Internet Res, 18(5):e100.