=Paper=
{{Paper
|id=Vol-1650/smbm16SeguraBedmar
|storemode=property
|title=Simplifying Drug Package Leaflets
|pdfUrl=https://ceur-ws.org/Vol-1650/smbm16SeguraBedmar.pdf
|volume=Vol-1650
|authors=Isabel Segura Bedmar,Luis Núñez-Gómez,Paloma Martínez Fernández,Maribel Quiroz
|dblpUrl=https://dblp.org/rec/conf/smbm/Segura-BedmarNF16
}}
==Simplifying Drug Package Leaflets==
Simplifying Drug Package Leaflets Isabel Segura-Bedmar, Luis Núnez-Gómez, Paloma Martı́nez, Maribel Quiroz Computer Science Department, University Carlos III of Madrid Avd. Universidad, 30,Leganés, Madrid, Spain isegura@inf.uc3m.es, lununezg@pa.uc3m.es, pmf@inf.uc3m.es Abstract recent studies (Pires et al., 2015; Piñero-López et al., 2016) show that the readability and under- Drug Package Leaflets provide informa- standability of these documents have not been im- tion for patients on how to safely use proved during the last seven years. Therefore, fur- medicines. European Commission and re- ther efforts must be made to improve the under- cent studies stress that further efforts must standability of package leaflets in order to ensure be made to improve the readability and un- the proper use of medicines and to increase patient derstandability of package leaflets in or- safety. der to ensure the proper use of medicines One of the main reasons why the understand- and to increase patient safety. To the best ability has not been improved is that these doc- of our knowledge, this is the first work uments still contain a considerable number of that directly deals with the automatic sim- technical terms describing adverse drug reactions, plification of drug package leaflets. Our diseases and other medical concepts. Posology approach to lexical simplification com- (dosage quantity and prescription), contraindica- bines the use of domain terminological re- tions and adverse drug reactions seem to be the sources to give a set of synonym candi- sections most difficult to understand (March et al., dates for a given target term, and the use 2010). To help solving this problem, we pro- of their frequencies in a large collection of pose an automatic system to simplify drug pack- documents in order to select the simplest age leaflets. synonym. Text simplification is a Natural Language Pro- cessing (NLP) task that aims to rewrite text into 1 Introduction an equivalent with less complexity for readers. Since 2001, according to a directive of the Eu- There are two main approaches to this task: lexical ropean Parliament (Directive 2001/83/EC) (EU, and syntactic simplification. Lexical simplifica- 2001), every drug product must be accompanied tion basically consists of replacing complex con- by a package leaflet before being placed on the cepts with simpler synonyms, while syntactic sim- market. This document provides informative de- plification aims to reduce the grammatical com- tails about a medicine, including its appearance, plexity of a text while preserving its meaning. actions, side effects and drug interactions, con- Text simplification techniques have been ap- traindications, special warnings, etc. This di- plied to simplify texts from different domains such rective also required that Drug Package Leaflets as crisis management (Temnikova, 2012), health (DPL) must be written in order to provide clear information (Jonnalagadda et al., 2009; Kandula and comprehensible information for patients since et al., 2010; Jonnalagadda and Gonzalez, 2011), their misunderstanding could be a potential source aphasic readers (Devlin, 1999), language learn- of drug related problems, such as medication er- ers (Petersen and Ostendorf, 2007). Comprehen- rors and adverse drug reactions. In 2009, the Eu- sive surveys of the text simplification field can be ropean Commission published a guideline (EC, found in (Shardlow, 2014; Siddharthan, 2014). 2009) with recommendations and advices in order To the best of our knowledge, this is the first to issue package leaflets with accessible and un- work that directly deals with the automatic sim- derstandable information for patients. However, plification of drug package leaflets. In particular, we focus on the lexical simplification of adverse the other side, understandability is about the dif- drug reactions that are described in these docu- ficulty to interpret a word (Barbieri et al., 2005) ments. Moreover, our work is one of the few stud- and lexical simplification approaches are required. ies that address the simplification of texts written Concerning syntactic simplification it consists in Spanish. Our approach for lexical simplification on transforming complex and long sentences into combines the use of terminological resources that simplest and independent sentences eliminating provide a set of synonym candidates for a given coordination (of clauses, verbs, etc.), dropping target term, and the use of their frequencies in a subordination utterances (relative clauses, gerun- large collection of documents in order to select the dive and participle utterances), resolving anaphora most common synonym. and transforming passive into active voice. First The paper is organized as follows. Section 2 a parser is used to obtain a dependency tree that presents related work. Section 3 describes our ap- represents the syntactic structure of the sentence proach. Experiments, results, and discussion are (noun, prepositional and verbal phrases and how given in Section 4. Finally, the paper is concluded they are related to) (Dorr et al., 2003). Then, and future work is proposed in Section 5. rule-based approaches are used in syntactic sim- plification. Rules can be automatically learned 2 Related Work from annotated corpora of text (syntactic trees of sentences where original sentences are related to First works in text simplification started 20 years their simplified sentences) (Zhu et al., 2010), or ago (Chandrasekar et al., 1996). It is based on handcrafted rules (Chandrasekar et al., 1996; Sid- transforming a text in an equivalent text that is eas- dharthan, 2002). The rules include split, drop, ier to read and probably easier to understand by a copying and reordering operations over syntactic target audience. trees. There is a need to adapt contents for some groups of people because information is not Related to lexical simplification, this task con- equally accessible to everyone. It is unlikely that sists on replacing words (taking into account the professional editors will adapt text for all liter- context) and complex utterances by easier words acy levels, and NLP techniques could help sim- or phrases. A heuristic used is that complex words plify texts by automating some tasks. In this way, have a low frequency. Moreover, lexical resources, it is possible to help content editors to generate as Wordnet (Miller, 1995), are used to extract syn- adapted contents. On the other hand, text simplifi- onyms as candidates to replace a complex or dif- cation is essential in several types of texts: News, ficult word. Combining a lexical resource and a Government and administrative information, laws probabilistic model is an approach that has been and rights, etc. As it was mentioned before, there tried (De Belder et al., 2010). Probabilistic models are two subtasks of text simplification (Saggion are obtained from lexical simplifications, which et al., 2011): (1) syntactic simplification that di- have previously done applying E2R guidelines, as vides complex sentences in simplest sentences, (2) in the Simple Wikipedia. McCarthy and Navigli lexical simplification whose objective is to substi- (McCarthy and Navigli, 2007) introduce work to tute complex vocabulary by common vocabulary propose candidates to replace a word using con- (looking for synonyms that are simpler than the texts. In Semeval 2012, English Lexical Simpli- original word considering the context in the sen- fication challenge (Specia et al., 2012) with ten tence). Moreover, a clarification step could be participant systems, the evaluation results showed included to provide definitions and explanations that proposals based on frequency give good re- for acronyms, abbreviations and unusual words. sults comparing to other sophisticated systems. These tasks are not completely automatic, they Focusing on research devoted to synonym sub- have to be manually reviewed in some cases. stitution in Spanish texts, lack of semantic re- Firstly, we have to distinguish between read- sources is a handicap. A recent work is described ability and understandability because these con- in (Bott et al., 2012), LexSiS system that uses cepts capture different aspects of the complexity Spanish OpenThesaurus to build a vector space of the text. Readability is about the structure of model according to the distributional hypothesis sentences (it concerns syntax and consequently re- that establishes that different uses of a word tend quires syntactic simplification approaches). On to appear in different lexical contexts. A vector is built in a window of nine words around each word- biology or chemical texts). The latest version of sense in a corpus extracted from the OpenThe- June 2008 contains one hundred and sixty million sarus and compared using the cosine similarity of documents (from journals, books and newspa- combined with word frequency and word length. pers covering more than one hundred subjects). In This approach can be enhanced including rule- 2018 The Royal Spanish Academy (RAE) will de- based lexical simplification, see (Drndarevic et liver the CORPES XXI, a higher Spanish corpus al., 2012), where some patterns that avoid incor- with four hundred million of forms. rect substitutions are defined, for instance, to re- Finally, there are specific works to simplify nu- place reporting verbs (confirm, suggest, explain, merical expressions. Bautista and Saggion (2014) etc.) that leaves correct syntactic structures as (Bautista and Saggion, 2014) propose a rule-based well as other editing transformations (numerical lexical component that simplifies numerical ex- expressions or periphrasis). Following the same pressions in Spanish texts. This work makes approach, CASSA method is reported in (Baeza- news articles more accessible to certain readers by Yates et al., 2015) where the Spanish corpus used rewriting difficult numerical expressions in a sim- to extract word occurrences is the Google Books pler way. Ngram corpus that contains real web frequen- cies. This work also obtains word senses from 3 The EasyLecto system OpenThesaurus. But before simplifying we have to know the The EasyLecto system aims to simplify the drug level of readability and understandability of a text package leaflets, in particular, replacing the terms by using complexity measures. There are simple describing adverse drug reactions with synonyms measures based on frequency of words in texts as that are easier to understand for the patients. well as length of phrases, FOX index (Gunning, Figure 1 illustrates the EasyLecto system ar- 1986), Flesch-Kinaid (Kincaid et al., 1975) mea- chitecture. The first module of the EasyLecto sures are used in English. In Spanish texts, several system aims to automatically annotate adverse indexes have been proposed to measure the struc- drug reactions in texts. This module uses a tural complexity of a text (Anula, 2007): the num- dictionary-based approach that combines termi- ber or verbal predicates in subordinate clauses, nological resources, such as MedDRA, the ATC and the index of sentence recursion (a measure that system (a drug classification system developed counts the number of nested clauses in the text). by the World Health Organization) or CIMA (a To measure the lexical complexity two indexes are database of medicines approved in Spain), or dic- proposed: an index of low frequency words (the tionaries gathered from websites about health and number of content words1 with low frequency di- medicines such as MedLinePlus4 , vademecum.es5 vided by the total number of lexical words) and an or prospectos.net6 . The reader can find a de- index of lexical density (number of distinct con- tailed description of the NER module in (Segura- tent words /total of discourse segments2 ). Finally, Bedmar et al., 2015). other indexes such as the average length of sen- Once adverse drug reactions are automatically tences and average length of words (syllables) al- identified in texts, a set of synonyms is proposed though they are criticized. These indexes have to for each one of them. MedDRA7 is a medical ter- be validated by the end users. Knowing the read- minology dictionary about events associated with ability level of a document, users have the oppor- drugs. It is a multilingual dictionary (11 lan- tunity to choose the most suitable text, from a col- guages) and its main goal is to provide a classifica- lection of documents delivering the same informa- tion system for efficient communication of adverse tion (Sbattella and Tedesco, 2012). drug reactions data between countries. MedDRA With respect to Spanish corpora for extraction is composed of a five-level hierarchy. The most of frequencies and word contexts, the CREA3 cor- specific level, ”Lowest Level Terms” (LLTs), con- pus available online is not a useful resource when tains a total of 72,072 terms that express how in- domain specific texts are required (for instance, formation is communicated in practice. The main 1 4 A content word is a word with meaning (nouns, verbs, https://www.nlm.nih.gov/medlineplus/spanish/ 5 adjectives and adverbs) http://www.vademecum.es 2 6 sentences or phrases https://www.prospectos.net 3 7 http://corpus.rae.es/creanet.html http://www.meddra.org/ Figure 1: The EasyLecto system arquitecture. advantage of MedDRA is that its structured format concept. Moreover, an article related to a given allows easily obtaining a list of possible adverse medical concept can also be used to obtain the drug reactions and their synonyms. Thus, we de- definition of this concept by getting its first sen- cided to use MedDRA as a source of synonyms for tence. Finally, all downloaded articles, the defini- adverse drug reactions. Moreover, for a given ef- tions (first sentence of each article) and their re- fect in MedDRA, we used its longest synonym as lated health topics were translated into JSON ob- definition for the effect. jects in order to create an index (see Figure 2) us- The following step is to select the appropriate ing ElasticSearch9 , an open source search engine. synonym, that is, the simplest synonym. The more All told, the EasyLecto system proposes a def- common a term is in a collection of texts, the more inition and a set of synonyms from MedDRA, as familiar the term is likely to be to the reader (El- well as a definition and a set of synonyms from hadad, 2006). Thus, our system proposes those MedLinePlus, for each drug effect. Then, the fre- synonyms with higher frequency. In order to know quency of each synonym is calculated using the how common a word is, we gathered a large col- index built from MedLinePlus, and finally the syn- lection of texts such as the MedLinePlus articles 8 , onym with the highest frequency is selected as the and indexed it in order to obtain the frequency of simplest synonym. each drug effect. Due to the horizontal scalability provided by MedLinePlus is an online resource with health ElasticSearch, it is possible to index large collec- information for patients, which contains more than tions of documents, as is the case of the Med- 1,000 articles about diseases and 6,000 articles linePlus. The main advantage of ElasticSearch about medicines. The Spanish version is one of is its capacity to create distributed systems by the most comprehensive and trusted Spanish lan- specifying only the configuration of the hierarchy guage health websites at the moment. We devel- of nodes. Then, ElasticSearch is self-managed oped a web crawler to browse and download pages to maintain better fault tolerance and load distri- related to drugs and diseases from the MedLine- bution. Another important advantage of Elastic- Plus website. Each MedLinePlus article provides Search is that it does not require very high com- exhaustive information about a given medical con- puting power and a high storage capacity to index cept, and also proposes a list of related health top- large collections. In this study, ElasticSearch (ver- ics, which can be considered as synonyms of this sion 2.2) was installed on a Ubuntu Server 14.04 8 9 https://www.nlm.nih.gov/medlineplus/spanish/ http://elasticsearch.org gold-standard synonym, that is, the synonym pro- posed by the human annotators, to the simplest synonym, that is, the synonym with the highest frequency in the index built from the MedLine- Plus articles. Since we used two different re- sources, MedDRa and MedLinePlus, in order to achieve the set of synonym candidates, we eval- uated the simplest synonym from each of the re- sources. Thus, for the synonym obtained from MedLinePlus, EasyLecto achieves an accuracy of 68.7%, while for the MedDRA synonym, the ac- Figure 2: An index was generated from the Med- curacy is much lower (around 37.2%). This is LinePlus articles using ElasticSearch. mainly due to MedDRA being a highly specific standardized medical terminology, which implies with 8GB of RAM and 500GB of disk space. its terms are not familiar to most people. Med- A demo of the EasyLecto system is available at: LinePlus on the other hand is a health information http://jacky.uc3m.es/EasyLecto/. This tool allows website for patients, which uses a more readable to load a document highlighting the adverse drug language and a lay vocabulary. reactions (in blue) (see Figure 3). If the user se- We conducted an error analysis in order to ob- lects any of these adverse drug reactions, the tool tain the main causes of false positives and false displays a popup window with information about negatives in our system. In particular, we studied the definitions and synonyms proposed by the sys- in detail a random sample of 30 documents. Ta- tem. Figure 4 shows the synonyms and defini- ble 1 presents some errors that our system makes tions proposed for the effect ’dispepsia’ (dyspep- on the EasyDPL corpus. Most errors are due sia). While the most frequent MedDRA synonym to the absence of a simpler synonym for a term; was ’indigestión’ (indigestion), the most common some terms could only be explained by a small synonym from MedLinePlus was ’enfermedades sentence or phrase (for example, terms such as del estómago’ (stomach diseases). akathisia or eosinophilia). Another cause of er- ror was that some terms were replaced by their 4 Evaluation hypernyms in the gold-standard corpus (for exam- The dataset used for the evaluation is the Easy- ple, allergic alveolitis was substituted by allergy), DPL (easy drug package leaflets) corpus10 , which whereas the system failed because it does not ex- contains 306 package leaflets annotated with 1,400 ploit the hierarchical relationships between terms adverse drug reactions and their simplest syn- and is not able to propose more general terms as onyms. The corpus was manually annotated by synonyms for a specific term. Some errors, such three trained annotators. The quality and consis- as dysphoria-hoarseness or diaphoresis - sweat- tency of the corpus were evaluated by measuring ing, may occur due to the lack of synonyms in inter-annotator agreement (IAA). IAA also deter- the resources. An approach based on a word vec- mines the complexity of the task and provides an tor model able to compute the similarity between upper bound on the performance of the automatic words based on their contexts, could reduce such systems for the simplification of adverse drug re- errors. actions in drug package leaflets. In particular, the In addition to the quantitative evaluation, we Fleiss’ kappa (Fleiss, 1971) was calculated, which also used SurveyMonkey to collect some quick is an extension of Cohen’s kappa (Cohen, 1960) user feedback on the EasyLecto system 11 . We that measures the degree of consistency for two or defined a survey with 10 closed-ended questions, more annotators. The assessment showed a kappa in which users should pick just one answer from of 0.709, which is considered substantial on the a list of given options. We asked users about the Landis and Koch scale (Landis and Koch, 1977). usefulness and the performance of the EasyLecto, For each drug effect annotated in the EasyDDI as well as about its usability, design and visual ap- corpus, the evaluation consisted in comparing the peal. A total of 26 users completed the survey, 10 11 http://labda.inf.uc3m.es/doku.php?id=en:labda recursosPLN https://es.surveymonkey.com/r/8HMVJKV Figure 3: A drug package leaflet annotated with the EasyLecto system. Adverse drug reactions are highlighted in blue Figure 4: Simplification (synonyms and definitions) for the effect ’dispepsia’. Drug Effect Gold-standard synonym EasyLecto synonym acatisia (akathisia) incapacidad de quedarse quieto (inability to sit still) acatisia bursitis hinchazón alrededor de los músculos (swelling bursitis around the muscles) eosinofilia (eosinophilia) problemas en la sangre (blood problems) eosinofilia cloasma (chloasma) manchas durante el embarazo (spots during preg- cloasma nancy) miositis (myositis) inflamación en la piel (skin inflammation) miositis alveolitis alérgica (allergic alveolitis) alergı́a (allergy) alveolitis alérgica diaforesis (diaphoresis) sudoración (sweating) diaforesis disforia (dysphoria) ronquera (hoarseness) disforia Table 1: Some errors of the EasyLecto system. most of them being software engineers or PhD stu- dex provides us information about how common dents in computer science. The analysis of the sur- a word is. EasyLecto was evaluated on a gold- vey shows that most users have positive opinions standard corpus with 306 texts manually annotated about the EasyLecto system. Almost 97% of users by three trained experts. Experiments show an ac- think that the EasyLecto system helps to simplify curacy of 68.7% for the MedLinePlus synonym drug package leaflets. Regarding the definitions and 37.1% for the MedDRA synonym. Therefore, proposed by the system, 75% of users believe that resources that have been specially written for pa- the definitions help to understand the text. Almost tients are a better source of simpler synonyms that 30% of them would like to obtain three or more the specialized terminological resources (such as synonyms from the system. Around 81% of users MedDRA). On the other hand, the error analysis think that the EasyLecto has a friendly interface. shows that some of the system answers might as well be valid and simple synonyms, even though 5 Conclusions and future work they are not the same as proposed by the gold- standard corpus. In order to obtain a more realistic Although drug package leaflets should be de- evaluation, we plan to extend the EasyDPL corpus signed and written ensuring complete understand- by adding several simpler synonyms for each term. ing of their contents, several factors can have an In addition to the quantitative evaluation, the influence on patient understanding of drug pack- subjective impression of 26 users was documented age leaflets. Low literacy is directly associated by a simple questionnaire published in Survey- with limited understanding and misinterpretation Monkey. In general, users have positive percep- of these documents (Davis et al., 2006b; Davis et tions of the EasyLecto system. We are aware that al., 2006a). Older people are more likely to have our evaluation system based on user experience lower literacy skills, as well as decreased memory has a lot of shortcomings (e.g., the number of users and poorer reading comprehension (Kutner et al., is very small and they are not representative of the 2006). Therefore, low literacy along with older general public). Therefore, we plan to extend and age may lead to an unintentional non-compliance improve the evaluation with a large set of users or inappropriate use of drugs, leading to danger- that includes elderly users, people with disabilities ous consequences for patients, such as therapeu- or with low literacy levels. tic failure or adverse drug reactions. Several stud- ies (March et al., 2010; Pires et al., 2015; Piñero- In this work, we only focus on the simplifica- López et al., 2016) have shown that there is an ur- tion of adverse drug reactions, however we plan to gent need to improve the quality of drug package extend our approach in order to simplify not only leaflets because they are usually too difficult to un- other medical concepts (such as diseases, medical derstand for patients, and this could be a potential procedures, medical tests, etc), but also complex source of drug related problems, such as medica- words from open-domain texts. As future work, tion errors and adverse drug reactions. In partic- we also plan to integrate additional resources such ular, patients have problems to understand those as BabelNet (Navigli and Ponzetto, 2012) or the sections describing dosages and adverse drug re- UMLS Metathesaurus (Lindberg et al., 1993). In actions. addition to providing broader coverage for terms The EasyLecto system aims the simplification and more synonyms, these resources will allow to of drug package leaflets, in particular, the sim- develop a multilingual simplification system. plification of terms describing adverse drug reac- To the best of our knowledge, while word vector tions by synonyms that are easier to understand by models based on n-grams have already been used patients. The system uses a dictionary-based ap- (Bott et al., 2012), word vector models trained us- proach in order to automatically identify adverse ing deep learning techniques have not been ex- drug reactions in drug package leaflets. MedDRA plored for the task of simplification yet. We also and MedLinePlus are used as sources of synonyms plan to study the use of word embeddings learned and definitions for these effects. Our main hypoth- by Word2Vec (Mikolov et al., 2013) or Glove esis is that a simple word will likely be more com- (Pennington et al., 2014). One important advan- mon in a collection of texts than their more dif- tage of these models is that they allow to com- ficult synonyms. We built an index from a large pute the similarity between terms without the need collection of texts such as MedLinePlus. This in- of using synonym dictionaries that are generally domain-dependent. Jan De Belder, Koen Deschacht, and Marie-Francine Moens. 2010. Lexical simplification. In Proceed- Acknowledgments ings of ITEC2010: 1st international conference on interdisciplinary research on technology, education and communication. This work was supported by eGovernAbility- Access project (TIN2014-52665-C2-2-R). Siobhan Lucy Devlin. 1999. Simplifying natural lan- guage for aphasic readers. Ph.D. thesis, University of Sunderland. References Bonnie Dorr, David Zajic, and Richard Schwartz. Alberto Anula. 2007. Tipos de textos, complejidad 2003. Hedge trimmer: A parse-and-trim approach lingüı́stica y facilitación de la lectura. In Actas to headline generation. In Proceedings of the HLT- del IV Congreso de la Asociación Asiática de His- NAACL 03 on Text summarization workshop-Volume panistas. 5, pages 1–8. Association for Computational Lin- guistics. Ricardo Baeza-Yates, Luz Rello, and Julia Dembowski. Biljana Drndarevic, Sanja Štajner, and Horacio Sag- 2015. Cassa: A context-aware synonym simplifica- gion. 2012. Reporting simply: A lexical simplifica- tion algorithm. In Human Language Technologies: tion strategy for enhancing text accessibility. In Pro- The 2015 Annual Conference of the North American ceedings of Easy-to-Read on the Web Symposium. Chapter of the ACL, page 13801385. EC. 2009. Guideline on the readability of the labelling Thimoty Barbieri, Antonio BIANCHI, Licia SBAT- and package leaflet of medicinal products for human TELLA, Ferdinando CARELLA, and Marco use. FERRA. 2005. Multiabile: A multimodal learning environment for the inclusion of impaired e-learners Noémie Elhadad. 2006. Comprehending technical using tactile feedbacks, voice, gesturing, and texts: predicting and defining unfamiliar terms. In text simplification. Assistive Technology: From AMIA. Virtuality to Reality, 16(1):406–410. Council EU. 2001. Directive 2001/83/ec of the eu- Susana Bautista and Horacio Saggion. 2014. Can nu- ropean parliament and of the council of 6 novem- merical expressions be simpler? implementation and ber 2001 on the community code relating to medic- demostration of a numerical simplification system inal products for human use. Official Journal L, for spanish. In LREC, pages 956–962. 311(28):11. Joseph L Fleiss. 1971. Measuring nominal scale Stefan Bott, Luz Rello, Biljana Drndarevic, and Hora- agreement among many raters. Psychological bul- cio Saggion. 2012. Can spanish be simpler? lexsis: letin, 76(5):378. Lexical simplification for spanish. In Proceedings of COLING 2012, pages 357–374. Robert Gunning. 1986. The technique of clear writing. Raman Chandrasekar, Christine Doran, and Bangalore Siddhartha Jonnalagadda and Graciela Gonzalez. Srinivas. 1996. Motivations and methods for text 2011. Biosimplify: an open source sentence sim- simplification. In Proceedings of the 16th confer- plification engine to improve recall in automatic ence on Computational linguistics-Volume 2, pages biomedical information extraction. arXiv preprint 1041–1044. Association for Computational Linguis- arXiv:1107.5744. tics. Siddhartha Jonnalagadda, Luis Tari, Jörg Hakenberg, Jacob Cohen. 1960. A coefficient of agreement for Chitta Baral, and Graciela Gonzalez. 2009. To- nominal scale. Educ Psychol Meas, 20:37–46. wards effective sentence simplification for auto- matic processing of biomedical text. In Proceed- ings of Human Language Technologies: The 2009 Terry C Davis, Michael S Wolf, Pat F Bass, Mark Annual Conference of the North American Chap- Middlebrooks, Estela Kennen, David W Baker, ter of the Association for Computational Linguistics, Charles L Bennett, Ramon Durazo-Arvizu, Anna Companion Volume: Short Papers, pages 177–180. Bocchini, Stephanie Savory, et al. 2006a. Low lit- Association for Computational Linguistics. eracy impairs comprehension of prescription drug warning labels. Journal of general internal Sasikiran Kandula, Dorothy Curtis, and Qing Zeng- medicine, 21(8):847–851. Treitler. 2010. A semantic and syntactic text sim- plification tool for health content. In AMIA Annu Terry C Davis, Michael S Wolf, Pat F Bass, Jason A Symp Proc, volume 2010, pages 366–70. Thompson, Hugh H Tilson, Marolee Neuberger, and Ruth M Parker. 2006b. Literacy and misunder- J Peter Kincaid, Robert P Fishburne Jr, Richard L standing prescription drug labels. Annals of Internal Rogers, and Brad S Chissom. 1975. Derivation Medicine, 145(12):887–894. of new readability formulas (automated readability index, fog count and flesch reading ease formula) Carla Pires, Marina Vigário, and Afonso Cavaco. for navy enlisted personnel. Technical report, DTIC 2015. Readability of medicinal package leaflets: a Document. systematic review. Revista de saude publica, 49:1– 13. Mark Kutner, Elizabeth Greenberg, and Justin Baer. 2006. A first look at the literacy of america’s adults Horacio Saggion, Elena Gómez Martı́nez, Esteban in the 21st century. nces 2006-470. National Center Etayo, Alberto Anula, and Lorena Bourg. 2011. for Education Statistics. Text simplification in simplext. making text more accessible. Procesamiento del lenguaje natural, J Richard Landis and Gary G Koch. 1977. The mea- 47:341–342. surement of observer agreement for categorical data. biometrics, pages 159–174. Licia Sbattella and Roberto Tedesco. 2012. Calcu- lating text complexity during the authoring phase. Donald A Lindberg, Betsy L Humphreys, and Alexa T In Proceedings of Easy-to-Read on the Web Sympo- McCray. 1993. The unified medical language sium. system. Methods of information in medicine, Isabel Segura-Bedmar, Paloma Martı́nez, Ricardo Re- 32(4):281–291. vert, and Julián Moreno-Schneider. 2015. Explor- ing spanish health social media for detecting drug ef- Cerdá JC March, Rodrı́guez MA Prieto, Azarola A fects. BMC medical informatics and decision mak- Ruiz, Lorda P Simón, Cantalejo I Barrio, and Alina ing, 15(2):1. Danet. 2010. [quality improvement of health infor- mation included in drug information leaflets. patient Matthew Shardlow. 2014. A survey of automated text and health professional expectations]. Atención pri- simplification. International Journal of Advanced maria/Sociedad Española de Medicina de Familia y Computer Science and Applications, 4(1). Comunitaria, 42(1):22–27. Advaith Siddharthan. 2002. Resolving attachment and Diana McCarthy and Roberto Navigli. 2007. Semeval- clause boundary ambiguities for simplifying relative 2007 task 10: English lexical substitution task. In clause constructs. In Proceedings of the Student Proceedings of the 4th International Workshop on Workshop, 40th Meeting of the Association for Com- Semantic Evaluations, pages 48–53. Association for putational Linguistics (ACL02), pages 60–65. Computational Linguistics. Advaith Siddharthan. 2014. A survey of research on Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor- text simplification. ITL-International Journal of Ap- rado, and Jeff Dean. 2013. Distributed representa- plied Linguistics, 165(2):259–298. tions of words and phrases and their compositional- ity. In Advances in neural information processing Lucia Specia, Sujay Kumar Jauhar, and Rada Mihalcea. systems, pages 3111–3119. 2012. Semeval-2012 task 1: English lexical sim- plification. In Proceedings of the First Joint Con- George A Miller. 1995. Wordnet: a lexical ference on Lexical and Computational Semantics- database for english. Communications of the ACM, Volume 1: Proceedings of the main conference and 38(11):39–41. the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evalua- Roberto Navigli and Simone Paolo Ponzetto. 2012. tion, pages 347–355. Association for Computational Babelnet: The automatic construction, evaluation Linguistics. and application of a wide-coverage multilingual se- mantic network. Artificial Intelligence, 193:217– Irina Temnikova. 2012. Text Complexity and Text Sim- 250. plification in the Crisis Management domain. Ph.D. thesis, University of Wolverhampton. Jeffrey Pennington, Richard Socher, and Christopher D Zhemin Zhu, Delphine Bernhard, and Iryna Gurevych. Manning. 2014. Glove: Global vectors for word 2010. A monolingual tree-based translation model representation. In EMNLP, volume 14, pages 1532– for sentence simplification. In Proceedings of the 1543. 23rd international conference on computational lin- Sarah E Petersen and Mari Ostendorf. 2007. Text guistics, pages 1353–1361. Association for Compu- simplification for language learners: a corpus anal- tational Linguistics. ysis. In Proceedings of Workshop on Speech and Language Technology for Education, pages 69–72. Ángeles Marı́a Piñero-López, Pilar Modamio, F. Ce- cilia Lastra, and L. Eduardo Mariño. 2016. Read- ability analysis of the package leaflets for biologi- cal medicines available on the internet between 2007 and 2013: An analytical longitudinal study. J Med Internet Res, 18(5):e100.