=Paper= {{Paper |id=Vol-2470/p31 |storemode=property |title=Automatic detection of contraindications of medicines in package leaflet |pdfUrl=https://ceur-ws.org/Vol-2470/p31.pdf |volume=Vol-2470 |authors=Jonas Žalinkevičius,Rita Butkienė |dblpUrl=https://dblp.org/rec/conf/ivus/ZalinkeviciusB19 }} ==Automatic detection of contraindications of medicines in package leaflet== https://ceur-ws.org/Vol-2470/p31.pdf
          Automatic Detection of Contraindications of
                Medicines in Package Leaflet
                       Jonas Žalinkevičius                                                            Rita Butkienė
                      Faculty of Informatics                                                     Faculty of Informatics
                Kaunas University of Technology                                              Kaunas University of Technology
                        Kaunas, Lithuania                                                           Kaunas, Lithuania
                jonas.zalinkevicius@hotmail.com                                                   rita.butkiene@ktu.lt




    Abstract— Before physicians prescribe medicines, they must                  A system that automates the extraction of
take into consideration the patient’s diseases and medicines they           contraindications from leaflet text is described is in Section 3.
use. This is done to avoid complications that may occur. All                Using this system all leaflets of medicines registered in
information about possible contraindications is written in the              Lithuania were analyzed. The results of this analysis
medicine package leaflet. A system that can automatically detect            (contraindications extracted) are used in a commercial
contraindication mention in the Lithuanian text of leaflet                  medications information system that is used by Lithuanian
applying natural language parsing is presented. This system                 physicians for prescription of medications. The evaluation of
gives a possibility to shorten the time needed for medicines                the obtained results is presented in Section 4.
prescription decision making. The results of the experiment
showed that the created system successfully detected 56 per cent                                 II. RELATED WORK
contraindications.
                                                                                In Lithuania, it is established that each medicine registered
    Keywords—       medicine   contraindications, drug–drug                 in Lithuania must contain a package leaflet describing
interactions, shallow parsing, morphological analysis, noun                 therapeutic indications, possible contraindications, safety
phrase detection                                                            precautions, and usage information in the Lithuanian
                                                                            language. In order to be sure that the patient does not suffer
                         I. INTRODUCTION                                    from possible contraindication, the physician should read
    When a patient is diagnosed with a new disease,                         through all leaflet text before prescribing the medicine.
additionally physician asks the patient about his allergies,                Usually, the analysis of leaflets is time-consuming, so
previous health problems, chronic deceases, what medications                physicians tend to skip it and rely on the knowledge and
and food supplements he is using. After taking gathered                     experience they have gained.
information into consideration and evaluation of possible                       There are lots of systems developed for analysis and
contraindications with prescribed medication physician                      information extraction from the biomedical text in the English
assigns treatment and, if needed, changes previous                          language. But there is no solution for the detection of
assignments. Almost all information about contraindications                 contraindication (i.e. contraindication with disease or
can be found in the medicine package leaflet. According to                  contraindication with the pharmacological group) mentions in
Lithuania’s medicines registration procedure [1], every                     Lithuanian written text. We have analyzed articles that
package must have a leaflet written in Lithuanian. Information              describe similar problems when analyzing biomedical text.
in the leaflet must be divided into six sections [2], although              For example, a tool Semantator [4] was created for converting
the text in a section can be written in not structural manner.              biomedical text to linked data. It used ontology-based
So, if a physician needs to find possible contraindications, he             information extraction using biomedical ontology terms
must read all text in the second section (Table 1) or search for            hosted in BioPortal and ontology editor Protégé for text
information on the Internet. Usually, health care information               preprocessing. A semantic annotation and inference platform
consists of unstructured data and that leads to inaccurate                  SENTIENT-MD [3] creates a dependency graph as the first
search results that contain hundreds of links to not relevant               step for dependency parsing which is one of the tasks of
documents. And the user must read through results to find                   semantic annotation of medical knowledge in natural
relevant information.                                                       language text. Markus Bundschus [5] used probabilistic
     Automatic information extraction tools can extract                     graphical models (Conditional Random Fields) to identify
biomedical data, save it in a structural way, and minimize                  semantic relations.
information search problem. However, automatic text analysis                    Although all these authors work on texts written in
and information extraction from unstructured text in the                    English, we found that common rules and approaches could
medical domain is a challenging task [3]. The aim of this paper             be applied to Lithuanian texts as well. In order to extract
is to present a system that gives physicians the possibility of a           information from text, preprocessing is needed using natural
faster and more accurate way of finding contraindications                   language processing: text segmentation, a morphological
using automated contraindication detection in the medicine                  analysis should be performed and then a syntactic parse tree
package leaflet.                                                            or the dependency graph [6]. [7] should be formed. For
                                                                            semantic relations detection, existing ontologies or knowledge
                                                                            bases should be used.
 © 2019 for this paper by its authors. Use permitted under Creative
 Commons License Attribution 4.0 International (CC BY 4.0)

                                                                      110
                     III. SYSTEM DESCRIPTION                              B. Morphological analysis
   In this section, a system for the detection of                             A morphological analysis forms a background for
contraindication mentions in the medicine leaflet text written            information extraction about contraindications. In this stage, a
in Lithuanian is presented. The system implements a text                  given text is split into lexical units (e.g. sentences, lexemes)
analysis pipeline of four analysis stages: extraction of                  and analyzed morphologically. For this task, a web service
contraindication text block, morphological analysis, noun                 provided by the system “http://semantika.lt” [8] is used. The
phrase detection, and annotation.                                         web service returns morphological features for each given
                                                                          lexeme: part of speech, gender, number and so on.
    Additionally, all annotated phrases are checked is it in the
database of noun phrases to be ignored or not. This database              C. Noun phrase detection
is manually filled and helps to obtain more precise results. The              Phrases that express a specific contraindication usually are
overall pipeline for the detection of contraindication mentions           noun phrases, for example, heart attack, type one diabetes,
is shown in fig. 1.                                                       pancreatitis, and so on. Therefore, we chose a phrase structure
    Below each stage of text analysis is discussed in more                grammar method because it better fits for noun phrase
detail.                                                                   detection than dependency grammar as it was suggested by
                                                                          Axel Halvoet in his monography [9]. Phrase structure rules are
A. Extraction of contraindication text blocks                             used to split natural language written sentence into its
    In Lithuania, when describing the medicine, a producer                constituent parts: lexical and phrasal categories [9], [10], [11].
must follow a certain template of the package leaflet [2]. This           For the noun phrase detection in the medicine’s leaflet, three
template splits the description of leaflet into 6 sections listed         phrase structure rules ware specified (see Table 2).
in Table 1
                                                                                     TABLE II.        NOUN PHRASE STRUCTURE RULES
          TABLE I.        MEDICINE PACKAGE LEAFLET SECTIONS                No                                   Rule
     No                                Section                                   A lexeme is a part of a noun phrase if it is a noun in the genitive
                                                                           1     case and follows another noun in the genitive case or adjective or
 1           What X is and what it is used for                                   numeral or participle.
                                                                                 A lexeme is a part of a noun phrase if it is an attributive adjective
 2           What you need to know before you   X
                                                                           2     in the same case, number, and gender as a base noun and follows
 3           How to   X                                               noun in the genitive case or adjective or numeral or participle.
                                                                                 A lexeme is a part of noun phrase if it is an attributive numeral in
 4           Possible side effects                                         3     the same case, number and gender as the base noun and follows
                                                                                 noun in the genitive case, or adjective, or numeral, or participle.
 5           How to store X

 6           Contents of the pack and other information
                                                                             An algorithm implemented for the noun phrase detection
                                                                          checks every lexeme in the sentence for the satisfaction of
                                                                          conditions of at least one rule presents in Table 2. If the
    The information which, the patient should be aware of
                                                                          condition is satisfied a lexeme is included in the noun phrase.
before he or she takes the medicine, is presented in section
                                                                          The workflow of analysis of the noun phrase Lėtinis
number two. An example of this section is shown in fig. 2 with
                                                                          reumatinis perikarditas (Chronic rheumatic pericarditis) is
highlighted contraindications phrases. So, the first task of our
                                                                          shown in Table 3.
system is to find this section and extract its text for further
analysis.




Fig. 1. Contraindications lookup process activity



                                                                    111
Fig. 2. Example of “What you need to know before use of X” section in the medicine package leaflet




  TABLE III.      EXAMPLE OF NOUN PHRASE DETECTION WORKFLOW                         and name of the item from the database. If the noun phrase
                                                                                    matches the name in ICD the phrase is tagged as
  Step               Action                      Rule satisfaction                  contraindication with the disease. If the phrase matches the
          The first lexeme Lėtinis         No rule condition is satisfied           ATC item name, it is tagged as contraindication with a
          (Chronic) is an adjective in     fully, but according to rule             pharmaceutical chemical group, and if the phrase matches the
 1
          the nominative case, singular    No. 2 the lexeme is a good
          and of masculine gender          candidate for the noun phrase.
                                                                                    name of the active substance, it is tagged as contraindication
          The second word reumatinis       No rule condition is satisfied           with an active substance.
          is an adjective in the           fully, but according to rule
 2        nominative case, singular and    No. 2 the lexeme is a good
                                                                                        It is worthy to mention that before comparison of the noun
          of masculine gender and          candidate for the noun phrase.           phrases all identified phrases are checked against phrases in
          follows the adjective Lėtinis                                             the database of noun phrases to be ignored. In the text of
                                           The condition of rule No. 2 is           medicine package leaflet, a lot of words (i.e. illness, hand and
          The third word perikarditas is   satisfied. The noun is a base            so on) that are irrelevant (do not express a contraindication)
          a noun in the nominative case,   noun for the first two
                                                                                    but are used in ICD, ATC and active substances lists could be
          singular and of masculine        adjectives.      They        are
          gender It follows the            attributive adjectives of the            found. The database of noun phrases to be ignored was filled
 3                                                                                  manually with the help of a professional pharmacist.
          adjectives      lėtinis   and    noun. So, the condition of rule
          reumatinis which are in the      No. 2 is satisfied as well. The
          same case, number and            analysis of the third lexeme                                   IV. EXPERIMENT
          gender.                          completes the construction of
                                           the noun phrase.                             The aim of the experiment is to evaluate the created system
                                                                                    and check if a tool can achieve its target - to give physicians
                                                                                    the possibility of a faster and more accurate way of finding
    When the construction of the noun phrase is complete the                        contraindications. The experiment was done by manually
form of the head noun in the phrase is changed to its canonical                     annotating contraindications mentions in the package leaflet
form (lemma). This is done because the name of item                                 text block and comparing results with the system’s results.
registered in the International Classification of Diseases (ICD)                    This was done by a professional pharmacist who works in JSC
[12], Anatomical Therapeutic Chemical Classification System                         Skaitos kompiuterių servisas.
(ATC) [13] or lists of active substances are in the canonical
form, therefore, normalization is required to ensure the correct
comparison of values in the next stage of analysis.                                 A. Plan
D. Annotation                                                                           The experiment was organized as follows. From
    All noun phrases identified in the previous stage are                           medicines database ten randomly selected leaflets were
reviewed and checked for contraindication. If a                                     analyzed using the system created. The results of the analysis
contraindication is identified, the phrase is annotated. For                        were automatically gathered into the table, which example is
annotation three databases are used: ICD, ATC and the lists of                      presented in Table 4 In the first column the code of item
active substances. The algorithm compares the noun phrase                           automatically found in the text of leaflet by the system is
                                                                                    indicated. The second column represents the database (ATC,

                                                                              112
ICD or active substances) where the item is registered. The                   B. Results
third column was used for the evaluation of annotation                            The results of the evaluation are presented in Table 5. The
correctness.                                                                  precision, recall and F-Score metrics have been calculated for
                                                                              each leaflet analyzed. Additionally, the ratio between the
   TABLE IV.    AUTOMATICALLY DETECTED CONTRAINDICATIONS
            RESULTS EVALUATION FOR SINGLE LEAFLET                             number of correctly detected contraindications and overall
                                                                              automatically detected contraindications was calculated as
  Code                 Domain                  Is detection correct           well. This metric allows to evaluate how accurate the results
 J01CR     ATC                                False                           are and to use them in further calculations.
 J05AE     ATC                                True                                 Results showed that the system developed is able to
                                              True                            correctly detect 56% of relevant contraindications. The
 I09.2     ICD
                                                                              average number of links detected automatically is 1482.8
                                                                              while manually detected links are 197.9. The number of links
     The same randomly selected leaflets were analyzed and                    detected automatically in one leaflet is average four times
annotated manually, and the table of the same structure was                   higher, than detected manually. The average number of
filled in with manual annotation results. Manually found                      erroneous links to ICD is 72%, to ATC - 90%, and links to the
contraindications were not interpreted or changed to                          list of active substances - 61%.
synonyms. For example, heart attack and myocardial                                Calculations show that the system is able to achieve
infarction are the same diseases. But ICD contains only one                   0.25(±0.23) precision, 0.56(±0.32) recall, and 0.31(±0.19) F-
name of this disease - myocardial infarction. The created                     score value. To give a better perspective where the system’s
system is not able to recognize the heart attack as a synonym                 failures were and possible reasons for that, Pearson correlation
of myocardial infarction.                                                     coefficient calculations between various indicators were done
    Additionally, the active substances, mentioned in the                     (Table 6). The biggest impact on F-Score had incorrectly
leaflet, were translated into the Latin language (nominative                  detected links to ICD, a coefficient was -0.89. The reason why
and genitive grammatical cases). This was done because the                    precision was so low is that of the high ratio between
database of active substances, that was provided, has three                   automatically and manually detected links.
versions of translation: Lithuanian, Latin in the nominative
case and Latin in the genitive case.



                                                      TABLE V.         EXPERIMENT RESULTS
      ID     Auto.        Auto.        Man.            Precision        Recall       F-Score     Ratio of   Err. links   Err. links    Err. links
            detected     correctly    detected                                                    links      to ICD       to ATC       to active
              links      detected       links                                                    amounts                              substances
                           links
 13092     1906         346          385              0.18            0.90          0.30        4.95        82%          100%         65%
 13571     1899         367          444              0.19            0.83          0.31        4.28        81%          100%         58%
 859       87           67           162              0.77            0.41          0.54        0.54        17%          100%         100%
 1300      400          28           146              0.07            0.19          0.10        2.74        98%          100%         24%
 10958     464          14           71               0.03            0.20          0.05        6.54        100%         25%          21%
 1872      283          66           68               0.23            0.97          0.38        4.16        77%          100%         43%
 5363      473          237          291              0.50            0.81          0.62        1.63        46%          88%          49%
 13273     158          51           72               0.32            0.71          0.44        2.19        45%          100%         100%
 10744     1199         150          175              0.13            0.29          0.18        6.85        87%          100%         100%
 16551     1090         120          204              0.11            0.25          0.15        5.34        90%          87%          51%



 Median    468.5        93.5         168.5            0.185           0.56          0.305       4.22        82%          100%         55%
 Q1        312.25       54.75        90.5             0.115           0.26          0.158       2.328       54%          91%          45%
 Q3        1171.75      215.25       269.25           0.298           0.825         0.425       5.243       89%          100%         91%
 Avg       795.9        144.6        201.8            0.25            0.56          0.31        3.92        72%          90%          61%
 Std dev   686.52       129.45       132.27           0.23            0.32          0.19        2.10        27%          23%          30%
 Min       87           14           68               0.03            0.19          0.05        0.54        17%          25%          21%
 Max       1906         367          444              0.77            0.97          0.62        6.85        100%         100%         100%




                                                                        113
             TABLE VI.        CORRELATION OF ESTIMATES AND INDICATORS                  medications registered in Lithuania. Extracted data was used
                                                                                       in the pilot project for extending the functionality of the
                                                           Estimates                   system “https://gydytojams.vaistai.lt”. The additional
                                               Precision     Recall    F-Score         function supports physicians in search of possible
                                                                                       contraindications that are relevant to patient medical records.
              Incorrectly detected links to    -0.9655     -0.3114     -0.8939
                                                                                       Moreover, physicians have the possibility to give feedback
              ICD list amount
                                                                                       about erroneous contraindications presented. In such a way
              Incorrectly detected links to    0.3292      0.4184      0.4382
              ATC list amount                                                          they help in expanding the list of phrases to be ignored and
 Indicator




              Incorrectly detected links to    0.5229      0.1244      0.4523          eliminating incorrect contraindication links.
              active     substances     list                                               The experiment shows that approximately 56% of
              amount
                                                                                       contraindications are found but only every fourth is correct.
              Automatically and manually       -0.8119     -0.2583     -0.7682
              detected contraindications
                                                                                       Several changes in the algorithm remain for future work. First,
              ratio                                                                    before the noun phrase is looked up in databases, a context
                                                                                       must be identified. This would reduce the number of incorrect
                                                                                       links. Second, to detect phrases that refer to medication
C. Conclusions of the experiment                                                       analyzed and to ignore them.
    The experiment shows that the system automatically
successfully detected more than half of the relevant                                                            ACKNOWLEDGMENT
contraindication links (56%). But 75% of links were                                       Data for this system was provided by JSC Skaitos
erroneous and the system lacks precision. The reason for that                          kompiuterių servisas
is a high number of incorrect links to ICD (r=-0.9655), this
indicator has the most negative impact on the precision and F-                                                       REFERENCES
Score results. This might be because of commonly used                                  [1]  VVKT prie LR SAM, "Įsakymas 2015 m. liepos 3 d. Nr.(1.72E)1A-
phrases that are not contraindications but used in the ICD list.                            755 Dėl paraiškų registruoti vaistinį preparatą, perregistruoti vaistinį
                                                                                            preparatą, pakeisti registracijos pažymėjimo sąlygas, teisės į vaistinio
For example, the word allergy does not imply that this is a                                 preparato registraciją perleidimo, nereglamentiniam pakuotės ir (ar,"
contraindication and must be ignored. Another reason for low                                03 07 2016. [Online].
estimates results is, the number of detected contraindications                         [2] European Medicines Agency, "European Medicines Agency," 02 2019.
phrases. Calculations show, that the higher is the difference                               [Online].
between       automatically       and     manually     detected                        [3] S. Sahay, E. Agichtein, B. Li, E. V. Garcia and A. Ram, "Semantic
contraindications phrases, the lower are precision and F-Score                              Annotation and Inference for Medical Knowledge Discovery," 2007.
results. The reason for that is, a high number of noun phrases                              [Online].
that are irrelevant to contraindications noun phrases, for                             [4] C. Tao, D. Song, D. Sharma and C. G. Chute, "Semantator: Semantic
example, pill, driving.                                                                     annotator for converting biomedical text to linked data.," Journal of
                                                                                            Biomedical Informatics, vol. 46, no. 5, pp. 882-893. 12p., Oct2016.
    Additionally, considering why F-Score is so low (0.31) the                         [5] M. Bundschus, M. Dejori, M. Stetter, V. Tresp and H.-P. Kriegel,
assumption that this is because of low precision (0.25) can be                              "Extraction of semantic biomedical relations from text using
done. To raise this indicator the list of phrases to be ignored                             conditional random fields.," BMC Bioinformatics, vol. 9, pp. 1-14,
                                                                                            2008.
(common word and phrases) must be used. The most frequent
                                                                                       [6] Y. Zhang, H.-Y. Wu, J. Xu, J. Wang, S. Ergin, L. Li and H. Xu,
reasons for the incorrect detection of contraindications are:                               "Leveraging syntactic and semantic graph kernels to extract
                the context of the phrase in the sentence is not taken                     pharmacokinetic drug drug interactions from biomedical literature.,"
                                                                                            BMC Systems Biology, vol. 107, pp. 323-334 12p., 8/26/2016.
                 into account;
                                                                                       [7] R. Frank, Phrase Structure Composition and Syntactic Dependencies,
                Conjunctions are not taken into account and two or                         vol. 38, Cambridge, Mass: The MIT Press, 2002, pp. 2-27.
                 more noun phrases (i.e. “…kidney and liver                            [8] Damaševičius, R., Napoli, C., Sidekerskienė, T. and Woźniak, M.,
                                                                                            2017. IMF mode demixing in EMD for jitter analysis. Journal of
                 diseases…”) are not identified;                                            Computational Science, 22, pp.240-252.
          Brackets that are used to specify contraindication are                      [9] Kaunas University of Technology and Vytautas Magnus University,
           not taken into account (“…liver tumor (malignant or                              "Lietuvių kalbos sintaksinės ir semantinės analizės informacinė
                                                                                            sistema," [Online].
           benign)…”).
                                                                                       [10] A. Holvoet, Bendrosios sintaksės pagrindai, Vilnius: Vilniaus
    To avoid errors caused by those reasons, users of                                       Universitetas, Asociacija „Academia Salensis“, 2009.
“https://gydytojams.vaistai.lt” IS will be able to mark                                [11] D. Jurafsky and J. H. Martin, "Formal Grammars of English," in
contraindication as erroneous and if the pharmacist approves                                Speech and Language Processing (2Nd Edition), JAV, Prentice-Hall,
                                                                                            Inc., 2009, pp. 396-408.
that it will be removed from the database.
                                                                                       [12] D. Šveikauskienė, "Lietuvių kalbos sintaksinė analizė," Lietuvių kalba,
                                                                                            vol. 7, 2013.
                                 V. CONCLUSIONS
                                                                                       [13] Wózniak, M., Połap, D., Nowicki, R.K., Napoli, C., Pappalardo, G. and
   In this paper, the system which automatically detects                                    Tramontana, E., 2015, July. Novel approach toward medical signals
contraindications and links them to existing “Skaitos                                       classifier. In 2015 International Joint Conference on Neural Networks
kompiuterių servisas” databases have been introduced.                                       (IJCNN), pp. 1-7 . IEEE.
System analyses text of medications leaflets, it extracts noun                         [14] Valstybinė ligonių kasa, "TLK-10-AM / ACHI / ACS elektroninis
                                                                                            vadovas," [Online].
phrases and links them to corresponding items in ATC, ICD,
and active substances list. The system presented was used for                          [15] Norwegian Institute of Public Health, "WHOCC - Structure and
                                                                                            principles," [Online].
the extraction of contraindications from leaflets of all




                                                                                 114