-

10.4225/08/5490FA2E01A90

Concept Identi cation and Normalisation for Adverse Drug Event Discovery in Medical Forums

Alejandro Metke-Jimenez

alejandro.metke@csiro.au 0

Sarvnaz Karimi

sarvnaz.karimi@csiro.au 0 0 CSIRO , Australia

Social media is becoming an increasingly important source of information to complement traditional pharmacovigilance methods. In order to identify signals of potential adverse drug reactions, it is necessary to rst identify medical concepts and drugs in the text. We evaluate di erent concept extraction techniques on medical forums and for the machine learning approaches we encode complex annotations using a scheme that showed good results in other domains. Our study shows that the extended encoding scheme, although imperfect, still produces good results despite the complexities of social media. The comparison of techniques shows that the machine learning approach signi cantly outperforms the other approaches.

Text Mining Information Extraction Ontology-based Text Normalisation Drug Safety Adverse Drug Reaction Discovery

Adverse Drug Reactions (ADRs) are a major concern for public health. An ADR is an injury caused by a medication that is administered at the recommended dosage, for recommended symptoms. The traditional pharmacovigilance methods have shown limitations that have prompted the search for alternative sources that might help identify signals of potential ADRs.

One of these sources is social media. However, it is rst necessary to identify concepts of interest, such as mentions of adverse e ects, in the text which is unstructured and noisy. This step is critical because errors can a ect the subsequent stages of the signal detection process.

Background and related work

Although there is a large body of literature on generic information extraction from text such as news and social media, especially Twitter, there is limited work on the speci c area of ADR detection. A comprehensive survey of text and data mining techniques used for ADR signal detection can be found in [ 1 ].

In this paper we are concerned with concept extraction which can be divided in two steps: identifying spans of text that represent a concept of interest, referred to as concept identi cation, and mapping the spans to the corresponding concepts in a chosen ontology, referred to as concept normalisation.

The problem of medical concept extraction has been extensively studied by the clinical text mining community. Most techniques used to extract ADRs from social media use dictionary-based approaches. A review of these approaches and the most commonly used lexicons can be found in [ 2 ].

More recently, machine learning techniques have been applied to extract ADRs from social media. In [ 3 ] the authors implemented a CRF classi er to detect mentions of ADRs in a corpus of Twitter and DailyStrength posts and reported improvements over dictionary-based approaches. 3

Problem formulation

Our goal is to evaluate the concept extraction task speci cally on medical forums. Apart from the challenges that this type of data raises, such as dealing with misspellings and colloquial language, we also aim to evaluate techniques that are widely used to determine how well they perform against each other. 3.1

Concept identi cation

Concept identi cation consists of identifying spans of text that represent medical concepts. This task can be framed as a binary classi cation problem and evaluated using precision, recall, and F-score. In the strict version of the evaluation, the spans are required to match exactly. In the relaxed version the spans only need to overlap to be considered a positive match.

In order to consider the correct classi cation of negative examples we also evaluate the systems using accuracy. The set of negative examples is de ned as all the spans that are created by all the systems under evaluation that are not part of the gold standard. 3.2

Concept normalisation

The normalisation step takes the spans that were identi ed in the identi cation step and maps them to a concept in an ontology. ADR spans are mapped to the Clinical Finding hierarchy of SNOMED CT and drug spans to concepts in the Australian Medicines Terminology (AMT).

Concept normalisation is often evaluated using a metric referred to as accuracy. To avoid confusion with the metric used in the rst part of the task, we refer to this metric as e ectiveness, which is de ned as Next time I’ll try my luck with Paracetamol.

DB DI

Adverse Drug Event Discovery in Medical Forums The pill I took consisted of 50 MG Diclofenac and 200 MG Misoprostol. 3 HB

HI ... it has left me feeling exausted, and depressed. where nT P is the number of spans that match the gold standard exactly, ncorrect is the number of spans that were mapped to the correct concept in the corresponding ontology, and tg is the total number of identi ed concepts or spans in the gold standard. The relaxed version only considers the spans that were correctly identi ed in the previous stage. 4

Dataset

In our experiments, we used an annotated corpus called CSIRO Adverse Drug Event Corpus (Cadec)1. This corpus is a collection of medical posts sourced from the medical forum AskaPatient. A detailed description of the corpus can be found in [ 4 ]. To develop and evaluate a machine learning approach, we divided the data into training and testing sets, using a 70/30 split. 5

Methods

Most existing approaches to ADR mining in social media use dictionary-based techniques based on pattern matching rules or sliding windows. We implemented a sliding window approach using the Lucene search engine, without using stemming or removing stop words.

We also implemented a CRF classi er, similar to the one used in [ 3 ] but with fewer features, using the Stanford NER suite [ 5 ]. A CRF classi er takes as input di erent features that are derived from the text, such as the words that surround each token, letter n-grams and word shape features.

One of the challenges of dealing with discontinuous spans is representing them in a format that is suitable as input to the classi er. Continuous spans are typically represented using the standard Begin, Inside, Outside (BIO) chunking representation. This format does not support the notion of discontinuous spans and several solutions have been proposed to overcome this limitation. The most successful approach in tasks such as CLEF has been to extend the BIO format with additional tags to represent the discontinuous spans.

With the extended BIO format, the following additional tags are introduced: DfB, Ig and HfB, Ig. The rst set of tags is used to represent discontinuous, non-overlapping spans. The second set of tags is used to represent discontinuous, overlapping spans that share one or more tokens (the H stands for Head, as in head word). Figure 1 shows an example of a complex span.

One limitation of this approach is that it is impossible to represent several discontinuous spans in the same sentence unambiguously. To determine how this might a ect the performance of the CRF approach with the CADEC dataset, a round trip transformation was done on the gold standard annotations and the results are shown in Table 1. This is equivalent to having a perfect classi er.

The CRF classi er only identi es relevant spans but does not map them to concepts. Two approaches were explored to achieve this mapping. The rst one is based on the Vector Space Model (VSM) and was implemented using Lucene. The target ontology was indexed using stemming and removing stop words by creating a document for each term and storing the corresponding concept id. Then, the text of each span was used to query the index, without requiring all the tokens to match. The top ranked concept was assigned to the span and if the query returned no results then the span was annotated as concept less.

The second approach uses Ontoserver, a terminology server developed at the Australian e-Health Research Centre, that given a free-text query returns the most relevant SNOMED CT and AMT concepts. Ontoserver uses a purposetuned retrieval function based on a multi-pre x matching algorithm [ 6 ].

To determine if the improvements obtained with any two di erent methods were statistically signi cant, we used McNemar's test. 6

Results and discussion

The results of the concept identi cation task are shown in Table 2. The CRF implementation outperforms MetaMap and all the dictionary-based implementations in all of the metrics that were considered, in both strict and relaxed modes, as expected.

Identifying drugs usually involves less ambiguity than identifying ADRs and therefore better results were expected in this task. The results show that the CRF indeed performs better in this task that in the ADR identi cation task. Note also that most of the dictionary-based implementations achieve good recall but low precision; this is likely due to some of the constraints in the annotation guidelines, for example, drug classes are excluded. The CRF is capable of learning these constraints while the dictionary-based approaches are not.

Table 3 shows the results of the concept normalisation task. In this case the strict metric is more relevant, because some implementations can achieve a very high score in the relaxed version despite having a very poor overall performance. The results show that Ontoserver outperforms the other approaches when normalising ADRs. Overall, however, the results are quite poor. This highlights two important aspects of the task. First, it is inherently di cult to map colloquial language to ontologies that contain more formal terms. Second, because in this task the goal is to map the spans to SNOMED CT concepts, the quality of the results when using approaches that rely on other controlled vocabularies will depend on the quality of the mappings between those vocabularies and SNOMED CT.

It was also expected that the di erent methods would perform better when normalising drugs than when normalising ADRs. For most implementations this turned out to be true, except for the dictionary-based methods that are not based on AMT. These methods were unable to normalise any concepts because maps between the other controlled vocabularies and AMT do not currently exist. 7

Conclusions and future work

Pharmacovigilance should no longer rely only on manual reports of potential drug adverse e ects. One viable alternative is actively detecting signals of adverse drug reactions in social media through text mining.

We conducted an empirical evaluation of di erent methods to automatically extract concepts from medical forums. We explored the implications of representing complex annotations in a format suitable for use with machine learning methods. Finally, we proposed and implemented two concept normalisation techniques that we used in conjunction with our machine learning implementation.

We showed that there is some ambiguity when using the extended BIO format to represent the complex annotations, but the impact on the overall performance is not substantial. The experimental results showed that the CRF implementation combined with Ontoserver outperformed all the other methods that were evaluated. Even though these results show that machine learning methods perform better than simple dictionary-based methods, they also highlight the complexities in mapping the spans of text to concepts in an underlying ontology or controlled vocabulary.

Regarding future work, existing concept normalisation implementations in social media do not make use of the context of the spans. We believe more advanced methods may bene t from having access not only to the text in the span but also to the surrounding tokens and previously identi ed concepts.

Acknowledgements

AskaPatient kindly provided the data used in this study for research purposes only. Ethics approval for this project was obtained from the CSIRO ethics committee, which classi ed the work as low risk (CSIRO Ecosciences #07613).

Sarvnaz

Karimi , Chen Wang, Alejandro Metke-Jimenez, Raj Gaire , and Cecile Paris. Text and data mining techniques in adverse drug reaction detection . ACM Computing Surveys , 47 ( 4 ): 56 , 2015 .

Abeed

Sarker , Rachel Ginn, Azadeh Nikfarjam, Karen

OConnor

, Karen Smith,

Swetha

Jayaraman , Tejaswi Upadhaya, and

Graciela

Gonzalez . Utilizing social media data for pharmacovigilance: A review . Journal of Biomedical Informatics , 54 : 202 { 212 , 2015 .

Azadeh

Nikfarjam , Abeed Sarker, Karen O'Connor , Rachel

Ginn , and Graciela

Gonzalez . Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features . Journal of the American Medical Informatics Association , 2015 .

Sarvnaz

Karimi , Alejandro Metke-Jimenez, Madonna Kemp, and Chen Wang. CADEC: A corpus of adverse drug event annotations . Journal of Biomedical Informatics , 55 : 73 { 81 , 2015 .

Jenny

Rose Finkel , Trond Grenager, and

Christopher

Manning . Incorporating nonlocal information into information extraction systems by Gibbs sampling . In The 43rd Annual Meeting On Association for Computational Linguistics , pages 363 { 370 , Ann

Arbor

, Michigan, 2005 .

Merlijn

Sevenster , Rob van Ommering,

and Yuechen

Qian . Algorithmic and user study of an autocompletion algorithm on a large medical vocabulary . Journal of Biomedical Informatics , 45 ( 1 ): 107 { 119 , 2012 .