Introduction

A proposal for determining the evidence types of biomedical documents using a drug-drug interaction ontology and machine learning

Linh Hoang

0 1

Richard D. Boyce

Mathias Brochhausen

Joseph Utecht

Jodi Schneider

0 1 0 . University of Illinois at Urbana-Champaign, 2. University of Pittsburgh, 3. University of Arkansas for the Medical Sciences , USA 1 D. Lenat , F. van Harmelen, P. Clark (Eds.) , Proceedings of the AAAI 2019 Spring Symposium on Combining Machine Learning with Knowledge Engineering (AAAI-MAKE 2019). Stanford University , Palo Alto, California , USA

2017

While drug-drug interactions (DDI) are biological processes that result in a clinically meaningful change to the response of at least one co-administered drug, potential DDIs are information entities about the potential of DDIs based on data or data extrapolation (DIDEO Ontology 2014). Knowledge of potential DDIs is important for clinicians in making safe medical treatment decisions. However, it is challenging for clinicians to keep abreast of new knowledge about DDIs because a large amount of new research about DDIs is published every year in a variety of formats, including journal articles and drug labels (Schneider et al. 2015). Automatic extraction of DDI information from narrative text, tables, and figures of biomedical documents mainly focuses on extracting DDI “fact” claims and still has limited accuracy (Demner-Fushman et al. 2018; Miloševićet al. 2016; Segura-Bedmar et al. 2013). Machines should extract and structure knowledge with the goal of making it easier for humans to synthesize and evaluate evidence that supports DDI claims.

Introduction

We propose to combine machine learning with a formal representation of the DDI domain of discourse to assist humans in both authoring and assessing evidence of DDIs. To date, there has been little focus on using automatic extraction to lessen the cognitive burden, and the current practice for determining evidence type in a DDI study is for experts to read the study manually. We are inspired by prior work on computer-supported prospective knowledge capture by a community of scientists (Clark, Ciccarese, and Goble 2014) . More specifically, we use an ontology as the backbone underlying a machine learning system that helps users identify the evidence type of a DDI study based on its characteristics.

Methods Reuse the DIDEO ontology’s evidence hierarchy

DIDEO (DIDEO Ontology 2018) is a foundational domain representation that allows tracing the evidence underlying potential DDI knowledge (Brochhausen et al. 2014) . The ontology contains more than 40 evidence types of DDI studies (Utecht et al. 2017); an excerpt is shown in Figure 1. These were created based on evidence items relevant to DDI research (Boyce et al. 2009) . DIDEO specifies the necessary and sufficient conditions for each evidence type using terms either defined in DIDEO or imported from other ontologies.

Build a hierarchical multiclass classifier

The implementation of the hierarchal multiclass classifier consists of two basic steps described further below: (1) Prepare data; (2) Develop and evaluate the classifier.

Step 1: Prepare data

The data preparation includes three main steps: collect, annotate and preprocess data. We started by using an existing dataset which contains 189 unique papers of DDIs which were partially annotated by an expert (RB) with the evidence type labels assigned during a previous study (Schneider et al. 2015). Not all of the papers in the dataset had evidence type labels. Therefore, we created an annotation guideline and had the expert further annotate these papers, resulting in a manual gold standard of evidence type labels. The developer of the system (LH) also observed the expert’s annotation process in order to identify relevant text that could be used for training classifiers. We automatically collected the studies’ metadata, including title, abstract through PubMed API. We also manually collected full-text PDFs of these papers and automatically converted them to plain text.

Figure 1 – Part of DIDEO’s evidence type hierarchy

Step 2: Develop and evaluate the classifier

Features that we extract and use to develop classifiers are bigrams taken from the titles, from abstracts and from the Methods sections as well as drug entities from the titles and abstracts as detected by MetaMap (Aronson 2001) . This draws on our observation during the annotation process, that the Methods section is where the expert often found information to determine DDI evidence type. All papers in the dataset are used to train and test the toplevel sub-classifier. Subset of the dataset from the top-level classifier are used to train and test the next-level subclassifiers. This process is repeated until all the papers are given their final evidence type predictions. All subclassifiers are trained and tested using cross validation (5 folds). The sub-classifiers are then evaluated using different evaluation metrics, including: accuracy, precision, recall and F1-score.

Conclusions and future work

We propose to combine machine learning and knowledge representation to facilitate the process of assessing evidence from studies of DDIs. Drawing on an existing ontology of evidence types, DIDEO, we are building a hierarchical multiclass classifier that categorizes a DDI study’s evidence type. The primary purpose of the new classifier is to make it much easier for a DDI domain expert to assess the total body of evidence for a potential DDI. The key insight is to build the evidence type classifier from an ensemble of classifiers that assess the lower level characteristics of a study based on the necessary and sufficient axioms from the ontology.

This is an ongoing project where we plan to expand to additional DIDEO evidence types. In the future, studies about DDIs could be run through this classification system and the prediction result will ultimately be useful to assist evidence reviewers as they assess evidence items. More immediate goals will be to validate the evidence type definitions in the ontology, and suggest additional (potentially finer-grained) evidence types.

Acknowledgements

Support from National Institutes of Health R01LM011838, T15LM007059, R01LM010817. Thanks to Nigel Bosch for discussions of machine learning approaches.

Aronson , A.R.

2001 . Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program . In Proceedings of the Annual Symposium of the American Medical Informatics Association , 17 - 21 . Bethesda, MD: AMIA.

Boyce , R.D., Collins, C. , Horn , J. , Kalet , I. 2009 . Computing with evidence Part I: A drug-mechanism evidence taxonomy oriented toward confidence assignment . Journal of Biomedical Information 42 ( 6 ): 979 - 89 .

Brochhausen , M. , Schneider , J. , Malone , D. , Empey , E. P. , Hogan , W. R. , and Boyce , R.D. 2014 . Towards a foundational representation of potential drug-drug interaction knowledge . In Proceedings of First International Workshop on Drug Interaction Knowledge Representation , 16 - 31 . Aachen, Germany: CEURWS.

Clark , T. , Ciccarese , P. N. , and Goble , C. A. 2014 . Micropublications: a Semantic Model for Claims, Evidence, Arguments and Annotations in Biomedical Communications . Journal of Biomedical Semantics 5 ( 1 ): 28 - 61 .

Demner-Fushman , D. , Tonning , J. M. , Fung , K. W. , Do , P. , Boyce , R. D., and Roberts , K. 2018 . Adverse Reactions and Drug-Drug Interaction Extraction tracks at the Text Analysis Conference (TAC) . In Proceedings of the Annual Symposium of the American Medical Informatics Association , 1673 - 1674 . Bethesda, MD: AMIA.