A proposal for determining the evidence types of biomedical documents using a drug-drug interaction ontology and machine learning Linh Hoang1, Richard D. Boyce2, Mathias Brochhausen3, Joseph Utecht3, Jodi Schneider1 1. University of Illinois at Urbana-Champaign, 2. University of Pittsburgh, 3. University of Arkansas for the Medical Sciences Introduction Methods While drug-drug interactions (DDI) are biological process- Reuse the DIDEO ontology’s evidence hierarchy es that result in a clinically meaningful change to the re- DIDEO (DIDEO Ontology 2018) is a foundational domain sponse of at least one co-administered drug, potential DDIs representation that allows tracing the evidence underlying are information entities about the potential of DDIs based potential DDI knowledge (Brochhausen et al. 2014). The on data or data extrapolation (DIDEO Ontology 2014). ontology contains more than 40 evidence types of DDI Knowledge of potential DDIs is important for clinicians in studies (Utecht et al. 2017); an excerpt is shown in Figure making safe medical treatment decisions. However, it is 1. These were created based on evidence items relevant to challenging for clinicians to keep abreast of new DDI research (Boyce et al. 2009). DIDEO specifies the knowledge about DDIs because a large amount of new necessary and sufficient conditions for each evidence type research about DDIs is published every year in a variety of using terms either defined in DIDEO or imported from formats, including journal articles and drug labels (Schnei- other ontologies. der et al. 2015). Build a hierarchical multiclass classifier Automatic extraction of DDI information from narrative The implementation of the hierarchal multiclass classifier text, tables, and figures of biomedical documents mainly consists of two basic steps described further below: focuses on extracting DDI “fact” claims and still has lim- (1) Prepare data; (2) Develop and evaluate the classifier. ited accuracy (Demner-Fushman et al. 2018; Miloševićet Step 1: Prepare data al. 2016; Segura-Bedmar et al. 2013). Machines should The data preparation includes three main steps: collect, extract and structure knowledge with the goal of making it annotate and preprocess data. We started by using an exist- easier for humans to synthesize and evaluate evidence that ing dataset which contains 189 unique papers of DDIs supports DDI claims. which were partially annotated by an expert (RB) with the We propose to combine machine learning with a formal evidence type labels assigned during a previous study representation of the DDI domain of discourse to assist (Schneider et al. 2015). Not all of the papers in the dataset had evidence type labels. Therefore, we created an annota- humans in both authoring and assessing evidence of DDIs. tion guideline and had the expert further annotate these To date, there has been little focus on using automatic ex- papers, resulting in a manual gold standard of evidence traction to lessen the cognitive burden, and the current type labels. The developer of the system (LH) also ob- practice for determining evidence type in a DDI study is served the expert’s annotation process in order to identify for experts to read the study manually. We are inspired by relevant text that could be used for training classifiers. We prior work on computer-supported prospective knowledge automatically collected the studies’ metadata, including capture by a community of scientists (Clark, Ciccarese, and title, abstract through PubMed API. We also manually col- Goble 2014). More specifically, we use an ontology as the lected full-text PDFs of these papers and automatically backbone underlying a machine learning system that helps converted them to plain text. users identify the evidence type of a DDI study based on its characteristics. Copyright held by the authors. In A. Martin, K. Hinkelmann, A. Gerber, D. Lenat, F. van Harmelen, P. Clark (Eds.), Proceedings of the AAAI 2019 Spring Symposium on Combining Machine Learning with Knowledge Engineering (AAAI-MAKE 2019). Stanford University, Palo Figure 1 – Part of DIDEO’s evidence type hierarchy Alto, California, USA, March 25-27, 2019. Step 2: Develop and evaluate the classifier tions in the ontology, and suggest additional (potentially Features that we extract and use to develop classifiers are finer-grained) evidence types. bigrams taken from the titles, from abstracts and from the Methods sections as well as drug entities from the titles and abstracts as detected by MetaMap (Aronson 2001). Acknowledgements This draws on our observation during the annotation pro- Support from National Institutes of Health R01LM011838, cess, that the Methods section is where the expert often T15LM007059, R01LM010817. Thanks to Nigel Bosch found information to determine DDI evidence type. for discussions of machine learning approaches. All papers in the dataset are used to train and test the top- level sub-classifier. Subset of the dataset from the top-level classifier are used to train and test the next-level sub- References classifiers. This process is repeated until all the papers are Aronson, A.R. 2001. Effective mapping of biomedical text to the given their final evidence type predictions. All sub- UMLS Metathesaurus: the MetaMap program. In Proceedings of classifiers are trained and tested using cross validation (5 the Annual Symposium of the American Medical Informatics As- folds). The sub-classifiers are then evaluated using differ- sociation, 17-21. Bethesda, MD: AMIA. ent evaluation metrics, including: accuracy, precision, re- Boyce, R.D., Collins, C., Horn, J., Kalet, I. 2009. Computing with call and F1-score. evidence Part I: A drug-mechanism evidence taxonomy oriented toward confidence assignment. Journal of Biomedical Infor- mation 42(6): 979-89. Brochhausen, M., Schneider, J., Malone, D., Empey, E. P., Ho- gan, W. R., and Boyce, R.D. 2014. Towards a foundational repre- sentation of potential drug-drug interaction knowledge. In Pro- ceedings of First International Workshop on Drug Interaction Knowledge Representation, 16-31. Aachen, Germany: CEUR- WS. Clark, T., Ciccarese, P. N., and Goble, C. A. 2014. Micropublica- tions: a Semantic Model for Claims, Evidence, Arguments and Annotations in Biomedical Communications. Journal of Biomed- ical Semantics 5(1): 28-61. Demner-Fushman, D., Tonning, J. M., Fung, K. W., Do, P., Boyce, R. D., and Roberts, K. 2018. Adverse Reactions and Drug-Drug Interaction Extraction tracks at the Text Analysis Figure 2 – Implementation of the hierarchical classifier (corre- Conference (TAC). In Proceedings of the Annual Symposium of sponding to the branch of evidence types in Figure 1) the American Medical Informatics Association, 1673-1674. Be- thesda, MD: AMIA. DIDEO - The Potential Drug-drug Interaction and Potential Drug- Conclusions and future work drug Interaction Evidence Ontology. We propose to combine machine learning and knowledge http://www.ontobee.org/ontology/DIDEO representation to facilitate the process of assessing evi- Milošević, N., Gupta, A., Chen, A., DeMarco, S.T., Le, J., Schneider, J., Ning, Y., Nenadić, G., and Boyce, R. D. 2017. dence from studies of DDIs. Drawing on an existing ontol- Extraction of Drug-Drug Interactions from Drug Product Label- ogy of evidence types, DIDEO, we are building a hierar- ing Tables. In Proceedings of the 2017 Summit on Clinical Re- chical multiclass classifier that categorizes a DDI study’s search Informatics, 436. Bethesda, MD: AMIA. evidence type. The primary purpose of the new classifier is Segura-Bedmar, I., Martnez, P., Zazo, M.H. 2013. Semeval-2013 to make it much easier for a DDI domain expert to assess task 9: Extraction of drug-drug interactions from biomedical the total body of evidence for a potential DDI. The key texts. In Proceedings of the Seventh International Workshop on insight is to build the evidence type classifier from an en- Semantic Evaluation, 341-350. Stroudsburg, PA: ACL. semble of classifiers that assess the lower level characteris- Schneider, J., Brochhausen, M., Rosko, S., Ciccarese, P., Hogan, W. R., Malone, D., Ning, Y., Clark, T., and Boyce, R. D. 2015. tics of a study based on the necessary and sufficient axioms Formalizing Knowledge and Evidence about Potential Drug-drug from the ontology. Interactions. In Proceedings of International Workshop on Bio- medical Data Mining, Modeling, and Semantic Integration. Aa- This is an ongoing project where we plan to expand to ad- chen, Germany: CEUR-WS. ditional DIDEO evidence types. In the future, studies about Utecht, J., Brochhausen, M., Judkins, J., Schneider, J., Boyce, R. DDIs could be run through this classification system and D. Formalizing Evidence Type Definitions for Drug-drug Interac- the prediction result will ultimately be useful to assist evi- tion Studies to Improve Evidence Base Curation, In MEDINFO 2017: Precision Healthcare through Informatics, 960-964. Am- dence reviewers as they assess evidence items. More im- sterdam: IOS Press. mediate goals will be to validate the evidence type defini-