A proposal for determining the evidence types of biomedical documents
      using a drug-drug interaction ontology and machine learning
      Linh Hoang1, Richard D. Boyce2, Mathias Brochhausen3, Joseph Utecht3, Jodi Schneider1
       1. University of Illinois at Urbana-Champaign, 2. University of Pittsburgh, 3. University of Arkansas for the Medical Sciences


                         Introduction                                                               Methods
While drug-drug interactions (DDI) are biological process-               Reuse the DIDEO ontology’s evidence hierarchy
es that result in a clinically meaningful change to the re-              DIDEO (DIDEO Ontology 2018) is a foundational domain
sponse of at least one co-administered drug, potential DDIs              representation that allows tracing the evidence underlying
are information entities about the potential of DDIs based               potential DDI knowledge (Brochhausen et al. 2014). The
on data or data extrapolation (DIDEO Ontology 2014).                     ontology contains more than 40 evidence types of DDI
Knowledge of potential DDIs is important for clinicians in               studies (Utecht et al. 2017); an excerpt is shown in Figure
making safe medical treatment decisions. However, it is                  1. These were created based on evidence items relevant to
challenging for clinicians to keep abreast of new                        DDI research (Boyce et al. 2009). DIDEO specifies the
knowledge about DDIs because a large amount of new                       necessary and sufficient conditions for each evidence type
research about DDIs is published every year in a variety of              using terms either defined in DIDEO or imported from
formats, including journal articles and drug labels (Schnei-             other ontologies.
der et al. 2015).
                                                                         Build a hierarchical multiclass classifier
Automatic extraction of DDI information from narrative                   The implementation of the hierarchal multiclass classifier
text, tables, and figures of biomedical documents mainly                 consists of two basic steps described further below:
focuses on extracting DDI “fact” claims and still has lim-               (1) Prepare data; (2) Develop and evaluate the classifier.
ited accuracy (Demner-Fushman et al. 2018; Miloševićet                   Step 1: Prepare data
al. 2016; Segura-Bedmar et al. 2013). Machines should                    The data preparation includes three main steps: collect,
extract and structure knowledge with the goal of making it               annotate and preprocess data. We started by using an exist-
easier for humans to synthesize and evaluate evidence that               ing dataset which contains 189 unique papers of DDIs
supports DDI claims.                                                     which were partially annotated by an expert (RB) with the
We propose to combine machine learning with a formal                     evidence type labels assigned during a previous study
representation of the DDI domain of discourse to assist                  (Schneider et al. 2015). Not all of the papers in the dataset
                                                                         had evidence type labels. Therefore, we created an annota-
humans in both authoring and assessing evidence of DDIs.
                                                                         tion guideline and had the expert further annotate these
To date, there has been little focus on using automatic ex-
                                                                         papers, resulting in a manual gold standard of evidence
traction to lessen the cognitive burden, and the current
                                                                         type labels. The developer of the system (LH) also ob-
practice for determining evidence type in a DDI study is                 served the expert’s annotation process in order to identify
for experts to read the study manually. We are inspired by               relevant text that could be used for training classifiers. We
prior work on computer-supported prospective knowledge                   automatically collected the studies’ metadata, including
capture by a community of scientists (Clark, Ciccarese, and              title, abstract through PubMed API. We also manually col-
Goble 2014). More specifically, we use an ontology as the                lected full-text PDFs of these papers and automatically
backbone underlying a machine learning system that helps                 converted them to plain text.
users identify the evidence type of a DDI study based on
its characteristics.


Copyright held by the authors. In A. Martin, K. Hinkelmann, A. Gerber,
D. Lenat, F. van Harmelen, P. Clark (Eds.), Proceedings of the AAAI
2019 Spring Symposium on Combining Machine Learning with
Knowledge Engineering (AAAI-MAKE 2019). Stanford University, Palo               Figure 1 – Part of DIDEO’s evidence type hierarchy
Alto, California, USA, March 25-27, 2019.
Step 2: Develop and evaluate the classifier                         tions in the ontology, and suggest additional (potentially
Features that we extract and use to develop classifiers are         finer-grained) evidence types.
bigrams taken from the titles, from abstracts and from the
Methods sections as well as drug entities from the titles
and abstracts as detected by MetaMap (Aronson 2001).                                   Acknowledgements
This draws on our observation during the annotation pro-
                                                                    Support from National Institutes of Health R01LM011838,
cess, that the Methods section is where the expert often
                                                                    T15LM007059, R01LM010817. Thanks to Nigel Bosch
found information to determine DDI evidence type.
                                                                    for discussions of machine learning approaches.
All papers in the dataset are used to train and test the top-
level sub-classifier. Subset of the dataset from the top-level
classifier are used to train and test the next-level sub-                                    References
classifiers. This process is repeated until all the papers are      Aronson, A.R. 2001. Effective mapping of biomedical text to the
given their final evidence type predictions. All sub-               UMLS Metathesaurus: the MetaMap program. In Proceedings of
classifiers are trained and tested using cross validation (5        the Annual Symposium of the American Medical Informatics As-
folds). The sub-classifiers are then evaluated using differ-        sociation, 17-21. Bethesda, MD: AMIA.
ent evaluation metrics, including: accuracy, precision, re-         Boyce, R.D., Collins, C., Horn, J., Kalet, I. 2009. Computing with
call and F1-score.                                                  evidence Part I: A drug-mechanism evidence taxonomy oriented
                                                                    toward confidence assignment. Journal of Biomedical Infor-
                                                                    mation 42(6): 979-89.
                                                                    Brochhausen, M., Schneider, J., Malone, D., Empey, E. P., Ho-
                                                                    gan, W. R., and Boyce, R.D. 2014. Towards a foundational repre-
                                                                    sentation of potential drug-drug interaction knowledge. In Pro-
                                                                    ceedings of First International Workshop on Drug Interaction
                                                                    Knowledge Representation, 16-31. Aachen, Germany: CEUR-
                                                                    WS.
                                                                    Clark, T., Ciccarese, P. N., and Goble, C. A. 2014. Micropublica-
                                                                    tions: a Semantic Model for Claims, Evidence, Arguments and
                                                                    Annotations in Biomedical Communications. Journal of Biomed-
                                                                    ical Semantics 5(1): 28-61.
                                                                    Demner-Fushman, D., Tonning, J. M., Fung, K. W., Do, P.,
                                                                    Boyce, R. D., and Roberts, K. 2018. Adverse Reactions and
                                                                    Drug-Drug Interaction Extraction tracks at the Text Analysis
 Figure 2 – Implementation of the hierarchical classifier (corre-   Conference (TAC). In Proceedings of the Annual Symposium of
     sponding to the branch of evidence types in Figure 1)          the American Medical Informatics Association, 1673-1674. Be-
                                                                    thesda, MD: AMIA.
                                                                    DIDEO - The Potential Drug-drug Interaction and Potential Drug-
            Conclusions and future work                             drug            Interaction          Evidence             Ontology.
We propose to combine machine learning and knowledge                http://www.ontobee.org/ontology/DIDEO
representation to facilitate the process of assessing evi-          Milošević, N., Gupta, A., Chen, A., DeMarco, S.T., Le, J.,
                                                                    Schneider, J., Ning, Y., Nenadić, G., and Boyce, R. D. 2017.
dence from studies of DDIs. Drawing on an existing ontol-
                                                                    Extraction of Drug-Drug Interactions from Drug Product Label-
ogy of evidence types, DIDEO, we are building a hierar-             ing Tables. In Proceedings of the 2017 Summit on Clinical Re-
chical multiclass classifier that categorizes a DDI study’s         search Informatics, 436. Bethesda, MD: AMIA.
evidence type. The primary purpose of the new classifier is         Segura-Bedmar, I., Martnez, P., Zazo, M.H. 2013. Semeval-2013
to make it much easier for a DDI domain expert to assess            task 9: Extraction of drug-drug interactions from biomedical
the total body of evidence for a potential DDI. The key             texts. In Proceedings of the Seventh International Workshop on
insight is to build the evidence type classifier from an en-        Semantic Evaluation, 341-350. Stroudsburg, PA: ACL.
semble of classifiers that assess the lower level characteris-      Schneider, J., Brochhausen, M., Rosko, S., Ciccarese, P., Hogan,
                                                                    W. R., Malone, D., Ning, Y., Clark, T., and Boyce, R. D. 2015.
tics of a study based on the necessary and sufficient axioms
                                                                    Formalizing Knowledge and Evidence about Potential Drug-drug
from the ontology.                                                  Interactions. In Proceedings of International Workshop on Bio-
                                                                    medical Data Mining, Modeling, and Semantic Integration. Aa-
This is an ongoing project where we plan to expand to ad-           chen, Germany: CEUR-WS.
ditional DIDEO evidence types. In the future, studies about         Utecht, J., Brochhausen, M., Judkins, J., Schneider, J., Boyce, R.
DDIs could be run through this classification system and            D. Formalizing Evidence Type Definitions for Drug-drug Interac-
the prediction result will ultimately be useful to assist evi-      tion Studies to Improve Evidence Base Curation, In MEDINFO
                                                                    2017: Precision Healthcare through Informatics, 960-964. Am-
dence reviewers as they assess evidence items. More im-             sterdam: IOS Press.
mediate goals will be to validate the evidence type defini-