<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A proposal for determining the evidence types of biomedical documents using a drug-drug interaction ontology and machine learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Linh Hoang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Richard D. Boyce</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mathias Brochhausen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joseph Utecht</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jodi Schneider</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>. University of Illinois at Urbana-Champaign, 2. University of Pittsburgh, 3. University of Arkansas for the Medical Sciences</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>D. Lenat</institution>
          ,
          <addr-line>F. van Harmelen, P. Clark (Eds.)</addr-line>
          ,
          <institution>Proceedings of the AAAI 2019 Spring Symposium on Combining Machine Learning with Knowledge Engineering (AAAI-MAKE 2019). Stanford University</institution>
          ,
          <addr-line>Palo Alto, California</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <abstract>
        <p>While drug-drug interactions (DDI) are biological processes that result in a clinically meaningful change to the response of at least one co-administered drug, potential DDIs are information entities about the potential of DDIs based on data or data extrapolation (DIDEO Ontology 2014). Knowledge of potential DDIs is important for clinicians in making safe medical treatment decisions. However, it is challenging for clinicians to keep abreast of new knowledge about DDIs because a large amount of new research about DDIs is published every year in a variety of formats, including journal articles and drug labels (Schneider et al. 2015). Automatic extraction of DDI information from narrative text, tables, and figures of biomedical documents mainly focuses on extracting DDI “fact” claims and still has limited accuracy (Demner-Fushman et al. 2018; Miloševićet al. 2016; Segura-Bedmar et al. 2013). Machines should extract and structure knowledge with the goal of making it easier for humans to synthesize and evaluate evidence that supports DDI claims.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        We propose to combine machine learning with a formal
representation of the DDI domain of discourse to assist
humans in both authoring and assessing evidence of DDIs.
To date, there has been little focus on using automatic
extraction to lessen the cognitive burden, and the current
practice for determining evidence type in a DDI study is
for experts to read the study manually. We are inspired by
prior work on computer-supported prospective knowledge
capture by a community of scientists
        <xref ref-type="bibr" rid="ref4">(Clark, Ciccarese, and
Goble 2014)</xref>
        . More specifically, we use an ontology as the
backbone underlying a machine learning system that helps
users identify the evidence type of a DDI study based on
its characteristics.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <sec id="sec-2-1">
        <title>Reuse the DIDEO ontology’s evidence hierarchy</title>
        <p>
          DIDEO (DIDEO Ontology 2018) is a foundational domain
representation that allows tracing the evidence underlying
potential DDI knowledge
          <xref ref-type="bibr" rid="ref3">(Brochhausen et al. 2014)</xref>
          . The
ontology contains more than 40 evidence types of DDI
studies (Utecht et al. 2017); an excerpt is shown in Figure
1. These were created based on evidence items relevant to
DDI research
          <xref ref-type="bibr" rid="ref2">(Boyce et al. 2009)</xref>
          . DIDEO specifies the
necessary and sufficient conditions for each evidence type
using terms either defined in DIDEO or imported from
other ontologies.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Build a hierarchical multiclass classifier</title>
        <p>The implementation of the hierarchal multiclass classifier
consists of two basic steps described further below:
(1) Prepare data; (2) Develop and evaluate the classifier.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Step 1: Prepare data</title>
        <p>The data preparation includes three main steps: collect,
annotate and preprocess data. We started by using an
existing dataset which contains 189 unique papers of DDIs
which were partially annotated by an expert (RB) with the
evidence type labels assigned during a previous study
(Schneider et al. 2015). Not all of the papers in the dataset
had evidence type labels. Therefore, we created an
annotation guideline and had the expert further annotate these
papers, resulting in a manual gold standard of evidence
type labels. The developer of the system (LH) also
observed the expert’s annotation process in order to identify
relevant text that could be used for training classifiers. We
automatically collected the studies’ metadata, including
title, abstract through PubMed API. We also manually
collected full-text PDFs of these papers and automatically
converted them to plain text.</p>
        <p>Figure 1 – Part of DIDEO’s evidence type hierarchy</p>
      </sec>
      <sec id="sec-2-4">
        <title>Step 2: Develop and evaluate the classifier</title>
        <p>
          Features that we extract and use to develop classifiers are
bigrams taken from the titles, from abstracts and from the
Methods sections as well as drug entities from the titles
and abstracts as detected by MetaMap
          <xref ref-type="bibr" rid="ref1">(Aronson 2001)</xref>
          .
This draws on our observation during the annotation
process, that the Methods section is where the expert often
found information to determine DDI evidence type.
All papers in the dataset are used to train and test the
toplevel sub-classifier. Subset of the dataset from the top-level
classifier are used to train and test the next-level
subclassifiers. This process is repeated until all the papers are
given their final evidence type predictions. All
subclassifiers are trained and tested using cross validation (5
folds). The sub-classifiers are then evaluated using
different evaluation metrics, including: accuracy, precision,
recall and F1-score.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusions and future work</title>
      <p>We propose to combine machine learning and knowledge
representation to facilitate the process of assessing
evidence from studies of DDIs. Drawing on an existing
ontology of evidence types, DIDEO, we are building a
hierarchical multiclass classifier that categorizes a DDI study’s
evidence type. The primary purpose of the new classifier is
to make it much easier for a DDI domain expert to assess
the total body of evidence for a potential DDI. The key
insight is to build the evidence type classifier from an
ensemble of classifiers that assess the lower level
characteristics of a study based on the necessary and sufficient axioms
from the ontology.</p>
      <p>This is an ongoing project where we plan to expand to
additional DIDEO evidence types. In the future, studies about
DDIs could be run through this classification system and
the prediction result will ultimately be useful to assist
evidence reviewers as they assess evidence items. More
immediate goals will be to validate the evidence type
definitions in the ontology, and suggest additional (potentially
finer-grained) evidence types.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>Support from National Institutes of Health R01LM011838,
T15LM007059, R01LM010817. Thanks to Nigel Bosch
for discussions of machine learning approaches.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Aronson</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          <year>2001</year>
          .
          <article-title>Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program</article-title>
          .
          <source>In Proceedings of the Annual Symposium of the American Medical Informatics Association</source>
          ,
          <fpage>17</fpage>
          -
          <lpage>21</lpage>
          . Bethesda, MD: AMIA.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Boyce</surname>
            , R.D., Collins,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horn</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalet</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Computing with evidence Part I: A drug-mechanism evidence taxonomy oriented toward confidence assignment</article-title>
          .
          <source>Journal of Biomedical Information</source>
          <volume>42</volume>
          (
          <issue>6</issue>
          ):
          <fpage>979</fpage>
          -
          <lpage>89</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Brochhausen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malone</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Empey</surname>
            ,
            <given-names>E. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>W. R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Boyce</surname>
            ,
            <given-names>R.D.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Towards a foundational representation of potential drug-drug interaction knowledge</article-title>
          .
          <source>In Proceedings of First International Workshop on Drug Interaction Knowledge Representation</source>
          ,
          <fpage>16</fpage>
          -
          <lpage>31</lpage>
          . Aachen, Germany: CEURWS.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciccarese</surname>
            ,
            <given-names>P. N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Goble</surname>
            ,
            <given-names>C. A.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Micropublications: a Semantic Model for Claims, Evidence, Arguments and Annotations in Biomedical Communications</article-title>
          .
          <source>Journal of Biomedical Semantics</source>
          <volume>5</volume>
          (
          <issue>1</issue>
          ):
          <fpage>28</fpage>
          -
          <lpage>61</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tonning</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fung</surname>
            ,
            <given-names>K. W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Do</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boyce</surname>
          </string-name>
          , R. D., and
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Adverse Reactions and Drug-Drug Interaction Extraction tracks at the Text Analysis Conference (TAC)</article-title>
          .
          <source>In Proceedings of the Annual Symposium of the American Medical Informatics Association</source>
          ,
          <fpage>1673</fpage>
          -
          <lpage>1674</lpage>
          . Bethesda, MD: AMIA.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>