<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Causality driven data integration for adverse drug reaction discovery</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dr Chen Wang</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chen Wang</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sarvnaz Karmi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>CSIRO Computational Informatics</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>44</fpage>
      <lpage>45</lpage>
      <abstract>
        <p>SUMMARY We describe an ongoing effort in CSIRO for partially automating causality discovery in the Adverse Drug Reaction (ADR) detection process. The proposed method integrates data from multiple sources based on rules that indicate causality.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Dr Chen Wang is a Senior Research Scientist at CSIRO
Computational Informatics. He received his PhD from
Nanjing University. His research interests are primarily in
distributed, parallel and trustworthy systems. His current
work focus on data analytics systems for drug adverse
reaction discovery. His recent work include accountable
distributed systems and cloud computing. He is also
an Honorary Associate of the School of Information
Technologies at the University of Sydney. Dr Chen Wang
has industrial experience. He developed a high-throughput
event delivery system and a medical image archive system,
which are used by many hospitals and medical centres
in USA.
Both types of ADR discoveries ultimately lead to establishing the causal relationship between a drug and unexpected
adverse reactions. Often ADR discovery starts with data-mining techniques for disproportionality detection of the
reports about a drug and an adverse reaction in comparison to other pairs of drugs and adverse reactions. These
potential ADRs are then examined in medication safety review and assessment meetings. The main task of these
meetings is to establish the causality between a drug and its adverse reactions. This is largely a manual process
in the current practice and often generates wide variability in assessment1,2,3. Even though the shortcomings of the
current process were recognised in 70s1, there is not much improvement in the practice of establishing causality
between a drug and an ADR so far. As ADR related data become increasingly accessible in electronic format and with
the increase in processing power and techniques of dealing with big data, it is now possible to introduce carefully
designed algorithms to assist the causality reasoning process and therefore automate some of the manual steps in
this assessment to reduce variability. We note that the current process, endorsed by WHO, is still largely based on
Naranjo’s questionnaire
        <xref ref-type="bibr" rid="ref1">1 designed in 1981</xref>
        . To achieve this, there are two major requirements: first, it is essential to
understand and capture the reasoning process in the existing practice. A good reasoning process tends to minimise
the variability and inconsistency in assessment as shown in1. Second, integrating data from various sources is
essential for reaching correct conclusions in the reasoning process, e.g. additional data about background of the
patients in ADR reports may help to identify causes of an ADR. This is of course only possible with collaboration of
multiple health agencies to make such data accessible. Below, we propose a causality detection method to address
these requirements.
      </p>
    </sec>
    <sec id="sec-2">
      <title>DESCRIPTION</title>
      <p>Previous work trying to establish causality between a drug and its unexpected adverse reactions used a well designed
questionnaire to guide the assessment process1. The answers of these questions were assigned different scores
and the total score of each rater determines the certainty of the rater on whether a drug D causes a reaction R. A
consensus among raters served as an indicator of the causality of D and R.</p>
      <p>Our proposed method contains two steps: (1) Design rules to capture the causality reasoning process using
domainexpertise and the current known knowledge of ADRs per each drug or active ingredient; and (2) Process different
data sources based on these rules to establish if a given drug D causes a specified adverse reaction R.</p>
      <p>A starting point for rule identification is using the existing questionnaires, and also formalising the reasoning process
within the review and assessment teams inside the regulatory. For instance, consider a specific drug D and its
possible adverse reaction R and a given dataset S (e.g, electronic health records and clinical notes). The following
rules could be considered for causality discovery:</p>
      <p>These rules capture common reasoning used in identifying whether D causes R. The set of rules are extensible. With these rules defined, the next step is to process data
based on these rules to discover causality between a drug and a given adverse reaction. In order to achieve this, we first build a data model that contains necessary data fields
required by these rules. For example, to support the rules above, we need information about actions taken on a drug by a consumer, or instructions of a medical professional
to the patient, such as the discontinuing its use, changing its dose etc. as well as additional information about other factors that may cause the adverse reaction. After the rule
list is completed, a table is constructed to capture the data model. See Table 1 for an example. The headers of Column 2 to Column 6 show a sample data model. Afterwards,
we process each data source using information extraction techniques and assisted by medical ontologies and drug knowledge repositories to populate the table. The last
column “D causes R” in Table 1 represents the decisions and is partially populated via existing knowledge.</p>
    </sec>
    <sec id="sec-3">
      <title>D DISCONTINUED</title>
    </sec>
    <sec id="sec-4">
      <title>D READMINISTRATERED</title>
    </sec>
    <sec id="sec-5">
      <title>DOSE CHANGE</title>
    </sec>
    <sec id="sec-6">
      <title>OTHER FACTORS</title>
      <p>1
2
3
4
5
Based on the table, we use a decision tree to classify the data. Human annotated data are used as the training set. A sample decision tree classifier is shown in Figure 1. Note
that this tree only partially covers Table 1. The final decisions on causality (Yes, No, or Unknown) will be based on a threshold on the probabilities generated by decision. The
decision tree evolves as the number of confirmed causality pairs increases. As the data model is independent of underlying data sources, our method is capable of dealing
with multiple data sources as long as they contain at least some of information needed by the data model.</p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSION</title>
      <p>Causality discovery is essential to detect potential adverse drug reactions. However, the implementation challenges are extracting high quality causality information from a
variety of data and dealing with different level of credibility of information from different data sources.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>C.</given-names>
            <surname>Naranjo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Busto</surname>
          </string-name>
          , E. Sel ers, P. Sandor,
          <string-name>
            <given-names>I.</given-names>
            <surname>Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Janecek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Domecq</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Greenblatt</surname>
          </string-name>
          .
          <article-title>A method for estimating the probability of adverse drug reactions</article-title>
          .
          <source>Clinical Pharmacology and Therapeutics</source>
          ,
          <volume>30</volume>
          :
          <fpage>239</fpage>
          -
          <lpage>245</lpage>
          ,
          <year>1981</year>
          . 2.
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Naidu</surname>
          </string-name>
          .
          <article-title>Causality assessment: A brief insight into practices in pharmaceutical industry</article-title>
          .
          <source>Perspect Clin Res</source>
          <year>2013</year>
          ;
          <volume>4</volume>
          :
          <fpage>233</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          3.
          <string-name>
            <given-names>N.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Borlak</surname>
          </string-name>
          .
          <article-title>Correlation versus causation? Pharmacovigilance of the analgesic flupirtine exemplifies the need for refined spontaneous ADR reporting</article-title>
          .
          <source>PLoS One</source>
          .
          <year>2011</year>
          ;
          <volume>6</volume>
          (
          <issue>10</issue>
          ):
          <fpage>e25221</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>