<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Murff HJ, FitzHenry F, Matheny ME, Gentry N, Kotter KL, Crimin K, et al. Automated
identification of postoperative complications within an electronic medical record using natural language
processing. JAMA.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Biomedical Informatics Investigator</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peter L. ELKIN</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sarah MULLIN</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sylvester SAKILAY</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, SUNY</institution>
          ,
          <addr-line>New York</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1998</year>
      </pub-date>
      <volume>306</volume>
      <issue>8</issue>
      <fpage>2</fpage>
      <lpage>3</lpage>
      <abstract>
        <p>The BMI Investigator is a computer human interface built in .Net which allows simultaneous query of structured data such as demographics, administrative codes, medications (coded in RxNorm), laboratory test results (coded in LOINC) and formerly unstructured data in clinical notes (coded in SNOMED CT). The ontology terms identified using SNOMED are all coded as either positive, negative or uncertain assertions. They are then where applicable built into compositional expressions and stored in both a graph database and a triple store. The SNOMED CT codes are stored in a NOSQL database, Berkley DB, and the structured data is stored in SQL using the OMOP / OHDSI format. The BMI investigator also lets you develop models for cohort selection (data driven recruitment to clinical trials) and automated retrospective research using genomic criteria and we are adding image feature data currently to the system. We performed a usability experiment and the users identified some usability flaws which were used to improve the software. Overall, the BMI Investigator was felt to be usable by subject matter experts. Next steps for the software are to integrate genomic criteria and image features into the query engine.</p>
      </abstract>
      <kwd-group>
        <kwd>Clinical Research Informatics</kwd>
        <kwd>Ontology</kwd>
        <kwd>Recruitment to clinical trials</kwd>
        <kwd>Automated retrospective research</kwd>
        <kwd>clinical genomic trial recruitment</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction
Semantic Interoperability is a long held goal of the field of biomedical informatics. (1)
(2) This requires formal representation of the knowledge in the clinical record(3). We
describe our effort to use and validate a semantically interoperable interface and system
to automate retrospective research, to enhance our ability to author clinical predication
rules and our ability to perform data driven recruitment to clinical trials. (4) (5)</p>
      <p>Many authors have written about semantic interoperability and ISO TC 215 TS
17117 describes the value and composition of nomenclatures and terminologies that
enable semantic interoperability. (6, 7) The Springer series book Terminology and
Terminological Systems guides one through the principles of semantic interoperability
and the nomenclatures and tools available to help one achieve that goal. (8, 9)</p>
      <p>Today we have standards such as SNOMED CT which represents general medicine
in a description logic based terminology. (10) (11) (12) (13) RxNorm or the ATC
remain the standard for drug terminologies in both the US and Europe respectively. Elkin
and Brown published a drug semantics from the US physicians’ desk reference (online
as the Daily Med), which provides in codified form the indications, contraindications,
and adverse reactions for all drugs which can be used for clinical decision support. (14)</p>
      <p>LOINC is an open source terminology which began as a code set for laboratory test
results. By utilizing these standards on our primary data we have developed an
application which can query across Clinical, Genomic and Image data and enable fully
automated retrospective research. (15)
2. Methods
The data for the BMI Investigator is stored in OMOP / OHDSI format with a Berkley
DB NOSQL database. The medications are all coded with RxNorm and the labs are
coded with LOINC. The Berkley DB database holds SNOMED CT codes that are parsed
out of the patients’ clinical notes. The data is stored by patient, document, section,
subsection, problem, paragraph, sentence, compositional expression, then named entity
and polarity. We code the polarity of each entity as a positive, negative or uncertain
assertion, explicitly using the HTP-NLP system. (16, 17) These are then formed into
compositional expressions where possible and this data is stored in a triple store.</p>
      <p>The BMI Investigator application was written in .Net and was created using the
usercentered design development method. (18, 19) We tested the system on a population of
212,343 patients in our outpatient practices at the UBMD practice plans. The data for
this trial was from 2010 to 2015. The data used in the system was judged by the IRB to
be IRB Exempt #587570.</p>
      <p>Inclusion Criteria: All patients 18 years old or older</p>
      <p>In our development process, we had Clinical Informatics Fellows and Biomedical
Informatics Masters’ and PhD students use the system. We observed them using the
system and asked them to describe their experience using the think aloud method. We
paid particular attention to the understandability of the screen ques and the results.</p>
      <p>The system allows users to use Boolean logic and parentheses to construct their
queries. It also allows subqueries so that one can define a population and then ask
questions of the population. The users do not ever see a code and do not have to know
anything about the information model or the ontologies in use to use the system. When
the input string or parts of the input string have no map to our ontologies they are
searched as a keyword search. The system allows one to save intermediate queries, reuse
them, add to them and import them for reuse. Once created these models can be run in
a batch mode.</p>
      <p>Genomic data is presented as gene abnormalities that are used in clinical medicine
and polymorphisms that have been identified are stored in a separate set of tables and
they are also used to match to our patients who are included in the precision oncology
project. (20) We add image features which are stored matrices and vectors extracted
from images using image data analysis tools developed at UB. These act as separate
Boolean connected search criteria. Datasets can then be exported in a csv format for
further analysis and reporting.</p>
      <p>We report the results of the Usability study were 8 participants used the system under
supervision going through the same scenarios. (21) (22, 23) Each participant
was asked to identify the relative risk of Obstructive Sleep Apnea comparing patients
who have Rosacea and those that do not have Rosacea. This is a complex task that
requires four queries to accomplish. Each student asked to set up the problem as two
ratios that could then be compared using a Pearson Chi-Square test. The students were
asked how easy was the software to use? How easy was the software to learn? Could
you design a more intuitive interface?
3. Results:</p>
      <p>The system has a simple interface. Where researchers enter what they want to query
and the results are returned almost always in less than a minute. Users enter into simple
search line what they are interested in looking for in their query. They specify which
ontology if they want to use. They specify if they are looking for positive, negative,
uncertain or not mentioned cases. They specify whether they want the ontology terms
exploded (the reflexive transitive closure on subsumption) or not. They specify if they
want to limit the search to certain sections of the clinical note or not. The user specifies
if there is a value that they are looking for or range of values and units or not. Then they
specify any time constraints on the query (perhaps you want to recruit patients over one
time period who meet the inclusion / exclusion criteria and then follow some outcome
sometime in the future.</p>
      <p>Results come back quickly and in this case we are looking at patients who have
anxiety in the practice and we can see that there are 32,798 patients reporting anxiety in
our dataset (see figure xx). We display that about twice as many women report anxiety
as men (See figure 3). You can also see the age distribution of our anxious patients (See</p>
      <p>Acknowledgements: This work has been supported in part by grants from NIH NLM
T15LM012595, and NCATS UL1TR001412. This study was funded in part by the NCI
and the Department of Veterans Affairs through the BD-STEP program, and through a
grant from the VA’s MAVERIC research group.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>