-

Murff HJ, FitzHenry F, Matheny ME, Gentry N, Kotter KL, Crimin K, et al. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA.

Biomedical Informatics Investigator

Peter L. ELKIN

Sarah MULLIN

Sylvester SAKILAY

0 0 Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, SUNY , New York , USA

1998

306 8 2 3

The BMI Investigator is a computer human interface built in .Net which allows simultaneous query of structured data such as demographics, administrative codes, medications (coded in RxNorm), laboratory test results (coded in LOINC) and formerly unstructured data in clinical notes (coded in SNOMED CT). The ontology terms identified using SNOMED are all coded as either positive, negative or uncertain assertions. They are then where applicable built into compositional expressions and stored in both a graph database and a triple store. The SNOMED CT codes are stored in a NOSQL database, Berkley DB, and the structured data is stored in SQL using the OMOP / OHDSI format. The BMI investigator also lets you develop models for cohort selection (data driven recruitment to clinical trials) and automated retrospective research using genomic criteria and we are adding image feature data currently to the system. We performed a usability experiment and the users identified some usability flaws which were used to improve the software. Overall, the BMI Investigator was felt to be usable by subject matter experts. Next steps for the software are to integrate genomic criteria and image features into the query engine.

Clinical Research Informatics Ontology Recruitment to clinical trials Automated retrospective research clinical genomic trial recruitment

1. Introduction Semantic Interoperability is a long held goal of the field of biomedical informatics. (1) (2) This requires formal representation of the knowledge in the clinical record(3). We describe our effort to use and validate a semantically interoperable interface and system to automate retrospective research, to enhance our ability to author clinical predication rules and our ability to perform data driven recruitment to clinical trials. (4) (5)

Many authors have written about semantic interoperability and ISO TC 215 TS 17117 describes the value and composition of nomenclatures and terminologies that enable semantic interoperability. (6, 7) The Springer series book Terminology and Terminological Systems guides one through the principles of semantic interoperability and the nomenclatures and tools available to help one achieve that goal. (8, 9)

Today we have standards such as SNOMED CT which represents general medicine in a description logic based terminology. (10) (11) (12) (13) RxNorm or the ATC remain the standard for drug terminologies in both the US and Europe respectively. Elkin and Brown published a drug semantics from the US physicians’ desk reference (online as the Daily Med), which provides in codified form the indications, contraindications, and adverse reactions for all drugs which can be used for clinical decision support. (14)

LOINC is an open source terminology which began as a code set for laboratory test results. By utilizing these standards on our primary data we have developed an application which can query across Clinical, Genomic and Image data and enable fully automated retrospective research. (15) 2. Methods The data for the BMI Investigator is stored in OMOP / OHDSI format with a Berkley DB NOSQL database. The medications are all coded with RxNorm and the labs are coded with LOINC. The Berkley DB database holds SNOMED CT codes that are parsed out of the patients’ clinical notes. The data is stored by patient, document, section, subsection, problem, paragraph, sentence, compositional expression, then named entity and polarity. We code the polarity of each entity as a positive, negative or uncertain assertion, explicitly using the HTP-NLP system. (16, 17) These are then formed into compositional expressions where possible and this data is stored in a triple store.

The BMI Investigator application was written in .Net and was created using the usercentered design development method. (18, 19) We tested the system on a population of 212,343 patients in our outpatient practices at the UBMD practice plans. The data for this trial was from 2010 to 2015. The data used in the system was judged by the IRB to be IRB Exempt #587570.

Inclusion Criteria: All patients 18 years old or older

In our development process, we had Clinical Informatics Fellows and Biomedical Informatics Masters’ and PhD students use the system. We observed them using the system and asked them to describe their experience using the think aloud method. We paid particular attention to the understandability of the screen ques and the results.

The system allows users to use Boolean logic and parentheses to construct their queries. It also allows subqueries so that one can define a population and then ask questions of the population. The users do not ever see a code and do not have to know anything about the information model or the ontologies in use to use the system. When the input string or parts of the input string have no map to our ontologies they are searched as a keyword search. The system allows one to save intermediate queries, reuse them, add to them and import them for reuse. Once created these models can be run in a batch mode.

Genomic data is presented as gene abnormalities that are used in clinical medicine and polymorphisms that have been identified are stored in a separate set of tables and they are also used to match to our patients who are included in the precision oncology project. (20) We add image features which are stored matrices and vectors extracted from images using image data analysis tools developed at UB. These act as separate Boolean connected search criteria. Datasets can then be exported in a csv format for further analysis and reporting.

We report the results of the Usability study were 8 participants used the system under supervision going through the same scenarios. (21) (22, 23) Each participant was asked to identify the relative risk of Obstructive Sleep Apnea comparing patients who have Rosacea and those that do not have Rosacea. This is a complex task that requires four queries to accomplish. Each student asked to set up the problem as two ratios that could then be compared using a Pearson Chi-Square test. The students were asked how easy was the software to use? How easy was the software to learn? Could you design a more intuitive interface? 3. Results:

The system has a simple interface. Where researchers enter what they want to query and the results are returned almost always in less than a minute. Users enter into simple search line what they are interested in looking for in their query. They specify which ontology if they want to use. They specify if they are looking for positive, negative, uncertain or not mentioned cases. They specify whether they want the ontology terms exploded (the reflexive transitive closure on subsumption) or not. They specify if they want to limit the search to certain sections of the clinical note or not. The user specifies if there is a value that they are looking for or range of values and units or not. Then they specify any time constraints on the query (perhaps you want to recruit patients over one time period who meet the inclusion / exclusion criteria and then follow some outcome sometime in the future.

Results come back quickly and in this case we are looking at patients who have anxiety in the practice and we can see that there are 32,798 patients reporting anxiety in our dataset (see figure xx). We display that about twice as many women report anxiety as men (See figure 3). You can also see the age distribution of our anxious patients (See

Acknowledgements: This work has been supported in part by grants from NIH NLM T15LM012595, and NCATS UL1TR001412. This study was funded in part by the NCI and the Department of Veterans Affairs through the BD-STEP program, and through a grant from the VA’s MAVERIC research group.