Early steps of an Ontology for Magnetic Resonance Imaging: MRIO Lucas M. Serra 1, Michael G. Dwyer 2, William D. Duncan 1, Alexander D. Diehl 1 Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, USA1; Buffalo Neuroimaging Analysis Center, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, USA2. Abstract and timing how long it takes for the needle to re-right itself. As the protons re-align themselves with the applied magnetic field, The Magnetic Resonance Imaging Ontology (MRIO) is an they release energy. Protons can release energy to their application ontology that represents numerous entities in the surroundings, which is referred to as spin-lattice relaxation or domain of magnetic resonance imaging (MRI) including MRI T1 relaxation. Alternatively, protons can become out of phase analysis and MRI sequences. Data from clinical trials MRI with each other. This is called spin-spin relaxation or T2 protocols were used to create the axioms of these MRI relaxation. Depending on which of these effects dominates an sequences. We have also created means for automatically image determines whether we designate an image as a “T1 loading MRI headers as new ontology instances and image” or a “T2 image”. The aforementioned effects alter the demonstrate the ability to query data in MRIO. The current net magnetic vector within the machine, which is captured as work represents the beginnings of a full-fledged imaging electrical impulses by the RF coil. In addition to these “classical” image contrasts, the field of MRI physics has ontology and automated analysis pipeline, which we plan to discovered many other sources of tissue contrast that can be further develop. Future iterations of the project will include a elucidated by variations in the standard pulse sequence regime. stream-lined user-interface for querying and improved Together, these various contrasts enable fine discrimination of capability in classifying image types. tissue composition that is not possible with other imaging modalities, which has cemented MRI as the premiere imaging Keywords: option for pathologies affecting soft tissues. MRI ontology; imaging informatics; MRIO. Expanding Use and Standards Introduction MRI is an often-used component in a physician’s toolkit especially in the US which boasts the second highest number of The Fundamentals of Magnetic Resonance Imaging MRI machines per capita globally (2). MRI has broad clinical and research applications ranging from traumatic brain injuries Magnetic resonance imaging (MRI) is a mainstay of modern to osteoarthritis to malignancy. Within the past two decades, the medicine that has rapidly integrated itself into a myriad of use of imaging across healthcare has risen dramatically, and has diagnostic algorithms and has proven itself as a valuable been partly fueled by physicians who purchase MRI machines component of healthcare due to its versatility and accuracy. for their practices and consequently order more scans (3, 4). The However, these features come at the cost of price and growing use of imaging data has necessitated improvements in complexity. MRI is a nuanced technology and, when imaging standards and protocols. Healthcare professionals and approaching methods for representing its components in an researchers working within the field of imaging wisely adopted ontology, merits an understanding of the fundamental principles a standard file format for medical images decades ago. Digital of magnetic resonance. MRI is based upon the same physical Imaging and Communications in Medicine (DICOM) is used principles that underlie nuclear magnetic resonance and is worldwide to store and transmit medical images. (5). In order to predicated upon on the notion of “spin”. Spin gives particles, further augment the standardization and interoperability like protons, their angular momentum and a magnetic moment introduced by DICOM, centers involved in clinical trials often (1). Protons therefore have magnetic fields which align with adopt detailed protocols, which state specific image parameters applied external magnetic fields. By interrogating these proton and tolerances for use during data collection. These data exist as spins with radiofrequency pulses and recording the responses, numbers and text in the metadata fields of a DICOM header. MRI is able to infer many different properties of the underlying tissue. Problems Facing the Field The Anatomy of an MRI Machine I. With increased use and widening adoption comes ever- growing volumes of data that must be catalogued, managed, and Modern MRI machines are composed of a primary analyzed. Despite the progress made in standardizing medical superconducting magnet that supplies the main magnetic field, images, there exist numerous challenges in the management of a gradient coil to alter the primary magnet’s field and encode imaging data, which the use of an ontology helps to mitigate. spatial information, and a set of radiofrequency (RF) coils to The metadata fields of a DICOM file frequently represent non- create pulses and receive signals. If we consider an analogy explicit knowledge using ambiguous language. For instance, where the protons are the needle of a compass, the RF coil’s one of the fields in the DICOM header is labeled ‘PulseTime’. function is somewhat similar to nudging the needle with a finger The preceding fields deal with cardiac aspects of the scan such Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). as ‘CardiacRepetitionTime’ and ‘ImagesPerCardiacCycle’ “data siloes”. The ontology covering MRI simulations and which may lead one to believe that ‘PulseTime’ relates to the sequences did not publish their ontology in any form. The pulse or heart rate of the patient. This is complicated by later DICOM controlled terminology, although published alongside fields that reference RF pulses but instead do so using language ontologies on BioPortal, has a completely flat structure and like ‘PulseSequence’. This makes it challenging for a user who some of its definitions are not crafted in the style preferred by is unfamiliar with the domain to use the data. Fully the OBO Foundry. Additionally, all these ontologies seem to not understanding the intended meaning of the data fields involves cover the higher levels of abstraction that we desire in our deep knowledge of the latest version of the DICOM ontology. specifications, use of a third-party website, or consultation with In the current work, we have developed the MRI ontology a domain expert. Among the most important issues is a lack of (MRIO) to represent MRI analyses, sequences, images, and consensus about the exact parameters that make up a specific image type, which is partly confounded by intermachine and machines using metadata from DICOM files to create axioms. We have also created methods for extracting this information inter-operator variability. The Alzheimer's Disease from DICOM headers and automatically creating new ontology Neuroimaging Initiative (ADNI) maintains highly detailed MRI scanner protocols for use in its clinical trials and illustrates this instances. variability well (6). The ADNI 3 protocols define the MRI Methods acquisition parameters for capturing a sagittal 3D fluid- attenuated inversion recovery image of a human brain in several Ontology Construction different machines from different vendors. In a General Electric 25 MRI machine, the echo time (TE) is 119.0ms, the repetition MRIO was created with the latest version of Protégé (5.5.0) time (TR) is 4800.0ms and the inversion time (TI) is 1451ms (11). The HermiT (1.4.3.456) reasoner plugin was used for while in a Siemens Magnetom Verio machine the parameters are inference (12). Our ontology was built with certain principles in 442ms, 4800ms, and 1650ms for TE, TR, and TI respectively. mind, such as resource identifiers, textual definitions, and Although both machines are 3 tesla MRI machines and openness, all of which are outlined by the Open Biological and attempting to capture the same image, their TE parameters are Biomedical Ontology (OBO) Foundry (13). Following these quite different. Moreover, even small changes in these principles, MRIO uses BFO as its upper level ontology and re- parameters can result in radically different images and uses existing ontologies like the Ontology for Biomedical associated image types. Broadly speaking, we currently do not Investigations (OBI) and the Information Artifact Ontology (14- have effective methods for transitioning from these elementary 16). MRIO adds 70 new terms, most with well-constructed imaging parameters to higher semantic levels. If we borrow an textual and logical definitions to represent multiple aspects of analogy from biology, these imaging parameters are similar to MRI images. Around two dozen terms were reused from OBI the nucleotides of DNA where different sequences can code for and IAO as upper level terms or in relations. Our ontology was the same codons and proteins. As of yet, we lack an elegant way constructed in both a top-down and a bottom-up approach. The to determine these proteins or their functions from their entities we deemed most important in representing MRIs in an constituents. These factors can result in problems with ontology are: the MRI image objective and the MRI sequences, interoperability when combining large sets of MRI images and followed by the MRI machine, the patient/evaluant, the MRI the requirement to write complex and cumbersome queries to assay, and the MRI image itself. We consulted with domain create retrospective cohorts. experts in order to create the MRI analysis hierarchy. The most salient metadata on DICOM image files are “parameter Imaging Ontologies specifications” or “acquisition parameters”, which describe RF The current work is not the first ontology in the domain of pulse sequences. These parameters, implemented as data MRI images, and a handful of past studies have created MRI- properties, were used in creating the axioms and computer- related ontologies. NeuroLOG or OntoNeuroLOG is a French readable definitions of sequences. A GitHub repository multi-level ontology created to integrate neurological resources containing the latest version of the ontology is available at: from multiple academic centers and uses DOLCE as its upper https://github.com/LucasSerra1/MRIO.git level ontology (7). NeuroLOG covers a wide array of brain- Data Extraction centric investigation-related entities including MRI (8). A more recent MRI ontology covered MRI simulations and modeled the The scripts used in parsing MRI headers and MRI protocol fundamental processes of the RF pulses that form sequences (9). files were written using the Python programming language. In Lastly, the DICOM controlled terminology is available on essence, the scripts extract information from the DICOM BioPortal and consists of every term used in the DICOM file headers and transform the information into instances of MRIO format along with their definitions (10). These works suffer classes and relations. DICOM header data fields are first from limitations in accessibility and usability. NeuroLOG is transformed into a spreadsheet. These fields are mapped to inaccessible through the paper’s provided links and what is MRIO data properties. Numeric values are then read and viewable through snapshots of the ontology show missing associated with these data properties. The RDFLib (4.2.2) textual and logical definitions for represented entities. Python library was used to facilitate this transformation and NeuroLOG also uses DOLCE, which restricts its automatically add graph nodes and new instances to our interoperability with the multitude of existing OBO Foundry ontology from these mapped classes. To create the axioms that ontologies that are grounded in the Basic Formal Ontology. underlie the sequence types (Figure 2), a separate script was Interoperability with OBO Foundry ontologies is an important created that extracts parameter specifications from JSON files feature that promotes reuse and prevents the creation of isolated representing years of MRI study protocols used in clinical trials Discussion conducted at the Buffalo Neuroimaging Analysis Center. As no exact definitions for consensus sequence parameters exist in the Our work contributes to imaging informatics in a number of DICOM specifications or in published literature, simple ranges ways. The automatic creation of ontology instances mitigates the were used to define sequence parameters and provide a survey laborious task of data entry. Our system also enables precise of the data available. Minimum and maximum values were selection of cohorts from datasets of DICOMs and facilitates extracted across hundreds of entries to create the ranges that discovery of potential subgroups within imaging data. MRIO constitute the axioms of our sequence classes. provides a structured semantic representation of many of the Our final output of the data extraction process was an OWL metadata fields found in the DICOM format. To this end, MRIO file containing 4 instances (representing a single DICOM improves the interpretability of data field definitions without the header), 70 MRIO-specific classes, and 8 new data properties. need for external resources and elucidates some of the implicit The original data consisted of 300 text files containing 1000 knowledge found within this domain. MRIO’s adherence to entries for MRI protocols (17). This was distilled into 5 MRI OBO Foundry principles also enhances interoperability with pulse sequence classes in the final ontology. After modification other similarly structured ontologies. with RDFLib, the ontology was loaded as a triplestore into the Despite these benefits, MRIO and its extraneous systems are free version of GraphDB (8.9) Using SPARQL, we queried the currently limited in some respects. At present, MRIO can only data looking for images by their parameters (18, 19). process single DICOM headers, which must be loaded as text files. Furthermore, once new MRI instances are loaded, the Results HermiT reasoning engine in Protégé takes minutes to sort individuals and infer relations. This occurs with only a handful The ensemble of these moving pieces is a pipeline that of DICOMs loaded. We are investigating methods to speed up automatically loads DICOM headers and inserts them into a the reasoning so we can scale the ontology appropriately. Our queryable MRI ontology created from a combination of domain ontology also only captures a small selection of the vast number expertise and parameter data extracted from clinical trial of data fields found within the DICOM file standard. We would protocols. Figure 3 provides an overview of the gross structure also like to more fully develop the definitions of our classes. As of the ontology. As MRIO is built upon the foundations of OBI, a final limitation, our system requires that users understand it takes a similar approach in establishing relationships between SPARQL to write their queries and extract information from the overall imaging process and the participants. Terms derived data loaded in triplestores, although our long-term plans include from OBI are in ovals while MRIO terms are in boxes. More creating a web interface to simplify querying. specifically, ‘magnetic resonance imaging pulse sequence’ is MRIO represents the beginnings of a full-fledged imaging define as a type of ‘processed material’ and stands in a ‘part of’ ontology and automated analysis pipeline. There are many relation to the ‘magnetic resonance imaging radiofrequency possibilities for future work and expanding the functionality of coil’, which is an OBI ‘measurement device’. As shown in MRIO. With thousands of MRIs loaded from disparate data sets Figure 3, both the MRI machine and a ‘material entity’ with the and institutions, it would be possible to better grasp which are ‘magnetic resonance imaging evaluant role’ are the specified the exact elements that make a “T1 image”. This could occur inputs of a ‘magnetic resonance imaging assay’. This ‘magnetic either through community consensus or MRIO could provide resonance imaging assay’ term resides under the ‘planned high-quality data for machine learning or statistical treatments process’ class and has ‘magnetic resonance imaging datum’ as of this question. In later versions of our work, the query system the specified output. This data undergoes a ‘magnetic resonance could be improved with a natural language processing-based imaging data transformation’, which in the real-world partly query system and a more stream-lined user interface that would takes the form of a Fourier transformation and results in the final obviate the need for users to know SPARQL. ‘magnetic resonance imaging image’. The image is tied back to Conclusion the sequence used and the subject of the scan using ‘is about’ relations. MRIO is the only MRI ontology under active development. Figure 1 depicts the structure of the MRI pulse sequences. At present, MRIO enjoys a number of useful features and Several new data properties were needed to fully represent these initial steps provide a proof-of-concept for a much larger sequence parameters: ‘has TR’, ‘has TE’, ‘has inversion time’, analytic platform with numerous uses. ‘has flip angle’, and ‘has echo train length’. These entities were derived from BNAC MRI protocol specifications and represent settings configured on an MRI machine for the creation of an ACKNOWLEDGMENT MRI image. AD was supported by 5UL1TR001412 (NCATS). Figure 2 illustrates the type of query one is able to use with MD has received consultant fees from Claret Medical and EMD MRIO. With SPARQL, an investigator is able to hone in on Serono, and research grant support from Novartis and Celgene. well-crafted cohorts via sequence parameters as in this example or via a number of other axes. Fig. 1. Sample MRI pulse sequence class Fig 2. Example SPARQL query Fig. 3. Gross and relational structure of MRIO image creation device data MRI machine has_specified_input imaging assay transformation MRI datum MRI assay has_specified_input has_specified_output transformation has part has part has_specified_output measurement device device MRI datum MRI image MRI MRI magnet radiofrequency coil has_specified_input measurement image datum has part material MRI pulse entity sequence is about processed material inheres in MRI evaluant role evaluant role is about REFERENCES 1. Plewes DB, Kucharczyk W. Physics of MRI: A primer. Journal of Magnetic Resonance Imaging. 2012;35(5):1038-54. 2. Health expenditure indicators [Internet]. 2014. Available from: https://www.oecd-ilibrary.org/content/data/data-00349-en. 3. Agarwal R, Bergey M, Sonnad S, Butowsky H, Bhargavan M, Bleshman MH. Inpatient CT and MRI utilization: trends in the academic hospital setting. Journal of the American College of Radiology : JACR. 2010;7(12):949-55. 4. Baker LC. Acquisition of MRI equipment by doctors drives up imaging use and spending. Health affairs (Project Hope). 2010;29(12):2252-9. 5. Mildenberger P, Eichelberg M, Martin E. Introduction to the DICOM standard. European Radiology. 2002;12(4):920-7. 6. Petersen RC, Aisen PS, Beckett LA, Donohue MC, Gamst AC, Harvey DJ, et al. Alzheimer's Disease Neuroimaging Initiative (ADNI): clinical characterization. Neurology. 2010;74(3):201-9. 7. Masolo C, Borgo S, Gangemi A, Guarino N, Oltramari A. WonderWeb deliverable D18 ontology library. 2003. 8. Michel F, Gaignard A, Ahmad F, Barillot C, Batrancourt B, Dojat M, et al. Grid-wide neuroimaging data federation in the context of the NeuroLOG project. Studies in health technology and informatics. 2010;159:112-23. 9. Lasbleiz J, Saint-Jalmes H, Duvauferrier R, Burgun A. Creating a magnetic resonance imaging ontology. Studies in health technology and informatics. 2011;169:784-8. 10. Salvadores M, Alexander PR, Musen MA, Noy NF. BioPortal as a Dataset of Linked Biomedical Ontologies and Terminologies in RDF. Semantic web. 2013;4(3):277-84. 11. Noy NF, Crubezy M, Fergerson RW, Knublauch H, Tu SW, Vendetti J, et al. Protégé-2000: an open-source ontology- development and knowledge-acquisition environment. AMIA Annual Symposium proceedings AMIA Symposium. 2003;2003:953-. 12. Glimm B, Horrocks I, Motik B, Stoilos G, Wang Z. HermiT: An OWL 2 Reasoner. Journal of Automated Reasoning. 2014;53(3):245-69. 13. Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nature biotechnology. 2007;25(11):1251-5. 14. Bandrowski A, Brinkman R, Brochhausen M, Brush MH, Bug B, Chibucos MC, et al. The Ontology for Biomedical Investigations. PLOS ONE. 2016;11(4):e0154556. 15. Smith B, Ceusters W. Aboutness: Towards Foundations for the Information Artifact Ontology2015. 16. Arp R, Smith B, Spear AD. Building Ontologies with Basic Formal Ontology: The MIT Press; 2015. 248 p. 17. Antoniou G, van Harmelen F. Web Ontology Language: OWL. In: Staab S, Studer R, editors. Handbook on Ontologies. Berlin, Heidelberg: Springer Berlin Heidelberg; 2004. p. 67-92. 18. Gueting RH. GraphDB: A Data Model and Query Language for Graphs in Databases. Informatik-Bericht 155. 1994. 19. Prud’hommeaux E, Seaborne, A. SPARQL Query Language for RDF. W3C Recommendation. January 2008.