=Paper=
{{Paper
|id=Vol-3055/paper3
|storemode=property
|title=Ontology-based Semantic Model for Health Data Interpretation
|pdfUrl=https://ceur-ws.org/Vol-3055/paper3.pdf
|volume=Vol-3055
|authors=Christoniki Maga-Nteve,Nikos Tsolakis,Georgios Meditskos,Anastasios Karakostas,Stefanos Vrochidis,Ioannis Kompatsiaris
|dblpUrl=https://dblp.org/rec/conf/semweb/Maga-NteveTMKVK21
}}
==Ontology-based Semantic Model for Health Data Interpretation==
Ontology based semantic model for health data interpretation Christoniki Maga-Nteve1, Nikos Tsolakis1, Georgios Meditskos1,2, Anastasios Karakostas1, Stefanos Vrochidis1 and Ioannis Kompatsiaris1 1Information Technologies Institute, Centre of Research & Technology, Greece {chmaga, tsolakin, akarakos, vrochidis, ikom}@iti.gr 2School of Informatics, Aristotle University of Thessaloniki, 54124, Greece {gmed- itsk@csd.auth.gr} Abstract. New opportunities for improved personalized healthcare have emerged due to the re- cent advances in the development of modern methods which reinforce personalized early risk prediction, prevention and intervention. Using semantic techniques for data integration has be- come pivotal as it can deliver different ways to represent data, automating the process of data integration, and providing the ability to query semantically. In this paper, we propose a new semantic data model in which health information derived from Parkinson’s, Multiple Sclerosis, and Stroke (PMSS) patients is systematically analyzed to generate and improve knowledge that will be transferred to patient care in order to design and develop innovative health risk prediction and intervention tools. Furthermore, this project focuses on providing new opportunities for improved personalized healthcare and prevention that have been created by new designs and developments of innovative health risk prediction and intervention tools. A core ontology is currently being designed within the ALAMEDA project to deal with the semantic interoperability across heterogeneous datasets along with a semantic framework to concrete the generated heterogeneous data through a shared ontology. The ontology model development and the requirement elicitation will be done based on the components’ capabilities and use case requirements. The heterogeneous and dynamic data will be subjected to annotation through the development of semantic models for data sharing and usage apart from being interpretable. Keywords: Semantic interpretation, Ontology, Health-care data, Data interoperability 1 Introduction Health-care ontologies are pivotal for knowledge representation and data integration as the health data have become very complex and there is an intense need to link disorders and applicable medication along with specific individual patient attributes so as to ex- tract meaningful results. The use of ontologies facilitates easier processing of large da- tasets, while providing more effective solutions which support the way we manage health and wellbeing and the indispensable integration of knowledge and data [1]. In the healthcare domain, ontologies organize the knowledge as relations and instances to encode health records, lab results and diagnoses of patients while a specific data struc- ture is required to generate the appropriate information and solution. They can also add context to the patient’s data and provide a common framework for sharing and reuse of meaningful clinical outcomes. Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Semantic models are able to describe the health-care concepts and the relationships between them by improving their effectiveness and efficiency. They have profited healthcare communities with methods based on multiple ontologies, assuring the data quality in a heterogeneous environment and organizing them so that it can be inter- preted by computers without human intervention. Using semantic techniques for data integration has gained a lot of ground as they provide automated and multiple concepts to represent and process the data, and allow for the semantically query [2],[1]. With the goal of speeding up the modelling development process, a variety of possible knowledge resources can be reused. This approach has given different benefits to the developers, however, the existing methods and tools are not enough to guarantee a suc- cessful model. So, all these resources need to be evaluated in regard to their context- oriented usability and to adopt the requirements derived from the ontology needs. In this paper, we present an Ontology based semantic model for health data interpre- tation, where the proposed model is able to harmonize information from multiple sources to provide context awareness. The use of the ontologies and the semantic web technologies will allow us to provide a conceptual model, supporting interoperability and flexibility. The ontologies will also address the semantic interoperability issues, management and integration of information models, as it will define the concepts and their relationships within the ALAMEDA’s domain. The project’s innovations will uti- lize new machine learning models, built upon lifestyle retrospective data as well as new streams of patient data that involve the monitoring of everyday activities, such as sleep behavior. The success of such applications will provide clinicians with the opportunity to modify interventions based on personalized data recordings. The main goals of this study are to implement the semantic models development for the annotation of the dif- ferent modalities and to provide innovative solutions for context-aware data aggrega- tion. For the purpose of this work, a health-care ontology is being designed to deal with health-care information and IoT devices and services. The four steps based on Bravo, Reyes, Ortiz [3] that we follow to identify the mod- elling requirements are: a) the ontology requirements specification [4],where the main purpose is to define the requirements that the ontology should cover, b) the ontology design and c) construction, which are defined as an ordered series of phases that specify the procedures used in the engineering of an ontology or ontology system [3] and d) the ontology evaluation which is ensured in view of quality and correctness perspec- tives[5].The remainder of this paper is organized as follows. Section 2 summarizes the state-of-the-art on methodological reuse of existing technologies. Section 3 describes in detail how we draw out the requirements and the guidelines we follow based on the ALAMEDA needs. Sections 4 position our conceptual model and finally Section 5 con- cludes and presents some of our future work. 2 Related Work The development of semantic web technologies provides a number of ontology-based approaches in different domains. Within the Semantic Web community, it is strongly encouraged to reuse existing ontology models. The main domains that can be covered are: Sensor Data, Context, Activity Recognition, Event and Healthcare ontologies. Thus, we can build on existing resources and expand them, based on the specific needs of the ALAMEDA project, where is necessary. There are several state-of-the-art ontol- ogies that can be utilized for modelling ALAMEDA’s domains. In particular, some of the most commonly used ontologies, fused with healthcare interoperability standards, are the Fast Healthcare Interoperability Resources-HL7 (FHIR-HL7) [6] and the Sys- tematized Nomenclature of Medicine—Clinical Terms (SNOMED CT) [7]. The FHIR- HL7 describes the Resource Description Framework (RDF) representation of FHIR re- sources, while the SNOMED CT is a comprehensive medical terminology used for standardizing the storage, retrieval, and exchange of electronic health data and for the representation of medical concepts respectively. Additionally, the International Classi- fication of Diseases and Related Health Problems (ICD-10) [8] ontology aims to create a knowledge base for use in the ICD coding system is it also frequently used. Other related ontologies are the PDON, a Parkinson’s disease ontology for representation and modeling of the disease knowledge domain [9] and MSO, a multiple sclerosis ontology [10] integrated with Basic Formal Ontology [11]. While, much progress has been made in developing semantic models for healthcare interpretation, there is great potential for developing models about the three diseases that ALAMEDA addresses. The ALAMEDA ontology is concerned with creating a model that responds to the needs of patients with Stroke, MS and PD, providing semantic interoperability with respect to the personalized use cases of the project. The Semantic Sensor Network ontology (SSN) [12], is used for the representation of sensors and sensor-like devices. The base of SSN is the Stimulus-Sensor-Observa- tion pattern, a cornerstone for heavy-weight ontologies for the Semantic Sensor Web applications. The newest version of SSN, the Sensor, Observation, Sample, and Actu- ator (SOSA) allows the representation of sensors as a light-weight model [13]. The Semantic Smart Home [14] ontology captures knowledge relevant to activities of daily living, location, timing and people. The Event Model F is a formal model of events designed to facilitate interoperability in distributed event-based systems [15] and the Event Ontology [16] deals with the notion of reified events. The Dem@care Project contributes to the timely diagnosis, assessment, maintenance and promotion of self- independence of people with dementia [17]. The aforementioned ontologies can cover a subset of the domains involved in Healthcare systems and applications. Our proposed ontology seeks to respond to every aspect by reusing resources, comprised of modules for representing every need and can be easily adjustable and reusable. 3 Modelling Requirements Ontologies can be defined as an explicit specification of a conceptualization [18], where a logical formulation of complex problems is provided. For a solid design of an ontol- ogy, it is essential to define a variety of stages. Some of them are the reuse of existing ontologies, the class definition and the relations between them. Initially, the domain and the scope of the ontology have to be determined based on their intended uses. The existing ontologies can be integrated into the ontology framework so that the new on- tology will be developed from current dictionaries. The next step is to define the classes which the ontology will consist of and to allocate them in a hierarchical mode. Also relevant properties that characterize the relationships between the classes have to be introduced along with their individual instances. This study will provide the annotation layer to endorse ALAMEDA with situational awareness, extracting and harnessing deep intelligence through the aggregation of het- erogeneous information and knowledge. It will also allow the comprehensive represen- tation and the harmonization between the IoT devises, while it will offer innovative solutions development for context-aware data aggregation, enabling the semantic fed- eration of diverse infrastructures and services for capturing information. In this context, we are developing a health-care and patient-oriented ontology to per- sonalize the medical and social knowledge available. The ontology will provide knowledge structures that are maintainable whilst they will be used to support clinicians in multiple tasks. The objective is to enable sensors and data coming from multiple distributed information sources to be semantically accessible and discoverable, foster- ing the development of data processing applications that effectively utilize and combine multiple data sources and devices to deliver innovative services. 4 ALAMEDA’s Conceptual Model In ALAMEDA project, we will design a consensual and conceptual model able: a) to represent information that is made available via the questionnaires and the monitoring modules, b) to ensure the semantic interoperability of the information exchanged be- tween the individual ALAMEDA systems components, and c) to achieve the semantic annotation of the generated data and to further extend it with domain knowledge perti- nent to the ALAMEDA use cases. It is crucial for the ALAMEDA model to represent information related to the domains of the project such as physiological and cognitive assessment, clinical data, reported difficulties etc. Within the framework of the ontol- ogy design, we reused the Even Model F ontology and the Event Ontology, so as to construct the ALAMEDA Event module. The Event Ontology models the events, the environment and the changes that may happen and answers a series of critical questions about the actions, the places, the time and the person of an event. During the Sensors Ontology construction, we reused the SSN Ontology, which as it has been already men- tioned is a fundamental ontology in the sensor representation domain. 4.1 Methodology Overview There are several methodologies for ontology engineering to formalise and design an ontology. In this study, in order to develop the ALAMEDA’s Model, we used the NeOn Methodology, which is based on a set of 9 scenarios and is well-documented and highly adaptable. While building the ALAMEDA Ontology, different phases have been deter- mined. The first phase refers to the ontology requirements and the retrieval of the on- tology requirements specification document. The role of domain experts is very im- portant at this stage, since they define the use cases and propose optimal matching to ontology requirements as the model is ongoing. The second phase is related to the de- velopment of ontology at a primary level, where it will be defined which existing on- tologies will be used, along with the information input. The third phase contains the implementation and enrichment of the ontology, using the OWL2 [19] language for knowledge representation, which provides Properties, Classes and Individuals. 4.2 Ontology Requirements Specification Document The identification of the purpose of ALAMEDA Ontology, the scenarios that must be defined, the intended uses and end-users are the main elements of the specification of requirements. As proposed in the NeOn Methodology, the methodological guidelines were created, based on the state-of-the-art ontology development techniques and repre- sented in the Ontology Requirements Specification Document (ORSD). The require- ments are defined with respect to the Competency Questions (CQs), which are groups of questions that play a significant role in the ontology implementation, as they specify what knowledge has to be entailed in. A critical attribute of the CQs is that they define the functional requirements of the ontology. The ORSD is the output of the ontology requirements specification and provides information regarding the a) purpose which is the main general goal and function that the ontology should fulfill, b) scope which re- fers to the coverage and the degree of details that the ontology should have, c) imple- mentation language which is the formal language that will be used for the ontology design, d) intended end-users and uses and e) the non-functional and functional require- ments. In ALAMEDA project, there are clinical aims of the use cases that are critical components for the ontology framework. Parkinson disease is a common neurodegen- erative disorder and its meaningful worsening of global status or of individual motor is a specific use case with inclusion criteria such as advanced Parkinson, age and more. Relapse risk prediction in Multiple Sclerosis in young to middle age person is a second important use case. The Stroke use case refers to patients who suffered from Stroke the last month and is monitored for neuro-rehabilitation. The proposed Ontology should be constructed with respect to these use cases so as to provide a shared vocabulary for the communication and exchange of information among the different system components, to represent, store and retrieve patients’ profile data, sensors etc and to represent and query data made available by other analysis components of the ALAMEDA system. 4.3 ALAMEDA Ontology The ORSD is a key factor for modelling the classes, properties and instances of the ontology. Moreover, we used further input and feedback made available during the stage of the ontology design by the clinical and technical partners involved in the rele- vant project. Another element for the current version of the ontology was the standards and best practices in the Semantic Web community which are available. The imple- mentation of the ontology was done in Protégé [20], a tool for modelling ontologies that provides us with the ability to construct the appropriate modules. ALAMEDA Ontology Modules The ALAMEDA Ontology model consists of six modules and a main ontology, which acts as the parent of all the hierarchical relations. The Model Ontology represents all the modules attached to ALAMEDA, where Home, Person, Lab, Event, Time, Sensors modules are some of the main classes. In Figure 1, a high-level figure of the model can be seen. Fig. 1. A high-level figure of the ALAMEDA model Home Ontology formalizes information relevant to behavioural interpretation and re- ported difficulties in the home environment. An example is the class ReportedDifficul- ties which is used to describe information about the problems (e.g. difficulty in exer- cising or bad mood etc) that a patient may face in the home environment. Lab Ontology formalizes the types of information relevant to the tests, assessments, patient’s clinical & experimental records in the lab environment. The class Domain is used to provide information relevant to the specified domains, in order to describe the different types of clinical tests and their results. The class MeasuredData indicates the data that is essential to be shared during a task, while the class Task represents the possible types of tasks involved in the ALAMEDA. Finally, the class ClinicalAssessment is utilized so as to define the clinical characteristics that are collected during the clinical and med- ical phases taking place in the lab environment. Person Ontology: refers to patient’s, clinician’s and caregiver’s socialdemographic data. It consists of 5 classes that display person, disease, gender, educational level and language. Event Ontology provides in- formation relevant to the entities and the activities that take place in order to fulfill ALAMEDA purposes. The class Entity describes all the physical entities and consists of 2 subclasses Person and Place. The class Activity represents any activity the patient may be involved, while the class MeasurementPattern enacts the domain of the meas- urements that take place. Event Model F played a crucial role in the development of this module by using the participation pattern and making clear the roles and events in our model, while the Event Ontology describes the time, the agents etc of the ontology. Sensors Ontology describes information concerning the type and properties of the sen- sors used which are divided into two major groups: a) FixedSensor and b) Wearable- Sensor. The class FixedSensor refers to sleep or location monitoring and the class WearableSensor refers to sensors like smartwatches or belt sensors that will be used. During the Sensors Ontology construction, we reused the SSN Ontology. The concepts introduced by SSN are very important in healthcare sensing environment. Main con- cepts that being reused are the class Procedure, which provides a way to specify obser- vation and has an input and an output. Those input and output information is repre- sented in the classes Input and Output. Other critical components of the SSN Ontology that being reused are the Observation, the Platform. Time Ontology presents the time, duration and information about the tasks of the ALAMEDA. It consists of classes like DateTimeDescription, DayOfWeek etc that provide specific information about date, time, day, duration and their values. Table 1 presents some of the most important ob- ject/data properties of our model, their definition, the module/class they exist in and one of their relations. Table 1. ALAMEDA Modules Classes, Object/Data Properties and their use. Module Class/Subclass Class Name Definition Type Lab EDSS MS isFor Indicates the disease that Object a test is used for Property Event Activity Person hasAgent Indicates which person Object is the performer of an Property activity Patient ReportedDifficulties Patient forPatient Indicates which patient Object is responsible for each Property self-assessment Lab MocaTest Type Indicates the type of the DataProperty test Home ADLSummary Date Indicates the date of the DataProperty ADL activity In Figure 2, an example of upper-level vocabulary for modelling a clinical test can be observed [21]. The class CVLT-II represents a measure of verbal learning and memory that is in Domain III – Mental and Cognitive Ability of the ALAMEDA Clinical Ethics. It is essentially a test for patients that suffer from MS. It relates the class MS, which is a subclass of the class Disease, via the object property isFor. In this way, we represent the tests that take place in the ALAMEDA Project and their relationship with a specific disease. The class CVLT-II has some data properties that provide information about the type of the test and the score that the patient has in each respective test. In this example, data property score is an integer, and provides information about the score and data property type:PM, which is an integer, provides information about the type of the test as dictated by the experts. Fig. 2. An upper-level vocabulary for modelling a clinical test 4.4 Οntology Evaluation In this section, we present the evaluation of the ontology, considering the quality, the structure and the consistency. The metrics of the current version of the ALAMEDA ontology, as provided by the ontology metrics view in Protégé can be seen in Table 2. The number of the classes, axioms, object properties, data properties etc, are considered base metrics and provide information regarding the quantity of the ontology compo- nents. There are 253 Classes, 67 Object properties and 42 Data properties. Table 2. Base metrics. Base Metrics Value Class count 253 Object property count 67 Data property count 42 SubClassOf axioms count 268 Disjoint classes axioms count 2 Inverse object properties axioms count 1 As the model is ongoing, its evaluation will be done by using some of the most common tools and methodologies. OntoClean is a methodology which validates the taxonomic relationships from the ontological adequacy standpoint [22]. It provides characteriza- tion of the basic elements of the ontology by using ontological notion. Furthermore, one of the most important tools for evaluating the consistency of the model is OOPS! (OntOlogy Pitfall Scanner), a tool that detects pitfalls and their consequences in the quality of the ontology and provides modifications and improvements of the pitfalls [23]. This system provides pitfalls of different significance and categorize them in crit- ical, important and minor pitfalls. Critical are the ones that is essential to be corrected, important are the ones that is not crucial, but are important to be corrected and minor pitfalls are the ones that are not crucial by any means, but their correction will provide quality to the ontology. The structure of the ontology is evaluated by using OntoMet- rics, an online framework that provides information about the base and schema metrics of a semantic model [24]. Base metrics are the simple metrics, like the counting of classes, objects, etc, while schema metrics evaluate the design of the ontology. Some of the most common metrics that are used for evaluating the ontology using OntoMet- rics are attribute richness, inheritance richness, relationship richness, axiom/class ration and class/relation ratio. Further steps, such as ontology population will enrich our model and will allow us to design and develop innovative health risk prediction and intervention tools. Classification tasks such as diagnosis prediction with taxonomical knowledge found in the ongoing ontology will be combined in order to support human- understandable explanations of the analysis. 5 Conclusion One of the most challenging problems in healthcare systems is the interoperability be- tween heterogeneous data, where a medium to share knowledge and exchange infor- mation both across people and services is essential. The Semantic Web services provide interoperability standards and vocabularies that can facilitate access to the necessary data in a secure and safe manner. This paper describes a healthcare ontology-based model which interoperates be- tween the systems and it is able to facilitate knowledge sharing in a complex environ- ment. It is also able to manage and integrate patient specific data at home and lab envi- ronment with knowledge specific for this kind of patient. In addition, it presents the significance of this ontology, and provides users with the opportunity to gain insight and knowledge from their data and the criteria expected to be available through it. Currently, the ontology development is ongoing, the requirement elicitation will be done based on components capabilities and use case requirements, while the heteroge- neous and dynamic data will be subjected to annotation through the development of semantic models for data sharing and usage, besides being interpretable. An imminent challenge that needs to be addressed is to finalize the ORSD and the most appropriate semantic model for the acquired data that fits our case. In the future, we will include more restrictions and different types of properties on the interactions between interven- tion classes so as to implement a more efficient and accurate version of the ontology. Acknowledgements. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No GA101017558. References 1. Hammad R, Barhoush M, Abed-Alguni BH. A Semantic-Based Approach for Man- aging Healthcare Big Data: A Survey. J Healthc Eng. 2020. 2. H. Zhang, Q. Li, G. Yi et al., “An ontology-guided semantic data integration frame- work to support integrative data analysis of cancer survival,” BMC Medical Infor- matics and Decision Making, vol. 18, no. 2, p. 41, 2018. 3. Bravo, Maricela, Hoyos Reyes, Luis Fernando, & Reyes Ortiz, José A. Methodol- ogy for ontology design and construction (2019). 4. Suárez-Figueroa M.C., Gómez-Pérez A., Villazón-Terrazas B. How to Write & Use the Ontology Requirements Specification Document. Meersman R., Dillon T., Her- rero P. On the Move to Meaningful Internet Systems, Lecture Notes in Computer Science, vol 5871. Springer, (2009). 5. Hlomani, H. and D. Stacey. “Approaches, methods, metrics, measures, and subjec- tivity in ontology evaluation : A survey.” (2014) 6. FHIR Homepage, http://hl7.org/fhir/ 7. SNOMED-CT, https://bioportal.bioontology.org/ontologies/SNOMEDCT 8. ICD10, https://bioportal.bioontology.org/ontologies/ICD10 9. E. Younesi, A. Malhotra, M. Gündel, P. Scordis, A. Kodamullil, M. Page, B. Mül- ler, S. Springstubbe, U. Wüllner, D. Scheller, M. Hofmann-Apitius.PDON: Parkin- son's disease ontology for representation and modeling of the Parkinson's disease knowledge domain (2015). 10. https://bioportal.bioontology.org/ontologies/MSO 11. Basic Formal Ontology, https://basic-formal-ontology.org/ 12. Compton M. , Barnaghi P., Bermudez L., García Castro R., Corcho O., Cox S., Graybeal J., Hauswirth M., Henson C., Herzog A., Huang V.,Janowicz K., Kelsey D., Phuoc D., Lefort L., Leggieri M., Neuhaus H., Nikolov A., Page K., Taylor K. (2012). The SSN Ontology of the W3C Semantic Sensor Network Incubator Group. Web Semantics: Science, Services and Agents on the World Wide Web. 13. K. Janowicz, M. Compton, “The Stimulus-Sensor-Observation Ontology Design- Pattern and its Integration into the Semantic Sensor Network Ontology”, Interna- tional Workshop on Semantic Sensor Networks, vol. 668, CEUR-WS, 2010. 14. L. Chen, D. C. Nugent, H. Wang, “A Knowledge-Driven Approach to Activity Recognition in Smart Homes”, IEEE Trans. Knowl. Data Eng. 24(6): 961-974, 2012. 15. A. Scherp, T. Franz, C. Saathoff, and S. Staab, “A core ontology on events for rep- resenting occurrences in the real world”, In Multimedia Tools and Applications, 58(2):293–331, 2012. 16. http://motools.sourceforge.net/event/event.html. 17. http://demcare.eu/ 18. Grûber, T.: A translation approach to portable ontology specification, Knowledge Acquisition 5(2) (1993) 199- 220 19. http://www.w3.org/TR/owl2-overview/ 20. https://protege.stanford.edu/ 21. Falco, R., Gangemi, A., Peroni, S., Shotton, D., & Vitali, F.,2014. Modelling OWL ontologies with Graffoo. European Semantic Web Conference (320-325). 22. N.Guarino, C. Welty. “An Overview of OntoClean”. (2004) 23. http://oops.linkeddata.es/ 24. https://ontometrics.informatik.uni-rostock.de/ontologymetrics/