=Paper= {{Paper |id=Vol-2849/paper-28 |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-2849/paper-28.pdf |volume=Vol-2849 |dblpUrl=https://dblp.org/rec/conf/swat4ls/SloepKSF19 }} ==None== https://ceur-ws.org/Vol-2849/paper-28.pdf
                                                                                                                                                           A modular approach to knowledge graphs
                                                                                                                                                                and FAIR data in healthcare
Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).




                                                                                                                                                           Matthijs Sloep [0000-0003-3602-1885], Petros Kalendralis, Johan van
                                                                                                                                                                       Soest, Rianne Fijten [0000-0002-1964-6317]
                                                                                                                                                           Department of Radiation Oncology (MAASTRO), GROW school for
                                                                                                                                                           Oncology and Developmental Biology, Maastricht University Medical
                                                                                                                                                                         Centre+, Maastricht, The Netherlands
                                                                                                                                                                          matthijs.sloep@maastro.nl




                                                                                                                                                       Keywords: FAIR data, Linked Data, Modular, knowledge graphs


                                                                                                                                                 1     Introduction

                                                                                                                                          In healthcare, and more specifically cancer treatment, data sharing is essential yet
                                                                                                                                          difficult. 1 in 5 people diagnosed with cancer have a rare type of cancer, which means
                                                                                                                                          considerable time is needed to collect sufficient data for research. Combining data
                                                                                                                                          from multiple centres is therefore vital, unfortunately, linking this data is not
                                                                                                                                          straightforward. There are various ways healthcare centres store their data, due to for
                                                                                                                                          instance differences in treatment protocols and clinical systems. This means different
                                                                                                                                          variables and annotations are used. Consequently before we can solve any medical
                                                                                                                                          problems, we first need to solve this data integration challenge.

                                                                                                                                          The FAIR (Findable, Accessible, Interoperable and Reusable) principles [1] facilitate
                                                                                                                                          data driven research and integration. Here we describe a semantic solution to the
                                                                                                                                          application of the FAIR principles in a clinical use case and address the challenges we
                                                                                                                                          faced and the practical solutions we implemented to solve these issues. Our use case is
                                                                                                                                          proTRAIT, a nationwide Proton beam therapy (PBT) data infrastructure which
                                                                                                                                          combines data from the various proton centres in the Netherlands into a centralized
                                                                                                                                          database.

                                                                                                                                          Our approach to the FAIRification process is akin to the steps described by the Go-
                                                                                                                                          FAIR initiative: collect the data, analyse the data, define a semantic model for the data,
                                                                                                                                          link the data, define the meta-data, and finally deploy a resource for the data. We
                                                                                                                                          started with analysing what data is collected at the treatment centres and is relevant
                                                                                                                                          for medical research. Radiotherapists from the Dutch national proton therapy platform
                                                                                                                                          listed these specific clinical and treatment items. Then we manually annotated all the
                                                                                                                                          different items and created a knowledge graph to interconnect them. The list items
                                                                                                                                          were defined with existing and new ontology classes using the Radiation Oncology
                                                                                                                                          Ontology [2] The non-FAIR relational data from the various centres will be converted
                                                                                                                                          into triples utilizing the knowledge graphs. Essentially this process creates findable,
                                                                                                                                          accessible and linked semantic data. Metadata is collected and described in the dataset
                                                                                                                                          and the use of ontologies and unique and persistent identifiers ensure the
                                                                                                                                          interoperability. Finally, the reusability will be cared for by hosting the data on a
                                                                                                                                          clinical data management tool.
Our approach means that a lot of work goes into creating and maintaining the
knowledge graphs. Each cancer type requires a separate, manually created knowledge
graph, and each list and graph consist of hundreds of items. To facilitate this process
we made full use of the considerable overlap between the respective lists: by creating
separate turtle files for small subsets of items we leveraged the modular characteristics
of knowledge graphs. We designed these files in such a way that they can be easily
reused. Apart from the technical composition of the code another challenge was to
arrange these items in subsets that are both logical and practical, in order for the
selection of the necessary subsets to be as easy as possible. For each separate cancer
we selected relevant turtle files, created more if necessary and compiled a tailored
comprehensive graph. Another advantage of this modular approach is that we can
easily adapt and adjust our graphs to changes in the clinic. For instance, when new
items prove relevant to a research question we can easily add more variables.

       References
        1. Wilkinson, MD, Dumontier, M, Aalbersberg, IJJ, Appleton, G, Axton, M, Baak, A,
           Blomberg, N, Boiten, JW, da Silva Santos, LB, Bourne, PE, Bouwman, J, Brookes, AJ,
           Clark, T, Crosas, M, Dillo, I, Dumon, O, Edmunds, S, Evelo, CT, Finkers, R,
           Gonzalez-Beltran, A, Gray, AJG, Groth, P, Goble, C, Grethe, JS, Heringa, J, 't Hoen,
           PAC, Hooft, R, Kuhn, T, Kok, R, Kok, J, Lusher, SJ, Martone, ME, Mons, A, Packer,
           AL, Persson, B, Rocca-Serra, P, Roos, M, van Schaik, R, Sansone, SA, Schultes, E,
           Sengstag, T, Slater, T, Strawn, G, Swertz, MA, Thompson, M, Van Der Lei, J, Van
           Mulligen, E, Velterop, J, Waagmeester, A, Wittenburg, P, Wolstencroft, K, Zhao, J &
           Mons (2016) The FAIR guiding principles for scientific data management and
           stewardship. Sci Data, 15 (3) Article 160018, 10.1038/sdata.2016.18

        2. Traverso A, van Soest J, Wee L, Dekker A. (2018) The Radiation Oncology Ontology
           (ROO): publishing linked data in radiation oncology using semantic web and ontology
           techniques. Med Phys. 10.1002/mp.12879.