<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Building a Semantic Repository for Outpatient Sheets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maria Nisheva</string-name>
          <email>marian@fmi.uni-so</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hristo Georgiev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Pavlov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Mathematics and Informatics, Sofia University St. Kliment Ohridski</institution>
          ,
          <addr-line>5 James Bourchier Blvd., 1164 Sofia</addr-line>
          ,
          <country country="BG">Bulgaria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Mathematics and Informatics, Bulgarian Academy of Sciences</institution>
        </aff>
      </contrib-group>
      <fpage>30</fpage>
      <lpage>37</lpage>
      <abstract>
        <p>The paper analyzes some issues in the area of semantic interoperability of health information systems and in particular, the issues related to the provision of semantic interoperability of health data. The design principles and some implementation details of a semantic repository for outpatient sheets are discussed in the context of the suggested ideas. The rapid development and use of various health information systems and networks raises the issue of semantic interoperability between heterogeneous health informatics applications. Semantic interoperability may be characterized as the capability of different software systems to share information and to have that information properly interpreted by the receiving system in the same sense as intended by the creators or maintainers of the transmitting system. It involves:  the processing of the shared information in all systems so that it is consistent with the intended meaning of this information;  the encoding of queries and presentation of information so that it conforms to the intended meaning regardless of the source of information. Standardization and utilization of semantic technologies are indicated as most effective instruments for providing and maintaining interoperability in information systems and especially in healthcare information systems. Semantic technologies or Semantic Web technologies such as Linked Data, Resource Description Framework (RDF/RDFS), SPARQL and different kinds of domain ontologies are increasingly being used in the health informatics community to respond to the knowledge integration and semantic interoperability needs [1]. In particular, semantic repositories can be used to achieve various goals, such as</p>
      </abstract>
      <kwd-group>
        <kwd>eHealth</kwd>
        <kwd>semantic data models</kwd>
        <kwd>semantic interoperability of health data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>ability of managing large volumes of heterogeneous data, significant analytical
power that is based on abilities of interlinking long-chain evidences, effective
data interoperability.</p>
      <p>
        According to [
        <xref ref-type="bibr" rid="ref2 ref9">2</xref>
        ], “a repository (also referred to as a data repository or
digital data repository) is a searchable and queryable interfacing entity that is
able to store, manage, maintain, and curate data/digital objects”. A repository is
a managed location where digital data objects are registered, permanently stored,
made accessible and retrievable, and curated [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Repositories preserve, manage,
and provide access to many types of digital resources, available in a variety of
formats. Resources in online data repositories are curated to provide their search,
discovery, and reuse.
      </p>
      <p>This paper aims to present some results achieved in the development of a
pilot version of semantic repository for clinical data received in formats used by
the information system of the Bulgarian National Health Insurance Fund (NHIF).
2</p>
      <p>Key features of a semantic repository for clinical data
Semantic repositories are software systems similar to database management
systems (DBMSs). They allow the storage, retrieval and management of structured
or semi-structured data. The main differences between semantic repositories and
traditional DBMSs can be summarized as follows:
 the data in the semantic repositories are presented in RDF format and ontologies are
used as data schemas that allow for automatic analysis and formal reasoning;
 the physical data model is flexible and schema independent, which simplifies the
integration of additional knowledge and schemas, in particular the integration of
relevant subject ontologies.</p>
      <p>The minimum functionality of a software system that plays the role of a
semantic repository of clinical data should include:
 reading XML documents, validating them and presenting them as RDF documents
in different formats;
 loading RDF documents into the semantic repository;
 retrieving particular documents, specified by flexible queries;
 providing an interface that allows defining and executing SPARQL queries to the
system;
 adding a single document or a set of documents to the semantic repository. The
addition of documents should be implemented in compliance with the requirement
of sustainability over time, and assistance that unplanned interruption of the system
operation should not cause loss of information;
 editing existing documents and loading their modified versions to the repository;
 retrieving the entire document set so that it can be easily integrated with different
external systems using various RDF formats.</p>
      <p>In addition to meeting these general requirements, a semantic repository of
clinical data oriented to real-world practical applications should also provide
 maintaining an interface to the corresponding core information system while
maintaining the ability to handle documents in real time;
 identifying ways to connect the system to other (external) systems, in particular
with appropriate graph or relational databases;
 developing additional systems to provide means of visualizing the documents used
by the core system.</p>
      <p>Fully to exploit the capabilities of a semantic repository, and in particular
a semantic repository for electronic patient records, it should include multiple
domain ontologies of various types. First, appropriate subject ontologies
describing socially significant diseases, their diagnosis, planning of appropriate
therapy, etc. should be selected or in some cases developed especially for
this purpose. The inclusion of such ontologies enables the development of
semantically interoperable information systems, decision support systems, data
mining systems, and other types of intelligent software systems in the healthcare
domain.
3</p>
    </sec>
    <sec id="sec-2">
      <title>System design</title>
      <p>As a part of the work on the more general task of defining and analyzing key
requirements for clinical data processing and data exchange systems, a project
has been developed and a pilot version 0of a semantic repository for outpatient
sheets has been implemented, which covers all functionalities mentioned above.
The project is oriented towards solving the following specific tasks:
 providing a convenient application interface, enabling opportunities for working
with documents in real time;
 determining a technology of connecting the system with other (external) systems –
graph or relational databases;
 development of an additional system that can provide means for visualization of the
documents with which the main system works.</p>
      <p>The structure of the repository management system of is shown in Fig. 1. It
consists of a number of packages:
 The Commands package is the main entry point for working with the system. It
implements the commands that the system can execute from the command line.
 The Converter package implements a library that takes care of working with files in
XML, XSD and RDF formats, defining methods for transformations between XML
files and sets of RDF triples. The library also supports operations that can retrieve
subsets of triples of an RDF graph. The structures in this package are used by both
the Commands and the Server package.
 The Server package contains two main parts: an implementation of a REST API
server, which defines the operations that can be performed with the content of the
semantic repository, and a number of static files, which are Swagger documentation
of the API server. In this way, a convenient user interface is supported to manipulate
the stored resources.
 The Settings package contains a description of the system settings.
 The Data Store object is a folder with the data files stored by the system.</p>
      <p>To provide the functionalities of the system, the following workflows are
implemented in a dynamic way:
 “XML” → “python dictionary” – converts XML messages to serialized python
format in the form of dictionaries;
 “python dictionary” → “RDF graph” – converts python dictionaries to RDF graphs;
 “RDF graph” → “RDF subgraph” – constructs an RDF граф according to a set of
criteria.</p>
      <p>The auxiliary intermediate python dictionaries have the following internal
representation:</p>
      <p>Dict
(
)
tag: element tag,
value: element value,
children: list of children-elements,
attributes: list of attributes</p>
      <p>They are designed to implement the main functionalities of the system in the
form of consecutive steps, passing through the selected intermediate serialized
python format.</p>
      <p>For example, for the purpose of document transformation in the “XML“
→ “python dictionary“ direction, the XmlParser class is used, which has the
following methods:
__init__(self, schema_path)
parse(self, element)
build_node(self, data)
build(self, data)
validate(self, content)
validate_file(self, path)
convert(self, value)
convert_file(self, path)
get_value(data, path)
constructor
Reads an XML element in a dictionary structure
Builds an XML element as a dictionary structure
Builds an XML string from a dictionary structure
Validates an XML string
Validates the content of an XML file
Converts an XML string to a dictionary structure
Converts an XML file to a dictionary structure</p>
      <p>Gets the value of a dictionary element
and for the transformation of a document in the direction “python dictionary”
→ “RDF graph”, the RdfBuilder class is applicable with the following methods:
__init__(self, namespace, identifiers)
get_child_value(name, children)
parse_uri(self, uri)
get_children_signature(item)
get_node(self, item, parent)
parse(self, item, parent, graph)
parse(self, data, parent, graph)
constructor
Gets the value of a list element given by
its name
Converts a URI to node name and
identifier
Gets the cryptographic signature of the
adjacent elements of the node
Gets the URIRef of a graph node from its
own dictionary and the parent structures
of the dictionary
Converts the elements at a given level in a
graph and adds URIRefs to the graph
Converts a dictionary structure and loads it
into an RDF graph</p>
      <p>Thus, the transformation in the “XML“ → “RDF graph” direction is
performed in two steps using the XmlParser and RdfBuilder classes, respectively.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Data flow and document representation</title>
      <p>
        The input data with which the system
works are documents - anonymized
monthly reports to the NHIF of doctors (general practitioners and specialists)
for their outpatient activities. They are automatically transformed into the XML
format of the NHIF for an outpatient list [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and its equivalent RDF graphs in
various serialized RDF formats [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]: N-Triple, N3, RDF / XML. In this way, the
repository, and in particular the data from the outpatient sheets in it can be used
for creating different types of semantically compatible healthcare information
systems.
      </p>
      <p>The overall data flow in the system is shown in Fig. 2.</p>
      <p>The generated RDF graphs are stored as files that describe an individual
doctor as a “central point” and present multiple reports and outpatient sheets that
are associated with this doctor (Fig. 3).</p>
      <p>
        Means for defining and executing the main types of SPARQL queries [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
(SELECT, CONSTRUCT, ASK, DESCRIBE) to the contents of the repository are
supported. Fig. 4 shows the results of the execution of the following exemplary
CONSTRUCT query:
      </p>
      <p>CONSTRUCT
{</p>
      <p>?amblist &lt;http://amblist.com/hasDiagnose&gt; ?diag
}
WHERE
{
{
}
?amblist &lt;http://amblist.com/AmbListMainDiag&gt; ?diag.
?diag &lt;http://amblist.com/MKB&gt; ?code.
filter regex(?code, „^E03.*“, „i“)
}</p>
      <p>This query is intended to construct the set of available RDF triples, which
integrates the outpatient sheets containing a primary or secondary diagnosis with
a code beginning with E03.
This paper discusses the concept and the design of a semantic repository for
clinical data of patients, obtained from periodically received documents – monthly
reports of doctors, providers of outpatient care. The presented pilot version of the
semantic repository is ready for use. It can be integrated into various types of
intelligent software systems, such as:
 information systems in the field of healthcare;
 software systems for data analysis and knowledge discovery in medical
research data;
 healthcare decision support systems.</p>
      <p>
        One of the next goals of our study is the development of an intelligent
decision support system. It is designed to assist physicians in diagnosing and
recommending treatment and diet plans for patients ill or prone to type 2 diabetes
– a socially significant disease that affects a high percentage of the world’s
population (8.5% in the adult population according to World Health Organization
2016 data [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]). The analysis of the information about freely available medical
ontologies indicates that among the most appropriate ontologies for the purposes
of this study is the combination of DDO and DMTO [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. DDO1 is an ontology
1 https://bioportal.bioontology.org/ontologies/DDO
for diagnosis of diabetes including knowledge about symptoms, lab tests, drugs,
complications, etc. DMTO2 is an ontology for creating customized treatment
plans for type 2 diabetic patients. DMTO extends the DDO ontology by adding
treatment classes and axioms to the existing diagnosis part. For this purpose, it
is planned to develop a software module designed to read anonymized patient
data from the repository and automatically create on their basis instances of
appropriate classes of DMTO.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgement</title>
      <p>The research presented in this paper is supported by the National Scientific
Program “еHealth” in Bulgaria.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Merrill</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corlosquet</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciccarese</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Semantic Web Repositories for Genomics Data Using the eXframe Platform</article-title>
          .
          <source>Journal of Biomedical Semantics</source>
          <year>2014</year>
          5(
          <issue>Suppl 1</issue>
          ):S3, doi:10.1186/2041-1480-5-S1-S3.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Austin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bloom</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dallmeier-Tiessen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          et al.:
          <article-title>Key components of data publishing: using current best practices to develop a reference model for data publishing</article-title>
          .
          <source>International Journal on Digital Libraries</source>
          <volume>8</volume>
          ,
          <fpage>77</fpage>
          -
          <lpage>92</lpage>
          (
          <year>2017</year>
          ). https://doi.org/10.1007/s00799-016-0178-2.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Berg-Cross</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ritz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wittenburg</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>RDA Data Foundation and Terminology - DFT: Results RFC (Version 1.5)</article-title>
          . RDA DFT Working Group,
          <year>2015</year>
          , https://www.rd-alliance.org/sites/default/ files/DFT%20Core%
          <fpage>20Terms</fpage>
          -and
          <source>%20model-v1-6.pdf (visited on May 20</source>
          ,
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <article-title>Първични медицински документи</article-title>
          . https://www.nhif.bg/get_file?uuid=
          <fpage>3f8425ef</fpage>
          -5a98
          <string-name>
            <surname>-</surname>
          </string-name>
          469eac92-def5871cac37
          <source>(visited on May 20</source>
          ,
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>RDF</given-names>
            <surname>Serialization</surname>
          </string-name>
          <article-title>Formats</article-title>
          . https://help.poolparty.biz/pp6/developer-guide/
          <article-title>general-information-on-the-poolparty-api/rdf-serialization-formats (visited on May 20,</article-title>
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seaborne</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>SPARQL 1.1 Query Language</article-title>
          ,
          <source>W3C Recommendation 21 March</source>
          <year>2013</year>
          . W3C,
          <year>2013</year>
          , https://www.w3.org/TR/sparql11-query/ (visited on May 20,
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. World Health Organization:
          <source>Global Report on Diabetes. WHO Library Cataloguing-in-Publication Data</source>
          (
          <year>2016</year>
          ),
          <source>ISSN 978 92</source>
          <volume>4</volume>
          <fpage>156525</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>El-Sappagh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kwak</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kwak</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>DMTO: a realistic ontology for standard diabetes mellitus treatment</article-title>
          .
          <source>Journal of Biomedical Semantics</source>
          <volume>9</volume>
          (
          <year>2018</year>
          ), https://doi.org/10.1186/s13326- 018-0176-y.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>2 https://bioportal.bioontology.org/ontologies/DMTO</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>