<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Building a Semantic Repository for Outpatient Sheets</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Maria</forename><surname>Nisheva</surname></persName>
							<email>marian@fmi.uni-sofia.bg</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Informatics</orgName>
								<orgName type="institution">Sofi a University St. Kliment Ohridski</orgName>
								<address>
									<addrLine>5 James Bourchier Blvd</addrLine>
									<postCode>1164</postCode>
									<settlement>Sofi a</settlement>
									<country key="BG">Bulgaria</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Institute of Mathematics and Informatics</orgName>
								<orgName type="institution">Bulgarian Academy of Sciences</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Hristo</forename><surname>Georgiev</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Informatics</orgName>
								<orgName type="institution">Sofi a University St. Kliment Ohridski</orgName>
								<address>
									<addrLine>5 James Bourchier Blvd</addrLine>
									<postCode>1164</postCode>
									<settlement>Sofi a</settlement>
									<country key="BG">Bulgaria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pavel</forename><surname>Pavlov</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Informatics</orgName>
								<orgName type="institution">Sofi a University St. Kliment Ohridski</orgName>
								<address>
									<addrLine>5 James Bourchier Blvd</addrLine>
									<postCode>1164</postCode>
									<settlement>Sofi a</settlement>
									<country key="BG">Bulgaria</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Building a Semantic Repository for Outpatient Sheets</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">84EAD0B4535BEC5A8AD3AF7794EB3D54</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T06:58+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>eHealth</term>
					<term>semantic data models</term>
					<term>semantic interoperability of health data</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The paper analyzes some issues in the area of semantic interoperability o f health information systems and in particular, the issues related to the provision of semantic interoperability of health data. The design principles and some implementation details of a semantic repository for outpatient sheets are discussed in the context of the suggested ideas.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The rapid development and use of various health information systems and networks raises the issue of semantic interoperability between heterogeneous health informatics applications. Semantic interoperability may be characterized as the capability of different software systems to share information and to have that information properly interpreted by the receiving system in the same sense as intended by the creators or maintainers of the transmitting system. It involves:</p><p> the processing of the shared information in all systems so that it is consistent with the intended meaning of this information;  the encoding of queries and presentation of information so that it conforms to the intended meaning regardless of the source of information. Standardization and utilization of semantic technologies are indicated as most effective instruments for providing and maintaining interoperability in information systems and especially in healthcare information systems.</p><p>Semantic technologies or Semantic Web technologies such as Linked Data, Resource Description Framework (RDF/RDFS), SPARQL and different kinds of domain ontologies are increasingly being used in the health informatics community to respond to the knowledge integration and semantic interoperability needs <ref type="bibr" target="#b0">[1]</ref>. In particular, semantic repositories can be used to achieve various goals, such as ability of managing large volumes of heterogeneous data, signifi cant analytical power that is based on abilities of interlinking long-chain evidences, effective data interoperability.</p><p>According to <ref type="bibr" target="#b1">[2]</ref>, "a repository (also referred to as a data repository or digital data repository) is a searchable and queryable interfacing entity that is able to store, manage, maintain, and curate data/digital objects". A repository is a managed location where digital data objects are registered, permanently stored, made accessible and retrievable, and curated <ref type="bibr" target="#b2">[3]</ref>. Repositories preserve, manage, and provide access to many types of digital resources, available in a variety of formats. Resources in online data repositories are curated to provide their search, discovery, and reuse.</p><p>This paper aims to present some results achieved in the development of a pilot version of semantic repository for clinical data received in formats used by the information system of the Bulgarian National Health Insurance Fund (NHIF).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Key features of a semantic repository for clinical data</head><p>Semantic repositories are software systems similar to database management systems (DBMSs). They allow the storage, retrieval and management of structured or semi-structured data. The main differences between semantic repositories and traditional DBMSs can be summarized as follows:</p><p> the data in the semantic repositories are presented in RDF format and ontologies are used as data schemas that allow for automatic analysis and formal reasoning;  the physical data model is fl exible and schema independent, which simplifi es the integration of additional knowledge and schemas, in particular the integration of relevant subject ontologies.</p><p>The minimum functionality of a software system that plays the role of a semantic repository of clinical data should include:</p><p> reading XML documents, validating them and presenting them as RDF documents in different formats;  loading RDF documents into the semantic repository;  retrieving particular documents, specifi ed by fl exible queries;  providing an interface that allows defi ning and executing SPARQL queries to the system;  adding a single document or a set of documents to the semantic repository. The addition of documents should be implemented in compliance with the requirement of sustainability over time, and assistance that unplanned interruption of the system operation should not cause loss of information;  editing existing documents and loading their modifi ed versions to the repository;  retrieving the entire document set so that it can be easily integrated with different external systems using various RDF formats.</p><p>In addition to meeting these general requirements, a semantic repository of clinical data oriented to real-world practical applications should also provide  maintaining an interface to the corresponding core information system while maintaining the ability to handle documents in real time;  identifying ways to connect the system to other (external) systems, in particular with appropriate graph or relational databases;  developing additional systems to provide means of visualizing the documents used by the core system.</p><p>Fully to exploit the capabilities of a semantic repository, and in particular a semantic repository for electronic patient records, it should include multiple domain ontologies of various types. First, appropriate subject ontologies describing socially signifi cant diseases, their diagnosis, planning of appropriate therapy, etc. should be selected or in some cases developed especially for this purpose. The inclusion of such ontologies enables the development of semantically interoperable information systems, decision support systems, data mining systems, and other types of intelligent software systems in the healthcare domain.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">System design</head><p>As a part of the work on the more general task of defi ning and analyzing key requirements for clinical data processing and data exchange systems, a project has been developed and a pilot version 0of a semantic repository for outpatient sheets has been implemented, which covers all functionalities mentioned above. The project is oriented towards solving the following specifi c tasks:</p><p> providing a convenient application interface, enabling opportunities for working with documents in real time;  determining a technology of connecting the system with other (external) systemsgraph or relational databases;  development of an additional system that can provide means for visualization of the documents with which the main system works.</p><p>The structure of the repository management system of is shown in Fig. <ref type="figure" target="#fig_0">1</ref>. It consists of a number of packages:</p><p> The Commands package is the main entry point for working with the system. It implements the commands that the system can execute from the command line.  The Converter package implements a library that takes care of working with fi les in XML, XSD and RDF formats, defi ning methods for transformations between XML fi les and sets of RDF triples. The library also supports operations that can retrieve subsets of triples of an RDF graph. The structures in this package are used by both the Commands and the Server package.  The Server package contains two main parts: an implementation of a REST API server, which defi nes the operations that can be performed with the content of the semantic repository, and a number of static fi les, which are Swagger documentation of the API server. In this way, a convenient user interface is supported to manipulate the stored resources.</p><p> The Settings package contains a description of the system settings.  The Data Store object is a folder with the data fi les stored by the system.</p><p>To provide the functionalities of the system, the following workfl ows are implemented in a dynamic way:  "XML" → "python dictionary" -converts XML messages to serialized python format in the form of dictionaries;  "python dictionary" → "RDF graph" -converts python dictionaries to RDF graphs;  "RDF graph" → "RDF subgraph" -constructs an RDF граф according to a set of criteria. They are designed to implement the main functionalities of the system in the form of consecutive steps, passing through the selected intermediate serialized python format.</p><p>For example, for the purpose of document transformation in the "XML" → "python dictionary" direction, the XmlParser class is used, which has the following methods: constructor Reads an XML element in a dictionary structure Builds an XML element as a dictionary structure Builds an XML string from a dictionary structure Validates an XML string Validates the content of an XML fi le Converts an XML string to a dictionary structure Converts an XML fi le to a dictionary structure Gets the value of a dictionary element and for the transformation of a document in the direction "python dictionary" → "RDF graph", the RdfBuilder class is applicable with the following methods:</p><formula xml:id="formula_0">__init__(self, namespace, identifi ers) get_child_value(name, children) parse_uri(self, uri) get_children_signature(item) get_node(self, item, parent) parse(self, item, parent, graph) parse(self, data, parent, graph)</formula><p>constructor Gets the value of a list element given by its name Converts a URI to node name and identifi er Gets the cryptographic signature of the adjacent elements of the node Gets the URIRef of a graph node from its own dictionary and the parent structures of the dictionary Converts the elements at a given level in a graph and adds URIRefs to the graph Converts a dictionary structure and loads it into an RDF graph Thus, the transformation in the "XML" → "RDF graph" direction is performed in two steps using the XmlParser and RdfBuilder classes, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Data fl ow and document representation</head><p>The input data with which the system works are documents -anonymized monthly reports to the NHIF of doctors (general practitioners and specialists) for their outpatient activities. They are automatically transformed into the XML format of the NHIF for an outpatient list <ref type="bibr" target="#b3">[4]</ref> and its equivalent RDF graphs in various serialized RDF formats <ref type="bibr" target="#b4">[5]</ref>: N-Triple, N3, RDF / XML. In this way, the repository, and in particular the data from the outpatient sheets in it can be used for creating different types of semantically compatible healthcare information systems.</p><p>The overall data fl ow in the system is shown in Fig. <ref type="figure" target="#fig_1">2</ref>. The generated RDF graphs are stored as fi les that describe an individual doctor as a "central point" and present multiple reports and outpatient sheets that are associated with this doctor (Fig. <ref type="figure" target="#fig_2">3</ref>). Means for defi ning and executing the main types of SPARQL queries <ref type="bibr" target="#b5">[6]</ref> (SELECT, CONSTRUCT, ASK, DESCRIBE) to the contents of the repository are supported. Fig. <ref type="figure" target="#fig_4">4</ref> shows the results of the execution of the following exemplary CONSTRUCT query: This query is intended to construct the set of available RDF triples, which integrates the outpatient sheets containing a primary or secondary diagnosis with a code beginning with E03. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion</head><p>This paper discusses the concept and the design of a semantic repository for clinical data of patients, obtained from periodically received documents -monthly reports of doctors, providers of outpatient care. The presented pilot version of the semantic repository is ready for use. It can be integrated into various types of intelligent software systems, such as:</p><p> information systems in the fi eld of healthcare;  software systems for data analysis and knowledge discovery in medical research data;  healthcare decision support systems.</p><p>One of the next goals of our study is the development of an intelligent decision support system. It is designed to assist physicians in diagnosing and recommending treatment and diet plans for patients ill or prone to type 2 diabetes -a socially signifi cant disease that affects a high percentage of the world's population (8.5% in the adult population according to World Health Organization 2016 data <ref type="bibr" target="#b6">[7]</ref>). The analysis of the information about freely available medical ontologies indicates that among the most appropriate ontologies for the purposes of this study is the combination of DDO and DMTO <ref type="bibr" target="#b7">[8]</ref>. DDO 1 is an ontology for diagnosis of diabetes including knowledge about symptoms, lab tests, drugs, complications, etc. DMTO<ref type="foot" target="#foot_1">2</ref> is an ontology for creating customized treatment plans for type 2 diabetic patients. DMTO extends the DDO ontology by adding treatment classes and axioms to the existing diagnosis part. For this purpose, it is planned to develop a software module designed to read anonymized patient data from the repository and automatically create on their basis instances of appropriate classes of DMTO.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Structure of the software implementation of the repository.</figDesc><graphic coords="4,94.06,187.88,268.36,175.72" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Data fl ow in the system.</figDesc><graphic coords="6,83.35,56.69,289.72,92.38" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Doctor-centered structure of the RDF graphs generated by the system.</figDesc><graphic coords="6,193.24,249.29,69.88,131.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head></head><label></label><figDesc>://amblist.com/AmbListMainDiag&gt; ?diag.?diag &lt;http://amblist.com/MKB&gt; ?code.fi lter regex(?code, "^E03.*", "i") } UNION { ?amblist &lt;http://amblist.com/AmbListDiag&gt; ?diag.?diag &lt;http://amblist.com/MKB&gt; ?code.fi lter regex(?code, "^E03.*", "i") } }</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 4 .</head><label>4</label><figDesc>Fig.4. Result of executing a CONSTRUCT query.</figDesc><graphic coords="7,63.55,179.87,352.12,108.04" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://bioportal.bioontology.org/ontologies/DDO</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://bioportal.bioontology.org/ontologies/DMTO</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgement</head><p>The research presented in this paper is supported by the National Scientifi c Program "еHealth" in Bulgaria.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Semantic Web Repositories for Genomics Data Using the eXframe Platform</title>
		<author>
			<persName><forename type="first">E</forename><surname>Merrill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Corlosquet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ciccarese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Das</surname></persName>
		</author>
		<idno type="DOI">10.1186/2041-1480-5-S1-S3</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Biomedical Semantics</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note>Suppl</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Key components of data publishing: using current best practices to develop a reference model for data publishing</title>
		<author>
			<persName><forename type="first">C</forename><surname>Austin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Bloom</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dallmeier-Tiessen</surname></persName>
		</author>
		<idno type="DOI">10.1007/s00799-016-0178-2</idno>
		<ptr target="https://doi.org/10.1007/s00799-016-0178-2" />
	</analytic>
	<monogr>
		<title level="j">International Journal on Digital Libraries</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="77" to="92" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Berg-Cross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ritz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wittenburg</surname></persName>
		</author>
		<ptr target="https://www.rd-alliance.org/sites/default/files/DFT%20Core%20Terms-and%20model-v1-6.pdf" />
		<title level="m">RDA Data Foundation and Terminology -DFT: Results RFC (Version 1.5</title>
				<imprint>
			<date type="published" when="2015-05-20">2015. May 20, 2020</date>
		</imprint>
	</monogr>
	<note>RDA DFT Working Group</note>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<ptr target="https://www.nhif.bg/get_file?uuid=3f8425ef-5a98-469e-ac92-def5871cac37" />
		<title level="m">Първични медицински документи</title>
				<imprint>
			<date type="published" when="2020-05-20">May 20, 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<ptr target="https://help.poolparty.biz/pp6/developer-guide/general-informa-tion-on-the-poolparty-api/rdf-serialization-formats" />
		<title level="m">RDF Serialization Formats</title>
				<imprint>
			<date type="published" when="2020-05-20">May 20, 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">SPARQL 1.1 Query Language, W3C Recommendation 21</title>
		<author>
			<persName><forename type="first">S</forename><surname>Harris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Seaborne</surname></persName>
		</author>
		<ptr target="https://www.w3.org/TR/sparql11-query/" />
		<imprint>
			<date type="published" when="2013-03">March 2013. 2013. May 20, 2020</date>
			<pubPlace>W3C</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">World Health Organization: Global Report on Diabetes</title>
	</analytic>
	<monogr>
		<title level="m">WHO Library Cataloguing-in-Publication Data</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="volume">978</biblScope>
			<biblScope unit="page" from="92" to="94" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">DMTO: a realistic ontology for standard diabetes mellitus treatment</title>
		<author>
			<persName><forename type="first">S</forename><surname>El-Sappagh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kwak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kwak</surname></persName>
		</author>
		<idno type="DOI">10.1186/s13326-018-0176-y</idno>
		<ptr target="https://doi.org/10.1186/s13326-018-0176-y" />
	</analytic>
	<monogr>
		<title level="j">Journal of Biomedical Semantics</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
