<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Integrating Heterogeneous Gene Expression Data through Knowledge Graphs for Improving Diabetes Prediction</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Rita</forename><forename type="middle">T</forename><surname>Sousa</surname></persName>
							<email>rita.sousa@uni-mannheim.de</email>
							<affiliation key="aff0">
								<orgName type="department">Data and Web Science Group</orgName>
								<orgName type="institution">Universität Mannheim</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Heiko</forename><surname>Paulheim</surname></persName>
							<email>heiko.paulheim@uni-mannheim.de</email>
							<affiliation key="aff0">
								<orgName type="department">Data and Web Science Group</orgName>
								<orgName type="institution">Universität Mannheim</orgName>
								<address>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Integrating Heterogeneous Gene Expression Data through Knowledge Graphs for Improving Diabetes Prediction</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">58FA3B85C2BAFF6B6D21E4C3AAAAF88C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:50+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Diabetes Prediction</term>
					<term>Expression data</term>
					<term>Knowledge Graph</term>
					<term>Ontology</term>
					<term>Knowledge Graph Embedding</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Diabetes is a worldwide health issue affecting millions of people. Machine learning methods have shown promising results in improving diabetes prediction, particularly through the analysis of diverse data types, namely gene expression data. While gene expression data can provide valuable insights, challenges arise from the fact that the sample sizes in expression datasets are usually limited, and the data from different datasets with different gene expressions cannot be easily combined.</p><p>This work proposes a novel approach to address these challenges by integrating multiple gene expression datasets and domain-specific knowledge using knowledge graphs, a unique tool for biomedical data integration. KG embedding methods are then employed to generate vector representations, serving as inputs for a classifier. Experiments demonstrated the efficacy of our approach, revealing improvements in diabetes prediction when integrating multiple gene expression datasets and domain-specific knowledge about protein functions and interactions.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Diabetes is a chronic health condition resulting from insufficient insulin production by the pancreas or the body's inability to utilize the insulin it generates effectively <ref type="bibr" target="#b0">[1]</ref>. This disease has emerged as a worldwide health issue, impacting millions of people globally. According to the World Health Organization, in 2019, diabetes directly contributed to 1.5 million deaths, with 48% occurring before the age of 70. Besides that, this chronic disease is associated with the development of several comorbidities, such as blindness, kidney failure, heart attacks, strokes, and lower limb amputation.</p><p>Due to the multidisciplinary nature of diabetes, predicting and detecting this complex disease continues to pose a significant challenge. In the last decades, some approaches have demonstrated encouraging outcomes using machine learning methods to identify patterns and potential risk factors linked to diabetes, allowing not only the early detection of diabetes but also enabling tailored interventions <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5]</ref>. These machine learning approaches encompass several types of data, including electronic health records <ref type="bibr" target="#b5">[6]</ref>, imaging data <ref type="bibr" target="#b6">[7]</ref>, and demographic data <ref type="bibr" target="#b7">[8]</ref>. Omics data, namely gene expression datasets, have also received attention since genomics, epigenomics, and transcriptomics can help understand the critical pathways and regulatory mechanisms in diabetes <ref type="bibr" target="#b8">[9]</ref>.</p><p>While gene expression datasets are readily accessible in public databases, and gene expression analysis is a powerful tool for pinpointing genes associated with diseases, particularly in the context of diabetes prediction, a significant issue arises in handling this type of data. On the one hand, gene expression datasets often exhibit a limitation in sample size, with a relatively small number of included samples. Conversely, supervised machine learning methods are data-driven, relying on a large number of labeled data for effective training and performance. One alternative involves combining multiple expression datasets to increase the sample pool for training machine learning models. However, this brings us to the challenge of how to integrate the information about multiple expression datasets, as each dataset may measure gene expression across distinct genes. Additionally, variations in experimental platforms and designs across different studies further complicate integration efforts. Knowledge graphs (KGs) present a unique and promising solution. KGs can represent knowledge about concepts and relationships in a fully machine-readable format <ref type="bibr" target="#b9">[10]</ref>. Moreover, several biomedical ontologies are publicly available to enrich KGs <ref type="bibr" target="#b10">[11]</ref>, enabling the representation of domain-specific knowledge. In fact, over the past few years, biomedical ontologies and KGs have emerged as a tool for biomedical data integration and have been adopted in many machine learning applications, with KG embedding approaches <ref type="bibr" target="#b11">[12]</ref> becoming increasingly popular <ref type="bibr" target="#b12">[13]</ref>.</p><p>This work tackles the challenge of integrating heterogeneous gene expression datasets in biomedical applications, focusing on diabetes prediction. We propose a novel approach that generates a KG to incorporate both gene expression data and domain-specific knowledge and then employs KG embedding methods to generate vector representations of patients. These patient representations serve as the input for a classifier to predict the likelihood of a patient having diabetes. We conducted an evaluation of the impact of integrating multiple gene expression datasets, which showed that incorporating other expression datasets and domainspecific knowledge improves diabetes prediction, emphasizing the efficacy of our approach. This work is developed in the context of the KI-DiabetesDetektion project, funded by the German Federal Ministry of Education and Research, that aims to integrate biomedical data from various sources and apply machine learning methods to improve the early-stage detection of Diabetes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Several works have been using gene expression data to predict diabetes, employing diverse methodologies and datasets. In Li et al. <ref type="bibr" target="#b13">[14]</ref>, a support vector machine classifier is used for the diagnosis of diabetes. While multiple datasets were extracted from the Gene Expression Omnibus database, the machine learning model was trained on only one dataset, with three additional datasets used for validation. Feature selection involved the identification of ten common genes across all datasets. Mansoori et al. <ref type="bibr" target="#b14">[15]</ref> and Kazerouni et al. <ref type="bibr" target="#b15">[16]</ref> focus on long non-coding RNAs potentially associated with diabetes type 2. Both studies incorporated data collected from 100 diabetic and 100 non-diabetic to train the classifiers. Mansoori et al. <ref type="bibr" target="#b14">[15]</ref> employed logistic regression, whereas Kazerouni et al. <ref type="bibr" target="#b15">[16]</ref> compare four classifiers (𝐾-nearest neighbor, support vector machine, logistic regression, and artificial neural networks) to predict diabetes type 2 using the expression values for specific long non-coding RNAs as input. Both studies suggest that increasing the dataset with a larger number of samples would likely improve the performance of the classifiers. Furthermore, some other approaches explore expression data for diabetes prediction without employing machine learning methods <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b8">9]</ref>.</p><p>In the biomedical domain, the exploration of KGs has become increasingly prominent, with KG embedding methods emerging as particularly promising for capturing KG-based information <ref type="bibr" target="#b18">[19]</ref>. These methods map entities and relationships in a KG into a lower-dimensional vector space while preserving graph structure and, in some cases, semantic information. Various types of KG embedding methods have been proposed to date. Translational models, exemplified by TransE <ref type="bibr" target="#b19">[20]</ref>, employ distance-based scoring functions to capture relationships between entities. On the other hand, semantic matching approaches, such as distMult <ref type="bibr" target="#b20">[21]</ref>, use similarity-based scoring functions to capture the latent semantics of entities and relations in their vector space representations. Walk-based methods, such as RDF2Vec <ref type="bibr" target="#b21">[22]</ref>, employ random walks to generate entity sequences as input to a neural language model that learns latent entity representations. Different walk-based approaches differ in their strategies for random walks and consideration of edge direction and type. In the context of biomedical KGs, characterized by rich hierarchical relations, walk-based approaches emerge as particularly well-suited, considering that these hierarchical relations can be more easily captured in walks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology</head><p>As discussed above, gene expression datasets typically only have few instances, and different datasets record different gene expressions. Thus, when training prediction models, one can either (1) use only one dataset, thereby having only little training data, or (2) try to combine multiple datasets. In the latter case, those are typically "incompatible" in the sense that they have different feature sets, i.e., a naive combination would lead to a larger dataset with lots of NULL values.</p><p>To overcome these challenges, we propose a methodology to integrate multiple expression datasets into a biomedical KG and then use it for diabetes prediction. Figure <ref type="figure" target="#fig_0">1</ref> shows an overview of this methodology. The first step corresponds to building the KG that integrates not only expression data from different datasets but also domain knowledge on protein function and protein interactions. Then, we generate a vector representation for each patient described in the biomedical KG. The last step involves giving the vectors as input for a classifier. The source code for our methodology is available on GitHub (https://github.com/ritatsousa/expressionKG).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Expression Data</head><p>Several studies have recently explored gene expression for diabetic and non-diabetic individuals, and the findings from these studies can be accessed in publicly available databases. The Gene Expression Omnibus (GEO) <ref type="bibr" target="#b22">[23]</ref> is a public database maintained by the National Center for Biotechnology Information that archives high-throughput gene expression and other genomics datasets. Each GEO dataset represents a curated collection of biologically comparable GEO samples whose measurements are assumed to be calculated equivalently. The file associated with each dataset contains the raw gene expression data generated by microarrays. In addition to raw data, processed files containing normalized or transformed expression values may be included. In the latter scenario, the data is structured in a tabular format, with each row corresponding to a unique sample, columns representing different genes, and the cells containing specific expression values of those genes for each respective sample.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Building the Knowledge Graph</head><p>The KG is built by integrating two types of data sources: expression data and domain-specific knowledge. Figure <ref type="figure" target="#fig_1">2</ref> illustrates the integration of the two data sources into a KG.</p><p>Since our approach relies on KG graph embeddings for generating patient representations and most embedding approaches are not able to handle numeric literals <ref type="bibr" target="#b23">[24]</ref>, we adopt two different strategies to include the expression data in the KG:</p><p>• The first strategy involves representing patient gene expression values in a KG using blank nodes and binning approaches. Following the technique proposed in <ref type="bibr" target="#b23">[24]</ref>, we create bins from the set of expression values for each gene within a given dataset. The percentage of unique values defines the number of bins. To implement this, a blank node is generated to represent the expression value attributed to a specific gene for a given patient. This establishes an association wherein a patient is connected to a blank node, which, in turn, is linked to a bin representing the expression value and the corresponding gene. Let us consider a simplified example using RDF:</p><p>(patientID, rdf:type, :Patient) (:geneID, rdf:type, :Gene) (:patientID, :hasExpression, _:x) (_:x, :isExpressionOfGene, :geneID) (_:x :hasValue :binID) where _:x denotes a blank node. • The second strategy employs a linking approach between patients and genes based on expression values. A link between a patient and a gene is created when the patient's expression value for that gene is higher than the calculated average expression value for the gene within the dataset.</p><p>The domain-specific knowledge includes the Gene Ontology (GO) <ref type="bibr" target="#b24">[25]</ref>, GO annotation data <ref type="bibr" target="#b25">[26]</ref>, and protein-protein interaction (PPI) data <ref type="bibr" target="#b26">[27]</ref>. The GO defines a hierarchy of classes that describe protein functions that can be represented as a graph where nodes are GO classes and edges define relationships between them. The GO encompasses three distinct domains for characterizing functions: the biological processes a protein is involved in, the molecular functions a protein performs, and the cellular components where a protein is located. These three domains of GO are represented as separate root ontology classes since they do not share any common ancestor. The GO annotation data refers to assigning functions represented as GO classes to proteins represented as links in the graph. Finally, the PPI data is extracted from STRING <ref type="bibr" target="#b26">[27]</ref>, one of the largest available PPI databases that integrates physical interactions and functional associations between proteins collected from several sources.</p><p>To bridge the gap between the two types of data sources, the expression data and the domain-specific knowledge, a gene in the expression data graph is mapped to a protein in the domain-specific KG. Online ID mapping tools, namely UniProt ID Mapping tool<ref type="foot" target="#foot_0">1</ref> , are used to convert identifiers between genes and proteins.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Learning Patient Representations</head><p>We propose to generate patient representations by leveraging the information of multiple gene expression datasets and domain knowledge. As a preliminary step, the KG is converted into a directed and labeled RDF graph, following the W3C's OWL to RDF Graph Mapping guidelines <ref type="foot" target="#foot_1">2</ref> . Next, our methodology employs RDF2Vec, a KG embedding method, to generate the low-dimensional vector representations. RDF2Vec <ref type="bibr" target="#b21">[22]</ref> is a path-based embedding method that generates random walks in a graph that take into consideration both edge direction and type, making it particularly suited to KGs. Word2vec, a language model, is then employed over random walks on the RDF graph to produce the embeddings.</p><p>Two distinct approaches are employed to represent patients: the first involves generating RDF2vec embeddings directly for the patients using the KG, while the second generates RDF2Vec embeddings for the genes present in gene expression datasets and represents patients as the weighted average of gene embeddings, determined by the respective gene expression values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Predicting Diabetes</head><p>Diabetes prediction is formulated as a binary classification task, where the goal is to categorize a set of patients based on whether they have diabetes or not. Therefore, in the final step, the patient representations are fed into a decision tree <ref type="bibr" target="#b27">[28]</ref> algorithm for training.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Evaluation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Data</head><p>Three diabetes-related GEO datasets (GSE15932, GSE30208, and GSE55098) are considered for this work (Table <ref type="table" target="#tab_0">1</ref>). These datasets comprise samples associated with two distinct groups: patients diagnosed with type 1 diabetes (T1D) and those serving as control subjects (non-T1D). The data from the three datasets are integrated into a KG described in Table <ref type="table" target="#tab_1">2</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Results and Discussion</head><p>To assess the efficacy of the proposed methodology, we analysed the diabetes performance on the GSE15932 dataset by enriching the training data with information from the GSE30208 and GSE55098 datasets. Since our approach involves integrating data from multiple expression datasets into a KG, we compare it against two baselines that employ the expression values of the patient directly as input for the classifier. The first baseline exclusively employs data from GSE15932 for training the classifier. The second baseline represents a more simplistic approach to adding information from other datasets. It involves merging all measured genes across datasets and setting the value to 0 when the patient does not have an expression value.</p><p>We employed a stratified cross-validation strategy to ensure robust evaluation, dividing the GSE30208 dataset into five folds. The same five folds were used throughout all experiments. The reported results represent the average performance over these five folds. Figure <ref type="figure" target="#fig_2">3</ref> illustrates the employed cross-validation strategy. Table <ref type="table">3</ref> shows the accuracy, precision, recall, f-measure, weighted average f-measure and the area under the ROC curve for the baselines and the proposed methodology. The second baseline results indicate that simplistically adding information from other datasets does not enhance performance. In fact, it appears to introduce noise to the classifier. This outcome is not unexpected, as the integration of information from diverse datasets is lacking, leading to an ineffective impact on overall performance. However, by integrating the information from other datasets in a KG, it becomes evident that training a model with diverse datasets improves the performance of machine learning models in all metrics, with the exception of precision. Therefore, it confirms our hypothesis that injecting other expression datasets can improve the performance of machine learning models.</p><p>However, there are performance variations between the different alternatives of our approach. For the integration of expression data into the KG, we explore the use of blank nodes and binning approaches versus a linking method based on expression values to link patients and Table <ref type="table">3</ref> Average diabetes prediction performance on the GSE30208 dataset for the baselines and the proposed methodology. Acc stands for accuracy, Pr stands for precision, Re stands for recall, F1 stands for fmeasure, WAF stands for weighted average f-measure, and AUC stands for area under the ROC curve. For each metric, the best value is in bold. genes. In generating patient representations, we employed two strategies: direct learning of embeddings for patients in the KG; or learning embeddings for genes and representing patients as the weighted average of gene embeddings. This last strategy is independent of the strategy employed to represent the expression data in the KG, so Table <ref type="table">3</ref> presents only three alternatives.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acc</head><p>Comparing the performance results of Table <ref type="table">3</ref>, the strategy involving the weighted average of gene embeddings for patient representation emerges as particularly promising because it consistently outperforms the other alternatives. Using links between patients and genes based on the expression values is the second-best strategy, and it still improves performance across several metrics compared to the baselines. Employing the binning approach achieves the worst results, performing worse for many metrics than the baseline. These results may be attributed to the inherent limitations of our path-based embedding method since genes and gene-expression values exist on separate paths. Since we are interested in investigating the impact of domain-specific knowledge on integrating data from different datasets, we evaluated the diabetes prediction performance using a KG that only contains gene expression data. Figure <ref type="figure" target="#fig_4">4</ref> illustrates the performance variations observed when employing a KG with domain knowledge alongside expression data, compared to utilizing a KG with expression data alone. The performance decreases when the domain knowledge is removed for both strategies of building the KG. This demonstrates that knowledge about protein functions and interactions can play an important role in integrating data from datasets measuring gene expression across different genes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>Several diabetes prediction approaches rely on the analysis of expression data, which provide a detailed molecular profile reflecting gene activity and regulation and therefore can uncover relationships between specific genes and the development of diabetes. However, exploring expression data in machine learning presents its own set of challenges. Existing expression datasets related to diabetes have a very low number of samples what can be a limitation for datadriven methods such as machine learning algorithms. Therefore, the integration of multiple  expression datasets can address the issue of limited samples and, at the same time, offer a comprehensive perspective on the complex factors influencing diabetes.</p><p>We have developed an approach that enables a comprehensive representation of gene expression data from different datasets within a KG. Through semantic links and domain-specific knowledge, KGs can create a unified knowledge space to connect datasets from distinct studies. In this work, we have explored different strategies to include the expression data in the KG and different strategies to represent the patients within the KG using KG embedding methods. The results of our experiments showed that integrating gene expression data in a KG is able to improve the performance of diabetes prediction.</p><p>The proposed approach is versatile and can be extended to the prediction of other diseases. In addition, since graph neural networks has gained substantial traction recently, as future work, we aim to investigate how can these architectures explicitly designed for graph structures can be used rather than the conventional process of generating embeddings and given them as input for classical machine learning methods such as decision trees.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Overview of the proposed methodology with the main steps: building the KG, learning patient representations and predicting diabetes.</figDesc><graphic coords="4,89.29,84.19,416.68,137.16" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Schema of the two types of data sources and how they are integrated into the KG.</figDesc><graphic coords="5,110.13,84.19,375.02,225.94" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Experimental strategy to split the GSE30208 dataset and enrich with data from the GSE15932 and GSE55098 datasets.</figDesc><graphic coords="7,162.21,328.64,270.86,123.70" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head></head><label></label><figDesc>(a) Using binning approach (b) Using patient-gene links</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Performance comparison between using a KG with domain knowledge and without domain knowledge generated with two approaches: binning and patient-gene links. Acc stands for accuracy, Pr stands for precision, Re stands for recall, F1 stands for f-measure, WAF stands for weighted average f-measure, and AUC stands for area under the ROC curve.</figDesc><graphic coords="9,89.29,84.19,206.26,154.70" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Number of samples, number of shared genes across different datasets, and references for each GEO dataset.</figDesc><table><row><cell>Dataset</cell><cell cols="3">Number of samples</cell><cell></cell><cell cols="2">Number of shared genes</cell><cell>Refs.</cell></row><row><cell></cell><cell cols="3">Total T1D non-T1D</cell><cell cols="3">GSE30208 GSE15932 GSE55098</cell><cell></cell></row><row><cell>GSE30208</cell><cell>63</cell><cell>37</cell><cell>26</cell><cell>368</cell><cell>0</cell><cell>0</cell><cell>[29, 30]</cell></row><row><cell>GSE15932</cell><cell>22</cell><cell>12</cell><cell>10</cell><cell>0</cell><cell>764</cell><cell>337</cell><cell>[31]</cell></row><row><cell>GSE55098</cell><cell>16</cell><cell>8</cell><cell>8</cell><cell>0</cell><cell>337</cell><cell>764</cell><cell>[32, 33]</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Number of triples, types of relations, GO classes and proteins in the KG.</figDesc><table><row><cell></cell><cell>Number</cell></row><row><cell>Triples</cell><cell>2433477</cell></row><row><cell cols="2">Types of relations 56</cell></row><row><cell>GO classes</cell><cell>51375</cell></row><row><cell>Proteins</cell><cell>19169</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://www.uniprot.org/id-mapping</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://www.w3.org/TR/owl2-mapping-to-rdf/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The work presented in this paper has been partly funded by the German Federal Ministry of Education and Research under grant number 13GW0661C (KI-DiabetesDetektion).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Care in diabetes-2022</title>
		<author>
			<persName><forename type="first">D</forename><surname>Care</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Diabetes care</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page">S17</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A review on current advances in machine learning based diabetes prediction</title>
		<author>
			<persName><forename type="first">V</forename><surname>Jaiswal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Negi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Primary Care Diabetes</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page" from="435" to="443" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Diabetes prediction using different machine learning approaches</title>
		<author>
			<persName><forename type="first">P</forename><surname>Sonar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Jayamalini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), IEEE</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="367" to="371" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Diabetes prediction using machine learning algorithms</title>
		<author>
			<persName><forename type="first">A</forename><surname>Mujumdar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vaidehi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procedia Computer Science</title>
		<imprint>
			<biblScope unit="volume">165</biblScope>
			<biblScope unit="page" from="292" to="299" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Diabetes prediction using ensembling of different machine learning classifiers</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">K</forename><surname>Hasan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Alam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hossain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hasan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="76516" to="76531" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Personalized diabetes management using electronic medical records</title>
		<author>
			<persName><forename type="first">D</forename><surname>Bertsimas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kallus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Weinstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">D</forename><surname>Zhuo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Diabetes care</title>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="page" from="210" to="217" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Prediction of type ii diabetes onset with computed tomography and electronic medical records</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">H</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">S</forename><surname>Wells</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Spann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">G</forename><surname>Terry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Carr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Huo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">A</forename><surname>Landman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Multimodal Learning for Clinical Decision Support and Clinical Image-Based Procedures: 10th International Workshop, ML-CDS 2020, and 9th International Workshop, CLIP 2020, Held in Conjunction with MICCAI 2020</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="13" to="23" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Learning temporal state of diabetes patients via combining behavioral and demographic data</title>
		<author>
			<persName><forename type="first">H</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Vu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Turaga</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</title>
				<meeting>the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="2081" to="2089" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Uncovering the gene regulatory network of type 2 diabetes through multi-omic data integration</title>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Qiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Translational Medicine</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page">604</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Knowledge graphs</title>
		<author>
			<persName><forename type="first">A</forename><surname>Hogan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Blomqvist</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cochez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">D</forename><surname>Melo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gutierrez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kirrane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E L</forename><surname>Gayo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Navigli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Neumaier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys (Csur)</title>
		<imprint>
			<biblScope unit="volume">54</biblScope>
			<biblScope unit="page" from="1" to="37" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Biomedical ontologies: a functional perspective</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Rubin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">H</forename><surname>Shah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">F</forename><surname>Noy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Briefings in bioinformatics</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="75" to="90" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Knowledge graph embedding: A survey of approaches and applications</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Mao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Guo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="page" from="2724" to="2743" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Semantic similarity and machine learning with ontologies</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kulmanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">Z</forename><surname>Smaili</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hoehndorf</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Briefings in Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="page">199</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Identification of type 2 diabetes based on a tengene biomarker prediction model constructed using a support vector machine algorithm</title>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2022">2022. 2022</date>
			<publisher>BioMed Research International</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Downregulation of long non-coding rnas linc00523 and linc00994 in type 2 diabetes in an iranian cohort</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Mansoori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ghaedi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sadatamini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Vahabpour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rahimipour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Shanaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Saeidi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Kazerouni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Molecular biology reports</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page" from="1227" to="1233" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding rnas expression: a comparison of four data mining approaches</title>
		<author>
			<persName><forename type="first">F</forename><surname>Kazerouni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bayani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Asadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Saeidi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parvizi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Mansoori</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC bioinformatics</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="1" to="13" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Long non-coding rna ly86-as1 and hcg27_201 expression in type 2 diabetes mellitus</title>
		<author>
			<persName><forename type="first">L</forename><surname>Saeidi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ghaedi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sadatamini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Vahabpour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rahimipour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Shanaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Mansoori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Kazerouni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Molecular biology reports</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page" from="2601" to="2608" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Gene expression profiling of type 2 diabetes mellitus by bioinformatics analysis</title>
		<author>
			<persName><forename type="first">H</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Cheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational and Mathematical Methods in Medicine</title>
		<imprint>
			<date type="published" when="2020">2020. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Benchmark and best practices for biomedical knowledge graph embeddings</title>
		<author>
			<persName><forename type="first">D</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Balažević</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Allen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chawla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Brandt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Taylor</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the conference. Association for Computational Linguistics. Meeting</title>
				<meeting>the conference. Association for Computational Linguistics. Meeting</meeting>
		<imprint>
			<publisher>NIH Public Access</publisher>
			<date type="published" when="2020">2020. 2020</date>
			<biblScope unit="page">167</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Translating embeddings for modeling multi-relational data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Bordes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Usunier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Garcia-Durán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Weston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Yakhnenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of NIPS 2013</title>
				<meeting>NIPS 2013<address><addrLine>Red Hook, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Curran Associates Inc</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="2787" to="2795" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">Embedding entities and relations for learning and inference in knowledge bases</title>
		<author>
			<persName><forename type="first">B</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Yih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Deng</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">RDF2Vec: RDF graph embeddings for data mining</title>
		<author>
			<persName><forename type="first">P</forename><surname>Ristoski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Paulheim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th International Semantic Web Conference</title>
				<meeting>the 15th International Semantic Web Conference<address><addrLine>Cham, Switzerland</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="498" to="514" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Ncbi geo: archive for gene expression and epigenomics data sets: 23-year update</title>
		<author>
			<persName><forename type="first">E</forename><surname>Clough</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Barrett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">E</forename><surname>Wilhite</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ledoux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Evangelista</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">F</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tomashevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">A</forename><surname>Marshall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">H</forename><surname>Phillippy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">M</forename><surname>Sherman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Research</title>
		<imprint>
			<biblScope unit="volume">52</biblScope>
			<biblScope unit="page" from="D138" to="D144" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Universal preprocessing operators for embedding knowledge graphs with literals</title>
		<author>
			<persName><forename type="first">P</forename><surname>Preisner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Paulheim</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">The Gene Ontology resource: enriching a GOld mine</title>
		<author>
			<persName><forename type="first">G</forename><surname>Consortium</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Research</title>
		<imprint>
			<biblScope unit="volume">49</biblScope>
			<biblScope unit="page" from="D325" to="D334" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">The GOA database: gene ontology annotation updates for 2015</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">P</forename><surname>Huntley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Sawford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mutowo-Meullenet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Shypitsyna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bonilla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>O'donovan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Research</title>
		<imprint>
			<biblScope unit="volume">43</biblScope>
			<biblScope unit="page" from="D1057" to="D1063" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets</title>
		<author>
			<persName><forename type="first">D</forename><surname>Szklarczyk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">L</forename><surname>Gable</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">C</forename><surname>Nastou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lyon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kirsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pyysalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">T</forename><surname>Doncheva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Legeay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Research</title>
		<imprint>
			<biblScope unit="volume">49</biblScope>
			<biblScope unit="page" from="D605" to="D612" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Induction of decision trees</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Quinlan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine learning</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="81" to="106" />
			<date type="published" when="1986">1986</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<ptr target="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30208" />
		<title level="m">Series GSE30208</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Innate immune activity is detected prior to seroconversion in children with hla-conferred type 1 diabetes susceptibility</title>
		<author>
			<persName><forename type="first">H</forename><surname>Kallionpää</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">L</forename><surname>Elo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Laajala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mykkänen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Ricano-Ponce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vaarma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">D</forename><surname>Laajala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hyöty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ilonen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Veijola</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Diabetes</title>
		<imprint>
			<biblScope unit="volume">63</biblScope>
			<biblScope unit="page" from="2402" to="2414" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<ptr target="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15932" />
		<title level="m">Series GSE15932</title>
				<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<ptr target="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE55098" />
		<title level="m">Series GSE55098</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Decreased mi r-146 expression in peripheral blood mononuclear cells is correlated with ongoing islet autoimmunity in type 1 diabetes patients 1</title>
		<author>
			<persName><forename type="first">M</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Ning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of diabetes</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="158" to="165" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
