<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>COVIDGraph: Connecting biomedical COVID-19 resources and computational biology models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lea Gütebier</string-name>
          <email>lea.guetebier@stud.uni-greifswald.de</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tim Bleimehl</string-name>
          <email>tim.bleimehl@helmholtz-</email>
          <email>tim.bleimehl@helmholtzmuenchen.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ron Henkel</string-name>
          <email>ron.henkel@uni-greifswald.de</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Müller</string-name>
          <email>sebastian.mueller@yworks.com</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Jarasch</string-name>
          <email>jarasch@dzd-ev.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jamie Munro</string-name>
          <email>jamie@munro.consulting</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Preusse, and the</string-name>
          <email>martin@kaiser-preusse.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dagmar Walthemath</string-name>
          <email>dagmar.waltemath@uni-</email>
          <email>dagmar.waltemath@unigreifswald.de</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>German Center for Diabetes Research</institution>
          ,
          <addr-line>Munich</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>HealthEcco Team, Kaiser &amp; Preusse</institution>
          ,
          <addr-line>Freiburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Munro Consulting</institution>
          ,
          <addr-line>London</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University Medicine Greifswald</institution>
          ,
          <addr-line>Greifswald</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>yWorks</institution>
          ,
          <addr-line>Tübingen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The COVID-19 pandemic has changed life across the globe. In January 2020, little was known about SARS-COV-2, but the vastly increasing number of infections and the uncontrolled spreading demanded fast medical action. Within a year, over 4 million publications relating to COVID-19 appeared in the scientific literature. Additionally, patents have been registered, ontologies have been extended, simulation studies for prediction of disease spread and underlying bioinformatics mechanisms have been built, and health studies have been designed. To support the exploration of COVID19 data, the CovidGraph project was initiated as a non-profit, collaborative and open project driven by researchers, software developers, data scientists and medical professionals. In this article we outline the history, goals and scope of CovidGraph. Using the example of computational biology models, we show how additional resources can be integrated with the knowledge graph to extend the scope of the CovidGraph, for example, to systems biology data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Copyright © 2021 for the individual papers by the papers’ authors. Copyright © 2021
for the volume as a collection by its editors. This volume and its papers are published
under the Creative Commons License Attribution 4.0 International (CC BY 4.0).
Published in the Proceedings of the 2nd Workshop on Search, Exploration, and
Analysis in Heterogeneous Datastores, co-located with VLDB 2021 (August 16-20, 2021,
Copenhagen, Denmark) on CEUR-WS.org.
1</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>CovidGraph is a research and communication platform that
encompasses publications, case statistics, genes and functions, molecular
data and more. It is developed and maintained by HealthECCO, a
non-profit collaboration of researchers, software developers, data
scientists and medical professionals (https://healthecco.org/). Our
aim is to help researchers quickly and eficiently find their way
through COVID-19 datasets using tools that implement artificial
intelligence methods, advanced visualisation techniques, and
intuitive user interfaces. Through CovidGraph users can explore papers,
patents, treatments and medications covering the family of corona
viruses. In addition to literature data we connect information from
biological entities - namely genes, proteins and their function
spanning a network of unparalleled size and knowledge. The latest
addition to the CovidGraph are systems biology models (Fig. 1).</p>
      <p>
        Over the last years, NoSQL approaches such as Key-Value Stores,
BigTable, document databases, triple stores, or graph databases
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], together with semantic web applications, became more
popular within the life sciences. Graph databases ofer a storage
concept based on nodes, (directed) edges, properties and labels. Nodes
can be labelled and are connected by edges, and both can
contain properties. They also allow easy horizontal scaling and fast
graph traversal. Finally, graph databases are schema optional –
a feature that is much appreciated when storing heterogeneous,
highly connected, cross-domain data items from diferent sources.
The HealthECCO project integrates such heterogeneous resources
and compiles a knowledge-base targeted at COVID-19 data (https:
//healthecco.org/covidgraph/), and potentially other diseases in
future versions. The underlying graph database is Neo4j [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>DATA RESOURCES</title>
      <p>
        Previous versions of the CovidGraph already integrated data from
ifve categories (Fig. 2 (A)): Patents, Papers, BioMedical
(ontologies and controlled vocabularies), Clinical Trials and Statistical &amp;
Geographic. Categories are cross-linked by relationships. For
example, items from the "Papers" category are linked to items from
the "Patents" category. One paper source is the COVID-19 Open
Research Dataset (CORD-19) – a collection of research papers relating
to COVID-19 (and corona viruses) [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. It is the main data source
for information about papers in the CovidGraph and contains
publications from PubMed, medRxiv and bioRxiv. Papers and related
information are stored and linked in multiple nodes in the
CovidGraph. Each paper node has author nodes connected to afiliation
nodes that, in turn, are linked to location nodes. Papers can be linked
to COVID-19 patents. The Lens (https://about.lens.org/covid-19/)
provides datasets of patent documents and literature concerning
human corona viruses and COVID-19. The CovidGraph furthermore
contains information about clinical COVID-19 studies from the
ClinicalTrials.gov registry. Studies are represented as clinical trials
nodes which are linked to multiple other nodes representing more
detailed information about each study. Also included in the
CovidGraph are case statistics and case data from Johns Hopkins
University [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and population estimates from the United Nations World
Population Prospects (https://population.un.org/wpp/). Nodes
include city, country, province, daily report and age group.
Biomedical data encodes information about genes, proteins, pathways and
diferent diseases associated with COVID-19. The data comprises
information from various biological and biomedical resources and
is connected to Gene Ontology terms. The Gene Ontology is a
resource for computational representation of the function of genes
and gene products [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Information about genes from the NCBI
Gene Database [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is stored in Gene nodes which are connected
to other nodes describing the underlying biology. Therefore, the
connected nodes include Gene Symbols according to the Ensembl
Genome Browser, a genome database [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The gene symbols are
mapped to synonyms. Since genes are expressed in various tissues
the gene nodes are linked to Gtex Tissue nodes containing gene
expression data from the GTEx Portal [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. For genes that are part
of a pathway there exists a relation between the corresponding
gene node and pathway node. The data included in the
COVIDGraph describes which genes are members of a pathway according
to the Reactome pathway knowledgebase, a database for
molecular information about biological pathways [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. As components
of the transcription and translation process in humans genes code
for transcripts which in turn code for proteins. In the CovidGraph
these processes are described by relationships between gene nodes,
transcript nodes and protein nodes. The data for the transcript
nodes is taken from the NCBI Reference Sequence Database [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ];
the Universal Protein Resource (UniProt) provides a resource of
protein sequences and annotation data [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Proteins associated with
annotation data from the Gene Ontology are linked to GO term
nodes. The last node type connected with gene nodes are disease
nodes. They are in turn associated with anatomy nodes. The
corresponding data is provided by Hetionet, an integrative network
of biomedical data including connections between diseases and
anatomies [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>Knowledge is primarily centred around the domain of
coronaviruses but is steadily extended to other connected diseases as part
of the HealthECCO project. The latest addition to CovidGraph is a
resource of computational biology models. We will introduce the
systems biology node in detail in Section 4.
3</p>
    </sec>
    <sec id="sec-4">
      <title>COVIDGRAPH FRAMEWORK</title>
      <p>
        The CovidGraph infrastructure is built as a labelled property graph
based on the Neo4j Enterprise edition v4.2. Textual information,
such as publications, clinical studies or ontology term descriptions,
is enriched and recognised by a pipeline based on natural
language processing and named entity recognition (BioBERT [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]).
The graph, as of now, contains 36 million nodes and 59 million
relationships but is still growing as the modular software framework
encourages to add and integrate new data sources. Server-wise,
CovidGraph relies on Docker Container. To integrate a new data
source, it needs to be wrapped in a container and it needs to
provide information such as connection data and mapping information
(https://github.com/covidgraph/data_template). An ETL-process
(https://git.connect.dzd-ev.de/dzdtools/motherlode) subsequently
extracts the data from the new source, transforms the data in
accordance with the provided mapping information, and loads the data
into the main CovidGraph.
4
      </p>
    </sec>
    <sec id="sec-5">
      <title>INTEGRATION OF SIMULATION STUDIES</title>
      <p>
        Via the aforementioned ETL-process, we connected the
CovidGraph and the Management System for Models and Simulations
(MaSyMoS, [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). MaSyMoS is a Neo4j graph database for storing
and retrieving data items describing biomedical simulation studies.
The data is extracted from repositories for computational biology
models (BioModels [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and Physiome Model Repository2 [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ])
and integrated in a single graph (Fig. 2 (B)). We consider a
computational biology model a mathematical model written in a
formal machine-readable language, such that it can be systematically
parsed and employed by simulation and analysis software without
further human translation [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. A biomedical simulation study is
considered any calculation performed on a model and describing
evolution of the biological system represented, for instance, over
spatial and/or temporal dimensions [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. MaSyMoS links simulation
studies, their results and corresponding models. Curated
simulation studies are furthermore annotated with meta-data, primarily
(A)
(B)
reference publications and ontological terms from bio-ontologies
[
        <xref ref-type="bibr" rid="ref11 ref4 ref5 ref6">4–6, 11</xref>
        ]. MaSyMoS provides access to over 1000 manually
curated simulation studies originally published in BioModels. This set
contains highly curated studies targeting COVID-19 disease and
spreading (https://www.ebi.ac.uk/biomodels/covid-19). The
resulting knowledge graph ofers domain-specific retrieval and similarity
measures, and it enables eficient access and reuse. As all model
have been shown to reproduce the published results, they are a
valuable resource for biomedical investigations.
      </p>
      <p>The integration of MaSyMoS data with CovidGraph was
twofolded: First we matched papers (publications) from both domains.
Then we connected biomedical ontology terms from both resources
thereby linking disease knowledge and biomedical simulation
studies. The Paper data set (cmp. Fig. 2 (A)) in CovidGraph is represented
by diferent nodes (e.g., the abstract, authors, paper ID). In MaSyMoS
a paper is represented by a single publication node containing the
same aforementioned set of information about a publication.
Consequently, we mapped the corresponding IDs (PubMedID and DOI)
from CovidGraph paper ID nodes and MaSyMoS publication nodes,
thus connecting relevant publications from both data sets. This
mapping resulted in 19 connections. This result is in our expected
range, as the underlying publication corpus covers diferent areas of
interest (e.g. cell cycle, MAPK and apoptosis for simulation models
&amp; clinical trials, respiratory studies and diseases for CovidGraph).
The BioMedical data set in the CovidGraph represents diferent
ontologies with relevance for COVID-19 research. These ontologies
have possible connections and overlap with ontological terms used
to annotate simulation studies in MaSyMoS (cmp. Figure 2 (B)). Our
analyses showed that most overlap can be observed in gene
information, chemical entities, proteins and diseases. Consequently, we
mapped ontological terms in MaSyMoS and CovidGraph for Gene
Ontology (1810 connections), ChEBI (1211 connections), UniProt
(911 connections) and Disease Ontology (72 connections) by their</p>
      <p>IDs (cmp. Figure 1). For Gene Ontology, ChEBI and Disease
Ontology more than 94% of the terms stored in MaSyMoS were connected
to terms in the CovidGraph. The UniProt coverage reached 41%.</p>
      <p>
        Example: COVID-19 spread in Wuhan city. The simulation study
by Roda at al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] investigates the COVID-19 spread in Wuhan
city in the beginning of 2020. Figure 3 shows a Neo4j excerpt of the
model in MaSyMoS and the association to disease information in
the CovidGraph. The association is build by a matching reference
publication and a matching ontology entry from the Disease
Ontology. More specifically, the model is linked (in the middle, dark
green) to several resources (pink). For example, one annotation
refers to an ontology term from the Disease Ontology and is
associated to the corresponding entry in the CovidGraph (on the right,
brown). Another example is the reference publication which links
to the corresponding publication in the CovidGraph (on the right,
blue). We consider this example a first step towards bridging the
gap between medical research and systems biology.
5
      </p>
    </sec>
    <sec id="sec-6">
      <title>TAKEAWAYS &amp; FUTURE WORK</title>
      <p>The CovidGraph project integrates COVID-related data from
heterogeneous data sources, mainly from the medial and health domains,
into a single knowledge graph. We demonstrate that even for fairly
distinct scientific domains such as computational biology modeling
and clinical research, it is possible to link knowledge graphs and
thereby quickly provide new data sources. The presented version of
CovidGraph provides a tool set and a single-access point to
previously disconnected data sources. Biomedical and clinician scientists
can explore a rich set of data items, which are not connected in any
other resource. CovidGraph is only one example for rapid
integration of knowledge. The HealthECCO infrastructure ofers solutions
for integration and exploration of other diseases, building on the
same integration workflow showcased in this paper.
beta
rho</p>
      <p>mu</p>
      <p>MASYMOS_BELONGS_TO</p>
      <p>MASYMOS_HAS_ANNOTATION
Suscept…</p>
      <p>MASYMOS_MISA_SRYEMAOCSTA_H…AS_REACM…ASYMOS_IS_PRODUCT</p>
      <p>Suscept…
MASYMOS_HAS_REACTION</p>
      <p>Roda2020
- SIR model
of COVID-… MASYMOS_BELONGS_TO
MMAASSYYMMOOMSSAS__YMHBOESA_HLSAOS__SNPIPEGnRCIESfOSe_DTMcAOUStYCMMeMAOSATYSdSM_OYBSEM…_HLOAOSSN_RG_MEISAASC_TS_TIROMOYMANASYEMMOSS_HAS_YASPEOCIEMSCOSS…__BHEALOSN_GSR_MTEOMM…AAASYSSMMOYYASM_MSMHASYMOS_BAEMLMONOOGYS_TSMOA_AAMSSRSSEYYS__AOMMMCIIYAOTSSOSSISOMYS____NMH_MPHOALIOSASSRAO___SSRBSYOSECPE_MLE_…DOCCAONPIIEUSTGOSnR_SCEH_MNMOfTAMTADOMeSTSDMAY__AAAcM…UASISSOINtSCNYSeYYY_TSMBMMMdE_LOOO…OSMNSSSGSP_A_S_H__ESITHSABOYCS_AMEI_LESSLOOMPS_OCSECAN_ACSMOCTIYGEAEOMMMSSSDNOPA__YTSASTIAM_NROYBINOETMMSLSMAOOS__YENSSMISGPNO_S_SEHT_P_ICSAT_RILOSEMOOAC_SSAPDYTMERUDO_SOCI_NCTDONUTACINTS_WSMPASEYMuOCS_IISE_LhOSCATEDa_INnMASYMOS_CONTAINS_SPECI…</p>
      <p>Infected</p>
      <p>Confirm…</p>
      <p>Recover…
ht p:/ idM…ASYMMOASS_hBYtEMLpOOS:N/_MA…hSiYdMaOsS_…TisaMAxSYoMOnS_BJELMOuNG…lo1nMA3SYMOMS_ihsAStYMMApSYMOM:AS/_S_BEBMYLiOEdNM…LAO…OMSSN_AGYISSSM__CTYROOKMESAaTO_OuiSRssV_tBehrEsuLio…OnNOGfS_TO
ht p:/ id…MASMYAMSOYSM_OhSas_TBaExLoOnMASNYM1O…S_BE9…C:1EMA9SYM…O:S_h5a…5 MASYMMOASSY_iMsDOeSs_cBribEeLdOBNyGS_TO
ht p:/ id…
ht p:/ id… MASYMOS_DOID_DESCRIBES_… COVID-19
ht p:/ id… MASYMOS_RESOURCE_DESCRIBES_PAPERID 32289100</p>
      <p>PAPER_HAS_PAPERID</p>
      <p>Why is it
dif icult to
accurately
predict the
COVID-19
epidemi…</p>
      <p>The CovidGraph-Team hopes to motivate other data providers to
link up with our resource, but we also like to discuss the applicability
of our graph database infrastructure on existing data silos.</p>
    </sec>
    <sec id="sec-7">
      <title>ACKNOWLEDGMENTS</title>
      <p>The work presented here is the result of the HealthEcco Team (https:
//healthecco.org/team/). The COVID-19 collection in BioModels
was built with the help of an EOSC COVID-19 Fast Track funding.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Renzo</given-names>
            <surname>Angles</surname>
          </string-name>
          and
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Survey of graph database models</article-title>
          .
          <source>ACM Computing Surveys (CSUR) 40</source>
          ,
          <issue>1</issue>
          (
          <year>2008</year>
          ),
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Garth</surname>
            <given-names>R Brown</given-names>
          </string-name>
          , Vichet Hem,
          <string-name>
            <surname>Kenneth S Katz</surname>
          </string-name>
          , Michael Ovetsky, Craig Wallin, Olga Ermolaeva, Igor Tolstoy, Tatiana Tatusova,
          <string-name>
            <surname>Kim D Pruitt</surname>
          </string-name>
          ,
          <string-name>
            <surname>Donna R Maglott</surname>
          </string-name>
          , et al.
          <year>2015</year>
          .
          <article-title>Gene: a gene-centered information resource at NCBI</article-title>
          .
          <source>Nucleic acids research</source>
          43,
          <string-name>
            <surname>D1</surname>
          </string-name>
          (
          <year>2015</year>
          ),
          <fpage>D36</fpage>
          -
          <lpage>D42</lpage>
          . https://doi.org/10.1093/nar/gku1055
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Kathi</given-names>
            <surname>Canese</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sarah</given-names>
            <surname>Weis</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>PubMed: the bibliographic database</article-title>
          .
          <source>The NCBI Handbook</source>
          <volume>2</volume>
          (
          <year>2013</year>
          ),
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>The</given-names>
            <surname>Gene Ontology Consortium</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>The Gene Ontology resource: enriching a GOld mine</article-title>
          .
          <source>Nucleic Acids Research</source>
          <volume>49</volume>
          ,
          <issue>D1</issue>
          (
          <year>2021</year>
          ),
          <fpage>D325</fpage>
          -
          <lpage>D334</lpage>
          . https://doi.org/10. 1093/nar/gkaa1113
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>UniProt</given-names>
            <surname>Consortium</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>UniProt: a worldwide hub of protein knowledge</article-title>
          .
          <source>Nucleic acids research</source>
          47,
          <string-name>
            <surname>D1</surname>
          </string-name>
          (
          <year>2019</year>
          ),
          <fpage>D506</fpage>
          -
          <lpage>D515</lpage>
          . https://doi.org/10.1093/nar/ gky1049
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Paula de Matos</surname>
            , Adriano Dekker, Marcus Ennis, Janna Hastings,
            <given-names>Kenneth</given-names>
          </string-name>
          <string-name>
            <surname>Haug</surname>
            , Steve Turner, and
            <given-names>Christoph</given-names>
          </string-name>
          <string-name>
            <surname>Steinbeck</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>ChEBI: a chemistry ontology and database</article-title>
          .
          <source>Journal of cheminformatics 2</source>
          ,
          <issue>1</issue>
          (
          <year>2010</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>1</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Ensheng</given-names>
            <surname>Dong</surname>
          </string-name>
          , Hongru Du, and
          <string-name>
            <given-names>Lauren</given-names>
            <surname>Gardner</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>An interactive web-based dashboard to track COVID-19 in real time</article-title>
          .
          <source>The Lancet infectious diseases 20</source>
          ,
          <issue>5</issue>
          (
          <year>2020</year>
          ),
          <fpage>533</fpage>
          -
          <lpage>534</lpage>
          . https://doi.org/10.1016/S1473-
          <volume>3099</volume>
          (
          <issue>20</issue>
          )
          <fpage>30120</fpage>
          -
          <lpage>1</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Ron</given-names>
            <surname>Henkel</surname>
          </string-name>
          , Olaf Wolkenhauer, and
          <string-name>
            <given-names>Dagmar</given-names>
            <surname>Waltemath</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Combining computational models, semantic annotations and simulation experiments in a graph database</article-title>
          .
          <source>Database</source>
          <year>2015</year>
          (
          <year>2015</year>
          ),
          <year>bau130</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Scott</surname>
          </string-name>
          Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green,
          <string-name>
            <given-names>Pouya</given-names>
            <surname>Khankhanian</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Sergio E</given-names>
            <surname>Baranzini</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Systematic integration of biomedical knowledge prioritizes drugs for repurposing</article-title>
          .
          <source>eLife 6 (Sept</source>
          .
          <year>2017</year>
          ),
          <year>e26726</year>
          . https://doi.org/10.7554/elife.26726
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Tim</surname>
            <given-names>Hubbard</given-names>
          </string-name>
          , Daniel Barker, Ewan Birney, Graham Cameron,
          <string-name>
            <given-names>Yuan</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L Clark</given-names>
            ,
            <surname>Tony Cox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J</given-names>
            <surname>Cuf</surname>
          </string-name>
          , Val Curwen,
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Down</surname>
          </string-name>
          , et al .
          <year>2002</year>
          .
          <article-title>The Ensembl genome database project</article-title>
          .
          <source>Nucleic acids research 30</source>
          ,
          <issue>1</issue>
          (
          <year>2002</year>
          ),
          <fpage>38</fpage>
          -
          <lpage>41</lpage>
          . https: //doi.org/10.1093/nar/30.1.
          <fpage>38</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Bijay</surname>
            <given-names>Jassal</given-names>
          </string-name>
          , Lisa Matthews, Guilherme Viteri, Chuqiao Gong, Pascual Lorente, Antonio Fabregat, Konstantinos Sidiropoulos, Justin Cook, Marc Gillespie,
          <string-name>
            <given-names>Robin</given-names>
            <surname>Haw</surname>
          </string-name>
          , et al.
          <year>2020</year>
          .
          <article-title>The reactome pathway knowledgebase</article-title>
          .
          <source>Nucleic acids research</source>
          48,
          <string-name>
            <surname>D1</surname>
          </string-name>
          (
          <year>2020</year>
          ),
          <fpage>D498</fpage>
          -
          <lpage>D503</lpage>
          . https://doi.org/10.1093/nar/gkz1031
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Nicolas</given-names>
            <surname>Le</surname>
          </string-name>
          <string-name>
            <surname>Novère</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Finney</surname>
          </string-name>
          , Michael Hucka, Upinder S Bhalla, Fabien Campagne, Julio Collado-Vides, Edmund J Crampin, Matt Halstead, Edda Klipp,
          <string-name>
            <given-names>Pedro</given-names>
            <surname>Mendes</surname>
          </string-name>
          , et al.
          <year>2005</year>
          .
          <article-title>Minimum information requested in the annotation of biochemical models (MIRIAM)</article-title>
          .
          <source>Nature biotechnology 23</source>
          ,
          <issue>12</issue>
          (
          <year>2005</year>
          ),
          <fpage>1509</fpage>
          -
          <lpage>1515</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Jinhyuk</given-names>
            <surname>Lee</surname>
          </string-name>
          , Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and
          <string-name>
            <given-names>Jaewoo</given-names>
            <surname>Kang</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>BioBERT: a pre-trained biomedical language representation model for biomedical text mining</article-title>
          .
          <source>Bioinformatics</source>
          <volume>36</volume>
          ,
          <issue>4</issue>
          (
          <year>2020</year>
          ),
          <fpage>1234</fpage>
          -
          <lpage>1240</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>John</surname>
            <given-names>Lonsdale</given-names>
          </string-name>
          , Jefrey Thomas,
          <string-name>
            <given-names>Mike</given-names>
            <surname>Salvatore</surname>
          </string-name>
          , Rebecca Phillips, Edmund Lo, Saboor Shad, Richard Hasz, Gary Walters, Fernando Garcia,
          <string-name>
            <given-names>Nancy</given-names>
            <surname>Young</surname>
          </string-name>
          , et al.
          <year>2013</year>
          .
          <article-title>The genotype-tissue expression (GTEx) project</article-title>
          .
          <source>Nature genetics 45</source>
          ,
          <issue>6</issue>
          (
          <year>2013</year>
          ),
          <fpage>580</fpage>
          -
          <lpage>585</lpage>
          . https://doi.org/10.1038/ng.2653
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Rahuman S Malik-Sherif</surname>
          </string-name>
          , Mihai Glont, Tung VN Nguyen, Krishna Tiwari, Matthew G Roberts,
          <article-title>Ashley Xavier</article-title>
          , Manh T Vu, Jinghao Men, Matthieu Maire,
          <string-name>
            <given-names>Sarubini</given-names>
            <surname>Kananathan</surname>
          </string-name>
          , et al.
          <year>2020</year>
          .
          <article-title>BioModels-15 years of sharing computational models in life science</article-title>
          .
          <source>Nucleic acids research</source>
          48,
          <string-name>
            <surname>D1</surname>
          </string-name>
          (
          <year>2020</year>
          ),
          <fpage>D407</fpage>
          -
          <lpage>D415</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>United</given-names>
            <surname>Nations</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>World population prospects 2019: highlights</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Kim</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Pruitt</surname>
            ,
            <given-names>Tatiana</given-names>
          </string-name>
          <string-name>
            <surname>Tatusova</surname>
          </string-name>
          , and
          <string-name>
            <surname>Donna R Maglott</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins</article-title>
          .
          <source>Nucleic acids research 35, suppl_1</source>
          (
          <issue>2007</issue>
          ),
          <fpage>D61</fpage>
          -
          <lpage>D65</lpage>
          . https://doi.org/10.1093/nar/gki025
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Ian</surname>
            <given-names>Robinson</given-names>
          </string-name>
          , Jim Webber, and
          <string-name>
            <given-names>Emil</given-names>
            <surname>Eifrem</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <string-name>
            <given-names>Graph</given-names>
            <surname>Databases. O'Reilly Media</surname>
          </string-name>
          , CA, USA.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Weston</surname>
            <given-names>C Roda</given-names>
          </string-name>
          , Marie B Varughese, Donglin Han, and
          <string-name>
            <surname>Michael</surname>
            <given-names>Y</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Why is it dificult to accurately predict the COVID-</article-title>
          19
          <source>epidemic? Infectious Disease Modelling</source>
          <volume>5</volume>
          (
          <year>2020</year>
          ),
          <fpage>271</fpage>
          -
          <lpage>281</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Falk</surname>
            <given-names>Schreiber</given-names>
          </string-name>
          , Björn Sommer, Tobias Czauderna, Martin Golebiewski, Thomas E Gorochowski, Michael Hucka, Sarah M Keating,
          <string-name>
            <given-names>Matthias</given-names>
            <surname>König</surname>
          </string-name>
          , Chris Myers,
          <string-name>
            <surname>David Nickerson</surname>
          </string-name>
          , et al.
          <year>2020</year>
          .
          <article-title>Specifications of standards in systems and synthetic biology: status and developments in 2020</article-title>
          .
          <source>Journal of integrative bioinformatics 17</source>
          , 2-
          <fpage>3</fpage>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Lynn</given-names>
            <surname>Marie</surname>
          </string-name>
          <string-name>
            <given-names>Schriml</given-names>
            , Cesar Arze, Suvarna Nadendla,
            <surname>Yu-Wei Wayne</surname>
          </string-name>
          <string-name>
            <surname>Chang</surname>
          </string-name>
          , Mark Mazaitis, Victor Felix, Gang Feng, and Warren Alden Kibbe.
          <year>2012</year>
          .
          <article-title>Disease Ontology: a backbone for disease semantic integration</article-title>
          .
          <source>Nucleic acids research</source>
          40,
          <string-name>
            <surname>D1</surname>
          </string-name>
          (
          <year>2012</year>
          ),
          <fpage>D940</fpage>
          -
          <lpage>D946</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>[22] The GTEx Portal</source>
          .
          <year>2020</year>
          .
          <article-title>GTEx Portal Documentation</article-title>
          . https://gtexportal.org/ home/documentationPage. Online, accessed
          <issue>12</issue>
          <year>October 2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Dagmar</surname>
            <given-names>Waltemath</given-names>
          </string-name>
          , Richard Adams, Daniel A Beard, Frank T Bergmann, Upinder S Bhalla, Randall Britten, Vijayalakshmi Chelliah, Michael T Cooling, Jonathan Cooper,
          <string-name>
            <surname>Edmund J Crampin</surname>
          </string-name>
          , et al.
          <year>2011</year>
          .
          <article-title>Minimum information about a simulation experiment (MIASE)</article-title>
          .
          <source>PLoS computational biology 7</source>
          ,
          <issue>4</issue>
          (
          <year>2011</year>
          ),
          <year>e1001122</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Lucy</given-names>
            <surname>Lu</surname>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          , Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Darrin Eide, Kathryn Funk, Rodney Kinney, Ziyang Liu,
          <string-name>
            <given-names>William</given-names>
            <surname>Merrill</surname>
          </string-name>
          , et al.
          <year>2020</year>
          . Cord-
          <volume>19</volume>
          : The covid-19 open research dataset.
          <source>ArXiv arXiv2004</source>
          . (
          <year>2020</year>
          ),
          <year>10706v2</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Tommy</surname>
            <given-names>Yu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Catherine M Lloyd</surname>
            ,
            <given-names>David P Nickerson</given-names>
          </string-name>
          , Michael T Cooling,
          <article-title>Andrew</article-title>
          K Miller, Alan Garny, Jonna R Terkildsen, James Lawson,
          <string-name>
            <surname>Randall D Britten</surname>
          </string-name>
          ,
          <string-name>
            <surname>Peter J Hunter</surname>
          </string-name>
          , et al.
          <year>2011</year>
          .
          <article-title>The physiome model repository 2</article-title>
          .
          <source>Bioinformatics 27</source>
          ,
          <issue>5</issue>
          (
          <year>2011</year>
          ),
          <fpage>743</fpage>
          -
          <lpage>744</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Deborah</surname>
            <given-names>A Zarin</given-names>
          </string-name>
          , Tony Tse,
          <string-name>
            <surname>Rebecca J Williams</surname>
          </string-name>
          ,
          <article-title>Robert M Calif,</article-title>
          and Nicholas C Ide.
          <year>2011</year>
          .
          <article-title>The ClinicalTrials.gov results database - update and key issues</article-title>
          .
          <source>New England Journal of Medicine 364</source>
          ,
          <issue>9</issue>
          (
          <year>2011</year>
          ),
          <fpage>852</fpage>
          -
          <lpage>860</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>