=Paper= {{Paper |id=Vol-3073/paper17 |storemode=property |title=A Community Effort for COVID-19 Ontology Harmonization |pdfUrl=https://ceur-ws.org/Vol-3073/paper17.pdf |volume=Vol-3073 |authors=Asiyah Yu Lin,Yuki Yamagata,William D. Duncan,Leigh C. Carmody,Tatsuya Kushida,Hiroshi Masuya,John Beverley,Biswanath Dutta,Michael DeBellis,Zoë May Pendlington,Paola Roncaglia,Yongqun He |dblpUrl=https://dblp.org/rec/conf/icbo/LinYDCKMBDDPRH21 }} ==A Community Effort for COVID-19 Ontology Harmonization== https://ceur-ws.org/Vol-3073/paper17.pdf
A Community Effort for COVID-19 Ontology Harmonization
Asiyah Yu Lin1, Yuki Yamagata2, William D. Duncan3, Leigh C. Carmody4, Tatsuya Kushida2,
Hiroshi Masuya2, John Beverley5, Biswanath Dutta6, Michael DeBellis7, Zoë May
Pendlington8, Paola Roncaglia8, Yongqun He9

1
  National Human Genome Research Institute, NIH, Bethesda, MD, USA
2
  RIKEN, Japan
3
  Lawrence Berkeley National Laboratory, Berkeley, CA, USA
4
  The Jackson Laboratory, Bar Harbor, ME, USA
5
  Northwest University, Evanston, Il, USA
6
  Indian Statistical Institute Bangalore Centre, India
7
  Individual Consultant and Researcher, San Francisco, CA, USA
8
  European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome
Campus, Hinxton, Cambridge CB10 1SD, UK.
9
  University of Michigan Medical School, Ann Arbor, MI, USA.


                                  Abstract
                                  Ontologies have emerged to become critical to support data and knowledge representation,
                                  standardization, integration, and analysis. The SARS-CoV-2 pandemic led to the rapid
                                  proliferation of COVID-19 data, as well as the development of many COVID-19 ontologies.
                                  In the interest of supporting data interoperability, we initiated a community-based effort to
                                  harmonize COVID-19 ontologies. Our effort involves the collaborative discussion among
                                  developers of seven COVID-19 related ontologies, and the merging of four ontologies. This
                                  effort demonstrates the feasibility of harmonizing these ontologies in an interoperable
                                  framework to support integrative representation and analysis of COVID-19 related data and
                                  knowledge.

                                  Keywords 1
                                  Knowledge integration, COVID-19, SARS-CoV-2, ontology, harmonization

1. Introduction

     Despite the development and distribution of effective COVID-19 vaccines, COVID-19 pandemic
remains a challenge to overcome. The sheer volume of data collected by researchers, the speed at which
it is generated, range of its sources, quality, accuracy, and need for assessment of usefulness, results in
complex, multidimensional datasets [1], often annotated in specific terminologies and coding systems
by researchers in distinct disciplines. The value of cross-discipline meta-data analysis is obvious, and
evident in the present pandemic. However, with the extensive COVID-19 research, we face a big
challenge of data silos, which significantly undermine interoperability, meta-data analysis,
reproducibility, pattern identification, and discovery and reusability across disciplines [2].
     Ontologies - interoperable, logically well-defined, controlled vocabularies representing common
entities and relations across disciplines - is a well-known solution to data silo problems. Ontologies are
widely used in bioinformatics and biomedical data standardization, supporting data integration, sharing,
reproducibility, and automated reasoning. To meet different needs for COVID-19 studies, different
groups of ontology developers have worked separately since the start of the pandemic, resulting in the


International Conference on Biomedical Ontologies 2021, September 16–18, 2021, Bozen-Bolzano, Italy
EMAIL: asiyah.lin@nih.gov (A. 1); yuki.yamagata@riken.jp (A. 2); yongqunh@med.umich.edu (A. 9)
ORCID: 0000-0003-2620-0345 (A. 1); 0000-0002-9673-1283 (A. 2); 0000-0001-9189-9661 (A. 9)
                               © 2021 Copyright for this paper by its authors.
                               Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Wor
    Pr
       ks
        hop
     oceedi
          ngs
                ht
                I
                 tp:
                   //
                    ceur
                       -
                SSN1613-
                        ws
                         .or
                       0073
                           g
                               CEUR Workshop Proceedings (CEUR-WS.org)
development of several COVID-19 ontologies. A lack of coordination among these groups would risk
the proliferation of COVID-19 ontologies using distinct, potentially non-interoperable, vocabularies.
    The Workshop on COVID-19 Ontologies (WCO-2020) held on Oct. 23 and Oct. 30, 2020 brought
the developers from international groups to report their efforts on building COVID-19 related
ontologies. To harmonize heterogeneous knowledge and data for better COVID-19 study, the workshop
attendees formed a COVID-19 Ontology Harmonization Working Group (WG) and discussed the ways
to harmonize these related ontologies. This paper reports the current results of our harmonization effort.

2. Scope and Methods

   In this study, the following seven COVID-19 related ontologies were covered in the ontology
harmonization process by the COVID-19 Ontology Harmonization Working Group:
   1. Virus Infectious Disease Ontology (VIDO) [3]
   2. Ontology of Coronavirus Infectious Disease (CIDO) [4]
   3. COVID-19 Infectious Disease Ontology (IDO-COVID-19) [5]
   4. Controlled Vocabulary for COVID-19 (COVoc)
   5. Homeostasis imbalance process ontology (HOIP) [6]
   6. Medical Action Ontology (MAxO)
   7. Ontology for collection and analysis of COviD-19 data (CODO) [7]

   Each of the above ontologies has their own scope and purpose. Three ontologies: Virus Infectious
Disease Ontology (VIDO), Coronavirus Infectious Disease Ontology (CIDO), and COVID-19
Infectious Disease Ontology (COVID-19-IDO) all extend the Infectious Disease Ontology (IDO) [5].
   The mission statement of the COVID-19 Ontology Harmonization WG is to harmonize different
COVID-19 related ontologies to support COVID-19 related data and knowledge interoperability. To
achieve the mission, WG members held regular virtual Zoom meetings and communicated through
emails. We identified overlapping domains or subdomains from different ontology groups and built
consensus on ontology terms needed to characterize specific COVID-19 related entities.

2.1 VIDO

    VIDO (https://bioportal.bioontology.org/ontologies/VIDO) is an extension of the IDO designed to
bridge IDO - which is composed of terms common to any scientific investigation of infectious disease
- to virus-specific ontologies. As such, VIDO follows OBO Foundry guidelines closely. VIDO is
composed of terms common to any investigation of viral infectious diseases, including virus
classification, virus infection epidemiology, pathogenesis, and treatment. For example, VIDO defines
terms such as virus, prion, viricide, virus infection incidence, and so on.

2.2 CIDO
   By extending IDO and other OBO ontologies including the Ontology for Biomedical Investigations
(OBI), CIDO (https://github.com/cido-ontology/cido) is developed to cover coronavirus infectious
diseases including their etiology, transmission, epidemiology, host-coronavirus interaction,
pathogenesis, diagnosis, prevention, and treatment. CIDO covers SARS-CoV, SARS-CoV-2, and
MERS-CoV, and other coronavirus strains that cause common human cold.

2.3 COVID-19-IDO

   COVID-19-IDO (https://bioportal.bioontology.org/ontologies/IDO-COVID-19), which was created
by the developers of VIDO, is a direct extension of VIDO. As such, IDO-COVID-19 covers the
epidemiology, classification, pathogenesis, and treatment of terms used to represent infection by the
SARS-CoV-2 virus strain and the associated COVID-19 disease.
2.4 COVoc

    Controlled Vocabulary for COVID-19 (COVoc) (https://github.com/EBISPOT/covoc) is an
application ontology created in collaboration between the European Bioinformatics Institute (EMBL-
EBI) and the Swiss Institute of Bioinformatics (SIB) in March 2020. Its primary use case is to enable
seamless annotation of biomedical literature to core databases and ELIXIR tools (ELIXIR is a
European-wide intergovernmental organization for life sciences). The ontology covers 9 axes related to
the COVID-19 pandemic (biomedical vocabulary, cell lines, chemical entities, clinical trials, conceptual
entities, diseases and syndromes, geographic locations, organisms, and proteins and genomes). COVoc
utilizes existing OBO ontologies where possible to augment connections to other useful resources such
as the COVID-19 Data Portal (https://www.covid19dataportal.org/).

2.5 CODO
    Ontology for Collection and Analysis of COviD-19 Data (CODO) (https://w3id.org/codo,
https://github.com/biswanathdutta/CODO) is a formal Ontology for collection and analysis of COVID-
19 data [8]. The goal of the ontology was to collect data about the pandemic so that researchers could
answer questions, for example about infection paths based on information about relations between
patients, clusters, geography, time, comorbidities, etc. The current CODO 1.3 primarily provides the
terms and relations for representing COVID-19 data and information, such as epidemiology, clinical
findings, etiology, diagnosis, treatment facility, comorbidity, including the statistical data on disease
spread and casualty by space and time, and resource requirements. The developed ontology can be used
by the various agencies, namely doctors, hospitals, policy-makers, government agencies, application
developers, etc. for various purposes, such as for developing applications, like search, question-
answering systems, risk detection systems; for document annotation; for developing knowledge graph,
etc. The ontology was designed by analysing disparate COVID-19 data sources such as datasets,
literature, services, government published COVID-19 guidelines, WHO literature, etc.

2.6 HOIP
    Homeostasis               imbalance               process          ontology             (HOIP)
(https://bioportal.bioontology.org/ontologies/HOIP) focuses on homeostatic imbalances between virus
action and innate defense processes and covers the causal relationship of organelle/cellular/organ
processes from early stage to clinical manifestation in COVID-19. The design patterns between CIDO
and HOIP have now been aligned after shared discussion and communication.

2.7 MAxO

    Medical Action Ontology (MAxO), launched in the spring of 2020, is a broad ontology that provides
a structured vocabulary to medical procedures, interventions, therapies, treatments, or clinical
recommendations. MAxO was designed to provide a thorough resource for annotating medical actions
to diseases, particularly rare diseases. Given the broad nature of MAxO and the timing of the ontology
development, much of the hierarchy was added with a keen awareness of the diagnostics and treatment
of SARS-CoV-2. While there are no COVID-19-specific terms, terms like ‘ventilation with proning’
(MAXO:0000619) and ‘clinical RNA detection testing’ (MAXO:0000592) were added to annotate
COVID-19 clinical data sets. To capture the relationship between treatments and diseases, a new tool,
Phenotypic Observation Explication Tool (POET), was developed to establish a relationship between
MAxO, Human Phenotype Ontology (HPO), and Mondo Disease Ontology (Mondo) terms. This tool
will allow researchers to actively participate in annotating COVID-19 data sets or other diseases in their
expertise. MAxO annotations and the POET tool will be available on the HPO website (hpo.jax.org) by
2022.
3. Ontology Overlapping and Term Reuse

    The ontology harmonization is started by identifying the scopes and development methods by
different ontologies covered in this work. We found that instead of reinventing the wheel, each ontology
has imported and reused many terms from other ontologies where possible (Table 1). The top 1 reused
ontologies (reused in six out of the seven ontologies) are: OBI, UBERON, CL, GO Biological
process,ChEBI, PRO, and RO. The top 2 reused ontologies (reused in five out of the seven ontologies)
are BFO, NCBI taxon, symptom ontology and Vaccine Ontology. Many of these reused ontologies are
Open Biomedical and Biological Ontologies (OBO) Foundry [8] ontologies.

Table 1.
                                 Ontology term reuse by COVID-19 related ontologies
   Ontology            Domain             VIDO       CIDO      COVID-19-       HoIP   CODO   MAxO   COVoc
                                                                 IDO

      BFO          Upper ontology          Yes        Yes         Yes          Yes    Yes

      IAO           Information                       Yes         Yes          Yes                   Yes
                      content

      OBI             Data item                       Yes         Yes          Yes    Yes    Yes     Yes

   NCBI taxon        Taxonomy              Yes        Yes         Yes          Yes                   Yes

    UBERON           Anatomical            Yes        Yes         Yes          Yes           Yes     Yes
                      structure

      CL                 Cell              Yes        Yes         Yes          Yes           Yes     Yes

      GO          Biological process       Yes        Yes         Yes          Yes           Yes     Yes

     PATO            Phenotype                        Yes                      Yes           Yes

     HPO             Phenotype                        Yes                      Yes           Yes     Yes

     ChEBI            Chemical             Yes        Yes         Yes          Yes           Yes     Yes
                     compound

      PRO              Protein             Yes        Yes         Yes          Yes           Yes     Yes

     HGNC               Gene                          Yes

     OGG                Gene                                                   Yes

      DO               Disease                        Yes                      Yes    Yes

    MONDO              Disease                                                               Yes     Yes

  SNOMED CT            Disease                                                        Yes

    NDF-RT        Disease/Finding                     Yes

   Symptom            Symptom              Yes        Yes         Yes          Yes    Yes

   Vaccine             Vaccine             Yes        Yes         Yes          Yes    Yes
   Ontology

      RO         Relational ontology       Yes        Yes         Yes          Yes           Yes     Yes
4. Ontology Alignment and Harmonization

    Given that most of the 7 ontologies follow the OBO Foundry ontology development principles, such
as reusing terms defined in OBO foundry ontologies, Our harmonization exercise found that these
ontologies can be aligned under the Basic Formal Ontology (BFO) upper level ontology (Figure 1).
Figure 1 below shows how VIDO, CIDO, IDO-COVID-19, MAxO and HoIP can fit into BFO’s
structure.




Figure 1: Hierarchical representation of selected terms from different ontologies that are harmonized
under the BFO upper level ontology. The red colors represent ontologies focused in this ontology
harmonization study. Terms from many ontologies such as BFO, NCBITaxon, and VO have been used
by our ontologies as well.

    The relationship between CIDO and IDO-COVID-19 provides an example of precisely the sort of
distinct overlapping ontology development efforts our working group was designed to address. Via this
alignment exercise and observing the scope of CIDO appears broad enough to include IDO-COVID-
19, our working group has decided to incorporate the latter ontology into CIDO. Incorporation of terms
from IDO-COVID-19 into CIDO will, moreover, strengthen the logical relationship between CIDO and
VIDO, given how closely related VIDO and IDO-COVID-19 are.
    The HoIP developers are working on mapping and aligning with all GO process terms. Concerning
harmonization, HoIP ontology has started to compare their processual entities to those in CIDO. For
example, although the labels of 'SARS-CoV-2 entry to cell' (CIDO:0000088) and 'viral entry into host
cell [COVID-19]' (HoIP:0037063) are different, as the HoIP entity is described using object property
restriction ('has agent' some SARS-CoV2), it can be mapped to correspondent CIDO term. As an
application ontology, the COVoc developers rely on CIDO developers to create new terms, and COVoc
imports and reuses CIDO for their application purpose. At the time of writing, CODO developers started
to align the current build to BFO as its upper ontology, which increases the future possibilities of better
alignment.


5. Discussions

    While ontology creates a common language and reduces the work of mapping, the emergence of
multiple ontologies may form individual silos by themselves. Given the report of many COVID-19
related ontologies, our COVID-19 Ontology Harmonization WG provided a timely effort to
collaboratively identify the overlapping between different ontologies and achieve the harmonization of
seven ontologies. Currently, seven ontologies have very different perspectives due to their use cases.
Entities within these seven ontologies are defined heterogeneously and described in various ways with
various granularities. We should align not only the same URIs but also the meaning (semantics) of the
entities. Therefore, it is necessary to investigate and compare entities among ontologies carefully, such
as definition, superclass, logical restrictions, and related entities. Towards the formal alignment of these
ontologies, we plan to clarify and make explicit the relationships such as equivalent class among the
ontologies.
    Members of the COVID-19 Ontology Harmonization WG made substantial efforts to characterize
SARS-CoV-2 and COVID-19 data in a collaborative, computationally tractable, responsible manner.
These ontologies are also being used in different use case studies, supporting productive and
interoperable COVID-19 research.
    Our working group has also recognized many future challenges such as funding, resource and time
commitment, and challenging infrastructure development. We are pleased to find that the willingness
to join the harmonization work is high, and more interested parties are joining the effort. We aim to
continue this collaborative effort to further support our active COVID-19, leading to enhanced public
health.


6. Acknowledgements

    We acknowledge the organizers and attendees of the 2020 Workshop on COVID-19 Ontologies
(WCO-2020), which initiated our get-together and collaboration on the ontology harmonization effort.
    The Office of Data Science Strategy, NIH, provided funding for AYL as a Data and Technology
Advancement (DATA) National Service Scholar. JB was supported by NIH / NLM T5 Biomedical
Informatics and Data Science Research Training Programs (5T15LM012495–03) during development
of VIDO and IDO-COVID-19. CODO work has been supported by Indian Statistical Institute through
internal project grant. PR’s work of COVoc has been made possible in part by a grant from Chan
Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation. YY was
supported by the RIKEN Open Life Science Platform Project during knowledge systematization of
COVID-19 infectious processes and development of HoIP ontology.


7. References

[1] Y. He et al. CIDO, a community-based ontology for coronavirus disease knowledge and data
    integration, sharing, and analysis. Scientific data. 2020. (7):181.
[2] R. Arp, B. Smith, A. Spear. Building Ontologies with Basic Formal Ontology. Cambridge, MA:
    MIT Press; 2015.
[3] J. Beverley, S. Babcock, G. Carvalho, L. G. Cowell, S. Duesing, R. Hurley, B. Smith (2020).
    Coordinating Coronavirus Research: The COVID-19 Infectious Disease Ontology. OSF Preprint.
    https://osf.io/5bx8c/
[4] Y. Liu, J. Hur, W. K. B. Chan, Z. Wang, J. Xie, D. Sun, S. Handelman, J. Sexton, H. Yu, Y. He.
    Ontological modeling and analysis of experimentally or clinically verified drugs against
    coronavirus infection. Sci Data. 2021 Jan 13;8(1):16. doi: 10.1038/s41597-021-00799-w.
[5] S. Babcock, J. Beverley, L. G. Cowell, B. Smith. The Infectious Disease Ontology in the Age of
    COVID-19. 2021, June 10. https://doi.org/10.31219/osf.io/az6u5
[6] Y. Yamagata et al. Ontology development for building a knowledge base in the life science and
    structuring knowledge for elucidating the COVID-19 mechanism. The 31th Annual Conference of
    the Japanese Society for Artificial Intelligence, 2021.
[7] B. Dutta, M. DeBellis (2020). CODO: an ontology for collection and analysis of COVID-19 data.
    In Proc. of 12th Int. Conf. on Knowledge Engineering and Ontology Development (KEOD),
    Lisboa,       Portugal,    2-4     November        2020,      vol.  2,  pp.    76-85     (DOI:
    https://doi.org/10.5220/0010112500760085).
[8] The Open Biomedical Ontologies Foundry. http://obofoundry.org/. Accessed 10 June. 2021.