<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Genome Variation Ontology for annotation of complex structural variations⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shuichi Kawashima</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Takatomo Fujisawa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Toshiaki Katayama</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DDBJ Center, National Institute of Genetics</institution>
          ,
          <addr-line>Yata 1111, Mishima, Shizuoka 411-8540</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Database Center for Life Science (DBCLS)</institution>
          ,
          <addr-line>178-4-4 Wakashiba, Kashiwa-shi, Chiba 277-0871</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Genome Variation Ontology (GVO) is an ontology for the systematic description of various genomic variations, including complex structural variations in genomes. With advances in the discovery and genotyping methods for genomic variations, it is expected that new types of complex structural variations (SVs) will be discovered. We have developed the Ge-nome Variation Ontology to annotate all types of genomic variations, which includes complex SVs. Terms on genomic variations from dbSNP, dbVar, gnomAD, SO, VariO, and HGVS were collected and clustered manu-ally to generate 47 concepts. GVO is available at http://genome-variation.org/resource/gvo In recent years, large-scale sequencing projects have been carried out, such as the 100,000 Genomes Project by Genomics England. And based on massive amounts of individual genome sequences, a number of genomic structural variations (SVs) have been reported. For example, gnomAD-SV [1] contains a large number of novel SVs discovered using an improved multialgorithm ensemble method against high-coverage WGS. In addition it is envisaged that the widespread use of long-read sequence technologies will generate an ever-increasing amount of information on SVs in the near future. As ontologies that are able to be used for annotation of genomic variations, the Sequence Ontology (SO) [2] and the Variation Ontology (VariO) [3] are available: SO is an ontology for annotation of sequence features, while VariO is an ontology for describing the effects, consequences and mechanisms of mutations in DNA, RNA and proteins. These include terms for canonical SVs, such as deletion and translocation. However, besides canonical SVs, gnomADSV, for example, also contains complex SVs classified into 11 subtypes, some of which are not available in the existing ontologies.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;ontology</kwd>
        <kwd>genome variation</kwd>
        <kwd>structural variation</kwd>
        <kwd>RDF</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        We have developed TogoVar database that integrates allele frequencies from Japanese
populations and providing annotations for variant interpretation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. One of the notable feature
of TogoVar is that all data is described in RDF. While TogoVar has targeted single nucleotide
variants (SNVs) and some canonical SVs, we have a plan expand the target to complex SVs. In
order to store complex SVs in TogoVar, an ontology that can be used to annotate them is needed.
To prepare for the situation, we have developed Genome Variation Ontology (GVO), which is
an ontology for de-scribing all types of genomic variation.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. GVO: Genome Variation Ontology</title>
      <p>We collected terms on genomic variation from gnomAD, dbSNP, dbVar, SO, VariO and HGVS
and performed manual clustering on equivalent concepts among them. As a result, we obtained
the 47 ontology classes corresponding to genomic variation types. Then we have organized
these classes hierarchically under gvo:Variation to construct GVO. These concepts include
several complex SVs, such as paired-duplication inversion and paired-deletion inversion, which
are not available in the existing ontologies. GVO is available in the web site
(http://genomevariation.org/resource/gvo) and BioPortal (https://bioportal.bioontology.org/ontologies/GVO).
Figure 1 shows an example of the GVO classes.</p>
      <p>We plan to use GVO with the FALDO ontology to convert genomic variations distributed in
VCF format into RDF. Therefore we also introduce some properties needed for this in GVO.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Collins</surname>
          </string-name>
          , et al.,
          <article-title>A structural variation reference for medical and population genetics</article-title>
          ,
          <source>Nature</source>
          <volume>581</volume>
          (
          <year>2020</year>
          )
          <fpage>444</fpage>
          -
          <lpage>451</lpage>
          .
          <source>doi:1 0 . 1 0 3 8 / s 4 1</source>
          <volume>5 8 6 - 0 2 0 - 2 2 8 7 - 8</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Eilbeck</surname>
          </string-name>
          , et al.,
          <article-title>The sequence ontology: a tool for the unification of genome annotations</article-title>
          ,
          <source>Genome Biology</source>
          <volume>6</volume>
          (
          <year>2005</year>
          )
          <article-title>R44</article-title>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>8 6</volume>
          / g b -
          <volume>2 0 0 5 - 6</volume>
          - 5 - r 4
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Vihinen</surname>
          </string-name>
          ,
          <article-title>Variation ontology for annotation of variation effects and mechanisms</article-title>
          ,
          <source>Genome Research</source>
          <volume>24</volume>
          (
          <year>2013</year>
          )
          <fpage>356</fpage>
          -
          <lpage>364</lpage>
          .
          <source>doi:1 0 . 1 1 0 1 / g r . 1 5</source>
          <volume>7 4 9 5 . 1 1</volume>
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mitsuhashi</surname>
          </string-name>
          , et al.,
          <article-title>Togovar: A comprehensive japanese genetic variation database</article-title>
          ,
          <source>Human Genome Variation</source>
          <volume>9</volume>
          (
          <year>2022</year>
          )
          <article-title>44</article-title>
          .
          <source>doi:1 0 . 1 0 3 8 / s 4 1</source>
          <volume>4 3 9 - 0 2 2 - 0 0 2 2 2 - 9</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>