<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>FAIR for automatic federated omics analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daphne Wijnbergen</string-name>
          <email>d.wijnbergen@lumc.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgios Malamas</string-name>
          <email>yiorgos.malamas@student.uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Roos</string-name>
          <email>m.roos@lumc.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eleni Mina</string-name>
          <email>e.mina@lumc.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Human Genetics, Leiden University Medical Center</institution>
          ,
          <addr-line>Einthovenweg 20, 2333 ZC Leiden</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SWAT4HCLS: 15th International SWAT4HCLS Conference</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this work, we create a workflow to apply federated gene expression meta-analysis in the Virtual Platform of the EJP RD. Based on this workflow, we identify which metadata is needed to make the data machine actionable. We then present a metadata schema that is based on the EJP RD metadata schema and consists of scientific, biological and file metadata.</p>
      </abstract>
      <kwd-group>
        <kwd>FAIR</kwd>
        <kwd>Federated analysis</kwd>
        <kwd>metadata</kwd>
        <kwd>machine actionability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        In the analysis of biomedical data, a large amount of time and efort is spent on finding datasets,
mapping identifiers, and data munging. An initiative that can help mitigate these issues is FAIR
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. With FAIR, machines can increasingly perform actions on data without human intervention,
if machine-actionable metadata is provided.
      </p>
      <p>Another factor that hinders the application of data analysis in biomedical research is privacy.
Human data, such as genomic data, is privacy sensitive and can not be fully anonymized.
Consequently, data often cannot be accessed and analyzed from outside the institute where
it was generated. Multiple eforts are ongoing to create infrastructures that enable federated
analysis of data. In this paradigm, an analysis method can be sent from one institute to the data
of another and executed, if approved. The results are then sent back to the first institute. This
ensures that the analysis can be performed, while privacy is preserved. One such efort is the
development of the “Virtual Platform” (VP) network of FAIR resources by the European Joint
Programme on Rare Diseases (EJP RD). Currently, various resources relevant for rare diseases
are FAIRified and connected within the VP. One goal of the EJP RD is to enable automated,
federated analysis over the resources in the VP.</p>
      <p>In our project, we created a workflow to apply federated analysis on omics data for rare
diseases. To achieve this, we have identified what metadata is needed to perform this analysis
by machines in an automated way.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Methods</title>
      <p>We implemented a workflow for gene expression analysis in Inclusion Body Myositis to serve
as the basis of our use case. This workflow consists of four main steps: (1) Identifying
transcriptomics datasets of interest (2) Applying diferential gene expression analysis on these datasets (3)
Mapping identifiers between datasets for data integration, (4) applying meta-analysis (analysis
of multiple analysis results) to determine which genes are diferentially expressed in multiple
studies in Inclusion Body Myositis.</p>
      <p>
        We identified what metadata is necessary for the data to be machine actionable for the purpose
of this use case. The VP metadata schema and an extension of the Data Catalog Vocabulary
(DCAT) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] were analyzed and extended with metadata elements needed for our use case.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Results</title>
      <p>We defined a metadata schema that extends the EJP RD and DCAT metadata schemas. This
schema contains metadata in three categories: 1.Scientific metadata such as the measurement
type, measurement device, and study design, that help find datasets that are measuring variables
of interest E.g. measurements of gene expression in case vs control. 2. Biological metadata such
as the disease, species and tissue, that are needed to select datasets that are biologically relevant
for the research question; e.g. Selecting datasets for Inclusion Body Myositis. 3. Metadata about
the data file itself, such as the download URL, the format, the media type, and a domain specific
ifle specification, that are needed for the machine to understand how to use the data.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Discussion</title>
      <p>In this work, we created a workflow for detecting diferential gene expression in various
transcriptomics datasets together with a metadata schema to make these datasets machine
actionable. Our work enables a machine to automatically run this workflow in a federated
manner on (privacy-sensitive) omics datasets for various rare diseases in the VP.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We would like to thank Luiz Bonino, Mark Wilkinson, Andra Waagmeester, Henriette Harmse,
Sunil Rodger, Alberto Camara Ballesteros, Wolmar Akerstrom, Eric Prud’Hommeaux and
Alexandra Tataru for helpful discussions. This initiative received funding from the EU Horizon
2020, grant agreements 825575 (EJP RD) and 824087 (EOSC-Life), and ELIXIR.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>M. D.</surname>
          </string-name>
          Wilkinson et al.,
          <article-title>Comment: The FAIR guiding principles for scientific data management and stewardship</article-title>
          ,
          <source>Scientific Data</source>
          <volume>3</volume>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . doi:
          <volume>10</volume>
          .1038/sdata.
          <year>2016</year>
          .
          <volume>18</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Albertoni</surname>
          </string-name>
          et al.,
          <source>Data catalog vocabulary (DCAT) - version 2</source>
          ,
          <year>2020</year>
          . URL: https://www.w3. org/TR/vocab-dcat-
          <volume>2</volume>
          /.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>