<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>HCLS</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1546-1696</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Representing bioinformatics Nextflow workflows in RO -Crate : challenges and opportunities</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>George Marchment</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marie Schmit</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Clémence Sebe</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frédéric Lemoine</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hervé Ménager</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sarah Cohen-Boulakia</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institut Pasteur, Université Paris Cité</institution>
          ,
          <addr-line>Bioinformatics and Biostatistics Hub, Paris, France, 28, rue du Dr Roux, Paris 75015</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique</institution>
          ,
          <addr-line>Orsay 91405</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>15</volume>
      <issue>4</issue>
      <fpage>26</fpage>
      <lpage>29</lpage>
      <abstract>
        <p>This research was conducted as part of the ShareFAIR project dedicated to assisting bioinformaticians in creating, comparing, and exchanging robust analysis workflows for multiscale datasets related to neuro-vascular pathologies. ShareFAIR partners use workflows from diverse types of systems. One important challenge lies in uniformly and simply representing workflows such that partners can understand, share and reuse them. The aim of this research is to evaluate how the standards currently available, especially RO-Crate, allow to represent workflows.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Workflows</kwd>
        <kwd>RO-Crate</kwd>
        <kwd>Nextflow</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Body</title>
      <p>This research was conducted as part of the ShareFAIR project (https://projet.liris.cnrs.fr/sharefair/),
a collaborative initiative involving nine French research partners. ShareFAIR is dedicated to
assisting bioinformaticians in creating, comparing, and ex changing robust analysis workflows for
multi-scale datasets related to neuro-vascular pathologies, encompassing genomic, neuro-vascular
imaging, and clinical data.</p>
      <p>ShareFAIR partners use workflows from diverse types of systems (e.g., Snakemake [ 3],
Nextflow [2], Galaxy [5]). One important challenge lies in uniformly and simply representing
workflows such that partners can understand, share and reuse them.</p>
      <p>The work presented here specifically focuses on workflows from the Nextflow workflow
system where already two distinct types of workflows can be designed, namely, domain specific
language DSL1 and DSL2. Available Nextflow workflows are equally distributed between DSL1
and DSL2.</p>
      <p>Our aim is to evaluate how the standards currently available, especially RO-Crate [4], allow to
describe Next -flow workflows, both from DSL1 and DSL2, at various levels of granularity.</p>
      <p>To do so, we have collected over 1,500 Nextflow workflows using a dedicated crawler
specifically designed to extract public Nextflow workflows from GitHub repositories. Figure 1
shows the number of workflows found on GitHub by creation date, as extracted b y ourcrawler. We then
parsed and analysed our workflow collection, in order to extract several pieces of information. This
includes the workflow’s metadata, its subworkflows and its constituting processes alongside their
inputs and outputs, thus forming a comprehensive dataset.</p>
      <p>Parsing and analysing this dataset allowed us to realise how heterogeneous the diverse
implementations of Nextflow workflows are. Annotating and describing them in a homogeneous
way would greatly facilitate their sharing, comparison and interrogation. RO -Crate emerges as a
strong contender for this undertaking.</p>
      <p>This work received support from the National Research Agency under the France 2030
program, with reference to ANR-22-PESN-0007.</p>
    </sec>
    <sec id="sec-2">
      <title>2. References</title>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>