<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Representing bioinformatics Nextflow workflows in RO-Crate : challenges and opportunities</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">George</forename><surname>Marchment</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Laboratoire Interdisciplinaire des Sciences du Numérique</orgName>
								<orgName type="institution" key="instit1">Université Paris-Saclay</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<address>
									<postCode>91405</postCode>
									<settlement>Orsay</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marie</forename><surname>Schmit</surname></persName>
							<affiliation key="aff1">
								<orgName type="department" key="dep1">Institut Pasteur</orgName>
								<orgName type="department" key="dep2">Bioinformatics and Biostatistics Hub</orgName>
								<orgName type="institution">Université Paris Cité</orgName>
								<address>
									<addrLine>28, rue du Dr Roux</addrLine>
									<postCode>75015</postCode>
									<settlement>Paris, Paris</settlement>
									<country>France, France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Clémence</forename><surname>Sebe</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Laboratoire Interdisciplinaire des Sciences du Numérique</orgName>
								<orgName type="institution" key="instit1">Université Paris-Saclay</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<address>
									<postCode>91405</postCode>
									<settlement>Orsay</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Frédéric</forename><surname>Lemoine</surname></persName>
							<affiliation key="aff1">
								<orgName type="department" key="dep1">Institut Pasteur</orgName>
								<orgName type="department" key="dep2">Bioinformatics and Biostatistics Hub</orgName>
								<orgName type="institution">Université Paris Cité</orgName>
								<address>
									<addrLine>28, rue du Dr Roux</addrLine>
									<postCode>75015</postCode>
									<settlement>Paris, Paris</settlement>
									<country>France, France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Hervé</forename><surname>Ménager</surname></persName>
							<affiliation key="aff1">
								<orgName type="department" key="dep1">Institut Pasteur</orgName>
								<orgName type="department" key="dep2">Bioinformatics and Biostatistics Hub</orgName>
								<orgName type="institution">Université Paris Cité</orgName>
								<address>
									<addrLine>28, rue du Dr Roux</addrLine>
									<postCode>75015</postCode>
									<settlement>Paris, Paris</settlement>
									<country>France, France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sarah</forename><surname>Cohen-Boulakia</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Laboratoire Interdisciplinaire des Sciences du Numérique</orgName>
								<orgName type="institution" key="instit1">Université Paris-Saclay</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<address>
									<postCode>91405</postCode>
									<settlement>Orsay</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Representing bioinformatics Nextflow workflows in RO-Crate : challenges and opportunities</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">A09E6CBF81ECB2B2195F330D080481CA</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:40+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Workflows</term>
					<term>RO-Crate</term>
					<term>Nextflow</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This research was conducted as part of the ShareFAIR project dedicated to assisting bioinformaticians in creating, comparing, and exchanging robust analysis workflows for multiscale datasets related to neuro-vascular pathologies. ShareFAIR partners use workflows from diverse types of systems. One important challenge lies in uniformly and simply representing workflows such that partners can understand, share and reuse them. The aim of this research is to evaluate how the standards currently available, especially RO-Crate, allow to represent workflows.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This research was conducted as part of the ShareFAIR project (https://projet.liris.cnrs.fr/sharefair/), a collaborative initiative involving nine French research partners. ShareFAIR is dedicated to assisting bioinformaticians in creating, comparing, and exchanging robust analysis workflows for multi-scale datasets related to neuro-vascular pathologies, encompassing genomic, neuro-vascular imaging, and clinical data.</p><p>ShareFAIR partners use workflows from diverse types of systems (e.g., Snakemake <ref type="bibr" target="#b2">[3]</ref>, Nextflow <ref type="bibr" target="#b1">[2]</ref>, Galaxy [5]). One important challenge lies in uniformly and simply representing workflows such that partners can understand, share and reuse them.</p><p>The work presented here specifically focuses on workflows from the Nextflow workflow system where already two distinct types of workflows can be designed, namely, domain specific language DSL1 and DSL2. Available Nextflow workflows are equally distributed between DSL1 and DSL2.</p><p>Our aim is to evaluate how the standards currently available, especially RO-Crate <ref type="bibr" target="#b3">[4]</ref>, allow to describe Next -flow workflows, both from DSL1 and DSL2, at various levels of granularity.</p><p>To do so, we have collected over 1,500 Nextflow workflows using a dedicated crawler specifically designed to extract public Nextflow workflows from GitHub repositories. Figure <ref type="figure" target="#fig_0">1</ref> shows the number of workflows found on GitHub by creation date, as extracted by our crawler. We then parsed and analysed our workflow collection, in order to extract several pieces of information. This includes the workflow's metadata, its subworkflows and its constituting processes alongside their inputs and outputs, thus forming a comprehensive dataset.</p><p>Parsing and analysing this dataset allowed us to realise how heterogeneous the diverse implementations of Nextflow workflows are. Annotating and describing them in a homogeneous way would greatly facilitate their sharing, comparison and interrogation. RO-Crate emerges as a strong contender for this undertaking. RO-Crate is a standard for aggregating and describing research data along with associated metadata. It allows, among other things, to describe workflows and scripts. However, the framework provided by RO-Crate may not be fully suitable to describe workflows at a high level of detail (e.g., up to the data flow structure or detailed process description). To do so, it may be adapted to reach a higher level of granularity.</p><p>In this study, we investigate the possibilities offered by RO-Crate for describing Nextflow workflows and present solutions to enhance it for capturing a more advanced level of workflow information.</p><p>Ultimately, we aim to extend this work to Snakemake and Galaxy workflows, enabling crossplatform comparisons between workflows. Additionally, we intend to investigate how other solutions, such as the Common Workflow Language <ref type="bibr" target="#b0">[1]</ref>, could contribute to improving workflow descriptions. This work received support from the National Research Agency under the France 2030 program, with reference to ANR-22-PESN-0007.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Evolution of the monthly and cumulative number of Nextflow workflows available on GitHub since 2017RO-Crate is a standard for aggregating and describing research data along with associated metadata. It allows, among other things, to describe workflows and scripts. However, the framework provided by RO-Crate may not be fully suitable to describe workflows at a high level of detail (e.g., up to the data flow structure or detailed process description). To do so, it may be adapted to reach a higher level of granularity.In this study, we investigate the possibilities offered by RO-Crate for describing Nextflow workflows and present solutions to enhance it for capturing a more advanced level of workflow information.Ultimately, we aim to extend this work to Snakemake and Galaxy workflows, enabling crossplatform comparisons between workflows. Additionally, we intend to investigate how other solutions, such as the Common Workflow Language<ref type="bibr" target="#b0">[1]</ref>, could contribute to improving workflow descriptions.</figDesc><graphic coords="2,176.75,72.00,241.50,151.50" type="bitmap" /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Methods included: standardizing computational reuse and portability with the Common Workflow Language</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Crusoe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Abeln</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Iosup</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Amstutz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chilton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tijanic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ménager</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soiland-Reyes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Gavrilovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Goble</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">C</forename><surname>Community</surname></persName>
		</author>
		<idno type="DOI">10.1145/3486897</idno>
		<idno>0001-0782</idno>
		<ptr target="https://dl.acm.org/doi/10.1145/3486897" />
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">65</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="54" to="63" />
			<date type="published" when="2022-05">Mai 2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Nextflow enables reproducible computational workflows</title>
		<author>
			<persName><forename type="first">P</forename><surname>Di Tommaso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chatzou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">W</forename><surname>Floden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">P</forename><surname>Barja</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Palumbo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Notredame</surname></persName>
		</author>
		<ptr target="https://www.nature.com/articles/nbt.3820" />
	</analytic>
	<monogr>
		<title level="j">Nature Biotechnology</title>
		<idno type="ISSN">1546-1696</idno>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="316" to="319" />
			<date type="published" when="2017-04">Apr. 2017</date>
			<publisher>Nature Publishing Group</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Snakemake-a scalable bioinformatics workflow engine</title>
		<author>
			<persName><forename type="first">J</forename><surname>Köster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rahmann</surname></persName>
		</author>
		<idno type="DOI">10.1093/bioinformatics/bts480</idno>
		<ptr target="https://doi.org/10.1093/bioinformatics/bts480" />
	</analytic>
	<monogr>
		<title level="j">Bioinformatics</title>
		<idno type="ISSN">1367-4803</idno>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="issue">19</biblScope>
			<biblScope unit="page" from="2520" to="2522" />
			<date type="published" when="2012">Okt. 2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">The Galaxy Community: The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update</title>
		<author>
			<persName><forename type="first">S</forename><surname>Soiland-Reyes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sefton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Crosas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">J</forename><surname>Castro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Coppens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Fernández</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Garijo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Grüning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>La Rosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Leo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>; Ó Carragáin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Portier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Trisovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R.-C</forename><surname>Community</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Groth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Goble</surname></persName>
		</author>
		<idno type="DOI">10.1093/nar/gkac247</idno>
		<ptr target="https://doi.org/10.1093/nar/gkac247" />
	</analytic>
	<monogr>
		<title level="j">Nucleic Acids Research</title>
		<idno type="ISSN">0305-1048</idno>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="97" to="138" />
			<date type="published" when="2022-01-01">Jan. 2022. /W1, W345-W351. Juli 2022</date>
			<publisher>IOS Press</publisher>
		</imprint>
	</monogr>
	<note>Data Science</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
