<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Performance Characterization of Scientific Workflows for the Optimal Use of Burst Buffers</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Christopher</forename><forename type="middle">S</forename><surname>Daley</surname></persName>
							<email>csdaley@lbl.gov</email>
							<affiliation key="aff0">
								<orgName type="institution">Lawrence Berkeley National Laboratory</orgName>
								<address>
									<addrLine>1 Cyclotron Rd</addrLine>
									<settlement>Berkeley</settlement>
									<region>CA</region>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Devarshi</forename><surname>Ghoshal</surname></persName>
							<email>dghoshal@lbl.gov</email>
							<affiliation key="aff0">
								<orgName type="institution">Lawrence Berkeley National Laboratory</orgName>
								<address>
									<addrLine>1 Cyclotron Rd</addrLine>
									<settlement>Berkeley</settlement>
									<region>CA</region>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Glenn</forename><forename type="middle">K</forename><surname>Lockwood</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Lawrence Berkeley National Laboratory</orgName>
								<address>
									<addrLine>1 Cyclotron Rd</addrLine>
									<settlement>Berkeley</settlement>
									<region>CA</region>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sudip</forename><surname>Dosanjh</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Lawrence Berkeley National Laboratory</orgName>
								<address>
									<addrLine>1 Cyclotron Rd</addrLine>
									<settlement>Berkeley</settlement>
									<region>CA</region>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lavanya</forename><surname>Ramakrishnan</surname></persName>
							<email>lramakrishnan@lbl.gov</email>
							<affiliation key="aff0">
								<orgName type="institution">Lawrence Berkeley National Laboratory</orgName>
								<address>
									<addrLine>1 Cyclotron Rd</addrLine>
									<settlement>Berkeley</settlement>
									<region>CA</region>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nicholas</forename><forename type="middle">J</forename><surname>Wright</surname></persName>
							<email>njwright]@lbl.gov</email>
							<affiliation key="aff0">
								<orgName type="institution">Lawrence Berkeley National Laboratory</orgName>
								<address>
									<addrLine>1 Cyclotron Rd</addrLine>
									<settlement>Berkeley</settlement>
									<region>CA</region>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Performance Characterization of Scientific Workflows for the Optimal Use of Burst Buffers</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">51F7017EAC557F5ED2341F8F3FFF3D67</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T23:28+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Burst Buffer</term>
					<term>DataWarp</term>
					<term>Workflow</term>
					<term>HPC</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Scientific discoveries are increasingly dependent upon the analysis of large volumes of data from observations and simulations of complex phenomena. Scientists compose the complex analyses as workflows and execute them on largescale HPC systems. The workflow structures are in contrast with monolithic single simulations that have often been the primary use case on HPC systems. Simultaneously, new storage paradigms such as Burst Buffers are also becoming available on HPC platforms. In order to maximize the performance of data analyses workflows today it is critical to determine the characteristics of the workflows. Obtaining a deeper understanding of the workflows helps us identify opportunities to leverage the capabilities of the Burst Buffer.</p><p>In this paper, we analyze the performance characteristics of the Burst Buffer and two representative scientific workflows. We measure the performance of these workflows using the Burst Buffer, allowing us to make recommendations for future optimal usage of workflows using Burst Buffer.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>The science drivers for high-performance computing (HP-C) are broadening with the proliferation of high-resolution observational instruments and emergence of completely new data-intensive scientific domains. Scientific workflows that chain the processing and data are becoming critical to manage these on HPC systems. Thus, while providers of supercomputing resources must continue to support the extreme bandwidth requirements of traditional supercomputing applications, centers must now also deploy resources that are capable of supporting the requirements of these emerging data-intensive workflows. In sharp contrast to the highly coherent, sequential, large-transaction reads and writes that are characteristic of traditional HPC checkpointrestart workloads <ref type="bibr" target="#b8">[11]</ref>, data-intensive workflows have been shown to often utilize non-sequential, metadata-intensive, and small-transaction reads and writes <ref type="bibr" target="#b10">[13,</ref><ref type="bibr" target="#b20">23]</ref>. Parallel file systems in today's supercomputers have been optimized for more traditional HPC workloads <ref type="bibr" target="#b9">[12]</ref>. The rapid growth in I/O demands coming from data-intensive workflows are demanding new performance and optimization requirements of future HPC I/O subsystems <ref type="bibr" target="#b10">[13]</ref>. It is therefore essential to develop methods to quantitatively characterize the I/O needs of data-intensive workflows to ensure that correct resources can be deployed with the correct balance of performance characteristics.</p><p>The emergence of data-intensive workflows has coincided with the emergence of flash devices being integrated into the HPC I/O subsystem as a "Burst Buffer", a performanceoptimized storage tier that resides between compute nodes and the high-capacity parallel file system (PFS). The Burst Buffer was originally conceived for massive bandwidth requirements of checkpoint-restart workloads for extreme-scale simulation <ref type="bibr" target="#b16">[19]</ref>. The tier buffers bursts of I/O traffic to enable the PFS to service a lower bandwidth load spread over a longer time period. However, the flash-based storage media underlying Burst Buffers are also substantially faster than spinning disk for the non-sequential and small-transaction I/O workloads of data-intensive workflows. This motivates using the media for use cases beyond buffering of I/O requests, such as providing a temporary scratch space, coupling workflow stages, and in-transit processing <ref type="bibr" target="#b2">[4]</ref>.</p><p>Today's commercially available Burst Buffer solutions <ref type="bibr" target="#b14">[17]</ref> expose their flash through the POSIX API which enables workflows to easily leverage the technology's capabilities. We need to understand and optimize the use of Burst Buffers to serve the needs of data-intensive workflows. Thus, it is essential to understand workflows' specific I/O requirements in the context of both flash-based storage media and the I/O stack through which applications utilize the Burst Buffer.</p><p>In this paper, we characterize two of the production data analytics workflows used at the National Energy Research have been shown to process large amounts of data with varied I/O characteristics <ref type="bibr" target="#b13">[16,</ref><ref type="bibr" target="#b18">21,</ref><ref type="bibr" target="#b6">9,</ref><ref type="bibr">7]</ref>. Deelman et al. <ref type="bibr" target="#b11">[14]</ref> highlights several challenges in data management for dataintensive scientific workflows. Several strategies have been proposed to optimize data management for scientific workflows in HPC environments <ref type="bibr" target="#b25">[28,</ref><ref type="bibr" target="#b17">20,</ref><ref type="bibr" target="#b5">8]</ref>. However, Burst Buffers add another layer in the storage hierarchy, adding to the data management challenges for scientific workflows. Hence, it is important to characterize scientific workflows to optimally use Burst Buffers based on their I/O characteristics. In this paper, we evaluate and characterize multiple workflows with different I/O profiles to understand the optimal use of Burst Buffers. Burst Buffers. Several uses of Burst Buffers have been shown in order to mitigate the I/O bottlenecks of dataintensive workloads <ref type="bibr" target="#b16">[19,</ref><ref type="bibr" target="#b3">6,</ref><ref type="bibr" target="#b19">22,</ref><ref type="bibr" target="#b22">25]</ref>. Most studies surrounding the design and use of Burst Buffers have so far focused on the I/O characteristics of individual applications <ref type="bibr" target="#b23">[26]</ref> or small components within workflows <ref type="bibr" target="#b20">[23]</ref>. However, research into optimizing scientific workflows with diverse I/O and storage requirements for Burst Buffers is still in its infancy, and a limited body of work presently exists <ref type="bibr" target="#b10">[13,</ref><ref type="bibr">5]</ref>. Beyond single applications and workflows, researchers are investigating I/O-aware scheduling on systems with a Burst Buffer. Herbein et al. <ref type="bibr" target="#b15">[18]</ref> demonstrate that system utilization can be improved by using application drain bandwidth between the Burst Buffer and PFS as a scheduling constraint. Thapaliya et al. <ref type="bibr" target="#b21">[24]</ref> show how different Burst Buffer allocation policies and the order of servicing I/O requests affects total application throughput on a system with a shared Burst Buffer. DataWarp. DataWarp is Cray's implementation of a Burst Buffer, and few guidelines exist for how to use it optimally for scientific workflows. Bhimji et. al show performance results for a collection of applications selected as part of NERSC's Early User Program <ref type="bibr" target="#b7">[10]</ref>. The results focus on application I/O bandwidth on DataWarp and the PFS. The NERSC website provides a list of known issues and overall guidelines for achieving high performance, but does not show when, why and how to use DataWarp for specific workflow use cases <ref type="bibr" target="#b1">[1]</ref>. Our work has analyzed two data analytics workflows and identified I/O signatures along with the specific workflow requirements to advise how to use DataWarp.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">The NERSC Burst Buffer Architecture</head><p>NERSC's Cori system features a Burst Buffer based on Cray DataWarp <ref type="bibr" target="#b14">[17]</ref>. This architecture is built upon discrete Burst Buffer nodes (BB nodes), each containing two Intel P3608 SSDs that deliver 6.4 TiB of usable capacity and 5.7 GiB/s of bandwidth. Currently, Cori has a total of 144 BB nodes, over 900 TiB of usable capacity, and over 800 GiB/sec of peak performance.</p><p>Cray's DataWarp middleware aggregates the SSDs on each of the BB nodes and provides user jobs with dynamically provisioned private parallel file systems. Users can request a certain capacity of Burst Buffer in 200 GiB increments (which we call fragments) when submitting jobs. Each fragment is allocated on a different BB node to allow the aggregate performance of the BB allocation to scale with the requested capacity. DataWarp also designates one of the BB nodes as the metadata server for the allocation. This allocation is mounted on the job nodes when the job is launched, and it is typically torn down upon job completion. However, users may also request a persistent mode allocation, which allows a BB allocation to persist across multiple jobs.</p><p>DataWarp also offers private mode reservations where each compute node gets its own metadata server within the Burst Buffer allocation and, by extension, its own private namespace. This enables higher aggregate metadata performance since each compute node's metadata is serviced by a unique BB node.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">METHODOLOGY</head><p>In this section, we detail our performance analysis methodology and workloads used for our analyses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Workflows</head><p>The two workflows studied in the paper were selected because they stress the I/O subsystem in very different ways: CAMP is limited by metadata performance and SWarp is limited by data transfer performance. When discussing the workflows, we use the term "workflow pipeline" to refer to a single unit of the larger workflow. The CAMP (Community Access MODIS Pipeline) workflow processes Earth's land and atmospheric data obtained from MODIS satellite data [3, <ref type="bibr" target="#b24">27,</ref><ref type="bibr" target="#b13">16]</ref>. It transforms the MODIS data from a swath space and time coordinate system (latitude and longitude) into a sinusoidal tiling system (tiles using sinusoidal projection). The MODIS data for CAMP consists of small geometa files in plain text format and swath products as Hierarchical Data Format (HDF) files. Each geometa file is only a few KBs and is used by all the swath products from a particular satellite. Each swath product has several files per day, each of which is approx. 1.1 MB in size and contains the product data in swath space and time coordinate system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1">CAMP</head><p>The CAMP workflow consists of two processing stepsa) builddb, that assembles and maps swaths to their corresponding sinusoidal tiles and b) reproject, that converts the MODIS products from a swath coordinate system to a sinusoidal tiling system. Figure <ref type="figure" target="#fig_0">1</ref> shows the high-level representation of the CAMP workflow that includes the data staging operations to and from the Burst Buffer. The workflow pipeline in this paper transforms one MODIS product's swath coordinates for one day into one specific tile. CAMP is written in Python and generates an intermediate SQLite database to provide the mapping for the reproject stage. We use Conda, which uses the Anaconda Python distribution, to install CAMP on DataWarp.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2">SWarp</head><p>The SWarp workflow combines overlapping raw images of the night sky into high quality reference images. It is used in the Dark Energy Camera Legacy Survey (DECaLS) to produce high quality images of 14,000 deg 2 of northern hemisphere sky. In this survey, each SWarp workflow pipeline produces an image for a 0.25 deg 2 "brick" of sky. The average input to each workflow pipeline is 16 × 32 MiB input images and 16 × 16 MiB input weight maps.</p><p>The SWarp workflow pipeline consists of a data resampling stage and a data combination stage. The data resampling stage interpolates the raw images and creates resampled images which can be trivially stacked. The data combination stage reads back the resampled images and then performs a reduction over the pixels to produce a single stacked image. The raw, resampled and stacked images are all in Flexible Image Transport System (FITS) file format. The DAG when using a Burst Buffer is similar to CAMP: input images and weight map files are staged-in prior to the data resampling stage and the combined image is staged-out after the data combination stage. SWarp is written in C and multithreaded with POSIX threads.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Workload Configuration</head><p>The workflow pipelines are run in their production configuration on Cori and all I/O is directed to DataWarp mount points. The DataWarp reservation is configured to use a shared namespace and one fragment of capacity. A job reservation is used for SWarp and a persistent reservation is used for CAMP (in order to retain the CAMP Python software environment between jobs). The Integrated Performance Monitoring (IPM) profiling tool [2] is used to collect run time, memory usage and time in different I/O calls for each workflow stage. The workflow pipelines are then replicated on 1 to 64 compute nodes (with 1 workflow pipeline per compute node) and I/O is directed to a fixed storage reservation of 1 DataWarp fragment. This allows us to study how run time is affected by the saturation of the storage resource.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">RESULTS</head><p>The high-level characteristics of the stages in a single workflow pipeline are shown in Table <ref type="table">1</ref>. The workflow stages are found to spend 10 -30 % of time in I/O. This is the best achievable I/O time and can only get worse as more workflow pipelines contend for the same storage resource. Figure <ref type="figure" target="#fig_2">2</ref> shows the scaling of SWarp-resample. The results show that wall clock time remains relatively constant until about 16 workflow pipelines and that I/O time is dom- Table <ref type="table">1</ref>: Time and memory measurements achieved with 1 compute node and 1 DataWarp fragment inated by data rather than metadata operations. Figure <ref type="figure" target="#fig_3">3</ref> shows the scaling of CAMP-builddb is limited by metadata performance. One source of these metadata operations is from the startup of Python applications, which is known to be a scalability issue in Python HPC applications <ref type="bibr" target="#b12">[15]</ref>. It happens because Python searches for files providing a package in every directory in the Python path. In spite of this, the dominant source of metadata load in CAMP-builddb are the transactions to the SQLite database. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">DISCUSSION</head><p>In this section, a) we discuss the key characteristics of the workflows analyzed and use the information to highlight the effective use of Burst Buffers and, b) we apply this knowledge to explain how to achieve the optimum performance with the DataWarp implementation of a Burst Buffer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Efficient use of Burst Buffers</head><p>The key findings from our experimental analyses are: 1. A single workflow pipeline does not provide the I/O parallelism needed to make efficient use of Burst Buffers. The data analytics workflows studied in this paper consist of single-process applications which perform I/O with a single thread of execution. This is poorly matched with the need to have This is because the one-time cost of staging the input data at access time may not be hidden by significant data reuse. Automatic file movement would also transfer the intermediate files to the PFS unnecessarily. 4. It is valuable to leave data in the Burst Buffer tier for longer than a single batch job. We have found that input files and software environments are reused across workflow pipelines.</p><p>• The input data for data analytics workflows are generally Write Once Read Many times (WORM).</p><p>In the SWarp workflow a single input image often contributes to multiple regions of the sky. Therefore it is wasteful to re-stage the same input file multiple times for each workflow pipeline.</p><p>• The software environment is reused in every single workflow pipeline. In the CAMP workflow the Python environment is responsible for some of the I/O. The role of "support I/O" (e.g. Python packages) is rarely mentioned in the context of Burst Buffers. It is useful to stage the software environment once to avoid the overhead and wear of repeatedly staging the software environment. Long-term data residency is not a good fit for today's Burst Buffers because they do not provide data redundancy. This imposes a data management burden upon the developer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Efficient use of DataWarp</head><p>DataWarp storage reservations on Cori consist of multiple storage fragments of size 200 GiB. The scaling studies show that both SWarp and CAMP are limited by DataWarp performance rather than capacity. SWarp and CAMP have an aggregate capacity requirement of up to 2.6 GiB and 150 MiB per workflow pipeline, respectively (Table <ref type="table">1</ref>). However, the performance saturates before fully utilizing the 200 GiB of capacity at approximately 16 workflow pipelines per Data-Warp fragment. This means that excess capacity must be reserved to sustain performance in a scaled out workflow. Metadata bottlenecks, such as seen in CAMP-builddb, can be addressed by combining the reservation of excess capacity with the private mode feature of DataWarp.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">CONCLUSION</head><p>In this paper we analyzed the performance of two scientific workflows running on the Cori supercomputer with the DataWarp Burst Buffer. We show that a single workflow pipeline does not have the parallelism to utilize the capabilities of the Flash storage hardware. We also show that the workflows have different I/O performance characteristics: SWarp is bound by data transfer performance and CAMP (specifically CAMP-builddb) is bound by metadata performance as the workflows are scaled out. The results are used to give general advice about using Burst Buffers more efficiently and to provide specific advice for DataWarp.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: workflow: i) staging operations move the data from the parallel file system to the Burst Buffer and vice-versa, ii) builddb and reproject transform the swath products to a sinusoidal tiling system.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>Figures 2 and 3 show how I/O time changes with concurrency for the most time-consuming stage of each workflow. I/O time is divided into time spent in metadata operations and data operations. The experiments are repeated three times at each node count and the plots show the mean time per workflow pipeline stage. The error bars simply show the range of mean times over the three experiments.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Scaling of SWarp-resample with number of workflow pipelines</figDesc><graphic coords="3,323.39,322.72,225.95,169.46" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Scaling of CAMP-builddb with number of workflow pipelines</figDesc><graphic coords="4,60.37,53.80,225.95,169.46" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0">Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory, and we present an analysis of their performance on the production Burst Buffer resource deployed as a part of NERSC's Cori system. The paper is organized as follows. Section 2 presents the background for the paper -related work and the details of the NERSC Burst Buffer Architecture. Section 3 details our approach to scalable I/O characterization for both workflows and Section 4 presents a detailed analysis of the I/O requirements of these workflows. We discuss efficient use of Burst Buffers in Section 5 and provide conclusions in Section 6.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was supported by Laboratory Directed Research and Development (LDRD) funding from Berkeley Lab, provided by the Director, Office of Science and Office of Science, Office of Advanced Scientific Computing Research (ASCR) of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. The authors would also like to thank Rollin Thomas for help with installing the CAMP Python software environment on DataWarp.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title/>
		<author>
			<persName><surname>References</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<ptr target="http://www.nersc.gov/users/computational-systems/cori/burst-buffer/" />
		<title level="m">NERSC website</title>
				<imprint>
			<date type="published" when="2016-08-31">31 August 2016</date>
		</imprint>
	</monogr>
	<note>Burst Buffer</note>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Trinity / NERSC-8 Use Case Scenarios</title>
		<idno>SAND 2013-2941 P</idno>
		<ptr target="https://www.nersc.gov/assets/Trinity--NERSC-8-RFP/Documents/trinity-NERSC8-use-case-v1.2a.pdf" />
		<imprint>
			<date type="published" when="2013-04-04">Apr. 2013. 4 October 2016</date>
		</imprint>
		<respStmt>
			<orgName>Los Alamos National Laboratory, Sandia National Laboratories, NERSC</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Storage challenges at los alamos national lab</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bent</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Grider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kettering</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Manzanares</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mcclelland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Torres</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Torrez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)</title>
				<imprint>
			<date type="published" when="2012-04">April 2012</date>
			<biblScope unit="page" from="1" to="5" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Montage: a grid-enabled engine for delivering custom science-grade mosaics on demand</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">B</forename><surname>Berriman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Deelman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Good</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Jacob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Katz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Kesselman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Laity</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">A</forename><surname>Prince</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-H</forename><surname>Su</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Scheduling data-intensive workflows on storage constrained resources</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bharathi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chervenak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, WORKS &apos;09</title>
				<meeting>the 4th Workshop on Workflows in Support of Large-Scale Science, WORKS &apos;09<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page">10</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Characterization of scientific workflows</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bharathi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chervenak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Deelman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Mehta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">H</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Vahi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Third Workshop on Workflows in Support of Large-Scale Science</title>
				<imprint>
			<date type="published" when="2008-11">2008. Nov 2008</date>
			<biblScope unit="page" from="1" to="10" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Accelerating Science with the NERSC Burst Buffer Early User Program</title>
		<author>
			<persName><forename type="first">W</forename><surname>Bhimji</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Cray User Group CUG</title>
				<imprint>
			<date type="published" when="2016-05">May 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Lessons Learned from a Hero I/O Run on Hopper</title>
		<author>
			<persName><forename type="first">S</forename><surname>Byna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Uselton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Knaak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">H</forename><surname>He</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2013 Cray User Group Meeting</title>
				<meeting><address><addrLine>Napa, CA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Small-file access in parallel file systems</title>
		<author>
			<persName><forename type="first">P</forename><surname>Carns</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vilayannur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kunkel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ludwig</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Symposium on Parallel &amp; Distributed Processing</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2009">2009. 2009</date>
			<biblScope unit="page" from="1" to="11" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Analyses of Scientific Workflows for Effective Use of Future Architectures</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Daley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ramakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Dosanjh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">J</forename><surname>Wright</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 6th International Workshop on Big Data Analytics: Challenges, and Opportunities (BDAC-15)</title>
				<meeting>the 6th International Workshop on Big Data Analytics: Challenges, and Opportunities (BDAC-15)<address><addrLine>Austin, TX</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Data management challenges of data-intensive scientific workflows</title>
		<author>
			<persName><forename type="first">E</forename><surname>Deelman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chervenak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CCGRID &apos;08. 8th IEEE International Symposium on</title>
				<imprint>
			<date type="published" when="2008-05">2008. May 2008</date>
			<biblScope unit="page" from="687" to="692" />
		</imprint>
	</monogr>
	<note>Cluster Computing and the Grid</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Gpaw -massively parallel electronic structure calculations with python-based software</title>
		<author>
			<persName><forename type="first">J</forename><surname>Enkovaara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Romero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shende</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Mortensen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procedia Computer Science</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="17" to="25" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">CAMP: Community Access MODIS Pipeline</title>
		<author>
			<persName><forename type="first">V</forename><surname>Hendrix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ramakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ryu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Van Ingen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">R</forename><surname>Jackson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Agarwal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Future Generation Computer Systems</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="418" to="429" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Architecture and Design of Cray DataWarp</title>
		<author>
			<persName><forename type="first">D</forename><surname>Henseler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Landsteiner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Petesch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wright</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Wright</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Cray User Group CUG</title>
				<imprint>
			<date type="published" when="2016-05">May 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Scalable I/O-Aware Job Scheduling for Burst Buffer Enabled HPC Clusters</title>
		<author>
			<persName><forename type="first">S</forename><surname>Herbein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">H</forename><surname>Ahn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lipari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">R</forename><surname>Scogland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Stearman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Grondona</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Garlick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Springmeyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Taufer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC &apos;16</title>
				<meeting>the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC &apos;16<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="69" to="80" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">On the role of burst buffers in leadership-class storage systems</title>
		<author>
			<persName><forename type="first">N</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cope</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Carns</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Carothers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Grider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Crume</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Maltzahn</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)</title>
				<imprint>
			<date type="published" when="2012-04">Apr. 2012</date>
			<biblScope unit="page" from="1" to="11" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">On timely staging of hpc job input data</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">M</forename><surname>Monti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename><surname>Butt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Vazhkudai</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Parallel and Distributed Systems</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page" from="1841" to="1851" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">A multi-dimensional classification model for scientific workflow characteristics</title>
		<author>
			<persName><forename type="first">L</forename><surname>Ramakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Plale</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science, Wands &apos;10</title>
				<meeting>the 1st International Workshop on Workflow Approaches to New Data-centric Science, Wands &apos;10<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page">12</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">A user-level infiniband-based file system and checkpoint strategy for burst buffers</title>
		<author>
			<persName><forename type="first">K</forename><surname>Sato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Mohror</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moody</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Gamblin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">R D</forename><surname>Supinski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Maruyama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Matsuoka</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on</title>
				<imprint>
			<date type="published" when="2014-05">May 2014</date>
			<biblScope unit="page" from="21" to="30" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">A</forename><surname>Standish</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Carland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">K</forename><surname>Lockwood</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Pfeiffer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tatineni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">C</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lamberth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cherkas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Brodmerkel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Jaeger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Rajagopal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Curran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">J</forename><surname>Schork</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">304</biblScope>
			<date type="published" when="2015-12">dec 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Managing I/O Interference in a Shared Burst Buffer System</title>
		<author>
			<persName><forename type="first">S</forename><surname>Thapaliya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bangalore</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lofstead</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Mohror</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moody</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">45th International Conference on Parallel Processing (ICPP)</title>
				<imprint>
			<date type="published" when="2016-08">2016. Aug. 2016</date>
			<biblScope unit="page" from="416" to="425" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">On the Role of NVRAM in Data-intensive Architectures: An Evaluation</title>
		<author>
			<persName><forename type="first">B</forename><surname>Van Essen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Pearce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ames</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gokhale</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 26th International Parallel and Distributed Processing Symposium</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2012">2012. 2012</date>
			<biblScope unit="page" from="703" to="714" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Development of a burst buffer system for data-intensive applications</title>
		<author>
			<persName><forename type="first">T</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Oral</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Pritchard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Vasko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Yu</surname></persName>
		</author>
		<idno>CoRR, abs/1505.01765</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Modis land data storage, gridding, and compositing methodology: Level 2 grid</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename><surname>Wolfe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Roy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Vermote</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Geoscience and Remote Sensing</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="1324" to="1338" />
			<date type="published" when="1998-07">Jul 1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Optimizing center performance through coordinated data staging, scheduling and recovery</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Vazhkudai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">G</forename><surname>Pike</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Cobb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Mueller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, SC &apos;07</title>
				<meeting>the 2007 ACM/IEEE Conference on Supercomputing, SC &apos;07<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="volume">55</biblScope>
			<biblScope unit="page">11</biblScope>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
