=Paper=
{{Paper
|id=Vol-1800/short4
|storemode=property
|title=Performance Characterization of Scientific Workflows for the Optimal Use of Burst Buffers
|pdfUrl=https://ceur-ws.org/Vol-1800/short4.pdf
|volume=Vol-1800
|authors=Christopher S. Daley,Devarshi Ghoshal,Glenn K. Lockwood,Sudip Dosanjh,Lavanya Ramakrishnan,Nicholas J. Wright
|dblpUrl=https://dblp.org/rec/conf/sc/DaleyGLDRW16
}}
==Performance Characterization of Scientific Workflows for the Optimal Use of Burst Buffers==
<pdf width="1500px">https://ceur-ws.org/Vol-1800/short4.pdf</pdf>
<pre>
     Performance Characterization of Scientific Workflows for
               the Optimal Use of Burst Buffers

            Christopher S. Daley, Devarshi Ghoshal, Glenn K. Lockwood, Sudip Dosanjh,
                            Lavanya Ramakrishnan, Nicholas J. Wright.
                                          Lawrence Berkeley National Laboratory
                                                    1 Cyclotron Rd
                                                     Berkeley, CA
                      [csdaley,dghoshal,glock,sudip,lramakrishnan,njwright]@lbl.gov

ABSTRACT                                                              I/O needs of data-intensive workflows to ensure that cor-
Scientific discoveries are increasingly dependent upon the            rect resources can be deployed with the correct balance of
analysis of large volumes of data from observations and sim-          performance characteristics.
ulations of complex phenomena. Scientists compose the                    The emergence of data-intensive workflows has coincided
complex analyses as workflows and execute them on large-              with the emergence of flash devices being integrated into
scale HPC systems. The workflow structures are in contrast            the HPC I/O subsystem as a “Burst Buffer”, a performance-
with monolithic single simulations that have often been the           optimized storage tier that resides between compute nodes
primary use case on HPC systems. Simultaneously, new                  and the high-capacity parallel file system (PFS). The Burst
storage paradigms such as Burst Buffers are also becoming             Buffer was originally conceived for massive bandwidth re-
available on HPC platforms. In order to maximize the per-             quirements of checkpoint-restart workloads for extreme-scale
formance of data analyses workflows today it is critical to           simulation [19]. The tier buffers bursts of I/O traffic to en-
determine the characteristics of the workflows. Obtaining a           able the PFS to service a lower bandwidth load spread over a
deeper understanding of the workflows helps us identify op-           longer time period. However, the flash-based storage media
portunities to leverage the capabilities of the Burst Buffer.         underlying Burst Buffers are also substantially faster than
In this paper, we analyze the performance characteristics             spinning disk for the non-sequential and small-transaction
of the Burst Buffer and two representative scientific work-           I/O workloads of data-intensive workflows. This motivates
flows. We measure the performance of these workflows using            using the media for use cases beyond buffering of I/O re-
the Burst Buffer, allowing us to make recommendations for             quests, such as providing a temporary scratch space, cou-
future optimal usage of workflows using Burst Buffer.                 pling workflow stages, and in-transit processing [4].
                                                                         Today’s commercially available Burst Buffer solutions [17]
                                                                      expose their flash through the POSIX API which enables
Keywords                                                              workflows to easily leverage the technology’s capabilities.
Burst Buffer; DataWarp; Workflow; HPC                                 We need to understand and optimize the use of Burst Buffers
                                                                      to serve the needs of data-intensive workflows. Thus, it is
                                                                      essential to understand workflows’ specific I/O requirements
1.    INTRODUCTION                                                    in the context of both flash-based storage media and the I/O
   The science drivers for high-performance computing (HP-            stack through which applications utilize the Burst Buffer.
C) are broadening with the proliferation of high-resolution              In this paper, we characterize two of the production data
observational instruments and emergence of completely new             analytics workflows used at the National Energy Research
data-intensive scientific domains. Scientific workflows that          Scientific Computing Center (NERSC) at Lawrence Berke-
chain the processing and data are becoming critical to man-           ley National Laboratory, and we present an analysis of their
age these on HPC systems. Thus, while providers of su-                performance on the production Burst Buffer resource de-
percomputing resources must continue to support the ex-               ployed as a part of NERSC’s Cori system. The paper is
treme bandwidth requirements of traditional supercomput-              organized as follows. Section 2 presents the background for
ing applications, centers must now also deploy resources              the paper - related work and the details of the NERSC Burst
that are capable of supporting the requirements of these              Buffer Architecture. Section 3 details our approach to scal-
emerging data-intensive workflows. In sharp contrast to               able I/O characterization for both workflows and Section 4
the highly coherent, sequential, large-transaction reads and          presents a detailed analysis of the I/O requirements of these
writes that are characteristic of traditional HPC checkpoint-         workflows. We discuss efficient use of Burst Buffers in Sec-
restart workloads [11], data-intensive workflows have been            tion 5 and provide conclusions in Section 6.
shown to often utilize non-sequential, metadata-intensive,
and small-transaction reads and writes [13, 23]. Parallel file
systems in today’s supercomputers have been optimized for             2.    BACKGROUND
more traditional HPC workloads [12]. The rapid growth in               In this section we describe related work and the NERSC
I/O demands coming from data-intensive workflows are de-              Burst Buffer architecture.
manding new performance and optimization requirements
of future HPC I/O subsystems [13]. It is therefore essen-             2.1   Related Work
tial to develop methods to quantitatively characterize the            Scientific Workflows. Data-intensive scientific workflows


Copyright held by the author(s).
                                                                 69
WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah


have been shown to process large amounts of data with var-             allows a BB allocation to persist across multiple jobs.
ied I/O characteristics [16, 21, 9, 7]. Deelman et al. [14]               DataWarp also offers private mode reservations where each
highlights several challenges in data management for data-             compute node gets its own metadata server within the Burst
intensive scientific workflows. Several strategies have been           Buffer allocation and, by extension, its own private names-
proposed to optimize data management for scientific work-              pace. This enables higher aggregate metadata performance
flows in HPC environments [28, 20, 8]. However, Burst                  since each compute node’s metadata is serviced by a unique
Buffers add another layer in the storage hierarchy, adding             BB node.
to the data management challenges for scientific workflows.
Hence, it is important to characterize scientific workflows to
optimally use Burst Buffers based on their I/O character-              3.    METHODOLOGY
istics. In this paper, we evaluate and characterize multiple             In this section, we detail our performance analysis method-
workflows with different I/O profiles to understand the op-            ology and workloads used for our analyses.
timal use of Burst Buffers.
Burst Buffers. Several uses of Burst Buffers have been                 3.1     Workflows
shown in order to mitigate the I/O bottlenecks of data-                   The two workflows studied in the paper were selected be-
intensive workloads [19, 6, 22, 25]. Most studies surrounding          cause they stress the I/O subsystem in very different ways:
the design and use of Burst Buffers have so far focused on the         CAMP is limited by metadata performance and SWarp is
I/O characteristics of individual applications [26] or small           limited by data transfer performance. When discussing the
components within workflows [23]. However, research into               workflows, we use the term “workflow pipeline” to refer to a
optimizing scientific workflows with diverse I/O and storage           single unit of the larger workflow.
requirements for Burst Buffers is still in its infancy, and a
limited body of work presently exists [13, 5]. Beyond sin-             3.1.1    CAMP
gle applications and workflows, researchers are investigating
I/O-aware scheduling on systems with a Burst Buffer. Her-
bein et al. [18] demonstrate that system utilization can be
improved by using application drain bandwidth between the
Burst Buffer and PFS as a scheduling constraint. Thapaliya
et al. [24] show how different Burst Buffer allocation policies
and the order of servicing I/O requests affects total applica-
tion throughput on a system with a shared Burst Buffer.
DataWarp. DataWarp is Cray’s implementation of a Burst
Buffer, and few guidelines exist for how to use it optimally
for scientific workflows. Bhimji et. al show performance
results for a collection of applications selected as part of
NERSC’s Early User Program [10]. The results focus on
application I/O bandwidth on DataWarp and the PFS. The
NERSC website provides a list of known issues and over-
all guidelines for achieving high performance, but does not
show when, why and how to use DataWarp for specific work-
flow use cases [1]. Our work has analyzed two data analytics           Figure 1: CAMP workflow: i) staging operations move the
workflows and identified I/O signatures along with the spe-            data from the parallel file system to the Burst Buffer and
cific workflow requirements to advise how to use DataWarp.             vice-versa, ii) builddb and reproject transform the swath
                                                                       products to a sinusoidal tiling system.
2.2    The NERSC Burst Buffer Architecture
   NERSC’s Cori system features a Burst Buffer based on                   The CAMP (Community Access MODIS Pipeline) work-
Cray DataWarp [17]. This architecture is built upon discrete           flow processes Earth’s land and atmospheric data obtained
Burst Buffer nodes (BB nodes), each containing two Intel               from MODIS satellite data [3, 27, 16]. It transforms the
P3608 SSDs that deliver 6.4 TiB of usable capacity and 5.7             MODIS data from a swath space and time coordinate system
GiB/s of bandwidth. Currently, Cori has a total of 144 BB              (latitude and longitude) into a sinusoidal tiling system (tiles
nodes, over 900 TiB of usable capacity, and over 800 GiB/sec           using sinusoidal projection). The MODIS data for CAMP
of peak performance.                                                   consists of small geometa files in plain text format and swath
   Cray’s DataWarp middleware aggregates the SSDs on each              products as Hierarchical Data Format (HDF) files. Each ge-
of the BB nodes and provides user jobs with dynamically                ometa file is only a few KBs and is used by all the swath
provisioned private parallel file systems. Users can request           products from a particular satellite. Each swath product
a certain capacity of Burst Buffer in 200 GiB increments               has several files per day, each of which is approx. 1.1 MB in
(which we call fragments) when submitting jobs. Each frag-             size and contains the product data in swath space and time
ment is allocated on a different BB node to allow the ag-              coordinate system.
gregate performance of the BB allocation to scale with the                The CAMP workflow consists of two processing steps –
requested capacity. DataWarp also designates one of the BB             a) builddb, that assembles and maps swaths to their cor-
nodes as the metadata server for the allocation. This alloca-          responding sinusoidal tiles and b) reproject, that converts
tion is mounted on the job nodes when the job is launched,             the MODIS products from a swath coordinate system to a
and it is typically torn down upon job completion. However,            sinusoidal tiling system. Figure 1 shows the high-level rep-
users may also request a persistent mode allocation, which             resentation of the CAMP workflow that includes the data


                                                                  70
WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah


staging operations to and from the Burst Buffer. The work-                                     SWarp     SWarp      CAMP      CAMP
flow pipeline in this paper transforms one MODIS product’s                                      rsmpl     coadd       db       reprj
swath coordinates for one day into one specific tile. CAMP
                                                                       Compute threads              16        16          1          1
is written in Python and generates an intermediate SQLite
database to provide the mapping for the reproject stage. We            I/O threads                   1         1          1          1
use Conda, which uses the Anaconda Python distribution,                Wall time (s)              10.7       4.7       15.3        9.2
to install CAMP on DataWarp.                                           I/O time (s)                2.2       1.2        2.1        1.5
                                                                       I/O time (%)               20.3      26.0       13.5       16.6
3.1.2    SWarp                                                         Peak mem. (MiB)           108.8    1064.7       96.1       93.0
   The SWarp workflow combines overlapping raw images of               Total file size (MiB)    1686.5    1016.8       74.1       77.5
the night sky into high quality reference images. It is used in
the Dark Energy Camera Legacy Survey (DECaLS) to pro-
                                                                       Table 1: Time and memory measurements achieved with 1
duce high quality images of 14,000 deg2 of northern hemi-
                                                                       compute node and 1 DataWarp fragment
sphere sky. In this survey, each SWarp workflow pipeline
produces an image for a 0.25 deg2 “brick” of sky. The av-
erage input to each workflow pipeline is 16 × 32 MiB input
                                                                       inated by data rather than metadata operations. Figure 3
images and 16 × 16 MiB input weight maps.
                                                                       shows the scaling of CAMP-builddb is limited by metadata
   The SWarp workflow pipeline consists of a data resam-
                                                                       performance. One source of these metadata operations is
pling stage and a data combination stage. The data resam-
                                                                       from the startup of Python applications, which is known to
pling stage interpolates the raw images and creates resam-
                                                                       be a scalability issue in Python HPC applications [15]. It
pled images which can be trivially stacked. The data com-
                                                                       happens because Python searches for files providing a pack-
bination stage reads back the resampled images and then
                                                                       age in every directory in the Python path. In spite of this,
performs a reduction over the pixels to produce a single
                                                                       the dominant source of metadata load in CAMP-builddb are
stacked image. The raw, resampled and stacked images are
                                                                       the transactions to the SQLite database.
all in Flexible Image Transport System (FITS) file format.
The DAG when using a Burst Buffer is similar to CAMP:
input images and weight map files are staged-in prior to the
data resampling stage and the combined image is staged-out
after the data combination stage. SWarp is written in C and
multithreaded with POSIX threads.

3.2     Workload Configuration
  The workflow pipelines are run in their production config-
uration on Cori and all I/O is directed to DataWarp mount
points. The DataWarp reservation is configured to use a
shared namespace and one fragment of capacity. A job reser-
vation is used for SWarp and a persistent reservation is used
for CAMP (in order to retain the CAMP Python software
environment between jobs). The Integrated Performance
Monitoring (IPM) profiling tool [2] is used to collect run
time, memory usage and time in different I/O calls for each
workflow stage. The workflow pipelines are then replicated
on 1 to 64 compute nodes (with 1 workflow pipeline per com-
pute node) and I/O is directed to a fixed storage reservation          Figure 2: Scaling of SWarp-resample with number of work-
of 1 DataWarp fragment. This allows us to study how run                flow pipelines
time is affected by the saturation of the storage resource.

4.    RESULTS
   The high-level characteristics of the stages in a single            5.    DISCUSSION
workflow pipeline are shown in Table 1. The workflow stages               In this section, a) we discuss the key characteristics of the
are found to spend 10 - 30 % of time in I/O. This is the best          workflows analyzed and use the information to highlight the
achievable I/O time and can only get worse as more work-               effective use of Burst Buffers and, b) we apply this knowl-
flow pipelines contend for the same storage resource.                  edge to explain how to achieve the optimum performance
   Figures 2 and 3 show how I/O time changes with concur-              with the DataWarp implementation of a Burst Buffer.
rency for the most time-consuming stage of each workflow.
I/O time is divided into time spent in metadata operations             5.1    Efficient use of Burst Buffers
and data operations. The experiments are repeated three                  The key findings from our experimental analyses are:
times at each node count and the plots show the mean time                1. A single workflow pipeline does not provide
per workflow pipeline stage. The error bars simply show the                 the I/O parallelism needed to make efficient
range of mean times over the three experiments.                             use of Burst Buffers. The data analytics workflows
   Figure 2 shows the scaling of SWarp-resample. The re-                    studied in this paper consist of single-process applica-
sults show that wall clock time remains relatively constant                 tions which perform I/O with a single thread of exe-
until about 16 workflow pipelines and that I/O time is dom-                 cution. This is poorly matched with the need to have


                                                                  71
WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah


                                                                              • The software environment is reused in every sin-
                                                                                 gle workflow pipeline. In the CAMP workflow the
                                                                                 Python environment is responsible for some of the
                                                                                 I/O. The role of “support I/O” (e.g. Python pack-
                                                                                 ages) is rarely mentioned in the context of Burst
                                                                                 Buffers. It is useful to stage the software envi-
                                                                                 ronment once to avoid the overhead and wear of
                                                                                 repeatedly staging the software environment.
                                                                            Long-term data residency is not a good fit for today’s
                                                                            Burst Buffers because they do not provide data redun-
                                                                            dancy. This imposes a data management burden upon
                                                                            the developer.

                                                                      5.2    Efficient use of DataWarp
                                                                         DataWarp storage reservations on Cori consist of multi-
                                                                      ple storage fragments of size 200 GiB. The scaling studies
                                                                      show that both SWarp and CAMP are limited by DataWarp
Figure 3: Scaling of CAMP-builddb with number of work-                performance rather than capacity. SWarp and CAMP have
flow pipelines                                                        an aggregate capacity requirement of up to 2.6 GiB and 150
                                                                      MiB per workflow pipeline, respectively (Table 1). However,
                                                                      the performance saturates before fully utilizing the 200 GiB
                                                                      of capacity at approximately 16 workflow pipelines per Data-
     multiple I/O streams to obtain the peak performance              Warp fragment. This means that excess capacity must be
     from Burst Buffer Flash storage. Unfortunately, sin-             reserved to sustain performance in a scaled out workflow.
     gle I/O stream workflow pipelines are a common fea-              Metadata bottlenecks, such as seen in CAMP-builddb, can
     ture of high throughput data analytics workflows. We             be addressed by combining the reservation of excess capacity
     show that better utilization of Burst Buffer resources           with the private mode feature of DataWarp.
     is possible by executing multiple concurrent workflow
     pipelines against the same unit of Burst Buffer storage.         6.    CONCLUSION
     Our results indicate that a single unit of DataWarp                 In this paper we analyzed the performance of two sci-
     storage on Cori can sustain the I/O requests from ap-            entific workflows running on the Cori supercomputer with
     proximately 16 concurrent workflow pipelines before              the DataWarp Burst Buffer. We show that a single work-
     there is any slow down.                                          flow pipeline does not have the parallelism to utilize the
  2. A scaled out workflow pipeline is often limited                  capabilities of the Flash storage hardware. We also show
     by metadata performance. Our analysis has found                  that the workflows have different I/O performance charac-
     significant metadata costs originating from database             teristics: SWarp is bound by data transfer performance and
     transactions, Python initialization and opening many             CAMP (specifically CAMP-builddb) is bound by metadata
     small files. The aggregated metadata operations from             performance as the workflows are scaled out. The results are
     multiple workflow pipelines can easily saturate a sin-           used to give general advice about using Burst Buffers more
     gle metadata server, as shown in the CAMP-builddb                efficiently and to provide specific advice for DataWarp.
     workflow stage.
  3. It is valuable to explicitly control the data in
     the Burst Buffer tier. The workflows read input                  Acknowledgments
     data sets and produce a number of intermediate files             This work was supported by Laboratory Directed Research
     which can be discarded once there are final results,             and Development (LDRD) funding from Berkeley Lab, pro-
     e.g. the resampled images in SWarp and the SQLite                vided by the Director, Office of Science and Office of Science,
     database in CAMP. Therefore, we do not expect au-                Office of Advanced Scientific Computing Research (ASCR)
     tomatic file movement between the Burst Buffer and               of the U.S. Department of Energy under Contract No. DE-
     the PFS to benefit these two data analytics workflows.           AC02-05CH11231. This research used resources of the Na-
     This is because the one-time cost of staging the input           tional Energy Research Scientific Computing Center, a DOE
     data at access time may not be hidden by significant             Office of Science User Facility supported by the Office of Sci-
     data reuse. Automatic file movement would also trans-            ence of the U.S. Department of Energy under Contract No.
     fer the intermediate files to the PFS unnecessarily.             DE-AC02-05CH11231. The authors would also like to thank
  4. It is valuable to leave data in the Burst Buffer                 Rollin Thomas for help with installing the CAMP Python
     tier for longer than a single batch job. We have                 software environment on DataWarp.
     found that input files and software environments are
     reused across workflow pipelines.
        • The input data for data analytics workflows are
                                                                      7.    REFERENCES
           generally Write Once Read Many times (WORM).                [1] Burst Buffer. NERSC website: http://www.nersc.gov/
           In the SWarp workflow a single input image often                users/computational-systems/cori/burst-buffer/;
           contributes to multiple regions of the sky. There-              accessed 31 August 2016.
           fore it is wasteful to re-stage the same input file         [2] IPM. https://github.com/nerscadmin/IPM; accessed
           multiple times for each workflow pipeline.                      13 July 2016.


                                                                 72
WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah


 [3] NASA MODIS Website. http://modis.gsfc.nasa.gov/.                   DataWarp. In Cray User Group CUG, May 2016.
 [4] Trinity / NERSC-8 Use Case Scenarios. Technical               [18] S. Herbein, D. H. Ahn, D. Lipari, T. R. Scogland,
     Report SAND 2013-2941 P, Los Alamos National                       M. Stearman, M. Grondona, J. Garlick,
     Laboratory, Sandia National Laboratories, NERSC,                   B. Springmeyer, and M. Taufer. Scalable I/O-Aware
     Apr. 2013.                                                         Job Scheduling for Burst Buffer Enabled HPC
     https://www.nersc.gov/assets/Trinity--NERSC-8-                     Clusters. In Proceedings of the 25th ACM
     RFP/Documents/trinity-NERSC8-use-case-v1.2a.pdf;                   International Symposium on High-Performance
     accessed 4 October 2016.                                           Parallel and Distributed Computing, HPDC ’16, pages
 [5] APEX Workflows. Technical report, Los Alamos                       69–80, New York, NY, USA, 2016. ACM.
     National Laboratory, NERSC, and Sandia National               [19] N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross,
     Laboratories, Los Alamos, NM, 2016.                                G. Grider, A. Crume, and C. Maltzahn. On the role of
 [6] J. Bent, G. Grider, B. Kettering, A. Manzanares,                   burst buffers in leadership-class storage systems. In
     M. McClelland, A. Torres, and A. Torrez. Storage                   IEEE 28th Symposium on Mass Storage Systems and
     challenges at los alamos national lab. In IEEE 28th                Technologies (MSST), pages 1–11, Apr. 2012.
     Symposium on Mass Storage Systems and                         [20] H. M. Monti, A. R. Butt, and S. S. Vazhkudai. On
     Technologies (MSST), pages 1–5, April 2012.                        timely staging of hpc job input data. IEEE
 [7] G. B. Berriman, E. Deelman, J. C. Good, J. C. Jacob,               Transactions on Parallel and Distributed Systems,
     D. S. Katz, C. Kesselman, A. C. Laity, T. A. Prince,               24(9):1841–1851, 2013.
     G. Singh, and M.-H. Su. Montage: a grid-enabled               [21] L. Ramakrishnan and B. Plale. A multi-dimensional
     engine for delivering custom science-grade mosaics on              classification model for scientific workflow
     demand, 2004.                                                      characteristics. In Proceedings of the 1st International
 [8] S. Bharathi and A. Chervenak. Scheduling                           Workshop on Workflow Approaches to New
     data-intensive workflows on storage constrained                    Data-centric Science, Wands ’10, pages 4:1–4:12, New
     resources. In Proceedings of the 4th Workshop on                   York, NY, USA, 2010. ACM.
     Workflows in Support of Large-Scale Science, WORKS            [22] K. Sato, K. Mohror, A. Moody, T. Gamblin, B. R.
     ’09, pages 3:1–3:10, New York, NY, USA, 2009. ACM.                 d. Supinski, N. Maruyama, and S. Matsuoka. A
 [9] S. Bharathi, A. Chervenak, E. Deelman, G. Mehta,                   user-level infiniband-based file system and checkpoint
     M. H. Su, and K. Vahi. Characterization of scientific              strategy for burst buffers. In Cluster, Cloud and Grid
     workflows. In 2008 Third Workshop on Workflows in                  Computing (CCGrid), 2014 14th IEEE/ACM
     Support of Large-Scale Science, pages 1–10, Nov 2008.              International Symposium on, pages 21–30, May 2014.
[10] W. Bhimji et al. Accelerating Science with the                [23] K. A. Standish, T. M. Carland, G. K. Lockwood,
     NERSC Burst Buffer Early User Program. In Cray                     W. Pfeiffer, M. Tatineni, C. C. Huang, S. Lamberth,
     User Group CUG, May 2016.                                          Y. Cherkas, C. Brodmerkel, E. Jaeger, L. Smith,
[11] S. Byna, A. Uselton, D. Knaak, and Y. H. He. Lessons               G. Rajagopal, M. E. Curran, and N. J. Schork.
     Learned from a Hero I/O Run on Hopper. In 2013                     Group-based variant calling leveraging next-generation
     Cray User Group Meeting, Napa, CA, 2013.                           supercomputing for large-scale whole-genome
[12] P. Carns, S. Lang, R. Ross, M. Vilayannur, J. Kunkel,              sequencing studies. BMC Bioinformatics, 16(1):304,
     and T. Ludwig. Small-file access in parallel file                  dec 2015.
     systems. In 2009 IEEE International Symposium on              [24] S. Thapaliya, P. Bangalore, J. Lofstead, K. Mohror,
     Parallel & Distributed Processing, pages 1–11. IEEE,               and A. Moody. Managing I/O Interference in a Shared
     may 2009.                                                          Burst Buffer System. In 2016 45th International
[13] C. S. Daley, L. Ramakrishnan, S. Dosanjh, and N. J.                Conference on Parallel Processing (ICPP), pages
     Wright. Analyses of Scientific Workflows for Effective             416–425, Aug. 2016.
     Use of Future Architectures. In Proceedings of the 6th        [25] B. Van Essen, R. Pearce, S. Ames, and M. Gokhale.
     International Workshop on Big Data Analytics:                      On the Role of NVRAM in Data-intensive
     Challenges, and Opportunities (BDAC-15), Austin,                   Architectures: An Evaluation. In 2012 IEEE 26th
     TX, 2015.                                                          International Parallel and Distributed Processing
[14] E. Deelman and A. Chervenak. Data management                       Symposium, pages 703–714. IEEE, may 2012.
     challenges of data-intensive scientific workflows. In         [26] T. Wang, S. Oral, M. Pritchard, K. Vasko, and W. Yu.
     Cluster Computing and the Grid, 2008. CCGRID ’08.                  Development of a burst buffer system for
     8th IEEE International Symposium on, pages 687–692,                data-intensive applications. CoRR, abs/1505.01765,
     May 2008.                                                          2015.
[15] J. Enkovaara, N. A. Romero, S. Shende, and J. J.              [27] R. E. Wolfe, D. P. Roy, and E. Vermote. Modis land
     Mortensen. Gpaw - massively parallel electronic                    data storage, gridding, and compositing methodology:
     structure calculations with python-based software.                 Level 2 grid. IEEE Transactions on Geoscience and
     Procedia Computer Science, 4:17 – 25, 2011.                        Remote Sensing, 36(4):1324–1338, Jul 1998.
[16] V. Hendrix, L. Ramakrishnan, Y. Ryu, C. van Ingen,            [28] Z. Zhang, C. Wang, S. S. Vazhkudai, X. Ma, G. G.
     K. R. Jackson, and D. Agarwal. CAMP: Community                     Pike, J. W. Cobb, and F. Mueller. Optimizing center
     Access MODIS Pipeline. Future Generation Computer                  performance through coordinated data staging,
     Systems, 36:418 – 429, 2014.                                       scheduling and recovery. In Proceedings of the 2007
[17] D. Henseler, B. Landsteiner, D. Petesch, C. Wright,                ACM/IEEE Conference on Supercomputing, SC ’07,
     and N. Wright. Architecture and Design of Cray                     pages 55:1–55:11, New York, NY, USA, 2007. ACM.


                                                              73

</pre>