=Paper=
{{Paper
|id=Vol-2841/SIMPLIFY_10
|storemode=property
|title=MRbox: Simplifying Working with Remote Heterogeneous Analytics and Storage Services via Localised Views
|pdfUrl=https://ceur-ws.org/Vol-2841/SIMPLIFY_10.pdf
|volume=Vol-2841
|authors=Athina Kyriakou,Iraklis Angelos Klampanos
|dblpUrl=https://dblp.org/rec/conf/edbt/KyriakouK21
}}
==MRbox: Simplifying Working with Remote Heterogeneous Analytics and Storage Services via Localised Views==
<pdf width="1500px">https://ceur-ws.org/Vol-2841/SIMPLIFY_10.pdf</pdf>
<pre>
     MRbox: Simplifying Working with Remote Heterogeneous
       Analytics and Storage Services via Localised Views
                              Athina Kyriakou                                                            Iraklis A. Klampanos
               National Technical University of Athens                                 National Centre for Scientific Research "Demokritos"
                          Zografou, Greece                                                           Agia Paraskevi, Greece
                    athina.skyriakou@gmail.com                                                   iaklampanos@iit.demokritos.gr

ABSTRACT
The management, analysis and sharing of big data usually in-
volves interacting with multiple heterogeneous remote and local
resources. Performing data-intensive operations in this environ-
ment is typically a non-automated and arduous task that often
requires deep knowledge of the underlying technical details by
non-experts. MapReduce box (MRbox) is an open-source exper-
imental application that aims to lower the barrier of technical
expertise needed to use powerful big data analytics tools and
platforms. MRbox extends the Dropbox interaction paradigm,
providing a unifying view of the data shared across multiple het-                      Figure 1: The concept of MRbox is to simplify interacting
erogeneous infrastructures, as if they were local. It also enables                     with multiple analytics and big data resources by extend-
users to schedule and execute analytics on remote computational                        ing the cloud storage paradigm.
resources by just interacting with local files and folders. MRbox
currently supports Hadoop and ownCloud/B2DROP services and                             need to know explicitly in which remote or local resource they
MapReduce jobs can be scheduled and executed. We hope to fur-                          reside. In addition, in the current prototype, users have the ability
ther expand MRbox so that it unifies more types of resources, and                      to schedule and execute MapReduce jobs [3] on a remote Hadoop
to explore ways for users to interact with complex infrastructures                     cluster [12] from their local machine. The input data can be
more simply and intuitively.                                                           stored in any connected resource but users do not have to run
                                                                                       specialised HDFS commands [12] on their terminal to move them
1    INTRODUCTION                                                                      to a specific Hadoop cluster and to issue the MapReduce job.
A number of multidisciplinary scientific and business problems                         Lastly, output data generated from MapReduce processes are
require the use of advanced analytics tools and the management,                        fetched locally for ad hoc analytics and they are pushed onto the
processing and sharing of big data across local and remote plat-                       B2DROP data exchange service[6], so that users can share their
forms, services and cloud infrastructures. Users often define com-                     results with colleagues and with the outside world.
plex analytics pipelines using workflow management systems,                                MRbox could be useful to different actors. It can be used by
which can also orchestrate their execution. However, in prac-                          (i) data analysts, researchers and engineers working on big data
tice, this orchestration is only partially automatic because the                       crunching problems, but also by (ii) systems and e-infrastructures
tools, resources and execution environments used by different                          that seek to seamlessly integrate with third-party big data and
organisations are highly heterogeneous while the majority of                           data management resources. MRbox is an open-source project
workflow management systems provide solutions to field-specific                        hosted on GitHub1 .
problems [8]. As a result, researchers and data analysts are forced
to encounter the technical details of the underlying tools and                         2    ASSUMPTIONS AND USE CASES
resources.                                                                             For the development of MRbox, the two main assumptions made
    In response to this challenge, MRbox attempts to lower the                         are: (i) Users do not necessarily have full control of the needed
barrier of required technical knowledge to use the needed tools                        remote resources. They just have rights to create, delete, modify
and infrastructures and to hide the complexity of big data man-                        and relocate files in their remote folders and run computation
agement from non-experts. To this end, we investigate provid-                          jobs on them. (ii) When working with large and complex data
ing local views on remote computation and file storage cloud                           sets, moving them should be avoided unless necessary[2].
resources, extending the paradigm of cloud storage, synchroni-                             For the development of the current prototype, the following
sation and data exchange services, such as Dropbox [5]. Using                          use cases were identified. Firstly, users should easily connect to
the file system of the local operating system, MRbox provides an                       the integrated remote infrastructures, just by running the MRbox
overview of the data stored in the connected remote infrastruc-                        application. Secondly, they need to have a complete view of the
tures and enables users to perform computation jobs on them, as                        file system hierarchy in the remote resources from their local file
if they were local, and to share the files produced easily.                            system. When a remote data set exceeds in size the maximum
    More specifically, users can delete, modify or move data sets,                     file size that can be replicated in the local machine, a link file
just by interacting with local files and folders and without the                       will be created locally, containing only the path to the remote file.
                                                                                       Thirdly, users of MRbox would have a live preview of the files
© 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed-   residing in a remote source by running file system commands in
ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus)
on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0           their terminal (e.g. head, tail), even when the files do not exist
International (CC BY 4.0)
                                                                                       1 https://github.com/AthinaKyriakou/mrbox
physically in the local machine. Moreover, users should be able         to 20GB of storage per user. B2DROP offers automatic synchro-
to schedule computation jobs that will be executed in remote            nisation via ownCloud3 , an open source file synchronisation and
processing infrastructures by referring to files as if they were        sharing tool. Alternatively, the service can be accessed on the
local. Lastly, users need to have seamless access to the output files   Web via an intuitive user interface, or on the local machine via a
generated by the scheduled jobs in order to do further analysis,        WebDAV Client.
or share them with their peers.
                                                                        3.3     MRbox Configuration
3     SYSTEM OVERVIEW                                                   To connect to a remote HDFS, the local host needs to be con-
This section summarises the main components and features of             figured as a client node. This usually requires to have a copy
the application.                                                        of the Hadoop distribution locally available and configured to
                                                                        access the remote cluster[15]. In MRbox.conf – the configuration
3.1      A Sample Session With and Without                              file of MRbox – the user needs to specify the host and port of
                                                                        the HDFS NameNode, as well as the path to the local Hadoop
         MRbox
                                                                        installation. The user can also determine the absolute path of the
As an example, let us consider the scheduling of a MapReduce            local and HDFS folders that will be in sync, and the maximum
job, which is the currently supported computation framework. In         size of the files that will be retrieved locally once created in an
the table below we compare the set of actions that a user needs         integrated remote resource. Depending on the distribution policy
to perform, with and without using MRbox.                               for MRbox, the Hadoop part of the configuration may also be
 Using MRbox                     Without Using MRbox                    pre-configured, therefore saving the end user from even being
 1. Code the Map and Reduce      1. Code the Map and Reduce             aware of the connection specifics to the remote Hadoop cluster.
 functions locally.              functions locally.                         In addition, the B2DROP service is currently used by installing
 2. Move the input file (or a    2. If the input data set is in         on the local machine the ownCloud Desktop Synchronisation
 link to it) in the local MRbox  the local file system, copy it         Client following the B2DROP documentation. Both B2DROP
 folder, if it is not already    to an HDFS cluster via HDFS            and MRbox monitor the same local folder. A more fine-grained
 there.                          terminal commands. If it is            integration of ownCloud in MRbox will be developed in the future,
                                 on a remote resource, use the          e.g. by modifying the existing client.
                                 resource-specific API or in-
                                 structions to copy it to HDFS.         3.4     Managing Files
 3. Create a yaml file specify- 3. Use the Hadoop Streaming             MRbox creates folders locally and on the integrated remote re-
 ing the local paths to Map API to run the job. Move and                sources, under the root paths specified in the MRbox.conf file.
 and Reduce functions, input generally manage the files                 In order to keep track of local and remote instances of files and
 file and desired output loca- generated using the HDFS                 to promote consistency, MRbox implements a catalogue. This
 tion. Move the yaml file in the commands as needed.                    section describes this catalogue, how it stores mappings of lo-
 local MRbox folder.                                                    cal paths to remote ones, the synchronisation process between
 Outputs are automatically Use HDFS terminal com-                       the local MRbox folder, HDFS and B2DROP, the implementation
 synced in the local MRbox mands to fetch the outputs                   of links to comply with local size constraints, and the use of
 folder and in B2DROP. Links locally. Use an ownCloud                   checksums for data validation.
 are created if the files are client or API to copy them to
                                                                           3.4.1 The Local Catalogue. The local catalogue is used to
 larger than a predefined size. B2DROP.
                                                                        maintain mappings between local and remote files and direc-
                                                                        tories. In the current prototype of MRbox, the local catalogue
3.2      Supported Resources
                                                                        maps the local paths to the paths of the corresponding files and
MRbox currently assumes a UNIX clone as the local and host              directories on HDFS, making it possible for the user to interact
system and supports the Apache Hadoop framework [13] and                with the HDFS of a remote cluster through the local file system
the B2DROP service [6].                                                 hierarchy. Its current implementation is in SQLite[10], a light-
   Apache Hadoop is a widely used and open-source MapReduce             weight, self-contained SQL database engine that does not require
framework. It includes two main modules; Hadoop Distributed             a separate server process to operate and therefore is integrated in
File System (HDFS) and the Hadoop MapReduce. HDFS is the                the application itself. For each file and folder, a record is created
distributed file system primarily used by Hadoop applications,          in the database consisting of the path, the most recent modifica-
providing high data-access performance, fault tolerance and na-         tion time and the checksum of the local and respective copy on
tive support of large data sets. MRbox establishes a connection         HDFS. Each record also has an attribute specifying whether the
to the desired HDFS on a remote cluster and it interacts with           local object is a file, a link (see Section 3.4.3) or a directory.
Hadoop MapReduce via the Hadoop command line tools pro-
vided.                                                                     3.4.2 Synchronisation. Synchronisation between the local
   B2DROP[6] is a user-friendly data synchronisation and ex-            MRbox folder and HDFS is one-way as we can only monitor the
change service created by the European Data Infrastructure (EU-         local folder for changes. Our current implementation makes use
DAT2 ) for research communities. It provides a secure storage           of the Python Watchdog library [11], which monitors a desig-
environment for long-tail but still volatile data that are subject      nated local folder for changes. Depending on the type of the
to active research, while facilitating the process of sharing and       event identified, one of the following actions will take place:
keeping them up-to-date across different machines. It offers up              • on_created(): When a file, link or directory is created
                                                                                locally, it is registered in the local catalogue. For the case
2 https://eudat.eu/                                                     3 https://owncloud.com/
      of file or directory, a copy is created on the remote HDFS.       dfs.checksum.combine.mode to COMPOSITE_CRC. The compos-
      If the file created has a yaml extension, MRbox attempts          ite CRC checksum is not applicable to links and directories. In
      to run a MapReduce job on the remote Hadoop cluster, ac-          contrast to Hadoop’s default MD5 of MD5 of CRC file check-
      cording to the specification passed in the yaml file (Figure      sum computed across chunks and blocks, the composite CRC is
      2).                                                               independent of chunk and block configurations and describes
    • on_deleted(): When a file, link or directory is deleted           only the logical file contents. As a result, it permits comparison
      locally, MRbox deletes the corresponding HDFS file and            between striped and replicated files, between HDFS instances
      the record from the local catalogue.                              with potentially different chunk and block size configurations,
    • on_modified(): When a file is modified locally, the HDFS          as well as between HDFS and local files or other external storage
      file is modified accordingly and the local and HDFS check-        systems that implement Hadoop’s File System interface [7, 14].
      sums on the local catalogue are updated.                          At the end of each transfer, local file checksums are computed
    • on_moved(): When a file, link or directory is moved within        and compared against the checksums computed on HDFS. If the
      the local MRbox folder, the corresponding file or directory       checksums do not match, the transfer is repeated.
      is relocated accordingly within the HDFS folder.
                                                                        3.5    Scheduling MapReduce Jobs
   Handling Bidirectional Synchronisation. As discussed above,
the current prototype assumes that users do not necessarily have        MRbox aims to highlight the functional aspects of MapReduce
full control over the remote resources. This means that, if files       by hiding the technical implementation details from users. In the
on HDFS change by any means other than through interacting              future, MRbox could be extended to support more computational
with MRbox, MRbox will not have a way to track these changes,           frameworks. To trigger a MapReduce job users need to:
therefore leading to inconsistencies. To handle such potential                • Implement the Map and Reduce functions in two distinct
inconsistencies between the local and HDFS MRbox folders an                     files locally. Any programming language supported by the
offline synchronisation process should be scheduled to run peri-                Hadoop Streaming utility can be used [16].
odically. This is left for future work as low priority, since MRbox’s         • Specify in a yaml file the absolute paths of the mapper
main goal is to provide local views on remote HDFS resources,                   and reducer scripts in the local file system. Also, users
hiding direct access from its users.                                            need to define the relative paths of the input as well as
   In the case of B2DROP, bidirectional synchronisation is achie-               the output location. The input path can point to a file or
ved through its desktop client, which is installed locally (Figure              a link. All paths are local and users do not need to know
2). To support special functionalities of MRbox (e.g. see Section               the file structure of the remote HDFS cluster.
3.4.3), this client would have to be modified further.                        • Move or save the yaml file in the local MRbox folder to
                                                                                trigger the execution.
    3.4.3 Links. In MRbox, links are a special read-only file type
                                                                           MRbox will automatically issue the configured MapReduce
recognised by the link extension and registered in the local cata-
                                                                        job to the remote Hadoop using the Hadoop Streaming utility.
logue. Links are created to comply with local file size restrictions
                                                                        The outputs will be fetched locally in the specified output path
when large data sets are generated on the HDFS cluster and need
                                                                        as files or links. The ownCloud Desktop Client, will replicate the
to be made available locally. In the current prototype such files
                                                                        output file to B2DROP, giving the user the possibility to share it
are the outputs of MapReduce jobs on HDFS.
                                                                        with people working on the same project.
    The local file size limit can be specified in the MRbox configu-
ration. After the completion of a MapReduce job, the local file size
limit is compared against the size of the output file produced on
                                                                        3.6    Additional Tools
HDFS. If the generated file is larger than the maximum allowable        The current prototype enables users to get a live preview of data
size limit, a link file is created, which contains the path to the      sets residing on HDFS, by running mrview.py cmd path for a file,
remote file. This link can also be used in the browser to view the      link or directory in the local MRbox folder. If the specified path
file on HDFS, if this is supported by the remote cluster. Moreover,     corresponds to a file or directory, the supported UNIX command
if a link is deleted or moved within the local MRbox folder, the        is executed. If the path corresponds to a link, the data set does not
corresponding remote file will be deleted or moved respectively.        exist locally due to size constraints. Users can run head and tail
                                                                        commands to get an overview of the HDFS data corresponding
   Synchronisation of Links Between B2DROP and HDFS. In the             to the link. The list of supported commands for links can be
current prototype, a link file in the local MRbox folder is syn-        extended to a complete suite of tools in the future.
chronised verbatim onto B2DROP. However, this is not always
desirable, since users would expect the complete file to be avail-      4     RELATED WORK
able on a data exchange resource such as B2DROP. After all, the
                                                                        Scientific and business workflow management systems and Work-
users that need to access a file on B2DROP may not have access
                                                                        flow-as-a-Service platforms [1, 4, 9, etc.] often facilitate the pro-
to the HDFS cluster where it was created. To allow for special
                                                                        cessing of big data across multiple e-infrastructures. However,
synchronisation treatment for the case of large files that appear
                                                                        only a limited number of them integrate big data frameworks,
as links locally, a customised synchronisation client would need
                                                                        such as Hadoop, directly, without focusing on a certain data
to be implemented.
                                                                        organisation or field-specific problems [17]. In addition, these
   3.4.4 File Checksums. Checksums are used to guarantee data           systems still present challenges for big data analytics in the cloud
integrity in file transfers between the local and HDFS folder.          and when used across organisations with potentially heteroge-
HDFS uses a 32-bit cyclic redundancy check based on the Castag-         neous resources and execution environments [8]. As a result,
noli polynomial (CRC32C). To perform end-to-end client side             the data operations that need to be performed are partially auto-
validation, MRbox adopts a composite CRC file checksum intro-           mated and professionals have to manually configure and use the
duced in Apache Hadoop 3.1.1 that can be configured by setting          involved services and local and remote infrastructures. Finally,
                    MRbox
                (via Watchdog)
                                                                   SQLite3 DB            Remote HDFS   according to a distribution policy, the connection to remote in-
                                                                    Instance
                                                                    (via Local                         frastructures, such as a Hadoop cluster, could be pre-configured,
     create obj in the                                            Catalog Class)
    local mrbox folder                                                                                 saving the end user from even being aware of the configuration
                                  get HDFS path(local_path)
                                                                                                       specifics. Additionally, to verify our concept, we intend to mea-
         alt
                                       hdfs_path                                                       sure the usability and performance of the current prototype via
       obj registered
                                 find type and checksum of local obj
                                                                                                       user studies covering a range of uses and workloads. Furthermore,
            in the
        local catalog       update db record(loc_path, loc_chk)                                        while maintaining simplicity, we will investigate expanding to
                              find type, HDFS path, loc checksum
                                                                                                       other resources and execution contexts, e.g. triggering the execu-
               else                  create db record                                                  tion of numerical codes on MPI clusters.
                         (loc_path, hdfs_path, loc_chk, obj _type)
                                                                                                          A valuable more theoretical follow-up work could be the for-
         alt                                                                                           malisation of resource types and integration policies, which will
                              mkdir(hdfs_path)
          obj is dir
        && does not
                                                                                                       add to the extensibility of MRbox while minimising the risk of
       exist on HDFS                                                                                   data inconsistencies and resource mismanagement. This could
                              put(loc_path, hdfs_path)
        obj is file &&   get HDFS checksum(hdfs_path)
                                                                                                       lead to future research towards a more general framework, to de-
       does not exist
         on HDFS         HDFS file checksum                                                            scribe arbitrary computational and storage resources. An alterna-
                          update db record(event_path, hdfs_chk)                                       tive direction could be the exploitation of the unifying file-based
         alt
                                                                                                       view of heterogeneous resources to integrate widely used tools
                                              issue MapReduce job
                         (mapper, reducer, input file, output local path specified in yaml file)       and user interfaces, e.g. spreadsheet applications to transparently
      obj is yaml file
                              job's outputs in the output dir, in link or file type                    make use of large data sets on remote HDFS clusters.

                                                                                                       REFERENCES
                                                                                                        [1] I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, and S. Mock. 2004. Ke-
Figure 2: Sequence of events after a file is created in the                                                 pler: an extensible system for design and execution of scientific workflows. In
local MRbox folder                                                                                          Proceedings. 16th International Conference on Scientific and Statistical Database
                                                                                                            Management, 2004. 423–424. https://doi.org/10.1109/SSDM.2004.1311241
                                                                                                        [2] Malcolm Atkinson. 2018. Pushing the Limits of Data Powered Research.
                                                                                                            https://doi.org/10.5281/zenodo.1164420
                                                                                                        [3] Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Pro-
                                                                                                            cessing on Large Clusters. Commun. ACM 51, 1 (2008), 107–113.
identifying whether a workflow supports a specific data lake,                                           [4] Ewa Deelman, Karan Vahi, Gideon Juve, Mats Rynge, Scott Callaghan, Philip J
platform or tool is usually a challenging task and users need to                                            Maechling, Rajiv Mayani, Weiwei Chen, Rafael Ferreira da Silva, Miron Livny,
get familiar with a new Graphical User Interface (GUI) and the                                              and Kent Wenger. 2015. Pegasus: a Workflow Management System for Science
                                                                                                            Automation. Future Generation Computer Systems 46 (2015), 17–35. https:
specific workflow model to efficiently use the management sys-                                              //doi.org/10.1016/j.future.2014.10.008 Funding Acknowledgements: NSF ACI
tem [8, 17]. In contrast to workflow management systems, MRbox                                              SDCI 0722019, NSF ACI SI2-SSI 1148515 and NSF OCI-1053575.
does not require that its users learn a new GUI or a workflow                                           [5] Dropbox [n.d.]. What is Dropbox - Features Overview - Dropbox. Dropbox.
                                                                                                            https://www.dropbox.com/features.
model specification language.                                                                           [6] EUDAT Collaborative Data Infrastructure 2019. B2DROP User Documentation.
   Furthermore, MRbox borrows from and extends cloud storage,                                               EUDAT Collaborative Data Infrastructure. https://eudat.eu/services/userdoc/
                                                                                                            b2drop.
synchronisation and data exchange services, such as Dropbox.                                            [7] Google Cloud 2020. Validating data transfers between HDFS and Cloud Stor-
However, apart from synchronising files and folders and allowing                                            age. Google Cloud. https://cloud.google.com/solutions/migration/hadoop/
users to share their data, MRbox investigates the extension of                                              validating-data-transfers.
                                                                                                        [8] Samiya Khan, Syed Arshad Ali, Nabeela Hasan, Kashish Ara Shakil, and
this interaction paradigm for data analytics and distributed pro-                                           Mansaf Alam. 2019. Big data scientific workflows in the cloud: Challenges
cessing, as well as for synchronising amongst multiple resources.                                           and future prospects. In Cloud computing for geospatial big data analytics.
                                                                                                            Springer, 1–28.
                                                                                                        [9] Iraklis A. Klampanos, Chrysoula Themeli, Alessandro Spinuso, Rosa Filgueira,
5       CONCLUSIONS AND FUTURE WORK                                                                         Malcolm Atkinson, André Gemünd, and Vangelis Karkaletsis. 2020. DARE
                                                                                                            Platform a Developer-Friendly and Self-Optimising Workflows-as-a-Service
For the processing and sharing of big data, researchers and data                                            Framework for e-Science on the Cloud. Journal of Open Source Software 5, 54
analysts use a plethora of tools and heterogeneous infrastruc-                                              (2020), 2664. https://doi.org/10.21105/joss.02664
                                                                                                       [10] Python Software Foundation 2020. sqlite3 — DB-API 2.0 interface for SQLite
tures. As a result, performing data-intensive tasks is typically                                            databases. Python Software Foundation. https://docs.python.org/3/library/
non-automated and arduous and requires technical knowledge of                                               sqlite3.html.
the underlying systems. MRbox aims to lower the barrier of tech-                                       [11] Python Software Foundation 2020. watchdog 1.0.2. Python Software Founda-
                                                                                                            tion. https://pypi.org/project/watchdog/.
nical expertise by hiding the complexity of big data technologies                                      [12] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler.
from non-experts. Using the file system hierarchy of the local                                              2010. The Hadoop Distributed File System. In 2010 IEEE 26th symposium on
operating system, MRbox provides a unifying view of local and                                               mass storage systems and technologies (MSST). IEEE, 1–10.
                                                                                                       [13] The Apache Software Foundation [n.d.]. Apache Hadoop. The Apache Software
remote resources and the data residing in them, and enables users                                           Foundation. https://hadoop.apache.org/.
to schedule computational jobs on remote infrastructures as if                                         [14] The Apache Software Foundation 2019. Expose file-level composite CRCs in
                                                                                                            HDFS which are comparable across different instances/layouts. The Apache
they were local. In the current prototype, Hadoop and B2DROP                                                Software Foundation. https://issues.apache.org/jira/browse/HDFS-13056.
services are supported and MapReduce jobs can be scheduled.                                            [15] The Apache Software Foundation 2019. Hadoop: Setting up a Single Node
   This work can be improved in several ways. Firstly, we need                                              Cluster. The Apache Software Foundation. https://hadoop.apache.org/docs/
                                                                                                            stable/hadoop-project-dist/hadoop-common/SingleCluster.html.
to test MRbox under heavier workloads. A more fine-grained                                             [16] The Apache Software Foundation 2019. Hadoop Streaming. The Apache
integration of ownCloud in MRbox will be implemented to en-                                                 Software Foundation. https://hadoop.apache.org/docs/r1.2.1/streaming.html.
able users to directly connect to the B2DROP service without                                           [17] Jianwu Wang, Daniel Crawl, and Ilkay Altintas. 2009. Kepler + Hadoop: A
                                                                                                            General Architecture Facilitating Data-Intensive Applications in Scientific
installing a separate client. Secondly, the ownCloud client needs                                           Workflow Systems. In Proceedings of the 4th Workshop on Workflows in Support
to be modified and expanded further to support special func-                                                of Large-Scale Science. 1–8.
tionalities of MRbox, e.g. for the case of links, the complete data
set should be available on resources such as B2DROP. Thirdly,

</pre>