Data Curation Policies for EUDAT Collaborative Data Infrastructure © Vasily Bunakov1, © Alexia de Casanove2, © Pascal Dugénie2, © Rene van Horik3, © Simon Lambert1, © Javier Quinteros4, © Linda Reijnhoudt3 1 Science and Technology Facilities Council, Harwell Campus, United Kingdom 2 CINES, Montpellier, France 3 Data Archiving and Networked Services (DANS), The Hague, Netherlands 4 GFZ German Research Centre for Geoscience, Potsdam, Germany vasily.bunakov@stfc.ac.uk, casanove@cines.fr, dugenie@cines.fr, rene.van.horik@dans.knaw.nl, simon.lambert@stfc.ac.uk, javier@gfz-potsdam.de, linda.reijnhoudt@dans.knaw.nl Abstract.The work outlines an approach to the development of a data curation framework in the EUDAT Collaborative Data Infrastructure. Practical use cases are described as well as provisional results of defining granular data curation policies with high potential for their machine-executable implementation. Keywords: data curation, e-infrastructures, long-term digital preservation, policies • B2STAGE – service for managing data transfers 1 Introduction between EUDAT storage and high-performance computing; EUDAT Collaborative Data Infrastructure (CDI) [1] is a European e-infrastructure of data services and infor- • B2FIND – service for data discovery across the mation resources in support of research. This infrastruc- EUDAT infrastructure (data catalogue). ture and its services have been developed in close col- Data curation (or digital curation) is the selection, laboration with over 50 research communities spanning preservation, maintenance, collection and archiving of across many different scientific disciplines, with more digital assets and hence is the essential part of research than 20 major European research organizations, data data management. Sensible data curation requires estab- centres and computing centres involved. Researchers, lishing and developing long-term repositories of digital research communities and service providers can use assets for their current and future use by researchers and EUDAT data services to manage research data accord- wider society. Collaborative data infrastructures like ing to their own needs. EUDAT that span across the borders should play a sig- The EUDAT services offering has emerged as a re- nificant role in research data curation. sult of two consecutive FP7 and Horizon 2020 projects, Historically, EUDAT services have been built with with the actual services focused on different aspects of only a few considerations for conscious data curation, data management and data use, and supported by a vari- with secure and controlled access to data being one of ety of information technology stacks. The major the major initial goals to achieve. Other aspects of data EUDAT services [19] are: curation started playing a more prominent role when • B2ACCESS – identity and authorization service; services matured to production stage and became a part of an operational collaborative infrastructure. Specifi- • B2HANDLE – service for assigning and managing cally, operational requirements of B2SAFE service (that persistent identifiers; currently offers what long-term digital preservation • B2DROP – service for secure and trusted data ex- projects typically call “bit-level” preservation), as well change; as automated data transfers across interrelated • B2SHARE – service for sharing small-scale “long B2DROP, B2SHARE and B2FIND services have made tail” data; it essential to systematically explore the topic of data curation in EUDAT. • B2SAFE – robust, safe and highly available service The decision was made to formulate the core ap- for storing large-scale data in community and de- proach to data curation with the involvement of two partmental repositories; prominent unrelated research communities with sub- stantial amounts of data to manage and then, using these two use cases as a proof-of-concept for clearly formu- Proceedings of the XIX International Conference lated data curation activities, get other user communi- “Data Analytics and Management in Data Intensive ties involved. Domains” (DAMDID/RCDL’2017), Moscow, Russia, October 10–13, 2017 72 Another decision made was to reuse the outputs of pilot represents a business model that can be potentially the SCAPE project [2] and Research Data Alliance replicated by other institutes. Practical Policy Working Group [3] in order to set up a The EUDAT B2SAFE service is used in the first reasonable data curation framework for EUDAT. step of the ingestion process. Existing images of herbar- The rest of the paper outlines the core use cases, ium specimens along with the associated metadata are characterizes the SCAPE and RDA outputs that are transmitted to the CINES repository using B2SAFE deemed to be applicable in EUDAT context, describes transfer service. The ingestion into B2SAFE is carried mapping of SCAPE policy elements [4] to granular data out in accordance with the centralized persistent identi- policies in EUDAT, and sets directions for further fiers (PID) management system used in EUDAT. It is works on data policies in EUDAT. envisaged that discovery and visualization of the data objects will be performed with the EUDAT B2FIND 2 HERBADROP use case service. The data workflow in HERBADROP is represented 2.1 Motivations and relation to EUDAT services by Figure 1. The HERBADROP data pilot [12] aims to offer an ar- chival service for long-term preservation of herbarium specimen images and to develop innovative processes for extracting metadata from those images. HERBADROP follows the global trend towards scala- ble industrial-style digitizing of herbaria specimens. It is designed as both an archival service for long-term preservation of herbarium specimen images and a tool for analysing and extracting information written on the image, both supported by CINES [6], by using Optical Character Recognition (OCR) analysis. Making the specimen images and data available online from different institutes allows cross domain Figure 1 Data workflow of the HERBADROP data research and data analysis for botanists and researchers pilot with diverse interests (e.g. ecology, social and cultural history, climate change). 2.2 Data curation scenarios for HERBADROP Herbaria hold large numbers of collections: approx- The HERBADROP communities have expressed their imately 22 million herbarium specimens exist as botani- wish to implement specific use cases such as identifying cal reference objects in Germany, 20 million in France duplicates amongst specimens from the different muse- and about 500 million worldwide. High resolution im- ums. This kind of requirement is very useful to improve ages of these specimens require substantial bandwidth EUDAT services. Another example of policy is long and disk space. New methods of extracting information term preservation that involves a number of controls from the specimen labels have been developed using including file format verification and metadata quality. OCR but using this technology for biological specimens Amongst HERBADROP users, two partners of the is particularly complex due to the presence of biological community have proposed practical scenarios for data material in the image with the text, the non-standard curation: Digitarium [14] and the Royal Botanic Garden vocabularies, and the variable and ancient fonts. Much of Edinburgh (RGBE). of the information is only available using handwritten text recognition or botanical pattern recognition which Scenario proposed by Digitarium (Finland) are less mature technologies than OCR [13]. The proposed platform is expected to support or Digitarium [14] would like to use Optical Character even substitute costly manual data input as much as Recognition (OCR) data to generate metadata based on possible. The platform will also curate and enrich the label information available for the herbarium speci- metadata resulting from image analysis using optical men. Firstly, a Natural Language Processing based sys- character recognition (OCR) and pattern matching. tem will be used to do OCR quality check and extract Results are exposed as platform independent Web relevant terms. Then metadata will be either automati- services which can be effectively integrated into herbar- cally generated, or manually inserted through the tran- ium data management systems as well as metadata cap- scription portal [15] but with the help of OCR data. ture workflows. Since 2016, five European community More general for EUDAT infrastructure services, partners 1 have been involved. Their contribution to the Digitarium would like to utilize and integrate them into the whole digitisation process of natural history biologi- 1 cal collections. The data flow goes from the beginning The partners in the HERBADROP data pilot are: Musée National d’Histoire Naturelle (MNHN) – Paris, France; Royal Botanic Garden of Edinburgh (RBGE) – United Kingdom; Germany; Digitarium – Finland; Naturalis Biodiversity Center Botanic Garden and Botanical Museum (BGBM) – Berlin, – Netherlands 73 of the digitisation process i.e. imaging, to storage, then infrastructure named GEOFON [7] to research and bet- to transcription and analysis, until accessing. This in- ter understand our complex system Earth. volves data storage, high-performance computing re- GFZ is one of the members of the EPOS initiative sources, and web services in EUDAT. (European Plate Observatory System) [5] and, in this Firstly, the images from the imaging station can be context, collaborates with other two seismological data transferred into EUDAT storage for long-term preserva- centres related to EPOS (KNMI, INGV) in the EUDAT tion instantly or in batch. After transferring, HPC can project. access the images and do OCR to extract label infor- mation to generate preliminary metadata. This metadata Besides being one of the fastest earthquake infor- has to be associated with corresponding images. The mation provider worldwide, GEOFON is also one of the data can be openly accessed. However, the access rights largest nodes of the European Integrated Data Archive of data have to be set up for different purposes, such as (EIDA) for seismological data under the ORFEUS 2 endangered species protection. umbrella, which is a distributed data centre established Secondly, using HTTP APIs, the images and their to (a) securely archive seismic waveform data and relat- metadata can be accessible from EUDAT by data-owner ed metadata, gathered by European research infrastruc- portals. Therefore, browsing and transcribing are avail- tures, and (b) provide transparent access to the archives able. Updated metadata will be transferred back into the by the geosciences research communities. EUDAT B2SAFE service. Different versions of The internal structure of GEOFON is based on three metadata have to be kept. pillars: Thirdly, the metadata is indexed. Therefore, the data • A global seismic network operated in close collabo- can be searched or filtered based on different terms for ration with many partner institutions with focus on further scientific usages. HPC resources can be utilized EuroMed and Indian Ocean regions. The network also on the data for different researches. consists of ca. 110 high quality stations, which ac- Scenario proposed by RBGE (the Royal Botanic quire data in real time [8]. Garden of Edinburgh) in association with MNHN • A global earthquake monitoring system which uses (Musée National d’Histoire Naturelle) – Paris data from GEOFON and partner networks [9]. It publishes most timely earthquake information. First The core of the concept of HERBADROP is to harvest automatic solutions are available few minutes after metadata from OCR analysis of the text that is a part of the events and mostly manually revised later. herbarium images. The choice has been to proceed to a full text analysis using a Lucene-based engine Elas- • A comprehensive seismological data archive for GFZ ticsearch [16]. The objective of this approach is to pro- and partner networks, for permanent networks as vide a powerful interface for further data curation as well as for temporary deployments. part of the preservation process (identifying duplicates, For some GEOFON partner networks, GEOFON or inducing new taxonomic relations, etc.), see [12]. acts as a data centre saving a replica of the original copy Safeguarding long-term data storage is an important and at the same time as a data distribution centre. Addi- precondition for reliable access to herbarium specimen tionally, data from many temporary station deployments information. Thanks to this pilot, it is possible to envis- are permanently archived at GEOFON, in particular age long-term storage for herbarium specimen images. passive seismological experiments of the GFZ Geo- Moreover, the specimens will be discoverable by the physical Instrument Pool Potsdam (GIPP) and the Ger- entire scientific community. Thus, undescribed species man Task Force Earthquake. stored in herbaria can be examined by experts to aid Most data are open for public access, as well as real- identification and discovery of new species. time data feeds when available. However, there is a Distribution information for species over time can small amount of data under an embargo period, usually be evaluated and these data could provide evidence of for a limited amount of time (3–4 years). the point in time when an invasive species first occurred in a certain area. Historians could analyse herbarium 3.1 Data workflow in GEOFON data to create itineraries for historical characters. The GEOFON supports two scenarios for the ingestion of data can be used to calibrate predictive models of the data into its archive: one for permanent networks and oncoming changes in biodiversity patterns under global one for temporary (and most probably already finished) threats. This diverse information will be useful for a experiments. wide user community including conservationists, policy Usually, raw data is transmitted to the data centre makers, and politicians. with the metadata (technical hardware description) to be able to operate with it. In the case of permanent net- 3 GEOFON use case works raw data is received continuously from the sta- tions around the world via satellite using a protocol The second use-case concerns GFZ, the German Re- search Centre for Geosciences. GFZ provides valuable 2 seismological services in the form of a seismological Observatories and Research Facilities for European Seismology (http://www.orfeus-eu.org/) 74 called SeedLink [17], a real-time data acquisition proto- the year. The continuous time series are stored in a col which works on TCP. The packets of each individu- standard seismological format called Mini-SEED. The al station are always transferred in timely (FIFO) order. time series are split in daily files for each recording In the case of temporary experiments network op- sensor and, therefore, files are closed when the day fin- erators provide usually, first, the metadata needed to use ishes. At that moment, “new” data (recently closed the data, and in a second phase the data to be archived. files) can be processed to obtain derived products from Data transmission can be done as in the permanent net- them. For instance, quality metrics on the data or de- works case (SeedLink protocol), or can also be trans- tailed availability information, which are offered to our mitted to the data centre by the network operator using users by means of a Web service. some client-server tools provided by GEOFON, which Once the data is archived users can make use of any will do the first quality check of the data format. In of the services provided by GEOFON to retrieve it. some cases, both methods could be used. Considering that there are different services which can A schematic view of the workflow at GEOFON can provide the data to the users, the usage statistics is cen- be seen in Fig. 2. It should be noted that this workflow tralized in one database to be able to analyse the impact is also valid for many of the seismological data centres of the data on the community regardless of the method belonging to EIDA/ORFEUS. For instance, the other used to retrieve it. two data centres piloting EUDAT services (KNMI and 3.2 Service hosting environment with the inclusion of INGV). EUDAT services Considering the workflow depicted in the previous sec- tion, GEOFON introduced some EUDAT services in order to automate and/or improve some of the tasks related to it. Many services are being provided at GEOFON (e.g. interactive web portals, proprietary protocols to get data or derived products), with two of them (Station-WS and Dataselect) being particularly important, as they are international standards and the core services for the community upon which other services are built. Station- Figure 2 Data workflow from GEOFON. It also repre- WS serves the information describing the hardware and sents the workflows from a generic seismological data everything related to the deployment, while Dataselect centre as the ones under the EIDA/ORFEUS initiative. serves the data. Boxes in black are generic activities from the data cen- tre. Blue boxes show activities related to the EUDAT Two main EUDAT services have been integrated in service B2SAFE, while brown boxes show the tasks the GEOFON workflow; namely, B2SAFE and related to B2HANDLE B2HANDLE. The former is used to accomplish most of In both cases, permanent and temporary networks, the Data Management tasks, while the latter is used to data go through some quality checks after being re- manage/store Persistent Identifiers (PIDs). ceived. When data are sent in real-time there is a first As the archive is stored in a directory structure from control by sorting the records before actually ingesting a partition, the B2SAFE service “mounts” the archive as them into the archive (~1 day after reception). After 4–6 an external resource in read-only mode. weeks, for stations that still have the buffered data, a One of the main requirements for the Data Policies gap filling process is started. at GEOFON is the capability to trigger processes based When data have been bulk uploaded to the data cen- on the inclusion of new data. In the context of B2SAFE, tre by the network operator, it is immediately checked this can be done by means of automatic rules which are to exclude overlaps. In this case, as all available data is executed under certain conditions (e.g. new data ingest- copied off-line, there is no need to check for problems ed). related to real-time transmission, like gaps and proper With the proper rules we can enforce that, after new order of records, as they are checked by the automatic data is detected by B2SAFE, a certain set of actions is archiving tools. executed. For instance, the derived products can be In the case that the data is under an embargo period, generated and data can be replicated to a partner data the access control list is created or updated. After com- centre from the EUDAT CDI, the Karlsruhe Institute of pletion of the last steps, data is opened through standard Technology (KIT). Also, as part of this replication pro- access protocols. cess, persistent identifiers (PIDs) are generated for each The internal organization of the archive is based on file, so that the PID can be used to globally and univo- an approach called SeisComP3 Data Structure (SDS). cally identify the file. This means that files are stored under a predefined di- PIDs are managed and stored by means of the al- rectory structure, which uses the codes from the net- ready mentioned service called B2HANDLE, which is work/station/channel used to record the data as well as based on a Handle Server and other libraries developed within the project. GFZ has a broad expertise in this 75 type of tools and, therefore, we decided to deploy our • start and end time of network/station operation must own B2HANDLE server and work with our local in- be available and data outside this time span must not stance. be allowed. Each generated PID is stored with a set of key-value pairs called “PID Record”. The information in the PID The identified relevant policies are being gradually im- Record allows, among other things, to track other cop- plemented using generic EUDAT services and ies of the file in different data centres or validate its GEOFON-specific software. integrity by means of pre-calculated checksums. 4 Mapping of EUDAT data policies to SCAPE and RDA policy curation frame- 3.3 Data Policies to apply at GEOFON through works EUDAT services After the formalization of the internal workflows at For the design and implementation of data curation ac- GEOFON, and the inclusion of requirements from the tions in EUDAT, the relevant outputs of SCAPE project community and the data centre, we defined a set of Data [2] and Practical Policy Working Group of the Research Policies to be enforced by means of the tools available Data Alliance [3] have been identified. SCAPE outputs within EUDAT and new developments, which could be are perceived of high quality owing to the advanced useful for different communities. thinking that considered long-term digital preservation Some of them are related to the Replication process. policies at a granular level suitable for the machine- For instance: executable implementation. RDA Practical Policy Working Group outputs are a result of a substantial in- • replicate every new file in the archive to our internal ternational collaborative effort including experts in backup server; iRODS platform [11] that is a technological foundation • if we are the official provider of the data in a file, of the EUDAT B2SAFE service. replicate it to an off-site partner within the EUDAT For SCAPE, we used the catalogue of preservation CDI; policy elements [4] that is a systematized compendium • seismological data that does not belong to us but of granular policies with examples of what SCAPE comes from our earthquake early monitoring system called “control policies” (granular statements that are should be kept for 6 months only; data still need to easily translatable to machine-executable functions), be replicated to the internal server; and for the RDA Practical Policy Working Group it was • file deletion must not be possible in an automated their practical policy implementations report [9] that way. In case that the system detects that a file should compiled a set of machine-executable functions for be deleted, an email should be sent to the appropriate iRODS platform [11]. operator. In addition to this top-down retrospective review of the SCAPE and RDA outputs, a bottom-up analysis of Regarding the access control of the files: control policies applicable to the GEOFON and • “Restricted data” must be tagged and proper access HERBADROP use case was performed, with a number control must be applied to them; of control policies identified as prime candidates for implementation in EUDAT B2SAFE. These policies are • access restrictions can be automatically removed presented in Table 1. after a period of time (embargo period); Then the gap analysis was performed against • data must be able to be accessed via an HTTP API SCAPE policy elements, to see whether these bottom- respecting the ACL (Access Control List); up identified control policies allow enough coverage of Regarding automatic metadata extraction: the extensively defined data curation policy landscape • Metrics derived from the data must be automatically of SCAPE project. SCAPE policy elements catalogue calculated to populate some of our services when [4] is two-level with Guidance Policies on the top level new data is ingested. and Policy Elements on the granular level. An example of Guidance Policy is Authenticity Policy that breaks • Detailed statistics related to the data access should be down to Integrity, Reliability and Provenance as policy available for the data owners/creators. elements. Hence control policies in Data Integrity • In case that data are modified (e.g. correcting errors, checks category from Table 1 correspond to Integrity filling gaps), this information should be available for policy element of Authenticity Policy in the SCAPE future use (provenance information). policy elements catalogue. Regarding the integrity of the stored data: One noticeable gap discovered through this mapping • a weekly process will select ~2% of the folders in exercise is the Digital Object lifecycle which was paid our archive and verify that the synchronization is due attention to in SCAPE policy landscape but is miss- correct; the idea is that every file will be checked at ing in the current EUDAT considerations. This gap may least once in a year; be hard to address as EUDAT is a collaborative project that accumulates data from a large variety of research • check that the data is stored in SDS format; communities with a wide range of digital object types 76 and lifecycles. However, this discovery should inform 5 Conclusion and further work the future operation of EUDAT services so that they could meet all reasonable (and multi-aspect) require- Analysis of data curation requirements of two use cases: ments for data curation and long-term digital preserva- HERBADROP and GEOFON has been performed, tion. coupled with the retrospective review of the elaborated data curation policies from a dedicated EU project Table 1 Candidate control policies for implementation (SCAPE) and practical (machine-executable) policies by GEOFON and HERBADROP that were the output of the dedicated RDA working group. Policy Control poli- Policy examples A set of granular control policies have been identi- category cy fied as candidates for implementation in two use cases, Number and Data should be replicated in N location of locations, including in loca- and a gap analysis of these policies has been performed replicas tions A and B against the SCAPE catalogue of policy elements. A similar gap analysis should be performed against the Timeframe for Data should be replicated RDA practical policies catalogue, in order to see what replication within the next 24 hours after existing iRODS implementations can be reused for the Data rep- the data ingestion in any par- creation of machine-executable policies in EUDAT lication ticular location B2SAFE service. Data nodes roles All data nodes are equivalent After the set of identified policies is applied in the to read data from, but data can two use cases that have been involved in their formula- only be initially ingested in tion, the same policy framework should be applied in a node X then replicated over all larger number of research communities associated with other nodes EUDAT through its pilot programme. Checksum algorithm accepted The set of The scope of projects and initiatives in data curation is MD5 checksum and long-term digital preservation can be extended be- algorithms yond SCAPE and RDA working groups; this specifical- Data in- acceptable ly applies to popular functional models of digital tegrity preservation like OAIS [18] that we feel have not been Calculate checksums for 2% checks Periodicity of all data assets every week, thoroughly evaluated so far for their potential applica- and scope of with the aim of having the tion in EUDAT. integrity entire data collection checked checks The major result of these works is going to be a annually Data formats BMP and PNG accepted for conceptually and terminologically consistent catalogue accepted images of machine-executable policies for EUDAT services that will be explicitly mapped to requirements of the Metadata ex- Upon ingestion, file name participating research communities, as well as to mature traction from should be extracted as metada- data policy frameworks developed by EU projects and data ta international collaborations dedicated to data curation Data and and long-term digital preservation. Data format metadata The EUDAT data policies catalogue will serve then check proce- Software package X should be formats both as guidance for machine-executable policy imple- dures accepta- used for data format validation ble mentations and as a validation tool to ensure the com- pliance of EUDAT CDI services to high-level policies Minimal of data curation and long-term digital preservation. This metadata as- PID is a mandatory metadata should allow to promote certain EUDAT platforms such signed upon element as B2SAFE from their current status of “bit-level” data data release management solutions topolicy-driven services where Embargo rules Embargo period of N years is the actual set of policies can be configured according to applied to all PDFs and imag- es a particular use case. The set of data Data ac- licenses rec- CC-BY license should be Acknowledgements ommended assigned to all data released cess and This work is supported by EUDAT 2020 project that upon data after the embargo period ends data reuse release receives funding from the European Union’s Horizon 2020 research and innovation programme under the Data reuse grant agreement No. 654065. The views expressed are statistics col- Number of file downloads those of authors and not necessarily of the project. lection should be collected 77 Hazards Earth Syst. Sci., 10, pp. 2611-2622 References (2010). doi:10.5194/nhess-10-2611-2010 [1] EUDAT Collaborative Data Infrastructure. [11] iRODS: Integrated Rule-Oriented Data System. https://www.eudat.eu/eudat-cdi https://irods.org/ [2] SCAPE: Scalable Preservation Environments. [12] Haston, E., Chagnoux, S., Dugénie, P.: Herbadrop http://scape-project.eu/ – Long-term Preservation of Herbarium Specimen [3] Research Data Alliance Practical Policy Working Images. Proc. of the second Eudat User Forum. Group. https://www.rd-alliance.org/groups/ practi- Rome (2016). https://www.eudat.eu/communities/ cal-policy-wg.html long-term-preservation-of-herbarium-specimen- [4] SCAPE Catalogue of Preservation Policy Ele- images ments. http://scape-project.eu/wp-content/ up- [13] Dugénie, P., Chagnoux, S.: EUDAT Data Pilot loads/2014/02/SCAPE_D13.2_KB_V1.0.pdf Herbadrop. Second Interim Herbadrop Data Pilot [5] EPOS: European Plates Observing System. report (2016) https://www.epos-ip.org/ [14] Digitarium: Service Centre for High Performance [6] CINES: French National IT Center for Higher Ed- digitization. http://digitarium.fi/en ucation and Research. https://www.cines.fr/en/ [15] DigiWeb+digitization platform. http://digiweb. di- [7] Hanka, W., Kind, R.: The GEOFON Program. gitarium.fi/ Annals of Geophysics, 37 (5), Nov. 1994. ISSN [16] Elasticsearch Search and Analytics Engine. https:// 2037-416X. doi:10.4401/ag-4196 www.elastic.co [8] GEOFON Data Centre (1993): GEOFON Seismic [17] SeedLink Protocol and Tools Overview. http:// Network. Deutsches GeoForschungsZentrum ds.iris.edu/ds/nodes/dmc/services/seedlink/ GFZ. Other/Seismic Network. doi: [18] Reference Model for an Open Archival Infor- 10.14470/TR560404 mation System (OAIS), Recommended Practice, [9] Practical Policy Implementations Report. CCSDS 650.0-M-2 (Magenta Book). Issue 2, June http://dx.doi.org/10.15497/83E1B3F9-7E17- 2012. CCSDS (The Consultative Committee for 484A-A466-B3E5775121CC Space Data Systems), Washington DC (2012). [10] Hanka, W., Saul, J., Weber, B., Becker, J., Har- EUDAT services. https://www.eudat. eu/services- jadi, P., Fauzi and GITEWS Seismology Group: support Real-time Earthquake Monitoring for Tsunami [19] EUDAT services. https://www.eudat.eu/services- Warning in the Indian Ocean and Beyond, Nat. support 78