=Paper= {{Paper |id=Vol-1752/paper02 |storemode=property |title= Metadata for Nanoscience Experiments |pdfUrl=https://ceur-ws.org/Vol-1752/paper02.pdf |volume=Vol-1752 |authors=Vasily Bunakov,Tom Griffin,Brian Matthews,Stefano Cozzini |dblpUrl=https://dblp.org/rec/conf/rcdl/BunakovGMC16 }} == Metadata for Nanoscience Experiments == https://ceur-ws.org/Vol-1752/paper02.pdf
                      Metadata for nanoscience experiments
                  © Vasily Bunakov        © Tom Griffin        ©Brian Matthews
                           Science and Technology Facilities Council,
                                   Harwell, Oxfordshire, UK
         vasily.bunakov@stfc.ac.uk tom.griffin@stfc.ac.uk brian.matthews@stfc.ac.uk
                                        © Stefano Cozzini
                                 Instituto Officina dei Materiali
                                           Trieste, Italy
                                       cozzini@iom.cnr.it

                        Abstract                                   participating organizations yet is able to capture
                                                                   significant features of nanoscience physical and
Metadata is a key aspect of data management. This paper            computational experiments.
describes the work of NFFA project on the design of a
metadata standard for nanoscience community. The                   2 Approach and methodology
methodology and the resulting high-level metadata model
                                                                   2.1 General approach
are presented. The paper explains and illustrates the
principles of metadata design for data-intensive research.         The major purpose of any metadata is satisfying
This is value to data management practitioners in all              information needs of a certain community. “Community”
branches of research and technology that imply a so-               should be understood in broad terms and includes
called “visitor science” model where multiple researchers          machine agents, to ensure human-to-human, human-to-
apply for a share of a certain resource on large facilities        machine and machine-to-machine interoperability.
(instruments).                                                          The information needs may be generic (common
                                                                   with other communities) or specific for a particular
1 Introduction                                                     community. From the project perspective, the
The Nanostructures Foundries and Fine Analysis (NFFA-              information needs should be expressed as clearly
Europe) project www.nffa.eu brings together European               formulated Use Cases for the existing or proposed
nanoscience research laboratories that aim to provide              information and data management systems (IT
researchers with seamless access to equipment and                  platforms). A good metadata design should take into
computation. This will support a single entry point for            account user requirements and IT architecture, and in turn
research proposals supported by the project, and a                 should feed considerations for the IT architecture.
common platform to support the access and integration of                The IT architecture, the use cases and practices, and
the resulting experimental data. Both physical and                 the metadata design can be considered pillars of
computational experiments are in scope, with a vision              enterprise architecture that includes both technological
that they complement each other and can be mixed in the            and organizational aspects of a loosely coupled virtual
same identifiable piece of research.                               enterprise that the NFFA project is going to deliver for
    The project requires setting up the IT infrastructure          the European nanoscience community.
for managing research proposals and substantial amounts                 The main purpose of metadata design effort in
of data resulted from physical and computational                   NFFA project can be formulated then as giving the
experiments. A common metadata model that supports                 adequate support for that widely defined enterprise
different stages of the nanoscience research lifecycle is          architecture for nanoscience. This has an implication of
essential to unified researchers’ experience across                metadata design from “first principles”, i.e. by pondering
locations, and also for the design and operation of IT             over existing best practices of information management,
infrastructure components.                                         use cases for nanoscience and information technology
    Metadata design is a part of a joint research activity         opportunities (and limitations) rather than adopting any
within NFFA that takes empirical input from the project            existent metadata standard.
participants, also takes into account state-of-the art
                                                                   2.2 Top-down input: relevant information
standards and practices. Metadata design is an
                                                                   management frameworks
incremental effort of the project; this work presents the
first stage resulting in a high-level metadata model that is       The case for metadata collection and use can be specific
agnostic to the actual data management situation in                to nanoscience, yet there are general information needs
                                                                   that are typical for a wide variety of users and that have
Proceedings of the XVIII International Conference                  been developed in other branches of science and
«Data Analytics and Management in Data Intensive                   information management.
Domains» (DAMDID/RCDL’2016), Ershovo, Russia,                           One of the mature information design frameworks is
October 11 - 14, 2016                                              Functional Requirements for Bibliographic Records [2]




                                                               3
that considers four basic information needs (user tasks) in        most popular data management solutions. The
regards to information: “Find”, “Identify”, “Select” and           questionnaire inquired on the following aspects of data
“Obtain”. The ultimate goal is of course getting the               management in nano-facilities:
information resource, yet between searching for it and              Intensity of experiments and of resulting data flow
obtaining it, the resource should be identified as the one          Popular data formats
being sought, and selected as being useful for the user [1].        Data catalogue software
Each task may involve certain subtasks, e.g. selection              Data catalogue openness
may require checks on the resource context and on its               Data management policy
relevance to the actual user’s needs.                               Metadata standards for data catalogue
     Another mature information design framework of                 Persistent identifiers for data
relevance is the Reference Model for an Open Archival
                                                                    User management platform
Information System [3], a popular functional model for
                                                                    Popular third-party databases and information
long-term digital preservation. If expressed in terms of
                                                                        systems
information practitioner needs (user tasks) similarly to
                                                                        In total, seventeen responses out of the 20 project
FRBR, the OAIS basically deals with three categories of
                                                                   partners were received and reviewed. They showed very
them: “Ingest (into the archive)”, “Manage (within the
                                                                   different levels of data management maturity. From the
archive)” and “Disseminate (from archive)”. Each of
                                                                   responses, the following priorities the metadata design
these tasks may be complex and involve a number of
                                                                   were identified:
interrelated subtasks, e.g. managing information in the
archive may imply provenance and integrity checks,                      One experiment to many samples and one sample
managing access to information, and administration /                        to many data files relationships should be
reporting.                                                                  supported.
     Overall, the OAIS framework should be able to                      A common set of metadata fields for data
provide a good coverage of what NFFA needs to consider                      discoverability should be agreed upon, possibly
for sensible data collection, archiving and provision                       based on an existing popular standards or
towards the end users (researchers in nanoscience), and                     recommendation for data discovery.
the FRBR framework should be able to cover the end user                 User roles with different permissions for access
needs for information retrieval. The respective areas of                    to metadata should be developed. This means the
coverage and user categories relevant to NFFA are                           metadata model will need to represent users as
illustrated by the following table:                                         well as data.
Table 1 Information management frameworks and their                     It is reasonable to develop a common data
coverage of NFFA scope.                                                     management policy for NFFA, or a set of policies
  Framework (a OAIS                     FRBR                                with different flavours of access to data.
  source of best                                                        Having links to external reference databases is
  practices)                                                                valuable to ensure the high quality of metadata
                                                                            yet this will mean additional effort so should be
 General use        Data collection,     Data retrieval                     de-scoped from the initial design of metadata.
 case               management and                                      In addition to the questionnaire where responses
                    dissemination                                  were collected from research offices or relevant research
 User categories    Data archives        End users                 programme representatives, a common vocabulary of
                    administrators       (nanoscience              terms and definitions relevant to nanoscience data
                    IT specialists       researchers)              management was compiled and then refined by the IT
                                                                   teams of participating NFFA organizations ([5]). The
 Information       Ingest data          Find data
                                                                   vocabulary contains about twenty commonly agreed
 needs (user       Manage data          Identify data
                                                                   terms with definitions; it serves as a basis for the design
 tasks)            Disseminate          Select data
                                                                   of information entities (groups of metadata elements) and
                   data                 Obtain data
                                                                   contributes to the earlier mentioned NFFA “virtual
Being general in nature, OAIS and FRBR are still able to           enterprise” architecture.
provide good recommendations for NFFA practices of                      A particularly important use case to be supported by
information and data management. In particular, OAIS               the metadata model should be the situation when the same
emphasizes the need of having a clear agreement between            researcher (or a research group) applies for experimental
the data producer and the archive, and a clearly defined           time on more than one facility – as the nature of
format for data exchange between them – so called                  experiment may require this – yet the researcher wants a
Submission Information Package, whilst FRBR                        seamless experience across nanoscience facilities, with a
emphasizes the importance of having a clear identity for           single entry point for data management.
data assets.                                                            Another conclusion based on responses to the
2.3 Bottom-up input: questionnaire responses and                   questionnaire is that computational experiments in
common vocabulary                                                  nanoscience become common and can be mixed up with
A questionnaire was used to collect the NFFA partners’             physical experiments, so there should not be an artificial
responses about their data management practices and                division between the two.




                                                               4
2.4 Side input: IT architecture considerations                        3.2 Entity-relationship diagram
As an additional consideration for principal metadata
design, we used the draft NFFA Data System
Architecture that defines the outline design of the NFFA
portal, which considered the generic use case of the same
user performing a measurement on two different
facilities. Generic use cases when one user wants to
access data produced by another user, or wants to release
data into the public domain are currently not being
considered. These may be considered in future, so should
be taken into account within an extensible metadata
design.
      The draft architecture suggests that data should be
harvested from individual facilities in a suitable
“packaged” format, with METS [6] as a potential
candidate as it supports the provision of descriptive,                Figure 1 Metadata groups of elements and their
administrative, structural and file metadata. For the                 purpose.
descriptive part of metadata, the purpose of having the
data assets discoverable is emphasized in the draft
architecture. For the administrative metadata, the
importance of intellectual property information and
                                                                      Table 2 Metadata elements and information needs
information about the data source (provenance) is
                                                                      coverage.
emphasized. For the structural metadata, having the
information about the organization, perhaps structured in              Information     Inge   Manag     Diss   Find   Iden   Obta
a hierarchical way, is suggested. For the file metadata,               entity          st     e data    emin   data   tify   in
                                                                                       data   (within   ate           data   data
having the list of files that constitute a digital object (data                               NFFA      data
asset) and having pointers to external metadata files are                                     portal)
deemed most important.                                                 Research                          Y      Y      Y       Y
      After considering the draft architecture, the                    User
                                                                       Instrument       Y       Y
conclusion was that we could take METS as “the role                    Scientist
model” metadata standard for data packaging that                       Project                           Y      Y      Y       Y
corresponds to a specific entity in the NFFA generic                   Proposal         Y       Y
metadata model – Data Asset. As to particular elements                 Facility         Y       Y        Y      Y      Y       Y
of metadata suggested by the IT architecture draft, the                Instrument                        Y      Y      Y
                                                                       Experiment                        Y      Y      Y
fields for capturing intellectual property information and
                                                                       Sample                            Y      Y      Y
provenance are easily most important ones as they affect               Data Asset       Y       Y        Y      Y      Y       Y
the data assets reusability that should be one of the                  Raw Data         Y       Y        Y      Y      Y       Y
important outcomes of the NFFA project.                                Analysed         Y       Y        Y      Y      Y       Y
                                                                       Data
                                                                       Data Analysis    Y       Y                      Y
3 Implementation                                                       Data Analysis    Y       Y                      Y
                                                                       Software
3.1 Metadata groups and elements                                       Data Archive     Y       Y                              Y
The top-down, bottom up and side requirements resulted                 Data Manager     Y       Y                              Y
in the basic structure of the proposed metadata model that             Data Policy      Y       Y
                                                                       NFFA Portal              Y               Y
is illustrated by Figure 1.
      The suggested metadata elements are presented as a
matrix in Table 2 to make explicit the coverage of                    As a basis for further, more detailed metadata design and
identified information entities (common vocabulary                    as a contribution to the IT architecture design, the Entity-
terms) and of earlier identified information needs                    Relationship diagram presented by Figure 2 has been
categories of them, see Section 2.2).                                 agreed.
Certain elements are in common with the Core Scientific
Metadata Model ([4]) already in use in some of the
facilities. Mandatory and optional metadata fields                    3.3 Metadata operational recommendations
(attributes) for each element were defined and shared                 The metadata elements suggested are not all we need for
amongst project participants for further discussion ([5]).            having a successful metadata framework in NFFA. In




                                                                  5
  Figure 2 Entity-Relationship diagram for NFFA high-level metadata model.
addition, there should be established metadata                      opportunities for mutual mapping and cross-walks
management practices, ideally assisted by clear                     between different metadata models.
recommendations for NFFA partner organizations of how
to assign and curate metadata.                                      Acknowledgements
     For example, there are choices of how you aggregate            This work is supported by Horizon 2020 NFFA-Europe
data: let us say all data files for all samples measured in a       project www.nffa.eu under grant agreement no. 654360.
particular Experiment can be assembled in one package,              References
and then the package is given common descriptions like
Facility name, research User name, Data Policy etc.                 [1] Philip Hider. Information resource description:
However, this may not suit actual data management                       Creating and managing metadata. Facet Publishing,
practices or policies of certain Facilities, e.g. they may              2012.
want to make a Sample rather than an Experiment a focal             [2] Functional Requirements for Bibliographic Records
point of their metadata descriptions.                                   (FRBR). Final Report.
                                                                        http://archive.ifla.org/archive/VII/s13/frbr/
     These operational aspects of NFFA metadata
                                                                        Retrieved 20 May 2016.
implementation will require further engagement and
discussions with data practitioners in NFFA.                        [3] Reference Model for an Open Archival Information
                                                                        System (OAIS), Recommended Practice, CCSDS
                                                                        650.0-M-2 (Magenta Book). Issue 2, June 2012.
4 Conclusion                                                            http://public.ccsds.org/publications/archive/650x0m
                                                                        2.pdf Retrieved 20 May 2016.
The NFFA metadata development so far has produced an
agreed common approach with its mapping to the existing             [4] The Core Scientific Metadata Model (CSMD).
metadata frameworks and best practices. It has defined a                https://icatproject.org/user-documentation/csmd/
common vocabulary, the provisional list of mandatory                    Retrieved 20 May 2016.
and optional attributes, and the ER diagram that can be             [5] Draft metadata standard for nanoscience data.
used both in metadata design and in IT architecture                     NFFA project deliverable D11.2. February 2016.
design. The high-level metadata model will be further               [6] METS: Metadata Encoding and Transmission
refined through project work in NFFA and through                        Standard. http://www.loc.gov/standards/mets/
discussions in the wider nanoscience community. Also                [7] COADATA UDS: Uniform Description System for
the state-of-the-art metadata development for                           Materials on the Nanoscale
nanoscience that may cover specific entities in our                     http://www.codata.org/uploads/Uniform_Descriptio
generic metadata model, e.g. CODATA UDS [7] for                         n_System_Nanomaterials-Published-v01-15-02-
Sample, should be looked into in more detail, to see the                01.pdf




                                                                6