=Paper= {{Paper |id=Vol-2277/paper05 |storemode=property |title= Meaningful Data Interoperability and Reuse among Heterogeneous Scientific Communities |pdfUrl=https://ceur-ws.org/Vol-2277/paper05.pdf |volume=Vol-2277 |authors=Nikolay Skvortsov |dblpUrl=https://dblp.org/rec/conf/rcdl/Skvortsov18 }} == Meaningful Data Interoperability and Reuse among Heterogeneous Scientific Communities == https://ceur-ws.org/Vol-2277/paper05.pdf
         Meaningful Data Interoperability and Reuse among
              Heterogeneous Scientific Communities
                                           © Nikolay Skvortsov
              Institute of Informatics Problems, Federal Research Center “Computer Science and
                                 Control”, Russian Academy of Sciences,
                                             Moscow, Russia
                                              nskv@mail.ru
           Abstract. FAIR data principles declare data interoperability and reuse through the use of machine and
     human readable specifications. Adherence to these principles has some subsequences for data infrastructures
     and research communities. Meaningful data exchange and reuse by humans and machines requires formal
     specifications of subject domains accompanying data and allowing automatic inference. Development of
     formal conceptual specifications in research communities might be stimulated by a necessity to reach
     semantic interoperability of data collections and component, reuse of data resources. Data lifecycle hence
     includes collecting domain knowledge specifications, classifying all data, methods and services by these
     specifications, collecting and sharing them for reuse. Formal inference allows meaningful search and verified
     reuse of data, methods and services from collections.
           Keywords: FAIR data principles, conceptual modeling, research community

                                                                  with ontologies, accompanied by provenance
 1 Introduction                                                   information, and be comply with known data models, or
                                                                  have known mapping to them.
 Curation and sharing research data to make it reusable               FAIR data principles have been defined informally.
 for both human and machine is a topical issue for last           So they rises a number of different interpretations,
 years. For example, WF4Ever project [1] is aimed at              including application of Linked Data principles to
 preserving data, workflows and research results for their        provide FAIR ones [5], or lists of more detailed informal
 sharing and reuse. Research objects are declared as              requirements based on FAIR ones [6], or just simplified
 containers that encapsulate data, metadata, workflows,           numerical rating of conformity with FAIR principles [7].
 documentation, links to external resources and share all         At the same time, it seems that FAIR data principles
 resources related to a research for a community.                 should have some definite subsequences for requirement
     Collaborative data infrastructures support sharing of        to research data infrastructures. Ones relevant to data
 various resources such as collections, archives,                 semantics problems with respect to research
 databases, storage and computing capacities, and provide         communities are discussed in this talk.
 services to search, access and manage them. For
 example, EUDAT [2] is a network of numerous
                                                                  2 Subject domain specifications
 community specific data repositories and some of
 Europe’s largest data centers using common data                  FAIR data principles declare data interoperability and
 services for data and service providers and research             reuse through the use of machine and human readable
 communities. EUDAT Collaborative Data Infrastructure             specifications. It means that data are FAIR if only there
 (CDI) is a European infrastructure of integrated data            is an approach to define and clarify semantics of data in
 services and resources to support research.                      some domains of knowledge. Meaningful data exchange
 Heterogeneous research data infrastructure interact to           and reuse by machines (helpful for humans too) requires
 share research data globally and make science open.              quite formal specifications of subject domains allowing
 EOSC [3] initiative integrates services and data from            automatic inference.
 research data infrastructures, provides curation and                 Similarity and machine learning approaches could be
 preservation of scientific data repositories, computing          applied to help humans search and operate with data but
 capacity for research data analysis.                             do not define formal specifications of the used resources
     FAIR data principles [4] has gathered basic features         and evidence-based inference over metadata. Domain
 used in data curation and preservation and now are being         knowledges should define restrictions and permissible
 propagated in research data infrastructures and open             states of data from the view of specific domain.
 science. These principles are aimed to provide data              Advanced ontological and rule models should be used for
 interoperability and reuse by machines and humans. For           metdata development.
 this purpose datag should be well identified, specified              Conceptualization and conceptual specifications are
                                                                  necessary not only in general domains, but in domains of
Proceedings of the XX International Conference                    interest of narrower and more specialized communities, as
“Data Analytics and Management in Data Intensive                  well as in overlapping domains, in which cooperation of
Domains” (DAMDID/RCDL’2018), Moscow, Russia,                      research teams and reuse of specifications often occurs.
October 9-12, 2018



                                                             14
Most researches are held on intersection of several domains,                 Community members (humans or machines) operate
so they use constraints of several domains simultaneously                within the ontological commitment defined by shared
as points of view to specify research objects. Inference in              ontologies, i, e. use of the concepts of the subject domain
multidomain specifications should provide establishing                   in a consistent way with respect to the theories specified
relations and semantic interoperability between data                     by the ontologies. Ontologies are important for the
belonging to different domains.                                          automation of consistency control on any manipulations
                                                                         with the domain concepts. An interaction of communities
3 Collections of methods and experiment                                  in solving interdisciplinary problems requires
specifications                                                           simultaneous querying using different domain
                                                                         vocabularies. In that case, the researchers should commit
For comprehensive investigations of specific real-world                  to the specifications of several domains.
entities, it is important to share data, tools, research results,            Activities of communities are defined by data lifecycle
methods and specifications defining the semantics of                     to provide their interoperability and reuse in related
entities and phenomena in the domain as well as the                      domains. Maintenance of shared domain specifications
semantics of methods applied to them. Thus, no matter                    becomes a basis for arranging collections of data and
which kind of information object is used for research, it                sources, collections of specific methods, embedding
should be supplied with metadata in terms of ontologies.                 research results into such collections for further research.
Those are data, metadata, publications, implementations
of research methods, workflows describing the research                   Acknowledgments
processes. Inference over ontologies makes it possible to
select them from collections and access by selected                      The work was supported by Russian Foundation for
identifiers.                                                             Basic Research (grant 18-07-01434).
    Semantics based approaches to research objects
should be provided by inseparable linking of data and                    References
well defined methods related to objects of research. It
                                                                         [1]   Belhajjame K., et al: Workflow-Centric Research
means that method collections are considered as a
                                                                               Objects: A First Class Citizen in the Scholarly
specific data kinds. Methods used in any research domain
should be defined, conceptually specified and collected                        Discourse. In: ESWC2012 Workshop on the
in addition to general purpose methods such as                                 Future of Scholarly Communication in the
                                                                               Semantic Web (SePublica2012), pp. 1-12.
multidimentional data analysis or machine learning.
                                                                               Heraklion (2012).
Meaningful access to known implementations of
methods should be provided to humans and machines                        [2]   Schentz H., le Franc Y. Building a semantic
and be understandable for the both.                                            repository using B2SHARE. In: EUDAT 3rd
    Experiments over data in research infrastructures are                      Conference (2014)
constructed using shared and interoperable data, services                [3]   EOSC Declaration.
and workflows. Research experiments can include data                           https://ec.europa.eu/research/openscience/pdf/eosc
analysis, modelling in accordance with hypotheses and                          _declaration.pdf
testing models by observational data. Besides providing                  [4]   Wilkinson M., et al: The FAIR Guiding Principles
access to data and method implementation collections,                          for scientific data management and stewardship.
research infrastructures should include instruments for                        In: Scientific data, vol. 3 (2016)
experiment supporting, in particular, formulation and                    [5]   Wilkinson M.D., et al: Interoperability and
testing of hypotheses [8].                                                     FAIRness through a novel combination of Web
                                                                               technologies. In: PeerJ Preprints 5:e2522v2 (2017)
4 The role of communities                                                      https://doi.org/10.7287/peerj.preprints.2522v2
    Since shared semantics of research objects are                       [6]   Guidelines on FAIR Data Management in Horizon
becoming increasingly important for data reuse in each                         2020. Directorate-General for Research and
specific discipline or subject domain, heterogeneous                           Innovation European Commission (2016).
communities working in a domain should have                                    http://ec.europa.eu/research/participants/data/ref/h
conceptual specifications related to their research and                        2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-
approaches and maintain strong commitment to them.                             mgt_en.pdf
    Communities of researchers and vendors of                            [7]   Doorn P., Dillo I. FAIR Data in Trustworthy Data
analytical tools, research instruments and data owners                         Repositories. DANS / EUDAT / OpenAIRE
are interested in the long-term shared access to                               Webinar (2016). https://eudat.eu/events/webinar/
heterogeneous data and method collections. So the only                         fair-data-in-trustworthy-data-repositories-webinar
way of conceptualization and formal specification of a                   [8]   N. Skvortsov, L. Kalinichenko, D. Kovalev.
domain is development in communities stimulated by a                           Conceptualization of Methods and Experiments in
necessity to reach a semantic interoperability of                              Data Intensive Research Domains // Data
interacting components, integration of data collections,                       Analytics and Management in Data Intensive
reuse of data resources and method reproducibility due                         Domains (DAMDID/RCDL 2016). - CCIS, Vol.
to binding to semantics of the subject domains.                                706. - P. 3-17. – Springer, 2017.




                                                                    15