<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Semantic Data Model for a FAIR Digital Repository of Heterogeneous Agricultural Digital Objects</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>The Case of the EUREKA FarmBook</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hercules Panoutsopoulos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christopher Brewster</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Spyros Fountas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Natural Resources Management and Agricultural Engineering, Agricultural University of Athens</institution>
          ,
          <addr-line>75 Iera Odos St, Athens, 11855</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Data Science, Maastricht University</institution>
          ,
          <addr-line>Maastricht, Netherlands</addr-line>
          ,
          <institution>&amp; Data Science Group, TNO</institution>
          ,
          <addr-line>Soesterberg</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>During the past few years a significant number of agriculture-related research and development projects have been implemented by receiving funding from the European Commission. All these projects have aimed to address specific problems and have produced solutions documented in the digital objects created in their context. However, the uptake of the existing information and knowledge by the concerned stakeholders is not yet adequate. On the other hand, the divergence of the available digital objects in terms of their types and formats poses significant challenges in homogeneously describing them with metadata. The Horizon 2020 EUREKA project aims to make a contribution towards this direction by developing a FAIR digital repository based on a semantic data model. The present document focusses on the design decisions related to it, as well as the rationale for the need to develop FAIR digital repositories based on formal data models.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Agriculture</kwd>
        <kwd>semantic data model</kwd>
        <kwd>digital repository</kwd>
        <kwd>FAIR principles</kwd>
        <kwd>semantic web standards</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In the seven years of implementation of the Horizon 2020 Framework Programme, the European
Commission invested nearly one billion euros in projects related to agriculture, forestry, and rural
development1. These projects have created a large number of digital objects conveying valuable
information and knowledge about best and innovative practices. However, the uptake and re-use
of these digital objects has not yet been realised to an adequate extent. In addition, most of these
digital objects are no longer available after the end of the projects. As a result, there is little
potential for the various agricultural stakeholders to have access to the available knowledge
for further research and solution development. This is the context in which the EUREKA
project (https://www.h2020eureka.eu/) is being situated. More specifically, EUREKA’s goal is to
build a meta-repository (i.e., the “FarmBook”) of the digital objects created by the EU-funded,
Horizon 2020 multi-actor projects. By this way, EUREKA aspires to make a contribution towards
ensuring the longevity of the knowledge generated in those “source” projects and therefore,
allowing it to be further circulated and used as the basis for future developments. However, in
this efort, there are significant challenges to address mostly relating to the heterogeneity of the
digital objects to be integrated into the meta-repository.</p>
      <p>
        Semantic web technologies and standards [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], existing for quite many years, provide
recommendations for the exchange and interoperability of data in the Linked Open Data (LOD)
Cloud. The more recently established FAIR principles [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] have further contributed to the vision
of standards-based data sharing. However, what becomes evident from existing practices is that
most of the existing repositories are based on ad-hoc data modelling decisions and designs [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
To enable the access and re-use of the variety of agricultural digital objects, the aim of EUREKA
is to develop a FAIR digital repository by drawing upon semantic web standards. To this end,
a semantic data model is proposed. This data model has been developed by considering and
re-using concepts and relations from existing ontologies. It provides a formal structure aimed
to be used for the identification and definition of metadata for describing and annotating the
digital objects available from the FarmBook. In this context, the following research questions
have been intended to be addressed:
• Which are the main concepts and relations needed to be captured, in our semantic data
model, to identify a set of metadata for annotating the breadth of digital objects coming
from the Horizon 2020 multi-actor projects?
• Which external ontologies and what components of those need to be considered and
re-used for the design of the proposed semantic data model?
      </p>
      <p>In the rest of the article, an overview of the issues related to the adoption of the FAIR principles
for developing FAIR digital repositories is provided. Then, our methodology and the semantic
data model are presented. Finally, some concluding remarks are provided together with accounts
of the steps to be taken next in EUREKA.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Data Modelling for FAIR Digital Repositories</title>
      <p>
        The FAIR data principles have been developed as guidelines for good data sharing practices,
initially focussing on the publication and re-use of scientific data sets [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. There has been a surge
of interest in applying the FAIR principles to help make data management and the stewardship
of research outputs easier, and pave the way towards repositories facilitating FAIR data sharing.
Therefore, starting from the concept of “FAIR datasets”, it is important to broaden the scope of
FAIRness to “FAIR digital objects” and “FAIR digital repositories” [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        To turn these ideas into reality, we need to establish a FAIR data ecosystem [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In such an
ecosystem, "FAIR digital repositories" have a pivotal role to play by providing all the necessary
mechanisms and tools that will enable the accessibility and re-use of "FAIR digital objects". The
main assumption that has underpinned our work is that a “FAIR digital repository” (a digital
repository for the housing and delivery of “FAIR digital objects”) needs to be built on a data model
designed for digital objects to have explicit metadata. In this context, the re-use of concepts
and relations from external, appropriately selected ontologies is considered a best practice in
a semantic web paradigm. This approach is particularly important in cases of initiatives that
deal with the development of meta-repositories, as in the case of EUREKA. In this application
scenario, a well-designed data model, built in the spirit of the FAIR principles, can help towards
homogenising the heterogeneity of the divergent types of the digital repositories and objects,
and the variety of the designs, standards, and formats that may have been considered. The
FAIR principles directly relevant for such a data modelling are I1 (emphasising the need for
conceptual designs established upon standards-based, well-defined data models), as well as F2,
I2, and R1 (setting out a list of requirements for the metadata to define based on a formal data
model).
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. The EUREKA FarmBook’s Semantic Data Model</title>
      <sec id="sec-3-1">
        <title>3.1. Methodology</title>
        <p>
          The FarmBook data model’s design has been based on the methodology of Noy and McGuiness
[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Despite the well-acknowledged rigour of contemporary ontology building methods (e.g.,
KNARM), or the OBO Foundry’s principles2, we have drawn upon a widely-adopted, baseline
approach able to be comprehended by non-experts as well, given the EUREKA’s target audience
(i.e., farmers, foresters, and advisors). The following steps were involved in our model’s design:
        </p>
        <p>STEP 1 - Determine the domain and scope of the ontology: The overarching goal of
the EUREKA FarmBook’s semantic data model is to provide formal descriptions of the concepts,
the concept relations, and the properties of concepts needed to be considered for defining
the metadata to homogeneously characterise the breadth of digital objects available from the
EU-funded, Horizon 2020 multi-actor projects. The design of the data model has been driven by
the following, indicative, competency questions:
• What are the subjects (topics) of the digital objects available from project X?
• Which are the geographic locations related to the content of the digital objects of type X?
• What are the digital objects of type X on subject Y, related to the geographic location Z
(e.g., the videos on pig farming in oak forests in Spain)?</p>
        <p>STEP 2 - Consider re-using existing ontologies: The external ontologies that have been
considered, in terms of re-using definitions of concepts, relations, and concept properties, for
the design of our semantic data model, are Schema.org, FRAPO, FOAF, and DCMI . These
are generic ontologies selected with the aim to help us model basic descriptive features of a
digital object relating, among others, to the format in which information is conveyed (text,
audio, video, etc.), as well as provenance (e.g., the digital object’s creator and source
multiactor project). We are indeed familiar with agriculture-related controlled vocabularies (e.g.,
AGROVOC) and ontologies (e.g., FoodOn, AgrO), and deeply acknowledge the added value of
re-using their components as part of the development of a formal data model. These structures
2http://obofoundry.org/principles/fp-000-summary.html
will be considered next in EUREKA for the needs of creating a flexible model of the subject(s)
addressed in the various digital objects.</p>
        <p>STEP 3 - Define the classes and class hierarchy and STEP 4 - Define the properties
of the classes: The details of the definition of the model classes, their hierarchy, and the class
properties are presented in the next section.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Semantic Data Model Description</title>
        <p>The FarmBook’s semantic data model is shown in Figure 1 below. The digital object available by
a multi-actor project, the project providing it, its creator, and the geographic location mentioned
in its content are key concepts needed to be captured in the model. A digital object’s subject
may be a term defined in (and thus, imported from) agriculture-related controlled vocabularies
(e.g., AGROVOC) or ontologies (e.g., FoodOn, AgrO), and is denoted with a rectangular shape
following a modelling convention similar to that in FoodOn. We consider more than one such
structures because of the divergence in the "source" projects’ focus.</p>
        <p>Table 1 shows the alignment between the concepts captured in our model and the components
re-used from existing ontologies. Table 2 provides information about the relations captured in
our model and the ontologies they have been imported from.</p>
        <p>To describe the full extent of the digital object categories needed to be considered for the
FarmBook’s design, a set of relevant concepts, organised in appropriate class hierarchies,
have been imported into our semantic data model from the Schema.org ontology. These
concepts are: (i) TextDigitalDocument and PresentationDigitalDocument (being subclasses of the
DigitalDocument class); (ii) AudioObject, ImageObject, and VideoObject (being subclasses of the
MediaObject class); (iii) SoftwareApplication; and (iv) Dataset. DigitalDocument, MediaObject,
SoftwareApplication and Dataset are all subclasses of the CreativeWork class, which has, in
turn, been modelled as a subclass of the frapo:Output class.</p>
        <p>This is a decision based on the fact that there are project outputs not considered as “creative
work” (e.g., the network of partners established in the project). This hierarchy is illustrated in
Figure 1 by viewing the diagram from right to left (i.e., moving from a lower to a higher level of
abstraction in regard to the definition of our model’s classes). The import of specific OWL classes
and properties (namely, owl:Restriction, owl:Class, owl:onProperty, and owl:minCardinality),
has allowed to infuse “heavier” semantics into our model and explicitly define cardinality
restrictions over specific relations (e.g., the dct:creator relation as shown in the code snippet
below).</p>
        <p>By capturing a number of digital object properties, modelled with the help of the schema:
{description, keywords, inLanguage, potentialAction, url, fileSize, dateCreated, license} and the
dct: {title, format, type} properties, as well as properties of the “source” project modelled with
the help of the schema: {name, alternateName, url} properties, it has been made feasible to
assemble the FarmBook’s semantic data model and, finally, come up with a set of metadata
considered both rich and flexible enough (including administrative, structural, and provenance
information) to efectively cover the needs of the EUREKA FarmBook’s target end-users (namely,
farmers, foresters, and advisors).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion and Future Perspectives</title>
      <p>Despite the considerable investments of the European Commission in agriculture-related
research and development projects, the concerned stakeholders cannot easily find and access
relevant information and knowledge. In addition, there is a need for good data sharing practices.
The broad spectrum of formats in which data and information is conveyed makes imperative
a shift of focus from "FAIR datasets" to “FAIR digital objects” and “FAIR repositories”. The
EUREKA FarmBook’s data model, designed upon well-acknowledged semantic web standards, is
an initiative that paves the way towards the development of FAIR repositories of heterogeneous,
agricultural digital objects. The semantic data model proposed will be used to define a set
of metadata for the annotation of the digital objects, currently available and collected from
the various multi-actor projects, which will be housed in the FarmBook. Moreover, it is also
intended to serve as a reference model, adopted by future projects, for the description of the
variety of the data and information that will be generated in their context and become available
through their own, custom repositories. This will facilitate a uniform, standards-based approach
in the characterisation of agricultural digital objects and enable their broader uptake and re-use.</p>
      <p>The next steps in EUREKA involve the development of a graph-based structure for modelling
the subjects addressed in the FarmBook’s digital objects, based on the re-use of components
of agricultural ontologies (e.g., FoodOn, AgrO) and controlled vocabularies (e.g., AGROVOC).
Furthermore, the procedures for ingesting digital objects in the FarmBook will be established.
This will allow to test in practice the utility of our model and the metadata set derived from it.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The work presented in this documented has taken place in the Horizon 2020 EUREKA project
receiving funding from the EU under the No 862790 Grant Agreement.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Burners-Lee</surname>
          </string-name>
          ,
          <article-title>Linked data - the story so far</article-title>
          , in: A. P. Sheth (Ed.),
          <source>Semantic Services, Interoperability and Web Applications: Emerging Concepts</source>
          ,
          <source>IGI Global, Hershey</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>205</fpage>
          -
          <lpage>227</lpage>
          . doi:
          <volume>10</volume>
          .4018/978-1-
          <fpage>60960</fpage>
          -593-3.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          , M. Dumontier, ...,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mons</surname>
          </string-name>
          ,
          <article-title>The fair guiding principles for scientific data management and stewardship</article-title>
          ,
          <source>Scientific Data</source>
          <volume>3</volume>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . doi:
          <volume>10</volume>
          .1038/sdata.
          <year>2016</year>
          .
          <volume>18</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dunning</surname>
          </string-name>
          , M. de Smaele, J. Böhmer,
          <article-title>Are the fair data principles fair?</article-title>
          ,
          <source>International Journal of Digital Curation</source>
          <volume>12</volume>
          (
          <year>2017</year>
          )
          <fpage>177</fpage>
          -
          <lpage>195</lpage>
          . doi:
          <volume>10</volume>
          .2218/ijdc.v12i2.
          <fpage>567</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Commission</surname>
          </string-name>
          ,
          <article-title>Turning fair into reality: Final report and action plan from the european commission expert group on fair data, 2018</article-title>
          . URL: https://ec.europa.eu/info/sites/default/ ifles/turning_fair_into_reality_1.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <article-title>Ontology development 101: A guide to creating your first ontology</article-title>
          ,
          <year>2001</year>
          . URL: https://protege.stanford.edu/publications/ontology_development/ ontology101.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>