<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Reconciliation Framework for the Integration of Stocks and Fisheries Information</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yannis Marketakis</string-name>
          <email>marketak@ics.forth.gr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yannis Tzitzikas</string-name>
          <email>tzitzik@ics.forth.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aureliano Gentile</string-name>
          <email>aureliano.gentile@fao.org</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bracken van Niekerk</string-name>
          <email>bracken.vannieker@fao.org</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Agriculture Organization of the United Nations</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, University of Crete</institution>
          ,
          <addr-line>Heraklion</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Computer Science</institution>
          ,
          <addr-line>FORTH-ICS, Heraklion</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Reconciliation</institution>
          ,
          <addr-line>Entity Matching, Stock, Fishery, Species, Water Area, Fishing Gear</addr-line>
        </aff>
      </contrib-group>
      <fpage>292</fpage>
      <lpage>297</lpage>
      <abstract>
        <p>Fisheries management relies on analyzing data using complex models and software and includes the usually manual process of identifying and combining different parts of information about stocks and fisheries, which is a time-consuming and error-prone process. Firstly because there is no single source of information but rather they are many, and secondly because there are alternative ways of modeling and referring to the same piece of information. Approaches like the Global Record of Stocks and Fisheries (GRSF), which are the result of the semantic data integration of the corresponding information from different data sources, aim to overcome such problems, by providing a unified view of the stocks and fisheries information in a homogeneous manner. In this paper, we propose a reconciliation framework ensuring that similar pieces of information from heterogeneous sources are properly connected during the construction of the semantic warehouse of GRSF.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The main goal of fisheries management is to monitor, specify and propose regulations and rules for
protecting the fishery resources, so that their sustainable exploitation is possible. Although there is no
clear and generally accepted definition of fisheries management, according to [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] the main task of
fisheries
management is the integrated
      </p>
      <p>process of information gathering, analysis, planning,
consultation, decision-making, allocation of resources and formulation and implementation, with
enforcement as necessary, of regulations or rules which govern fisheries activities in order to ensure the
continued productivity of the resources and the accomplishment of other fisheries objectives. The key
indicators for efficient fisheries management are stocks and fisheries. Stocks refer to groups or
individuals of a species occupying a well-defined spatial range (e.g. swordfish in the Mediterranean
Sea), while fisheries describe the activities leading to the harvesting of the fish within a particular area,
using a particular method or equipment and purpose of activity (e.g. the Atlantic cod fishery in the area
of East and South Greenland).</p>
      <p>Nowadays there are several stakeholders, reporting information about stocks and fisheries at
regional, national and local levels. These stakeholders maintain their knowledge and publicize their
contents independently from each other. However, cross-references between the knowledge bases from
different stakeholders is not a common issue. In addition, the use of common vocabularies or standards</p>
      <p>2022 Copyright for this paper by its authors.
for describing particular aspects has not been globally agreed. This leads to the creation of several data
silos, each one with its own rules and terminology.</p>
      <p>
        The Global Record of Stocks and Fisheries (GRSF) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], aims to overcome such problems by
introducing a workflow that collects and semantically integrates stocks and fisheries information from
different databases, and present them in a unified manner. More specifically, it is the result of the
integration of (a) FIRMS [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], (b) RAM [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and (c) FishSource [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. During the construction and
refreshment [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] of GRSF, a reconciliation workflow is applied, ensuring that if there are similar
resources expressed in a different way they will be properly linked. In the following of the paper we
describe such cases. More specifically, in Section 2, we further discuss about the problem, in Section 3
we elaborate with the reconciliation framework and its applicability, and Section 4 concludes our work.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Motivation</title>
      <p>Since the sources contributing stocks and fisheries information to GRSF contain complementary
information, GRSF provides a merged view of those records. Merging is applied by comparing some
key elements of the records. For the case of stock records, the elements that are compared are: species
and assessment area, while for the case of fisheries they are: species, fishing area, management
authority, fishing gear, flag state. It becomes evident that the accuracy of the information of these
elements, specify the successful merging or not.</p>
      <p>Of course, there is not a single way of describing those elements. For example, marine species can
be referred to using their scientific name (e.g. Thunnus albacares), their common name in any language
(e.g. Yellowfin tuna in English), their 3-Alpha code (YFT), their APHIA ID (e.g. 127027)), etc. In a
similar manner, an area can be referred to using a common name, a FAO major area code (e.g., 37.3.1),
a GFCM code, an LME code, an ISO3 code of the exclusive economic zone, etc. The same applies for
all the aforementioned elements.</p>
      <p>Moreover, it is quite common that different sources use different terminologies for describing their
contents. For example, in FIRMS species are usually referred to using their 3-Alpha codes, RAM reports
their scientific name and taxonomy, and FishSource uses various combinations. It is therefore evident
that before applying any merging activity in GRSF, it is required to build a knowledge base, with the
different identifiers of each resource, so that they can be efficiently compared.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Reconciliation Framework</title>
      <p>The reconciliation framework is responsible for constructing a knowledge base consisting of the
different identifiers of the entities. The knowledge base can be afterwards exploited during the
construction of the GRSF, so that (a) the merging of records from different sources can be executed
properly and (b) all the records in GRSF use the preferred terminologies with respect to GRSF
guidelines. Below we introduce the conceptual backbone of the constructed knowledge base, we
describe the reconciliation framework workflow and we provide more information about the
applicability of the framework for the purposes of GRSF.
3.1.</p>
    </sec>
    <sec id="sec-4">
      <title>Model</title>
      <p>The configuration of the conceptual model for the reconciliation framework of GRSF is shown in
the upper part of Figure 1. The core element is Term, and is associated with the elements Identifier and
Information Object. More specifically, a Term can have a preferred identifier, one or more alternative
identifiers, and many additional information objects. In order to compare two different terms, the
identifier elements are used. In particular, if two terms have at least one of their identifiers the same
then they are merged into one. For this reason, we are comparing all the available identifiers and a
match is confirmed if all the values of an identifier match (i.e. identifier, type, index). For terms not
having any identifier, we are comparing their additional information objects. The main difference is
that when two terms have the same additional information object (same value and type), and at least
one of them does not have any identifier, a suggestion of merging the terms is made and is it up to the
user to approve it. The lower part of Figure 1 shows an indicative example for the term about the species
with scientific name “Thunnus albacares”.</p>
      <p>Identifier elements are enough for comparing and merging two terms. Practically, this means that as
soon as two terms have at least one of their identifiers the same then they are matched, no matter if the
identifier is the preferred or an alternative one. Furthermore, each identifier is associated with an index,
representing the order of preference. Upon the construction of all the terms, they can be used for
selecting the most preferred identifier of a term, starting from the preferred identifier (that has always
index =1), and if that does not exist, the alternative identifier with the lowest index is used. In the
example of Figure 1 the preferred identifier is YFT (3-Alpha code), the second most preferred is 127027
(APHIA ID), and so on. The types and indexes of the identifiers are specified during the configuration
of the GRSF construction/refresh workflow. An indicative configuration of the types and indexes of the
identifiers for GRSF is given in Section 3.3.
3.2.</p>
    </sec>
    <sec id="sec-5">
      <title>Workflow</title>
      <p>After the construction of the reconciliation KB, it is exploited during the construction of the GRSF
records. This is mandatory, so that records are fully compliant with the GRSF guidelines, that specify
how the different information of GRSF records should be presented. This is achieved through the lookup
methods, that allow spotting terms from the KB. As soon as a term is located, its preferred identifier is
used for the construction of the corresponding GRSF record. If it does not exist, the alternative identifier
with the lowest index will be used.
3.3.</p>
    </sec>
    <sec id="sec-6">
      <title>Application</title>
    </sec>
    <sec id="sec-7">
      <title>4. Conclusion</title>
      <sec id="sec-7-1">
        <title>Preferred ID / Index 3-Alpha[7] code / 1 FAO[9] code / 1</title>
      </sec>
      <sec id="sec-7-2">
        <title>ISSCFG[14] code / 1 ISO-3 / 1</title>
      </sec>
      <sec id="sec-7-3">
        <title>Acronym / 1</title>
      </sec>
      <sec id="sec-7-4">
        <title>Alternative IDs / Index APHIA[8] ID / 2</title>
      </sec>
      <sec id="sec-7-5">
        <title>Scientific Name / 3</title>
        <p>GFCM[12] / 2</p>
        <p>LME[10] / 3</p>
        <p>MRGID[11] / 4
ISO-3[13] (EEZ) / 5</p>
      </sec>
      <sec id="sec-7-6">
        <title>ISSCFG abbrev. / 2</title>
      </sec>
      <sec id="sec-7-7">
        <title>ISSCFG category / 3</title>
      </sec>
      <sec id="sec-7-8">
        <title>Legal name / 2</title>
      </sec>
      <sec id="sec-7-9">
        <title>Full name / 2</title>
        <p>This paper demonstrates the reconciliation framework that facilitates the semantic data integration
of data sources with stocks and fisheries and is compliant with the GRSF guidelines and best practices.
Although we focus on the particular use case, the framework is generic enough and can be used across
different domains as well. In fact, in this paper we have described its configuration on top of the GRSF
construction workflow, with the objective of efficiently comparing and locating similar entities that
have been described using different identifiers.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>5. Acknowledgements</title>
    </sec>
    <sec id="sec-9">
      <title>6. References</title>
      <p>This work has received funding from the European Union’s Horizon 2020 innovation action
BlueCloud (Grant agreement No 862409)
[8] W. Appeltans, M.J. Costello, B. Vanhoorne, W. Decock, L. Vandepitte, F. Hernández, J. Mees, E.</p>
      <p>Vanden Berghe, E., 2008. Aphia for a World Register of Marine Species (WoRMS). VLIZ Special
Publication.
[9] FAO Major Fishing Areas. URL: https://www.fao.org/fishery/en/area/search
[10] Large Marine Ecosystems (LME). URL: https://lmehub.net/
[11] Marine Regions. URL: https://www.marineregions.org/
[12] General Fisheries Commision for the Mediterranean (GFCM). URL:
https://www.fao.org/gfcm/about/en/
[13] ISO 3166. URL: https://www.iso.org/iso-3166-country-codes.html
[14] The International Standard Statistical Classification of Fishing Gear (ISSCFG). URL:
https://data.apps.fao.org/catalog/dataset/the-international-standard-statistical-classification-offishing-gear-isscfg</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K. L.</given-names>
            <surname>Cochrane</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.M.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A Fishery</given-names>
            <surname>Manager's Guidebook</surname>
          </string-name>
          , John Wiley &amp; Sons,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[2] Fisheries and Resources Monitoring System</source>
          ,
          <year>2022</year>
          . URL: http://firms.fao.org/firms/en
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>The</given-names>
            <surname>RAM Legacy Stock Assessment Database</surname>
          </string-name>
          ,
          <year>2022</year>
          . URL: https://www.ramlegacy.org/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Sustainable</given-names>
            <surname>Fisheries</surname>
          </string-name>
          <string-name>
            <surname>Partnership</surname>
          </string-name>
          ,
          <string-name>
            <surname>FishSource</surname>
          </string-name>
          <year>2022</year>
          . URL: https://www.fishsource.org/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tzitzikas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Marketakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Minadakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mountantonakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Candela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mangiacrappa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pagano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Perciante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Castelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taconet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gentille</surname>
          </string-name>
          , G. Gorelli,
          <source>Towards a Global Record of Stocks and Fisheries, 8th International Conference on Information and Communication Technologies in Agriculture, Food &amp; Environment, September 21-24</source>
          ,
          <year>2017</year>
          , Chania, Crete, Greece.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Marketakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tzitzikas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gentille</surname>
          </string-name>
          , Bracken Van Niekerk,
          <string-name>
            <given-names>and Marc</given-names>
            <surname>Taconet</surname>
          </string-name>
          ,
          <source>On the Evolution of Semantic Warehouses: The Case of Global Record of Stocks and Fisheries, 14th International Conference on Metadata and Semantics Research</source>
          , Special Track on Metadata &amp;
          <article-title>Semantics for Agriculture</article-title>
          ,
          <source>Food &amp; Environment (MTSR'20) Madrid</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>ASFIS</surname>
          </string-name>
          <article-title>List of Species for Fishery Statistics Purposes</article-title>
          . URL: https://www.fao.org/fishery/en/collection/asfis/en
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>