=Paper=
{{Paper
|id=Vol-3293/paper53
|storemode=property
|title=A Reconciliation Framework for the Integration of Stocks and Fisheries Information
|pdfUrl=https://ceur-ws.org/Vol-3293/paper53.pdf
|volume=Vol-3293
|authors=Yannis Marketakis,Yannis Tzitzikas,Aureliano Gentile,Bracken van Niekerk,Marc Taconet
|dblpUrl=https://dblp.org/rec/conf/haicta/MarketakisTGNT22
}}
==A Reconciliation Framework for the Integration of Stocks and Fisheries Information==
A Reconciliation Framework for the Integration of Stocks and Fisheries Information Yannis Marketakis 1, Yannis Tzitzikas 1,2, Aureliano Gentile 3, Bracken van Niekerk 3 and Marc Taconet 3 1 Institute of Computer Science, FORTH-ICS, Heraklion, Greece 2 Computer Science Department, University of Crete, Heraklion, Greece 3 Food and Agriculture Organization of the United Nations, Rome, Italy Abstract Fisheries management relies on analyzing data using complex models and software and includes the usually manual process of identifying and combining different parts of information about stocks and fisheries, which is a time-consuming and error-prone process. Firstly because there is no single source of information but rather they are many, and secondly because there are alternative ways of modeling and referring to the same piece of information. Approaches like the Global Record of Stocks and Fisheries (GRSF), which are the result of the semantic data integration of the corresponding information from different data sources, aim to overcome such problems, by providing a unified view of the stocks and fisheries information in a homogeneous manner. In this paper, we propose a reconciliation framework ensuring that similar pieces of information from heterogeneous sources are properly connected during the construction of the semantic warehouse of GRSF. Keywords 1 Reconciliation, Entity Matching, Stock, Fishery, Species, Water Area, Fishing Gear 1. Introduction The main goal of fisheries management is to monitor, specify and propose regulations and rules for protecting the fishery resources, so that their sustainable exploitation is possible. Although there is no clear and generally accepted definition of fisheries management, according to [1] the main task of fisheries management is the integrated process of information gathering, analysis, planning, consultation, decision-making, allocation of resources and formulation and implementation, with enforcement as necessary, of regulations or rules which govern fisheries activities in order to ensure the continued productivity of the resources and the accomplishment of other fisheries objectives. The key indicators for efficient fisheries management are stocks and fisheries. Stocks refer to groups or individuals of a species occupying a well-defined spatial range (e.g. swordfish in the Mediterranean Sea), while fisheries describe the activities leading to the harvesting of the fish within a particular area, using a particular method or equipment and purpose of activity (e.g. the Atlantic cod fishery in the area of East and South Greenland). Nowadays there are several stakeholders, reporting information about stocks and fisheries at regional, national and local levels. These stakeholders maintain their knowledge and publicize their contents independently from each other. However, cross-references between the knowledge bases from different stakeholders is not a common issue. In addition, the use of common vocabularies or standards Proceedings of HAICTA 2022, September 22–25, 2022, Athens, Greece EMAIL: marketak@ics.forth.gr (A. 1); tzitzik@ics.forth.gr (A. 2); aureliano.gentile@fao.org (A. 3); bracken.vannieker@fao.org (A. 4); marc.tacone@fao.org (A. 5) ORCID: 0000-0002-0417-2526 (A. 1); 0000-0001-8847-2130 (A. 2); 0000-0002-6542-132x (A. 3); 0000-0001-8537-3305 (A. 4); 0000- 0002-3103-6204 (A. 5) ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 292 for describing particular aspects has not been globally agreed. This leads to the creation of several data silos, each one with its own rules and terminology. The Global Record of Stocks and Fisheries (GRSF) [5], aims to overcome such problems by introducing a workflow that collects and semantically integrates stocks and fisheries information from different databases, and present them in a unified manner. More specifically, it is the result of the integration of (a) FIRMS [2], (b) RAM [3] and (c) FishSource [4]. During the construction and refreshment [6] of GRSF, a reconciliation workflow is applied, ensuring that if there are similar resources expressed in a different way they will be properly linked. In the following of the paper we describe such cases. More specifically, in Section 2, we further discuss about the problem, in Section 3 we elaborate with the reconciliation framework and its applicability, and Section 4 concludes our work. 2. Motivation Since the sources contributing stocks and fisheries information to GRSF contain complementary information, GRSF provides a merged view of those records. Merging is applied by comparing some key elements of the records. For the case of stock records, the elements that are compared are: species and assessment area, while for the case of fisheries they are: species, fishing area, management authority, fishing gear, flag state. It becomes evident that the accuracy of the information of these elements, specify the successful merging or not. Of course, there is not a single way of describing those elements. For example, marine species can be referred to using their scientific name (e.g. Thunnus albacares), their common name in any language (e.g. Yellowfin tuna in English), their 3-Alpha code (YFT), their APHIA ID (e.g. 127027)), etc. In a similar manner, an area can be referred to using a common name, a FAO major area code (e.g., 37.3.1), a GFCM code, an LME code, an ISO3 code of the exclusive economic zone, etc. The same applies for all the aforementioned elements. Moreover, it is quite common that different sources use different terminologies for describing their contents. For example, in FIRMS species are usually referred to using their 3-Alpha codes, RAM reports their scientific name and taxonomy, and FishSource uses various combinations. It is therefore evident that before applying any merging activity in GRSF, it is required to build a knowledge base, with the different identifiers of each resource, so that they can be efficiently compared. 3. Reconciliation Framework The reconciliation framework is responsible for constructing a knowledge base consisting of the different identifiers of the entities. The knowledge base can be afterwards exploited during the construction of the GRSF, so that (a) the merging of records from different sources can be executed properly and (b) all the records in GRSF use the preferred terminologies with respect to GRSF guidelines. Below we introduce the conceptual backbone of the constructed knowledge base, we describe the reconciliation framework workflow and we provide more information about the applicability of the framework for the purposes of GRSF. 3.1. Model The configuration of the conceptual model for the reconciliation framework of GRSF is shown in the upper part of Figure 1. The core element is Term, and is associated with the elements Identifier and Information Object. More specifically, a Term can have a preferred identifier, one or more alternative identifiers, and many additional information objects. In order to compare two different terms, the identifier elements are used. In particular, if two terms have at least one of their identifiers the same then they are merged into one. For this reason, we are comparing all the available identifiers and a match is confirmed if all the values of an identifier match (i.e. identifier, type, index). For terms not having any identifier, we are comparing their additional information objects. The main difference is that when two terms have the same additional information object (same value and type), and at least one of them does not have any identifier, a suggestion of merging the terms is made and is it up to the 293 user to approve it. The lower part of Figure 1 shows an indicative example for the term about the species with scientific name “Thunnus albacares”. Figure 1: The Reconciliation framework data model Identifier elements are enough for comparing and merging two terms. Practically, this means that as soon as two terms have at least one of their identifiers the same then they are matched, no matter if the identifier is the preferred or an alternative one. Furthermore, each identifier is associated with an index, representing the order of preference. Upon the construction of all the terms, they can be used for selecting the most preferred identifier of a term, starting from the preferred identifier (that has always index =1), and if that does not exist, the alternative identifier with the lowest index is used. In the example of Figure 1 the preferred identifier is YFT (3-Alpha code), the second most preferred is 127027 (APHIA ID), and so on. The types and indexes of the identifiers are specified during the configuration of the GRSF construction/refresh workflow. An indicative configuration of the types and indexes of the identifiers for GRSF is given in Section 3.3. 3.2. Workflow Figure 2 shows the overall workflow that is used for constructing GRSF and the placement of the reconciliation workflow. The detailed description of the construction workflow and the refreshment of GRSF can be found in [5] and [6]. More specifically, the reconciliation KB is constructed using as input the transformed data as well as data from external sources. For each concrete resource, a new entry is added in the knowledge base with the available information. Of course it is expected that some information might be missing during the creation of a new terminology resource. The reconciliation framework will take care of complementing the missing information as soon as they exist somewhere else (e.g. in another record, a transformed source, or an external source). Particularly, if a term with a particular identifier already exist, then its information are complemented with the new ones, otherwise a new term is created. 294 Figure 2: The GRSF construction workflow Figure 3 illustrates an example depicting three stock records, occupying the same species, and each source documents that species differently. S1 will create a new term of entity type Species, with preferred identifier “YFT”, and additional identifier “Thunnus albacares” with type scientific name. S2 instead of creating a new term, it will enhance the already created term shown before, because they have the same additional identifier of type scientific name. Finally, S3 will further enhance the term with an additional identifier. Figure 3: Creating and enhancing a term After the construction of the reconciliation KB, it is exploited during the construction of the GRSF records. This is mandatory, so that records are fully compliant with the GRSF guidelines, that specify how the different information of GRSF records should be presented. This is achieved through the lookup methods, that allow spotting terms from the KB. As soon as a term is located, its preferred identifier is used for the construction of the corresponding GRSF record. If it does not exist, the alternative identifier with the lowest index will be used. 295 3.3. Application Table 1 shows the list of term types that are reconciled using the reconciliation framework for the purposes of GRSF. For each term type, we provide the preferred identifier, and the alternative ones, as they have been agreed in GRSF guidelines. Table 1 The term types and their identifiers Term Type Preferred ID / Index Alternative IDs / Index Species 3-Alpha[7] code / 1 APHIA[8] ID / 2 Scientific Name / 3 Areas (assessment, fishing) FAO[9] code / 1 GFCM[12] / 2 LME[10] / 3 MRGID[11] / 4 ISO-3[13] (EEZ) / 5 Fishing Gear ISSCFG[14] code / 1 ISSCFG abbrev. / 2 ISSCFG category / 3 Flag State ISO-3 / 1 Legal name / 2 Management Authority Acronym / 1 Full name / 2 4. Conclusion This paper demonstrates the reconciliation framework that facilitates the semantic data integration of data sources with stocks and fisheries and is compliant with the GRSF guidelines and best practices. Although we focus on the particular use case, the framework is generic enough and can be used across different domains as well. In fact, in this paper we have described its configuration on top of the GRSF construction workflow, with the objective of efficiently comparing and locating similar entities that have been described using different identifiers. 5. Acknowledgements This work has received funding from the European Union’s Horizon 2020 innovation action BlueCloud (Grant agreement No 862409) 6. References [1] K. L. Cochrane and S.M. Garcia, A Fishery Manager's Guidebook, John Wiley & Sons, 2009. [2] Fisheries and Resources Monitoring System, 2022. URL: http://firms.fao.org/firms/en [3] The RAM Legacy Stock Assessment Database, 2022. URL: https://www.ramlegacy.org/ [4] Sustainable Fisheries Partnership, FishSource 2022. URL: https://www.fishsource.org/ [5] Y. Tzitzikas, Y. Marketakis, N. Minadakis, M. Mountantonakis, L. Candela, F. Mangiacrappa, P. Pagano, C. Perciante, D. Castelli, M. Taconet, A. Gentille, G. Gorelli, Towards a Global Record of Stocks and Fisheries, 8th International Conference on Information and Communication Technologies in Agriculture, Food & Environment, September 21-24, 2017, Chania, Crete, Greece. [6] Y. Marketakis, Y. Tzitzikas, A. Gentille, Bracken Van Niekerk, and Marc Taconet, On the Evolution of Semantic Warehouses: The Case of Global Record of Stocks and Fisheries, 14th International Conference on Metadata and Semantics Research, Special Track on Metadata & Semantics for Agriculture, Food & Environment (MTSR'20) Madrid, 2020. [7] ASFIS List of Species for Fishery Statistics Purposes. URL: https://www.fao.org/fishery/en/collection/asfis/en 296 [8] W. Appeltans, M.J. Costello, B. Vanhoorne, W. Decock, L. Vandepitte, F. Hernández, J. Mees, E. Vanden Berghe, E., 2008. Aphia for a World Register of Marine Species (WoRMS). VLIZ Special Publication. [9] FAO Major Fishing Areas. URL: https://www.fao.org/fishery/en/area/search [10] Large Marine Ecosystems (LME). URL: https://lmehub.net/ [11] Marine Regions. URL: https://www.marineregions.org/ [12] General Fisheries Commision for the Mediterranean (GFCM). URL: https://www.fao.org/gfcm/about/en/ [13] ISO 3166. URL: https://www.iso.org/iso-3166-country-codes.html [14] The International Standard Statistical Classification of Fishing Gear (ISSCFG). URL: https://data.apps.fao.org/catalog/dataset/the-international-standard-statistical-classification-of- fishing-gear-isscfg 297