Embrapa’s contributions to integrate brazilian agricultural vocabularies: Agrotermos in AgroPortal Milena Ambrosio Telles1,* , Clement Jonquet2 , Bibiana Teixeira de Almeida1 , Jaudete Daltio1 , Celina Maki Takemura1 , Leandro Henrique Mendonça de Oliveira1 and Maria de Cléofas Faggion Alencar1 1 Empresa Brasileira de Pesquisa Agropecuária (Embrapa), Brazil 2 MISTEA, Institut Agro, INRAE, University of Montpellier, France; LIRMM, CNRS, University of Montpellier, France Abstract The paper decribes the motivation, challenges and preliminary results of changes made to Agrotermos, Embrapa’s conceptual space, to turn it into a semantic resource encoded in standard Knowledge Organization Systems (KOS) formats in order to facilitate it’s manteinance and edition, promote its interoperability with other resources and improve the access and visibility to Brazilian Portuguese semantic resources and henceforth to Embrapa’s results in Brazilian research in Agriculture and related domains. These changes were made within the scope of a partnership established with AgroPortal, a web-based platform designed to support data integration, sharing, and analysis, and which offers tools and services to manage and use ontologies and semantic resources. Agrotermos’ infrastructure and processes are being reformulated, VocBench is being implemented as its data management tool, and its contents are being prepared and validated for publication in AgroPortal. The preliminary results of the changes to Agrotermos’ infrastructure show that both VocBench and AgroPortal largely improve editing, visualization of the semantic content and validation of mappings and alignments between vocabularies. Keywords semantic resources, knowledge organization systems (KOS), interoperability, terminology 1. Introduction and Motivation Human knowledge is structured using mostly written languages, and human languages recorded in the form of texts produce immense sets of data and information. Organizations involved in Research, Development and Innovation, such as the Brazilian Agricultural Research Corporation (Embrapa) or France’s National Research Institute for Agriculture, Food and Environment (INRAE), face challenges to process all their data to extract knowledge for decision support [1]. Knowledge Organization Systems (KOS) that model the semantic structures of specialized domains become essencial to process, organize, systematize and manage such data and information, and hence foster innovation [2]. These KOS, materialized as semantic resources/artefacts (such as ontologies, terminologies and thesauri), enable identifying standards, tendencies and insights off complex datasets and provide valuable support to strategic decisions [3]. Embrapa was established to lay technical and technological foundations for tropical agriculture and animal farming, and is one of the largest agricultural research corporations in the world [4]. The company’s technologies and solutions are produced and offered mostly in Brazilian Portuguese, but have good outreach potential to address the whole tropical world. Aside from Brazil, Portuguese is an official language in eight other countries in four continents and is spoken by over 260 million people worlwide. Proceedings of the 17th Seminar on Ontology Research in Brazil (ONTOBRAS 2024) and 8th Doctoral and Masters Consortium on Ontologies (WTDO 2024), Vitória, Brazil, October 07-10, 2024. * Corresponding author. $ milena.telles@embrapa.br (M. A. Telles); clement.joqnuet@inrae.fr (C. Jonquet); bibiana.almeida@embrapa.br (B. T. d. Almeida); jaudete.daltio@embrapa.br (J. Daltio); celina.takemura@embrapa.br (C. M. Takemura); leandro.oliveira@embrapa.br (L. H. M. d. Oliveira); cleofas.alencar@embrapa.br (M. d. C. F. Alencar)  0000-0001-9523-9724 (M. A. Telles); 0000-0002-2404-1582 (C. Jonquet); 0000-0003-0539-5008 (B. T. d. Almeida); 0000-0002-4984-4832 (J. Daltio); 0000-0002-6516-559X (C. M. Takemura); 0000-0002-5628-3682 (L. H. M. d. Oliveira); 0000-0003-3167-6903 (M. d. C. F. Alencar) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings With the aim of addressing some of this challenges using terminology to add value to and expand the company’s knowledge management expertise, capacity and outreach, Embrapa created Agrotermos, a conceptual space for knowledge representation of agriculture and its related areas. Operational constraints limited the use of this semantic resource to its full potential, which motivated a thorough study of improvements and updates necessary to both Agrotermos’ infrastructure and processes. The goal of this paper is to present an overview of the challenges faced and the technological and procedural strategy currently put in place to make Agrotermos more operational and available to its users. This article is divided into three parts and the introduction, which describe 1) the scenario of the ongoing work, a brief description of the objects involved and the main challenges encountered; 2) the results achieved so far; 3) the preliminary conclusions and the next steps to be followed. 2. Research Scenario 2.1. Agrotermos Since 2018, Embrapa has a designated Permanent Commission for Controlled Vocabularies, Agriter- minologies and Agrisemantics (GTermos), which is committed to assembling, sharing, disseminating and supporting terminological and semiotic support to knowledge, data and information management initiatives by Embrapa and its partners. Its actions are aimed at increasingly aligning Agrotermos with global standardisation initiatives, such as W3C, and tendencies, such as the FAIR Principles (Findable, Accessible, Interoperable, Reusable) for knowledge and data information management processes within the company. GTermos, a team of ten people, is responsible for managing and editing Agrotermos. Agrotermos (https://sistemas.sede.embrapa.br/agrotermos/) was conceived to aggregate the main controlled vocabularies on agriculture and related areas available in Portuguese [5]. Built based on the method and technology proposed by the Global Agricultural Concept Space (GACS) [6], Agrotermos contains the full Brazilian Thesaurus Agrícola Nacional (https://sistemas.agricultura.gov.br/tematres/ vocab/index.php - Thesagro) and the Portuguese content of FAO’s (United Nation’s Food and Agriculture Organization) Agrovoc Multilingual Thesaurus, as well as other resources created by Embrapa, along with alignments and mappings created by the GTermos team between these resources. Agrotermos comprises a platform for organizing, qualifying, and offering terminology data and semantic applications produced within the company. Agrotermos’ version 1.0, built in 2014, contains nearly 56 thousand terms, and merges in one unique conceptual space the content from multiple sources to use them all uniformally in an integrated approach. It may presently be accessed twofold: (1) by a web interface, intended for human use; and (2) by an application programming interface (API) that uses web-service Representational State Transfer (REST) technology, for machine-machine communication. These interfaces enable searching and browsing the semantic resource’s content (concepts and their relations) as served from a relational database. This version of Agrotermos has been accessible only via the custom eponym application, but did not exist as an independent source file (typically using the SKOS language representation and the corresponding RDF syntaxes) that could be distributed to users and explored by multiple tools and products using semantic web standards. However, since GTermos is also part of the Agrovoc editorial community as the editor of Brazilian Portuguese[7], Agrotermos’ curation models and practices have drifted increasingly closer to those of Agrovoc, and hence to relevant international standards, processes, resources and tools. Over time, challenges and requirements have arisen, and advancements to Agrotermos became necessary, including the need for GTermos/Embrapa to be able to distribute Agrotermos, either whole or in parts, as a semantic resource encoded in a standard format such as SKOS. In 2024, ten years into Agrotermos’ creation, new taskforces have been formed by GTermos, and its partners within and outside Embrapa, to address these issues and to select and implement up-to-date, standard-aligned procedures and infrastructures. 2.2. AgroPortal AgroPortal (https://agroportal.lirmm.fr/) is a web-based platform that provides access to a wide range of semantic resources and ontologies in the agri-food domain [8]. It is designed to support data integration, sharing, and analysis by offering tools and services to manage and use ontologies and semantic resources effectively. The platform is based on the OntoPortal technology [9] and features ontology hosting, search, versioning, visualization, comment, recommendation, enables semantic annotation, as well as storing and exploiting ontology alignments, all within a fully semantic web compliant infrastructure. It currently hosts 188 semantic resources in different formats, and languages. 2.3. Research Challenges The main challenges faced by Agrotermos are twofold: 1. Upgrading its technological infrastructure to one that is sourced by and operates under up-to-date semantic standards, and 2. Offering a visualization (distribution) interface that enables different users (librarians, text editors, terminologists, etc.) to easily access and use its data. Regarding Agrotermos’ technological infrastructure, the main challenge was mapping and addressing all the main technological and procedural aspects required to transform it into a fully interoperable semantic resource that meets both the international standards for resources of its kind and the FAIR Principles. This means that the set of technologies adopted should be a standard-compliant, easy-to-use, web-accessible platform for the curation and editing of different kinds of semantic resources. It should feature different user profiles, to enable hierarchical decisions and reviews, as well as different output formats. This “vocabulary management tool” should enable the operation of semantic resources in Portuguese (Brazilian Portuguese) and English, as well as the creation of mappings and alignments between its own and other semantic resources, e.g., controlled vocabularies with either poly- or monohierarchical structure. Furthermore, its outputs should be seamlessly interoperable with other semantic resources. This required revisiting, exercising and redefining the processes involved in editing and inserting new semantic resources into Agrotermos. To address this issues, and in partnership with AgroPortal, GTermos decided to embrace SKOS natively to distribute the content of Agrotermos and then rely on the vocabulary platform to distribute, serve and visualize this content. Between SKOS and OWL, which are the two W3C recommendations for representing semantic resources, SKOS was the most appropriate. It provides the means and constructs appropriate for the level of semantics needed for Thesagro and VocGeo, which have no need for the complex semantic layer of OWL. It is also the representation language already adopted by Agrovoc. This provides greater visibility for Agrotermos’ data and, consequently, for Brazilian agriculture. In AgroPortal, new knowledge insights are possible between Agrotermos and other ontologies and semantic resources in the repository. Plus, this choice is coherent with the fact that Agrovoc, partly included in Agrotermos, is itself hosted and served by AgroPortal. Nevertheless, adopting SKOS and using AgroPortal to distribute Agrotermos’ resources would address only the requirements related to the worldwide standard “distribution” of its results. There remained a need to adopt new standard tools and methods to “produce” the resources and use a standard vocabulary edition software such as VocBench[10]. 3. Results We have rethought the whole Agrotermos original infrastructure to address the aforementioned issues. The use of more structured technologies for managing, visualizing, and interoperating Agrotermos’ terminological data and semantic resources required embracing rather thorough technological advance- ments. Figure 1 shows the proposed new structure. The new structure equips the GTermos team with the multilingual platform VocBench to manage, edit and produce SKOS distributions of the resources contained in Agrotermos. The choice for VocBench as the data management tool for Agrotermos was based on the team’s own experience in editing the Figure 1: Overview of Agrotermos’ proposed new structure. Brazilian Portuguese content in Agrovoc. VocBench is a web-based, multilingual, collaborative platform for managing OWL ontologies, SKOS thesauri, Ontolex-lemon lexicons, generic RDF datasets, and other semantic resources. Data from Thesagro and other terminological datasets, controlled vocabularies, and/or ontologies generated by Embrapa and its partners may be incorporated into the new Agrotermos environment. Portuguese-language Agrovoc data were part of the original Agrotermos database. However, since Agrovoc is now fully integrated into AgroPortal, and all the language editing for Agrovoc occurs within its own VocBench project, there is no longer a need to replicate its content in Agrotermos’ VocBench. In the new proposed structure, AgroPortal will not only display Agrotermos’ data in Brazilian Portuguese, but also align its content with all other semantic resources hosted on the portal, including Agrovoc [8]. To enable the implementation of the new tools and the establishment of new management processes for Agrotermos, four workgroups were assembled to: 1. adapt Thesagro, itself included in Agrotermos, directly from its original BINAGRI Tematres installation to SKOS and include it in AgroPortal; 2. adapt VocGeo, a vocabulary on geoinformation produced by Embrapa (also included in the Agrotermos database), to SKOS and include it in AgroPortal; 3. study and install VocBench as the management tool for Agrotermos’ semantic resources and mappings; 4. organise a group of semantic resources in AgroPortal called Agrotermos, which contains Agrovoc, Thesagro and VocGeo for the moment, but may comprise other Embrapa’s semantic resources in the future. The data adaptation mentioned in items 1. and 2. included generating the complete SKOS format and uploading it to Agroportal. Thesagro and VocGeo, as well as the alignments and mappings that existed between them and Agrovoc’s content in Portuguese (PT) and Brazilian Portuguese (PT-BR), have already been inserted into AgroPortal. Thesagro and VocGeo are in their final validation phase to become publicly available without restrictions, and Agrovoc is already fully available. AgroPortal regrouped theses three resources within the Agrotermos group, and a slice was created for Agrotermos, to allow users to search and browse an exclusive AgroPortal view of only these three semantic resources: https://agrotermos.agroportal.lirmm. fr/. 4. Conclusion and Ongoing Work The first results of the changes to Agrotermos’ infrastructure show that both VocBench and AgroPortal largely improve both editing, the visualization of the data and the validation of mappings and alignments between vocabularies. Furthermore, once VocGeo and Thesagro are fully, publicly available, they will get a FAIR score produced by O’FAIRe tool in AgroPortal. There is still a need for training the GTermos team, which is at the beginning of its learning curve to master the use of the new tools to their full potential. Furthermore, the new proposed infrastructure has drawn GTermos and Thesagro’s team to negotia- tions to collaborate in joint infrastructure and processes for editing both Agrotermos and Thesagro. Acknowledgments To all GTermos’ members and collaborators, the workgroup by “Sistema Embrapa de Bibliotecas” (SEB), and Filipi Soares Miranda, for their contributions to test and improve Agrotermos. CJ was funded by the D2KAB project (www.d2kab.org) that received funding from the French National Research Agency (ANR-18-CE23-0017). References [1] I. Pierozzi Junior, P. R. B. Bertin, C. de Laia Machado, A. R. da Silva, Towards semantic knowledge maps applications: modelling the ontological nature of data and information governance in a RD organization, InTech, Rijeka, 2018. doi:10.5772/67978. [2] M. L. Zeng, Knowledge organization systems (KOS), Knowledge Organization 35 (2008) 160–182. doi:10.5771/0943-7444-2008-2-3-160. [3] C. G. Duque, G. G. Bastos, Ontologia aplicada a um modelo de gestão organizacional: contribuições da ciência da informação, Ciência da Informação 46 (2017) 197–213. doi:10.18225/ci.inf. v46i1.4023. [4] E. S. de Comunicação., Seu futuro inspira a nossa ciência, Brasília, DF, 2023. [5] Embrapa, Agrotermos, https://sistemas.sede.embrapa.br/agrotermos/, 2014. [6] I. Pierozzi Junior, M. C. Visoli, M. I. F. Souza, L. M. S. Cunha, I. Vacari, T. Z. Torres, Engenharia da informação: contribuições para a agricultura digital, Embrapa, 2020, pp. 192–217. URL: https:// ainfo.cnptia.embrapa.br/digital/bitstream/item/217705/1/LV-Agricultura-digital-2020-cap8.pdf. [7] FAO, Meet the agrovoc editorial community, https://www.fao.org/agrovoc/news/meet-agrovoc- editorial-community-1, 2021. [8] C. Jonquet, A. Toulet, E. Arnaud, S. Aubin, E. Dzalé Yeumo, V. Emonet, J. Graybeal, M.-A. Laporte, M. A. Musen, V. Pesce, P. Larmande, Agroportal: A vocabulary and ontology repository for agron- omy, Computers and Electronics in Agriculture 144 (2018) 126–143. URL: https://www.sciencedirect. com/science/article/pii/S0168169916309541. doi:10.1016/j.compag.2017.10.012. [9] C. Jonquet, J. Graybeal, S. Bouazzouni, M. Dorf, N. Fiore, X. Kechagioglou, T. Redmond, I. Rosati, A. Skrenchuk, J. L. Vendetti, M. Musen, Ontology Repositories and Semantic Artefact Catalogues with the OntoPortal Technology, in: ISWC 2023 - 22nd International Semantic Web Conference, volume 14266 of Lecture Notes in Computer Science, Springer, Athens, Greece, 2023, pp. 38–58. URL: https://hal.science/hal-04088537. doi:10.1007/978-3-031-47243-5\_3, members of the OntoPortal Alliance. [10] A. Stellato, M. Fiorelli, A. Turbati, T. Lorenzetti, W. Gemert, D. Dechandon, C. Laaboudi-Spoiden, A. Gerencsér, A. Waniart, E. Costetchi, J. Keizer, Vocbench 3: A collaborative semantic web editor for ontologies, thesauri and lexicons, Semantic Web 11 (2020) 855–881. doi:10.3233/SW-200370.