=Paper=
{{Paper
|id=Vol-1747/IP29_ICBO2016
|storemode=property
|title=Global Agricultural Concept Scheme: A Hub for Agricultural Vocabularies
|pdfUrl=https://ceur-ws.org/Vol-1747/IP29_ICBO2016.pdf
|volume=Vol-1747
|authors=Caterina Caracciolo,Tom Baker,Elizabeth Arnaud
|dblpUrl=https://dblp.org/rec/conf/icbo/CaraccioloBA16
}}
==Global Agricultural Concept Scheme: A Hub for Agricultural Vocabularies ==
Global Agricultural Concept Scheme
A Hub for Agricultural Vocabularies
Tom Baker Caterina Caracciolo
Independent FAO consultant Food and Agriculture Organization of the UN (FAO)
Bonn, Germany Italy, Rome
Elizabeth Arnaud
Bioversity International
Montpellier, France
Abstract— Thesauri are used to tag semi-structured GACS Core Beta 3.16, soft-launched at the Open Harvest
documents, texts, while more complex semantic structures are workshop of May 2016, provides 15,000 concepts formed by
used to describe (annotate) scientific data. We are creating a mapping and merging the most frequently used concepts from
Global Agricultural Concept Scheme (GACS) by mapping the three source thesauri. GACS Core concepts are labeled in
AGROVOC, CABT and NALT – three major thesauri in the multiple languages, with some in more than twenty-five
area of food and agriculture, with a beta release in May 2016. We languages. The soft launch opened a period of testing and
see GACS as a hub linking user-oriented thesauri with feedback in preparation for the next phase of its development,
semantically more precise domain ontologies linking, in turn, to which will begin in circa October 2016. GACS Core Beta 3.1
datasets about food and agriculture, in order to make that data
presents a set of concepts that is considered to be fairly stable,
more interoperable and reusable
with URIs that are not expected to change (see an example of
Keywords—thesauri, ontologies, food, agriculture, GACS, concept in GACS in Fig. 1). Problems resulting from the
AGROVOC, CABT, NALT, Crop Ontology integration process, such as overlapping labels, have been
substantially fixed, though much detailed work remains to be
done, notably the specification of a common hierarchical
I. GLOBAL AGRICULTURAL CONCEPT SCHEME structure. During this test phase, implementers are encouraged
The Food and Agricultural Organization of the United to use GACS on an experimental basis and provide feedback.
Nations (FAO), CAB International (CABI), and the National
Agricultural Library of the USDA (NAL) have long
maintained separate thesauri about agriculture, food and related
topics -- the AGROVOC Concept Scheme1, CAB Thesaurus,
and NAL Thesaurus – for use in indexing their respective
bibliographic databases:: AGRIS (8 million records), CAB
Abstracts (8.3), and Agricola (5.2). the AGROVOC Concept
Scheme, CAB Thesaurus2, and NAL Thesaurus3. The thesauri
provide globally identified concepts for use in automated
indexing and retrieval, subject description, natural language
processing, and translation.
Having previously collaborated on mappings and common
classifications, the three organizations resolved in 2013 to
explore the feasibility of pooling their most frequently used
concepts into a jointly maintained Global Agricultural Concept
Scheme (GACS). GACS was seen as the first step towards
improving the coherence and interoperability of agricultural Fig. 1 A concept in GACS
data – a vision explored in a July 2015 workshop on
“Agrisemantics”4 , with support from the Gates Foundation,
In the next phase of development, the scope of GACS will
elaborated in the Chania Declaration 5 of May 2016, and
be broadened beyond the core. Concepts from some of the
pursued by an Agrisemantics Working Group that is forming
source thesauri that were not included in GACS Core may be
within the Research Data Alliance initiative.
given an id.agrisemantics.org URI in a GACS Extension to be
maintained by their original owners or, optionally, in
1
http://aims.fao.org/agrovoc collaboration. The notion of GACS Module anticipates a
2
http://www.cabi.org/cabthesaurus/
3
http://agclass.nal.usda.gov/
4
http://aims.fao.org/sites/default/files/Report_workshop_Agrisemantics.pdf
5 6
http://blog.agroknow.com/?p=5067 http://agrisemantics.org/gacs
longer-term need to devolve maintenance of distinct types of connected data elements across a diversity of cropwheat-
concepts, such as organisms or geographical names, to related datasets from databases and repositories along with
communities of experts. multi-media information, and relevant literature from main
bibliographic databases like AGRIS, CABI and NAL with the
II. SEMANTIC ASSETS FOR FOOD AND AGRICULTURE goal of improving food security.
Information relevant to food and agriculture encompasses The Agrisemantics vision points in two directions: on the
data collected on factors ranging from yield and climate to one hand, to turn GACS into a more extensive network of
demographics and markets., Information is presented in forms thesauri and concept schemes to ensure the appropriate
ranging from narrative texts (policy, technical, and scientific coverage for our domain of interest. In particular, we are going
documents) through structured datasets (empirical data). to test the notion of a GACS Extension on the example of
Information may be graphically visualized, e.g., plotted onto AGROVOC. On the other hand, we aim at establishing tools
timelines or maps, or plugged into models for nowcasting or and methodologies to connect GACS and its constellation of
for forecasting trends. All types of data, from the analytical to “extensions” to multiple domain-specific ontologies.
the empirical, are required for achieving sustainable food The first ontology we will be working with is the Crop
systems. Ontology [1], which supports data comparison and
Thesauri provide concepts for indicating the overall topic interpretation at a higher granularity by providing a means for
of information resources, usually semi-structured texts such as annotating data element with trait measurement method and
bibliographic abstracts, journal articles, but also videos and unit or scale. (See Fig. 2)
courseware. Empirical data is composed of data elements with
precise definitions at defined levels of granularity. Datasets are
typically serialized in formats specific to a particular software
application, and their individual data elements are named
within the context of that particular application.
Interoperability across datasets is hampered by the sheer effort
required to determine equivalences among differently named
elements, then to extract sets of comparable elements from a
diversity of applications and formats. Ontologies, focused set
of related concepts specified with precise definitions and
global identifiers, are increasingly used to “annotate” data.
However, ontologies too may embody ad-hoc semantics in
Fig. 2 Mapping from thesaurus to ontology
different degrees, and are usually totally disconnected from the
world of thesauri, so preventing a seamless access to “hard”
and soft data alike. More specifically, a wheat data element labeled with the code
“GW” in a phenotype dataset can be mapped to the general
III. LINKING THESAURI TO DATA VIA ONTOLOGIES concept "grain weight" as defined, and given global identity
(URI), in the CGIAR Crop Ontology7. The CO term ‘Grain
The more fuzzily defined, globally identified concepts of Weight’ can, in turn, be mapped to ‘Grain’ in AGROVOC and
general-purpose, search-oriented thesauri and concept schemes, GACS. More information can then be discovered through a
such as GACS, may be mapped to the more precisely defined, query system using this mapping that will return, aside from
globally identified, domain-specific, application-oriented datasets related to grain weight, references to published papers
ontologies and, from there, to locally defined data elements where grain weight was studied.
embedded in software-specific databases. An unbroken chain
may be formed linking the most general concepts to the most
specific data elements. Semantic authority control for data
elements facilitates the re-use of datasets, and links from ACKNOWLEDGMENTS
precise ontologies to search-oriented concepts facilitates the
discovery of those datasets. Special thanks to the GACS Working Group: Tom Baker,
Caterina Caracciolo, Anton Doroszenko, Lori Finch, Sujata
One path to data interoperability is to use appropriately Suri, and Osma Suominen.
defined ontologies – i.e., ontologies that not only enable the
extraction of data from a database (process often called “data REFERENCES
annotation”), but that can also situate data within the
[1] Rosemary S., Matteis L., Skofic M., Portugal A., McLaren G., Hyman G.,
appropriate "context" -- a modeled set of data about the time Arnaud E.: 2012. Bridging the phenotypic and genetic data useful for
and place of its collection along with any additional elements integrated breeding through a data annotation using the Crop Ontology
required for its correct interpretation. Another path is to place developed by the crop communities of practice. Frontiers in Physiology,
those ontologies in a network with other semantic assets, vol. 3
including the thesauri and concept schemes used to express the
“topicality” of information resources. Such an integration of
semantic assets may support, for example, an analysis of the
yield gap in sub-Saharan African countries by providing well- 7
http://www.cropontology.org