Beverage Graph: Connecting Data about Consumable Liquids Jessica Singer1 , Robert Warren1 1 Myra Analytics, Ottawa, Ontario Abstract We describe the design and ongoing update of a knowledge graph and its assorted ontologies which describe beverages and their commercial availability as products. Previous approaches have focused on beverage types or brands with limited support for tracing the product’s content or identifying the specific product being consumed by a person. This inability to link the product and source has until now been a hindrance to nutritional studies and food traceability systems. Keywords Beverage Ontologies, Beverage Products, Beer Products, Juice, Consumable Liquids 1. Introduction The Beverage Graph is an ontology-backed, knowledge graph focused on beverages, their styles, brands and the packaging in which they are commercially available. Initially created to support commercial brewing activities, it has been made available for public use and to encourage linking to other knowledge graphs. It is available online through data dumps at https://rdf.ag/ or through a sparql server at https://rdf.ag/sparql. URIs are dereferencable and available in all RDF serializations through HTTP content negotiation. Previous projects in this area have primarily been simple data dumps, without strong schema or ontological structures. Online web sites dedicated to beverage reviews occasionally have an external API to retrieve data but lack support for shared identifiers or linked data principles[1]. Bev-On1 was an early OWL ontology attempt at building a structured representation of bever- ages but it is now unmaintained and an early knowledge graph project BevGraph2 is no longer active. A missing element within all current beverage datasets is the relationship between the product that is actually handled by the consumer and the substance within the product: datasets seem to focus exclusively on one or the other. Most nutritional and dietary datasets themselves will reference a specified measured serving of a substance rather than that of a commercial product. Product nutritional labelling will itself reference a measured serving, sometimes disconnected from the container capacity, and may only be a statistical approximation of the FOIS 2021 Ontology Showcase, held at FOIS 2021 - 12th International Conference on Formal Ontology in Information Systems, September 13-17, 2021, Bolzano, Italy " singer@myraanalytics.ca (J. Singer); warren@myranalytics.ca (R. Warren)  0000-0002-7066-1141 (R. Warren) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings CEUR Workshop Proceedings (CEUR-WS.org) http://ceur-ws.org ISSN 1613-0073 1 http://rdfs.co/bevon/latest/html 2 https://github.com/bevgraph generic substance rather than an empirical measurement. The separate ontological structures representing substances, containers, commercial products and their individual instances allow the graph to be properly integrate with both empirically and statistically approximated nu- tritional datasets. As most consumers interact with nutritional substances through products, this ontological bridge will enable better end-user reporting of consumption which will lead to better nutritional analysis and recommendation. 2. Design Knowledge Graph, vocabulary, schema, taxonomy and ontology are all terms that have come to be used interchangeably in the literature, causing no small amount of confusion. Beverage Graph is meant to be used as a RDFS/OWL ontologically-based Knowledge Graph capable of integrating with as much of the food supply chain as possible. It currently numbers over 50M triples, growing daily and available as a data dump at https://rdf.ag or through a Sparql endpoint at https://rdf.ag/sparql. All Beverage Graph URIs are de-referenceable and accessible is most data formats through HTTP Content Negotiation. The core of Beverage Graph relies heavily on the schema.org and GS1 vocabularies. Schema.org[2] is arguably one of the most successful RDFS web vocabularies currently in use. It provides support for store inventory recording, commercial offering and product variant enumeration. While an official OWL version is available on experimental basis, we simply type the relevant terms and properties as OWL entities. GS1 Global’s Webvoc[3] is similarly available as an RDFS vocabulary that we augment using OWL classes. The GS1 Webvoc has its roots in commercial logistics and product management, providing support for the labeling of the product, branding and it’s identification for inventory purposes. GS1 Webvoc also provides the gs1:packaging property and gs1:PackagingDetails class which permits the creation of standardized package descriptions including their dimension and weight. Currently, neither vocabulary provides a satisfactory solutions for “compound packaging” for bundled containers. We resolve this issue by having intermediate packaging listing parent item and count until such a time as a standardized solution be made available. The largest issue in aligning these two vocabularies was the resolution of what a product is, as represented in Figure 1. We understand that Beer, Porter Beers and that a (hypothetical) Porter Beer brewed by ACME exist as facts, but that pragmatically, ACME Porter Beer can only really exist within a container. Furthermore, there is more than one size of container (variant) and each physical container is filled from beer from a specific lot (beer batch). The specific arrangement of Figure 1 leverages the strengths of both schema.org and GS1 vocabularies to represent all aspects of a beverage product. The actual contents of beverages is represented using the Beer[4] and FoodOn[5] ontologies which gives the Beverage Graph coverage for Beers, Ciders, Meads, Juices and “Flavored” Juice Drinks, with support for coffees, teas, wines and hard liquors to come at a later date. Even in the case of untreated spring water, beverages are created through a process that transforms ingredients into a product. The design of the Beverage Graph allows the use of multiple ontologies to discover these processes. As an example, a query of the Porter class will reveal linkages to the Hops[6] ontology which lists Golding hops as a common ingredient to the 473ml Can rdfs:subClassOf gs1:package Traceability Beer Ontology f Of tO f f O O ss an ari ss ss a ACME Porter ACME Porter in ACME Porter in 473ml Cl la la beer:Porter ub ACME Porter sV a:i bC bC s In 473ml Can 473ml Can Lot 23 Can Lot 23 Serial 5665 fs: m su su rd e ch s: s: s f f rd rd rdfs:subClassOf rdf:type rdf:type rdf:type rdf:type rdf:type rdf:type rdf:type rdf:type beer:Beer schema:ProductGroup schema:Product schema:Product schema:IndividualProduct FOODON1260 gs1:Beverage / gs1:Product Figure 1: A fictitious ACME Porter is represented as modeled within the graph using the Beer, GS1 and schema.org ontologies. Note that we can readily link this data to specific lots or instances to deal with beverage recalls and traceability. beer style. Similarly for fruit juices, we leverage the FoodOn ontology to ontologically reveal ingredients, as in Figure 2. gs1:Beverage Monte Lirio schema:ProductGroup (Uninstanitated) rdf:type owl:oneOf Of rom Dole® pineapple juice ss pineapple juice 7551c5ae-46c5-4d0f-a09d- C l a (unsweetened) e d F 000 pineapple plant ub v ri 00 1 acc9dfffc8bb s:s de RO_0 NCBITaxon_4615 rdf FOODON_03305240 Figure 2: A similar instantiation of a pineapple juice, using FoodOn pineapple juice classes. Note FoodOn’s use of the Relation ontology and a possible expansion to take into account the specific varietal grown for juicing purposes. The Dole®brand pineapple juice product group is a subclass of the FoodOn pineapple juice class which references its source fruit. It is evident that the structure has enough flexibility to reference the specific cultivar instead of a generic pineapple plant. Figures 1 and 2 are simplifications of the data available within the graph and do not represent properties such as packaging, manufacturer, brand and product description and location information. The beverage graph uses the W3C Provenance [7] ontology is to trace sources and beverage processes, the OGC Time [8] ontology provides temporal annotations, the SKOS[9] vocabulary used for descriptions and the OGC GeoSparql[10] vocabulary provides geolocation information. We deliberately choose these mature, well engineered vocabularies over simpler solutions to better support the complexity of the real world data being represented. Because several commercial sources are used to update the Beverage Graph, Entity resolution is an important process due to the overlap between commercial data sources. The graph nature of the data provides a ready made structure for a statistical record linkage[11] model to be build and GeoSparql containment properties provides a quick means of obtaining coarse location matching when combined with the GeoNames[12] RDF dataset. When any two entities are determined to be the same, the SKOS-XL[13] vocabulary is used to convert the most recent entity node into a skos-xl:Label node which points to the authoritative entity. This approach preserves the original data provenance and allows us to “walk back” erroneous merges if needed. We note with disappointment that vocabulary reuse seems to be a “do as I say and not as I do” principle and that similar properties are often re-implemented. Concurrently, few graph databases provide the facilities, or are configured, to make use of ontological equivalencies when querying data. For this reason, the Beverage Graph often contains redundant properties in order to make data consumption as simple as possible. A small series of ontological statements3 is also maintained here as a means of aligning temporal statements between PROV-O, Time and schema.org as well as documenting equivalencies between common properties such as gs1:organizationName, schema:name and foaf:name. The data can be consumed with or without these ontological axioms. 3. Discussion The construction of the graph highlighted the complexities of commercial data management, the benefits of ontological backing and the complexities of integrating different ontological backed datasets. In acquiring external data, commercial API design reflect the needs and views of their owners which can result in unexpected data representations. A product variant should reference product instances that vary on explicitly defined, specific dimensions. Consumer facing API will often return product variants based on undefined conditions which may include similar packaging type, volume or store location which makes automated integration difficult. Issue in entity resolutions have highlighted the usefulness of generic terms such as the Geonames 6295630 “planet earth” entity as a generic stand-in for the locality of a brewery as this information is not always available. This avoids the sort of issues that would occur in relational databases with null values, in that the data is always logically consistent and schematically complete even through it is factually imprecise. Operationally, this greatly reduces the complexity of entity resolution queries as fewer exceptions must be handled. Ontology quality literature focuses on ontological completeness, logical consistency and structural issues[14] that are not always relevant to the actual operational use of the ontology itself. Some ontologists view “enumerative completeness” as an (unrealistic) primary objective, other rely on a reasoner reporting logical consistency and still others insist on over-constrained ontological constructs. Again, from an end-user perspective design consistency is the most important aspect through current tools and approaches may not enforce it. Our concern with ontology reuse is poor high-level documentation and the lack of consistency (or curation) in the ontological structures used across instances. FoodOn as an example is a collaborative project curated by multiple people and one that has chosen to import non-ontological datasets in bulk. Coordination across multiple designers can be difficult without close coordination and the large amounts of imported terms can make it difficult to identify the curated parts of an ontology and those still under review. 3 https://rdf.ag/o/BeverageGraph In this case, the consequence is that there are two mechanisms for defining a fruit juice and some confusion as to whether it derives from the plant or the fruit. Both mechanisms are ontologically consistent, but it makes querying the FoodOn ontology more difficult and potentially duplicates terms. Much has been made of “the code being the documentation” but at scale, ontological integration must be done programmatically and these issues will not be discovered without a high level overview of how specific real world objects are modeled. Too often, “suggestions” are made about the proper use of an ontology when it should be clearly specified. While well intentioned, the cost of flexibility in solving too many problems is a series of poor solutions instead of one good one. A parallel can be made with the early experiences of the Dublin Core standards which failed to provide an official structure for citing a bibliographic work in RDF while simultaneously publishing a dcterm:bibliographicCitation term. The only way known to the authors to use Dublin Core coherently is the Bibo[15] ontology which provides a minimal structure to dcterms and which is being supplanted by the SPAR ontologies. To this end, we wish to highlight the requirement for term labels, term descriptions and ontological object narratives that can explain an ontology at a high-level. Too often, we read ontology documentation that focuses on itself rather than on its uses and without commenting on the instantiation of classes or how to solve actual problems within the domain. Lastly, the actual semantic power of OWL2 ontologies is immense which, in a parallel to software design, can tempt designers to use overly complex technical solutions to simple problems. Ontology end users that wish to solve their own problem will naturally gravitate to the simplest, most documented solutions as it has the lowest cost of implementation. 4. Applications Beyond its initial focus on supporting beer brewers, the Beverage Graph is flexible in its design to support additional information as to the product, the generic beverage and detailed packaging information, including whether the packaging is recyclable and its composition. This opens the door to low hanging fruit studies on the prevalence of reusable packaging versus recyclable packaging and their relative volumes within specific markets. As other datasets also report the nutritional / calorimetric content of products, it becomes possible to quickly generate a partial but accurate nutritional profile of a person’s diet simply by scanning the barcode located on their beverage as they consume it. A direct application is in the resolution of the product content on an ontological basis based on its nomenclature. Consider the case of “Cider” which can mean an alcoholic beverage from fermenting apples or unfiltered apple juice or (confusingly), an “Non-alcoholic Cider” sold in the context of alcoholic beverages that contains no or only trace amounts of alcohol. As the Beverage graph reports the commercially mandated alcohol by volume (beer:abvValue / gs1:percentageOfAlcoholByVolume) for beverages an appropriate determination can be made. The addition of linkages to legislative ontologies may be most interesting from an analytical viewpoint as legislation is heavily dependant on context and local culture. A direct example is the contrast between beer, an alcoholic beverage, and vanilla extract, a baking ingredient. While beer may contain alcohol, it is not mandated to and may actually be non-alcoholic. Vanilla extract is mandated to contain a certain percentage of alcohol in order to be considered an extract but is regulated as a food and not an alcoholic beverage. Legislatively, their intended use dictates the regulatory regime under which they are controlled. From a public health perspective, it is their compositional properties that will dictate their capacity to be abused. In closing, the Beverage Graph provides a core from which other datasets and ontology can link against or extract a working set. It is freely available, well labeled and aims to be as open as possible. 5. Conclusion The Beverage Graph is a maintained collection of instances and ontological classes that document commercially available beverages, their contents and their packaging. It’s construction allows for integrations with other external data sets and lends itself to dietary, commercial and food production analysis. As additional upstream data sources are acquired, the graph will be expanded to more brands and beverage types such as coffee, tea and hard liquors. References [1] C. Bizer, T. Heath, K. Idehen, T. Berners-Lee, Linked data on the web (ldow2008), in: Proceedings of the 17th international conference on World Wide Web, 2008, pp. 1265–1266. [2] R. V. Guha, D. Brickley, S. Macbeth, Schema.org: Evolution of structured data on the web, Commun. ACM 59 (2016) 44–51. doi:10.1145/2844544. [3] GS1 Web Vocabulary Standard, 1.6.1, GS1, 2015. URL: https://www.gs1.org/docs/ gs1-smartsearch/GS1_Vocabulary_Standard.pdf. [4] R. Warren, J. Singer, Beer ontology, 2021. URL: https://doi.org/10.5281/zenodo.4672337. [5] D. M. Dooley, E. J. Griffiths, et al., Foodon: a harmonized food ontology to increase global food traceability, quality control and data integration, npj Science of Food 2 (2018) 23. [6] R. Warren, J. Singer, Hops ontology, 2021. URL: https://doi.org/10.5281/zenodo.4672692. [7] Prov-o: The PROV Ontology, 2013. URL: https://www.w3.org/TR/prov-o/. [8] S. Cox, C. Little, Time Ontology in OWL, 2013. URL: https://www.w3.org/TR/owl-time/. [9] D. Brickley, A. Miles, SKOS Core Vocabulary Specification, W3C Working Draft, W3C, 2005. URL: http://www.w3.org/TR/swbp-skos-core-spec/. [10] OGC GeoSPARQL - A Geographic Query Language for RDF Data, 2012. URL: http://www. opengis.net/doc/IS/geosparql/1.0. [11] W. E. Winkler, Advanced methods for record linkage, Technical Report rr945, Statistical Research Division, U.S. Bureau of the Census, 1994. [12] M. Wick, Geonames ontology, 2015. URL: http://www.geonames.org/about.html. [13] A. Miles, S. Bechhofer, SKOS-XL Simple Knowledge Organization System eXtension for Labels, Technical Report, 2009. URL: https://www.w3.org/TR/skos-reference/skos-xl.html. [14] S. M. Gurk, C. Abela, J. Debattista, Towards ontology quality assessment, in: MEP- DaW/LDQ@ESWC, 2017. [15] B. D’Arcus, F. Giasson, Bibliographic ontology specification, 2009. URL: https:// bibliontology.com/specification.html.