=Paper= {{Paper |id=Vol-3637/paper10 |storemode=property |title=Multilingual Labels for FoodOn |pdfUrl=https://ceur-ws.org/Vol-3637/paper10.pdf |volume=Vol-3637 |authors=Katherine Thornton,Kenneth Seals-Nutt,Mika Matsuzaki |dblpUrl=https://dblp.org/rec/conf/jowo/ThorntonSM23 }} ==Multilingual Labels for FoodOn== https://ceur-ws.org/Vol-3637/paper10.pdf
                                Multilingual Labels for FoodOn
                                Katherine Thornton1 , Kenneth Seals-Nutt2 and Mika Matsuzaki3
                                1
                                  WikiFCD Collaborative, Olympia, WA, USA
                                2
                                  WikiFCD Collaborative, New York, New York, USA
                                3
                                  Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, Baltimore, MD 21205, United States


                                                                         Abstract
                                                                         Data sets involving food frequently are of interest across multiple cultural contexts. Many ontologies and
                                                                         vocabularies related to food are monolingual. Enriching ontologies and vocabularies with multilingual
                                                                         data is useful for extending access to a resource to wider audiences. The WikiFCD knowledge base of
                                                                         FAIR food composition data includes mappings to Wikidata and mappings to FoodOn, a widely-used
                                                                         ontology for food. In this paper we describe a sample of six food items from WikiFCD as the basis for an
                                                                         exploration of strategies for sourcing multilingual labels from projects of the Wikimedia Foundation. We
                                                                         present five subgraphs of data related to food items sourced from Wikidata, Wikipedias and WikiFCD.
                                                                         We describe the advantages and disadvantages of sourcing labels from each subgraph. Each subgraph
                                                                         can be quickly retrieved from the Wikidata Query Service SPARQL endpoint or the SPARQL endpoint
                                                                         of WikiFCD. These strategies could be adapted to other domains seeking to enrich their data with
                                                                         multilingual labels.

                                                                         Keywords
                                                                         Food Composition, Nutri-informatics, Wikibase, Wikidata




                                Introduction
                                WikiFCD is a knowledge base of food composition data. After importing more than three
                                hundred thousand food items and their associated food composition measurements from the
                                United States Department of Agriculture’s Food Data Central (FDC), we began to add data from
                                additional countries. We have added more than three thousand food items to the knowledge
                                base along with measurements of nutritional components of those foods from published sources
                                [1]. The purpose of this knowledge base is to provide web access to published data about
                                food composition related to foods that are not found in FDC. For example, researchers have
                                documented gaps in coverage of plant foods and foods from international cuisines in FDC [2, 3].
                                WikiFCD contains food composition information for many plant foods that are not included
                                in FDC. Plant foods contain phytonutrients that support human health [4, 5, 6]. The nutrients
                                found in these minimally-processed plant foods are within the range effectively utilized by

                                IFOW 2023: Integrated Food Ontology Workshop, 9th Joint Ontology Workshops (JOWO 2023), co-located with FOIS
                                2023, 19-20 July, 2023, Sherbrooke, Québec, Canada
                                ∗
                                    Corresponding author.
                                †
                                     These authors contributed equally.
                                Envelope-Open katherine.thornton@yale.edu (K. Thornton); kenneth@seals-nutt.com (K. Seals-Nutt); mmatsuz2@jhu.edu
                                (M. Matsuzaki)
                                Orcid 0000-0002-4499-0451 (K. Thornton); 0000-0002-5926-9245 (K. Seals-Nutt); 0000-0002-7020-3757 (M. Matsuzaki)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
human metabolism, meaning we can digest and make use of these nutrients more easily [7].
Providing the food composition data for these foods allows people who track their food intake
to get a more accurate estimate of their nutritional intake.
   We created this knowledge base using Wikibase, and it is now part of the ecosystem of
Wikibases related to Wikidata. The ecosystem of Wikibases includes all instances of Wikibase
that can be federated with the Wikidata Query Service SPARQL endpoint [8]. Wikibase is
software infrastructure for creating knowledge bases [9]. An advantage to creating a knowledge
base is that it is possible to connect the data in the knowledge base with other data on the web.
We created mappings to Wikidata for some classes of items and for some properties in WikiFCD.
These mappings allow us to combine our data with subsets of data from Wikidata, increasing
the complexity and types of questions we can ask of our data. We have integrated the FoodOn
ontology into WikiFCD so that other projects that make use of FoodOn can also make use of
data from WikiFCD [10].
   The FoodOn mappings in WikiFCD allow us to connect these food items to their corresponding
food items in Wikidata. This connection, stored in WikiFCD, opens up several pathways to
source additional information about these food items. We can think of each of the potential
graphs of food-related data in Wikidata as Wikidata subsets [11]. In this paper we describe
five subsets of data from WikiFCD and Wikidata that can be used to source labels for these
food items in languages other than English. Multilingual labels can then be used for search, in
application development, or for other purposes. For projects that make use of FoodOn, this data
set of food items could be an option for sourcing multilingual labels to expand the audiences
for other projects.
   Data in Wikidata and WikiFCD are FAIR data. The FORCE 11 community published the
FAIR data principles in 2014 to promote data publishing practices that would support open
science and open access [12]. FAIR is an acronym for Findable, Accessible, Interoperable and
Reusable, four qualities of published data that promote data sharing. By populating WikiFCD
with data from published sources, we made food composition data findable and accessible on
the web via the WikiFCD SPARQL endpoint1 . Every food item in Wikidata and in WikiFCD
has a QID, which is a Unique Resource Identifier (URI). This means that every piece of data in
our dataset has a machine-actionable URI. The mapping statements we added to our food items
that connect them to Wikidata serve as our bridges with the web of linked open data, making
our data interoperable with many additional datasets. Data in Wikidata and in WikiFCD are
available under the terms of the Creative Commons Zero license2 . Our selection of CC0 as the
license means that anyone can freely reuse the data. These aspects of publishing data in the
Wikidata and WikiFCD knowledge bases fulfills the most complete degree of FAIRness, level F,
“FAIR data, Open Access, Functionally Linked”, as described in [13]. We welcome others who
are interested in reusing data from WikiFCD.




1
    https://wikifcd.wikibase.cloud/query/
2
    https://creativecommons.org/share-your-work/public-domain/cc0/
1. The WikiFCD Knowledge Base
We implemented the WikiFCD knowledge base using Wikibase, an extension of the MediaWiki
software that was created for the Wikidata knowledge base3 . We include several subsets of
data from Wikidata in WikiFCD in order to establish useful mappings between the WikiFCD
system and Wikidata itself. These mappings allow us to ask questions of the WikiFCD dataset
in combination with data from Wikidata, increasing the breadth and complexity of the queries.
   Each item in WikiFCD has a unique identifier, or Qid. The Qid for this food item is Q135853.
Most of the statements on the food items in WikiFCD provide information about the amount of
a nutrient in a 100 gram sample of the food item. Each statement also has room for a reference.
References record the source for the statement. In WikiFCD, most reference are for a food
composition table.
   Researchers have already created knowledge graphs related to food. Fore example, FoodKG
is a knowledge graph of food data [14]. WikiFCD centers food composition data for plant
foods, specifically unprocessed or minimally processed plant foods. One difference between
WikiFCD and FoodKG is that FoodKG does not include mappings to Wikidata. When Food KG
was created there were not many food items in Wikidata. Another difference between Food KG
and WikiFCD is that WikiFCD contains multilingual data.
   Open Food Facts is a database of food composition data that is open to community data
contribution and curation [15]. Open Food Facts makes use of the UPC codes on food packag-
ing. While WikiFCD does contain data related to some packaged foods, we emphasize fruits,
vegetables, grains and other foods that are often sold without packaging.


2. Identifying Food Items in Wikidata
We used several strategies to identify candidate queries that we would use to extract a subset of
food items from Wikidata. Our aim was to find a subset of food items to match with FoodOn
identifiers. We wrote a SPARQL query for food items in Wikidata with a corresponding article
in English Wikipedia. We thought that the fact that Wikipedians had already written an article
about the food might be an indicator of the popularity of the food. While this was a useful set
of food items, we had to remove food brands from the query. We rejected the article query
results as unsuitable and tried another approach. We wrote a SPARQL query for food items
in Wikidata with a statement using Property ‘‘USDA NDB number” which is an identifier for
a food item in the United States Department of Agriculture National Nutrient Database. Due
to the fact that this identifier would only be found on food items in the Wikidata knowledge
base, and would not include food brands, we determined that this would be a more suitable
subset of food items for matching with food items from FoodOn. Our subgraph, Wikidata food
items with a USDA NDB identifier, consists of one thousand three hundred and ninety-five food
items. Due to the fact that some of these food items have multiple USDA NDB identifiers, the
subgraph contains five hundred thirteen unique food items. This seemed like a suitable corpus
of food items for which we could identify FoodOn term matches.


3
    https://wikiba.se/
Figure 1: Layout of result data produced by LexMapr


3. Matching Food Items to FoodOn
When an organization wants to integrate FoodOn with their data, they need to map foods
from their data to FoodOn identifiers. Researchers have found the LexMapr application to
provide useful matches for food items and FoodOn identifiers [16, 17, 18]. LexMapr is a Django
application that accepts CSV files as input and returns a file with information about matches
and potential matches of submitted food labels with FoodOn identifiers. We used the LexMapr
application to provide candidate matches between food items from our NDB number subset of
Wikidata and FoodOn identifiers4 . LexMapr is a Python-based Django application that reads a
list of food item labels and then returns the following columns: Sample_Desc, Processed_Sample,
Processed_Sample (With Scientific Name), Matched_Components, Match_Status(Macro Level),
and Third Party Classification, as seen in Figure 1.
   For some of the food items LexMapr generated full matches. For those items for which
LexMapr generated component matches, we manually reviewed the results. Some of the
component matches turned out to contain matches and some did not.


4. Adding Food Items Subset from Wikidata to WikiFCD
We wrote a bot script using WikidataIntegrator, a Python library, to create new food items
in WikiFCD for the food items in the Wikidata subset. Numerous research groups use Wiki-
dataIntegrator to add data to Wikidata and to Wikibases [19]. We designed the bot script to add
statements with links back to the corresponding food items in Wikidata. This set of mappings is
useful for writing SPARQL queries to ask questions about the data in WikiFCD in combination
with the data from Wikidata.


5. Leveraging Multilingual Content in Wikimedia Projects
FoodOn provides food names in English as well as alternative names in English. The FoodOn
curation team would like to include multilingual content in order to make FoodOn more useful
for people working in other linguistic contexts. Multilingual data is expensive to create and
difficult to source. The purpose of this paper is to compare and contrast five strategies for
extracting multilingual labels for food items from projects of the Wikimedia Foundation. The
4
    https://lexmapr.cidgoh.ca/user-guide/
mappings that we created between items in WikiFCD and Wikidata allow us to write federated
SPARQL queries that combine data from WikiFCD with data from Wikidata. Wikidata is a
multilingual knowledge base with support for more than three hundred human languages
[20]. We identify five graphs of multilingual data from the Wikidata knowledge base that
the FoodOn curation team may evaluate for suitability for integration into FoodOn. The five
graphs are structurally distinct in that they are modeled differently in the knowledge base, and
partially overlapping. In this way the subgraphs can be compared with one another to provide
an agreement score for various foods.
   In WikiFCD we curate multilingual labels for food items when we find them in food composi-
tion table sources. An advantage of this approach is that all multilingual labels have provenance
information and include, at the statement level, references to the source from which we gathered
the data. Using our subgraph of food items in WikiFCD for which we have created mappings
to Wikidata, we then write SPARQL queries to extract multilingual labels for the food items
from Wikidata. An advantage to writing these SPARQL queries is that we can run them again
in the future and check if additional data has been added to Wikidata. With more than twelve
thousand active editors each month contributing to Wikidata, we anticipate that the number of
multilingual labels in Wikidata will increase over time.

5.1. Article Names per Language Version of Wikipedia
The Wikidata community has created items in Wikidata for each of the articles across the
different language versions of Wikipedia [21]. There are currently more than three hundred
active Wikipedias [22]. Wikidata contains information about mappings between items and the
sitelinks to all of the corresponding articles across the different language versions of Wikipedia.
If we look at a food item such as kale, the Wikidata item for which is Q45989, we can use a
SPARQL query on the Wikidata Query Service to find links to all Wikipedias that have an article
about kale. Currently sixty-two Wikipedias have an article about kale. Each of these Wikipedia
articles has a title, and from these article titles, we can get a sense of what kale is called in these
different languages.

5.2. Multilingual Labels from Wikidata Food Items
Another option for finding multilingual data is to consult the labels on Wikidata items. The
Wikidata data model was designed with multilingual content in mind [21]. More than three
hundred human languages are supported in Wikidata [20]. If we return to the example of kale,
the Wikidata item for kale currently has labels in seventy-nine languages, a sample of which can
be seen in Figure 2. This means that there are seventeen additional labels beyond the number
of articles across the different Wikipedia language versions. We can compare these labels with
those we found from the Wikipedias to check for consistency and accuracy.

5.3. Common Names from Taxon Items in Wikidata
Wikidata has a property “Taxon common name” (P1843) that can be used on taxon items to list
common names for the organism. The common names listed using this property are another
source of multilingual labels for food items. The property has a required qualifier which indicates
Figure 2: Wikidata item for ‘kale’ showing some of the labels in a variety of languages




Figure 3: Using the Ordia application to search for ’oregano’ in Wikidata’s L namespace


that the language of the common name must also be provided in statements using this property.
In this way we not only know the label, but also the language in which it is found.

5.4. Wikidata Lexemes
Wikidata introduced support for creating and editing lexemes in 2018 [23]. The Wikidata
community creates lexemes, forms and senses in the L namespace [24]. Editors have already
added more than half a million lexical entries [25]. The words described in the L namespace
include words related to food. A fourth pathway for sourcing multilingual labels related to
foods is to leverage Wikidata’s lexeme data.
   Editors use the property ‘item for this sense’ (P5137) to connect senses to items in Wikidata.
In Figure 3 we see the result of searching for ‘oregano’ in the Ordia application [26]. In Figure 4
we see the lexeme ‘oregano’ (L324776 ) in English. Under the first sense, which has the identifier
‘L324776-S1’, we see a statement that uses ‘item for this sense’ (P5137) connecting the sense to
Figure 4: Lexeme L324776 with L324776-S1 and the statement that connects to Q92581152, the Wikidata
item for ’oregano’


the Wikidata item for oregano.


6. Food Items and Multilingual Labels
We selected six food items as a demonstration data set: purslane, kale, apple, rice, oregano and
chocolate. We wrote SPARQL queries to describe subsets of Wikidata or WikiFCD for each
technique for sourcing multilingual labels described above. This sample allows us to compare
and contrast the suitability of each subgraph of multilingual labels, and to identify advantages
and disadvantages of the different techniques. We also created a web application that allows
users to browse the dataset, available here5 .
                             Table 1: Count of labels from 3 Wikidata subgraphs
              food item      Count WD labels Count WP articles Count Lexeme Senses
              purslane                 2              40                    2
               oregano               25               95                    3
                 kale                 64               62                   4
                apple                142              182                  37
                 rice                130              179                  39
              chocolate              118              144                  24


   In Table 1 we list the count of labels that we found in three of the subgraphs: Wikidata labels,
titles of articles in different language versions of Wikipedia, and lexeme senses. For the Wikidata
labels subgraphs we find that apple, rice, and chocolate have more labels than kale, oregano or
purslane. This distribution is likely due to the popularity of these foods. We anticipate that over
time more labels will be added for all of these food items. It is interesting that Wikidata editors
have not yet added labels from some of these language versions of Wikipedia to Wikidata.


5
    https://wikifcd.k2.services/multi-lingual-table
   In terms of the historical layers of information in Wikidata, creating items for all Wikipedia
articles and adding sitelinks between Wikipedias and Wikidata was a very early step in adding
data to Wikidata [21]. The order in which data layers were added to Wikidata is likely a factor
in why there are fewer lexeme senses that have been connected to Wikidata items. The Lexeme
namespace was added to Wikidata in 2018, six years after Wikidata launched, so editors have
had less time to contribute data [23].
     Table 2: Count of labels from taxon and taxon common name subgraph of Wikidata
           taxon name         Count taxon common name Count distinct languages
        Portulaca oleracea                  172                             57
        Origanum vulgare                    148                             59
        Brassica oleracea                    22                             15
         Malus domestica                     41                             41
          Oryza sativa                       17                              4
        Theobroma cacao                      5                               2

   In Table 2 we see counts for the number of statements using the property “taxon common
name” on items for the taxa related to the food items from our sample set. The taxon to food
item relationships are: Portulaca oleracea / purslane, Origanum vulgare / oregano, Brassica
oleracea / kale, Malus domestica / apple, Oryza sativa / rice, and Theobroma cacao / chocolate.
We note that the relationships between taxa and food items are not consistently modeled in
Wikidata. For example, Brassica oleracea encompasses many cultivars of food items. We were
not able to find a closer match for kale in Wikidata at the time of writing.

6.1. Wikidata Labels Subgraph
An advantage of sourcing translations from the subgraph that consists of Wikidata food items
and their labels is that many editors see these labels. It is likely that this subgraph will grow the
most quickly out of the five discussed in this paper in the near term, due to the ease of adding
labels to the Wikidata knowledge base. Adding labels in additional languages is work that some
editors monitor [20]. Labels, descriptions and aliases are parts of Wikidata items that receive
frequent editor attention [27]. A disadvantage of sourcing translations from this subgraph is
that labels do not have references, thus we do not know the provenance of the labels. There
is some inconsistency in the subgraph of food items that they are sometimes confused with
taxon items. For example, in Figure 5, we see that the Wikidata item for ‘kale’ is both a food
item (vegetable) as well as the taxon Brassica oleracea var. sabellica. It would be preferable to
have two separate Wikidata items, one for the food item and one for the taxon as is the case for
most food items.
   While there are two taxon common names listed the Wikidata item for ‘kale’ (Q45989), there
are more taxon common names related to ‘kale’ listed on the Wikidata item for ‘Brassica oleracea’
(Q146212), thus we chose to use the item ‘Brassica oleracea’ for the count of multilingual labels
in Table 2.
   A disadvantage of sourcing translations from the subgraph that consists of Wikidata food
items and their labels are that there are no references for labels. Labels, descriptions, and aliases
Figure 5: Wikidata item for ‘kale’ is both a taxon and a vegetable


are structured differently than statements on items in Wikidata. So if we are curious about a
particular label, we can’t turn to a reference for more information. The inconsistency in data
modeling practices across editors results in some food items to also describe taxa is another
disadvantage. There should be separate Wikidata items for food items and taxa. This will likely
be cleaned up by the Wikidata community, but it will require time. Looking at the labels for
‘apple’ from this subgraph, we find labels in Faroese, ‘súrepli’, and Navaho, ‘Bilasáana’ among
the 142 labels available in Wikidata.

6.2. Article Titles from Sitelinks to Wikipedias
Sourcing labels from the subgraph of food items and their sitelinks to different language versions
of Wikipedia involves the connections between Wikidata items and Wikipedia articles, if the
food item is well-known, there may be articles in many different language versions of Wikipedia,
as seen in Figure 6. Using this subgraph has the advantage of precise food item matches to
articles about those foods, thus the article title is usually a reliable source for multilingual data.
Article titles are high-visibility in that they are frequently seen by readers, and thus errors are
corrected more quickly. Sourcing labels from the subgraph of food items and their sitelinks
with different language versions of Wikipedia has the disadvantage that new article titles get
added when new articles are written, which is not a fast process. This contrasts with the ease
of adding additional labels to Wikidata. Rather than typing a string and pressing save to add a
label in Wikidata, more effort is required for an editor to create a new article in Wikipedia, the
title of which would then become an additional multilingual label candidate.

6.3. Taxon Common Name Subgraph
Some of the advantages of sourcing translations from the subgraph that consists of Wikidata
taxon items and their taxon common names include the fact that editors can contribute references
on these statements, and that this subgraph is likely to grow over time. As of October, 2022
Figure 6: Screenshot of the visualization of labels for ‘apple’ from article titles from Wikipedia language
versions in our webapp available at https://wikifcd.k2.services/multi-lingual-table.


there are more than 780,000 uses of P1843 ‘taxon common name’.
   Sourcing labels from the subgraph of Wikidata taxon items and their taxon common names
has the disadvantage that sometimes when multiple common names are provided per language,
it will require manual review to determine which of this would be appropriate for consideration
from FoodOn. In the cases where multiple common names are provided for a language, it is
not always clear if the labels provided all refer to the same species. For example, in Figure 7
we see that several common names are listed on the Wikidata item for Origanum vulgare are
from Spanish. Without further research it is not clear which of these might be the best fit for
FoodOn. Some of them could even be for specific subspecies of Origanum vulgare, and may
need to be moved to the subspecies items as they are added to Wikidata.
   We anticipate that more Wikidata editors will continue to contribute taxon common name
statements, and that this subgraph will continue to grow over time.

6.4. Lexeme Senses Subgraph
The advantages of sourcing multilingual labels from the subgraph of Wikidata Lexeme Senses
that connect to food items include the fact that there is room for references on these statements,
and that the subgraph is likely to grow as more editors contribute to the L namespace. In Figure
8, we can see that there are a small number of labels that we can source for ’apple’ from the
L namespace. The Wikidata community creates Lexeme challenges periodically to encourage
participation in the L namespace. For example, a recent challenge related to vegetables resulted
in the addition of more of the connections between food items and lexeme senses 6 . With the
creation of the Abstract Wikipedia community, we believe it is likely that more editors will be

6
    https://dicare.toolforge.org/lexemes/challenge.php?id=64
Figure 7: Multiple common names from Spanish listed for Origanum vulgare




Figure 8: Labels for apple from the Lexeme namespace in Wikidata


drawn to contribute to the L namespace so that more data from the lexeme namespace will be
available for reuse by Abstract Wikipedia [28].

6.5. WikiFCD Common Names Subgraph
In WikiFCD we created a property, “common name” (P76), that we use to record common names
in multiple languages. We add these statements to WikiFCD whenever additional food item
names are provided in the source food composition tables. An advantage of sourcing labels
from WikiFCD is that all common name statements have a reference back to a publication.
For example, one of the food composition tables we integrated into WikiFCD is the SMILING
Food composition table for Vietnam. Food item labels are provided by the team of authors who
prepared the food composition measurements for each food item. In this food composition table
both the food item name in Vietnamese and the food item name in English are provided.
   Currently there are five hundred sixty three food items in this subgraph. A disadvantage of
sourcing labels from WikiFCD is that the corpus grows more slowly than the Wikidata subgraphs
because a smaller number of people contribute to WikiFCD than Wikidata. We anticipate that
as more people contribute data to WikiFCD this subgraph will grow more quickly. We chose to
exclude some of the labels that the SPARQL query for common names from WikiFCD returned
because they were the names of dishes rather than food items. Some food composition tables
include data for dishes of multiple ingredients as well as individual food items in their datasets.
We also found that there were differences in practices among the authors of food composition
tables. For example, some authors provide taxon information at the species level and some
provide taxon information at the varietal or subspecies levels.


7. Creating a Dataset of Multilingual Labels for Food Items
Anyone can reuse data from WikiFCD or Wikidata for any purpose. One or more of the
subgraphs described in this paper could be reused as a source of labels for any food-related
application that requires multilingual access terms for food items. Additionally these subsets
could be periodically monitored for updates from the communities. We anticipate that all of
these subgraphs will be extended by editors adding new information to these wikis. As Wikidata
editors notice errors, they will address them and improve them [29].
   We explored five distinct subgraphs in order to understand which of these would return
relevant label candidates. Another advantage of collecting labels from multiple subgraphs is
that we can cross-check labels with the results of another subgraph. For example, if we look at
the leafy green called ‘kale’ in English, for the Wikipedia article subset we find that the German
language article is titled ‘Grünkohl’. The Wikidata item has the German label ‘Grünkohl’. Seeing
the same label in different subsets increases our confidence that the label is accurate.
   Each of the food items in our sample data set had dozens of distinct language labels across
the sets of relevant subgraphs. In total we counted 173 distinct languages with labels for ‘apple’,
134 distinct languages with labels for ‘chocolate’, 68 distinct languages with labels for ‘kale’,
82 distinct languages with a label for ‘oregano’, 65 distinct languages with labels for ‘purslane’
and 151 distinct languages with labels for ‘rice’. Currently the labels in Wikidata are unevenly
distributed across the supported languages. We anticipate that as more editors join the Wikidata
community they will contribute many more labels in additional languages. Reusing multilingual
content from Wikidata in open scientific projects will increase the accessibility of data produced.
For domains such as food and human nutrition, multilingual data can be shared more widely
which could impact more people.
8. Creating Wikibases for Interoperability
We chose to create WikiFCD so that we could design a data model to accommodate food
composition data sourced from a diverse range of published sources. Our decision to use
Wikibase allowed us to provide web-based access to WikiFCD, meaning anyone can find this
data online. We had an explicit data model in mind, inspired by the structure of the published
food composition tables we drew from as our data sources. The data in WikiFCD are not all
appropriate for Wikidata. By connecting some items and some properties in WikiFCD to their
Wikidata correlates through mapping statements, we are able to treat the two resources as if
they were a single knowledge base through writing federated SPARQL queries. The SPARQL
endpoint for WikiFCD supports federated queries that allow us to combine data from WikiFCD
with data from Wikidata.
  In the future, if certain subsets of WikiFCD are of interest to the Wikidata community, we
are well-positioned to quickly contribute data to Wikidata itself. We will be able to leverage
the mappings that we created between food items with FoodOn identifiers and appropriate
Wikidata items. As more and more people create Wikibase instances for specialized data, we all
have more data to combine with different subsets of Wikidata through federated queries.


9. Conclusion
The five subgraphs we describe in this paper were created by different communities of editors
across multiple projects of the Wikimedia Foundation and the ecosystem of Wikibases. Editors
add content to different language versions of Wikipedia, to Wikidata, and to WikiFCD. As
more people contribute multilingual content, all of these projects become more accessible to
additional language communities.
   Some decisions about the suitability of sourcing labels from any combination of the subgraphs
described in this paper will require manual review. In some cases there will be multiple candidate
labels per languages, and a curator will need to evaluate them for suitability. The breadth of
languages covered by these label subgraphs and the fact that the data is free to reuse make this
an attractive source of multilingual content.
   Enriching WikiFCD with multilingual labels is a priority for our project because we want to
be sure that FAIR food composition data for a broad range of foods from diverse cuisines are
easily accessible on the web at no cost. As nutrition plays a part in maintaining health, people
who need data about foods that are not found in popular sources like Food Data Central will
be able to find food composition data in WikiFCD. We created an interactive webapp for our
sample dataset so that others can quickly compare the availability of labels across the subsets
we discussed7 .
   Developers of applications, ontologies, vocabularies, and other resources may need a free-to-
reuse source for multilingual content. As a multilingual knowledge base, Wikidata contains
labels in more than three hundred human languages. This means that anyone who needs to
source multilingual labels for words could explore Wikidata to see how complete the current
label inventory is. Depending on the domain, one or more of these subgraphs may contain

7
    https://wikifcd.k2.services/multi-lingual-table
enough multilingual labels to increase coverage for a specific project or use case. The purpose
of demonstrating five distinct subgraphs in this paper is to emphasize that there is multilingual
content in different layers of the projects of the Wikimedia Foundation.


Acknowledgments
We thank the Joint Food Ontology Working Group for productive discussions about FoodOn.
We thank the Wikidata community for continuing to improve the Wikidata knowledge base.


References
 [1] K. Thornton, K. Seals-Nutt, M. Matsuzaki, Introducing wikifcd: Many food composition
     tables in a single knowledge base, in: CEUR Workshop Proceedings, volume 2969, CEUR-
     WS, 2021.
 [2] Tidball, Tidball, Curtis, The absence of wild game and fish species from the usda national
     nutrient database for standard reference: Addressing information gaps in wild caught
     foods 53 (2014). doi:1 0 . 1 0 8 0 / 0 3 6 7 0 2 4 4 . 2 0 1 3 . 7 9 2 0 7 7 .
 [3] A. Durazzo, E. Camilli, S. Marconi, S. Lisciani, P. Gabrielli, L. Gambelli, A. Aguzzi, M. Lu-
     carini, J. Kiefer, L. Marletta, Nutritional composition and dietary intake of composite
     dishes traditionally consumed in italy, Journal of Food Composition and Analysis 77 (2019)
     115–124.
 [4] N. Monjotin, M. J. Amiot, J. Fleurentin, J. M. Morel, S. Raynal, Clinical evidence of
     the benefits of phytonutrients in human healthcare, Nutrients 14 (2022). URL: https:
     //www.mdpi.com/2072-6643/14/9/1712. doi:1 0 . 3 3 9 0 / n u 1 4 0 9 1 7 1 2 .
 [5] J. Gibbs, F. P. Cappuccio, Plant-based dietary patterns for human and planetary
     health, Nutrients 14 (2022). URL: https://www.mdpi.com/2072-6643/14/8/1614. doi:1 0 .
     3390/nu14081614.
 [6] H. Mechchate, A. El Allam, N. El Omari, N. El Hachlafi, M. A. Shariati, P. Wilairatana,
     M. S. Mubarak, A. Bouyahya, Vegetables and their bioactive compounds as anti-aging
     drugs, Molecules 27 (2022). URL: https://www.mdpi.com/1420-3049/27/7/2316. doi:1 0 .
     3390/molecules27072316.
 [7] G. Menichetti, A.-L. Barabási, Nutrient concentrations in food display universal behaviour,
     Nature Food 3 (2022) 375–382.
 [8] L. Zhou, C. Shimizu, P. Hitzler, A. M. Sheill, S. G. Estrecha, C. Foley, D. Tarr, D. Rehberger,
     The enslaved dataset: A real-world complex ontology alignment benchmark using wikibase,
     in: Proceedings of the 29th ACM International Conference on Information & Knowledge
     Management, 2020, pp. 3197–3204.
 [9] D. Diefenbach, M. D. Wilde, S. Alipio, Wikibase as an infrastructure for knowledge graphs:
     The eu knowledge graph, in: International Semantic Web Conference, Springer, 2021, pp.
     631–647.
[10] K. Thornton, K. Seals-Nutt, M. Matsuzaki, D. Damion, Reuse of the foodon ontology in a
     knowledge base of food composition data, Semantic Web Journal (2023).
[11] J. E. Labra-Gayo, A. Ammar, D. Brickley, D. F. Álvarez, A. G. Hevia, A. J. Gray, E. Prud’hom-
     meaux, D. Slater, H. Solbrig, S. A. H. Beghaeiraveri, et al., Knowledge graphs and wikidata
     subsetting, BioHackathon Europe 2020 (2021). URL: https://biohackrxiv.org/wu9et/.
[12] ”FORCE11”, The fair data principles (2014). https://www.force11.org/group/fairgroup/
     fairprinciples.
[13] B. Mons, C. Neylon, J. Velterop, M. Dumontier, L. O. B. da Silva Santos, M. D. Wilkinson,
     Cloudy, increasingly fair; revisiting the fair data guiding principles for the european open
     science cloud, Information Services & Use (2017) 1–8.
[14] S. Haussmann, O. Seneviratne, Y. Chen, Y. Ne’eman, J. Codella, C.-H. Chen, D. L. McGuin-
     ness, M. J. Zaki, Foodkg: a semantics-driven knowledge graph for food recommendation,
     in: International Semantic Web Conference, Springer, 2019, pp. 146–162.
[15] E. Chazelas, M. Deschasaux, B. Srour, E. Kesse-Guyot, C. Julia, B. Alles, N. Druesne-
     Pecollo, P. Galan, S. Hercberg, P. Latino-Martel, et al., Food additives: distribution and
     co-occurrence in 126,000 food products of the french market, Scientific reports 10 (2020)
     1–15.
[16] M. Balkey, M. Batz, G. Gopinath, G. Gosal, E. Griffiths, H. Tate, R. Timme, (v) standardizing
     the isolation source metadata for the genomic epidemiology of foodborne pathogens using
     lexmapr, IAFP 2021 (2021).
[17] D. Dooley, L. Andres-Hernandez, G. Bordea, L. Carmody, D. Cavalieri, L. Chan,
     P. Castellano-Escuder, C. Lachat, F. Mougin, F. Vitali, et al., Obo foundry food ontol-
     ogy interconnectivity, in: CEUR Workshop Proceedings, volume 2969, 2021.
[18] J. Pires, J. S. Huisman, S. Bonhoeffer, T. P. Van Boeckel, Increase in antimicrobial resistance
     in escherichia coli in food animals between 1980 and 2018 assessed using genomes from
     public databases, Journal of Antimicrobial Chemotherapy 77 (2022) 646–655.
[19] A. Waagmeester, G. Stupp, S. Burgstaller-Muehlbacher, B. M. Good, M. Griffith, O. L.
     Griffith, K. Hanspers, H. Hermjakob, T. S. Hudson, K. Hybiske, S. M. Keating, M. Manske,
     M. Mayers, D. Mietchen, E. Mitraka, A. R. Pico, T. Putman, A. Timothy, N. Queralt-Rosinach,
     L. M. Schriml, T. Shafee, D. Slenter, R. Stephan, K. Thornton, G. Tsueng, R. Tu, S. Ul-Hasan,
     E. Willighagen, C. Wu, A. I. Su, Wikidata as a knowledge graph for the life sciences, Elife
     9 (2020) e52614. URL: https://doi.org/10.7554/ELIFE.52614.
[20] L.-A. Kaffee, A. Piscopo, P. Vougiouklis, E. Simperl, L. Carr, L. Pintscher, A Glimpse into
     Babel: An Analysis of Multilinguality in Wikidata, in: Proceedings of the 13th International
     Symposium on Open Collaboration, OpenSym ’17, ACM, New York, NY, USA, 2017, pp.
     14:1–14:5. URL: https://doi.org/10.1145/3125433.3125465. doi:1 0 . 1 1 4 5 / 3 1 2 5 4 3 3 . 3 1 2 5 4 6 5 .
[21] D. Vrandečić, Wikidata: A new platform for collaborative data collection, in: Proceedings
     of the 21st International Conference Companion on World Wide Web, ACM, 2012, pp.
     1063–1064.
[22] Meta, List of wikipedias meta, discussion about wikimedia projects, 2022. URL: https:
     //meta.wikimedia.org/w/index.php?title=List_of_Wikipedias&oldid=23800107, [Online;
     accessed 8-October-2022].
[23] B. Cartoni, D. C. Aros, D. Vrandečić, S. Lertpradit, Introducing lexical masks: a new repre-
     sentation of lexical entries for better evaluation and exchange of lexicons, in: Proceedings
     of the 12th Language Resources and Evaluation Conference, 2020, pp. 3046–3052.
[24] F. Nielsen, Lexemes in wikidata: 2020 status, in: Proceedings of the 7th Workshop on
     Linked Data in Linguistics (LDL-2020), 2020, pp. 82–86.
[25] Ordia, Ordia statistics, 2022. URL: https://ordia.toolforge.org/statistics/, [Online; accessed
     13-October-2022].
[26] F. Å. Nielsen, Ordia: A web application for wikidata lexemes, in: European Semantic Web
     Conference, Springer, 2019, pp. 141–146.
[27] T. Pellissier Tanon, L.-A. Kaffee, Property label stability in wikidata: evolution and
     convergence of schemas in collaborative knowledge bases, in: Companion Proceedings of
     the The Web Conference 2018, 2018, pp. 1801–1803.
[28] D. Vrandečić, Building a multilingual wikipedia, Communications of the ACM 64 (2021)
     38–41.
[29] K. Shenoy, F. Ilievski, D. Garijo, D. Schwabe, P. Szekely, A study of the quality of wikidata,
     Journal of Web Semantics 72 (2022) 100679. URL: https://www.sciencedirect.com/science/
     article/pii/S1570826821000536. doi:h t t p s : / / d o i . o r g / 1 0 . 1 0 1 6 / j . w e b s e m . 2 0 2 1 . 1 0 0 6 7 9 .