=Paper=
{{Paper
|id=Vol-2969/paper4-IFOW
|storemode=property
|title=Introducing WikiFCD: Many Food Composition Tables in a Single Knowledge Base
|pdfUrl=https://ceur-ws.org/Vol-2969/paper4-IFOW.pdf
|volume=Vol-2969
|authors=Katherine Thornton,Kenneth Seals-Nutt,Mika Matsuzaki
|dblpUrl=https://dblp.org/rec/conf/jowo/ThorntonSM21
}}
==Introducing WikiFCD: Many Food Composition Tables in a Single Knowledge Base==
<pdf width="1500px">https://ceur-ws.org/Vol-2969/paper4-IFOW.pdf</pdf>
<pre>
Introducing WikiFCD: Many Food Composition
Tables in a Single Knowledge Base
Katherine Thornton1 , Kenneth Seals-Nutt2 and Mika Matsuzaki3
1
  WikiFCD Collaborative, Olympia, WA, USA
2
  WikiFCD Collaborative, New York, New York, USA
3
  Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, Baltimore, MD 21205, United States


                                         Abstract
                                         We introduce WikiFCD, a knowledge base of structured food composition data. This knowledge base is
                                         designed to accommodate data from different regions of the world, it is multi-lingual, and it supports open
                                         participation of any interested editor. We used Wikibase to store data from multiple food composition
                                         tables (FCTs). We mapped relevant classes of data to corresponding entities in the Wikidata knowledge
                                         base in order to support querying of food composition data alongside data about chemical compounds,
                                         metabolites, biological pathways, and data about human genes. We also make use of FoodOn to provide
                                         identifiers for food items. This knowledge base contains a growing number of FCTs that provide coverage
                                         of a broad range of cuisines and food traditions. Reusing data from this knowledge base can provide
                                         greater coverage of foods for nutrient intake tools. This knowledge base will be useful for policy makers,
                                         epidemiologists, nutrition researchers, developers of food-related applications, and people interested in
                                         food tracking.

                                         Keywords
                                         food composition, knowledge base, Wikidata, Nutri-informatics


1. Introduction
Food is an essential part of our lives, providing energy and nutrients required for health.
Suboptimal diet contributes to one in five deaths globally, making dietary improvement one
of the highest priorities in global health [1]. Our ability to accurately represent and retrieve
information on food items has an unequivocal impact on the quality of prevention and treatment
strategies we develop for nutrition-related diseases. Food composition data (FCD) - a central
piece connecting foods to health - has a rich history and diverse datasets exist around the world.
And yet, the usability as well as the interoperability of these data vary greatly, with a large
disparity between high income countries (HIC) and low and middle income countries (LMIC).
This disparity has a grave implication for global health as the “triple” burden of malnutrition
due to deficiencies or excess in macro and micronutrients are ubiquitous. Accurate and detailed
information about the composition of the foods we eat are, more than ever, needed by policy

IFOW 2021: 2nd Integrated Food Ontology Workshop, held at JOWO 2021: Episode VII The Bolzano Summer of
Knowledge, September 11-18, 2021, Bolzano, Italy
Envelope-Open katherine.thornton@yale.edu (K. Thornton); kenneth@seals-nutt.com (K. Seals-Nutt); mmatsuz2@jhu.edu
(M. Matsuzaki)
Orcid 0000-0002-4499-0451 (K. Thornton); 0000-0002-5926-9245 (K. Seals-Nutt); 0000-0002-7020-3757 (M. Matsuzaki)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
makers, researchers, software developers, and consumers as the world faces the epidemic of
nutrition-related diseases.
   Even in HIC, many food items that are consumed by people every day cannot be considered
for dietary analyses despite the existence of nutrient data for these food items [2] . As a result,
individuals are often forced to use nutrient data from “similar” food items, which may or may
not actually have similar nutrient content. This is especially true for those who consume more
ethnic minority foods.
   FCD are currently fragmented, unevenly provisioned, and published in formats ill-suited
to the web. Nutrient content of fruits, vegetables, staples, meats, and dairy products can also
vary for the same item from different areas and times because of changing characteristics such
as climate and terroir. However, the current structure for most FCDs are not well-suited for
reflecting these changes. Importantly, even though there are also wide regional variations in
foods that are commonly consumed, some places lack access to regionally appropriate FCD,
up-to-date FCD, or FCD in their own languages, leading to disparities in data availability
and accessibility and ultimately, in scientific evidence in health research. Development and
maintenance of such databases are difficult if the contributors are limited to small, closed groups
of researchers and employees in this field.
   However, it is within our power to change this situation and ensure that accurate food
composition data are available for the long tail of food items. Our solution to this challenge
is to build a knowledge base of structured food composition data using a peer production
approach. Not only is this knowledge base designed to accommodate data from different regions
of the world, it is multi-lingual, and supports open participation of any interested editor. The
knowledge base is also designed so that data is available according to FAIR principles [3]. The
free software infrastructure used to power Wikidata - Wikibase - has enabled us to develop
a unified resource encompassing many different food composition tables. We are building
our knowledge base using Wikibase because it is optimized for both human and algorithmic
curation. Opening contribution to anyone interested in this effort will both allow our dataset to
grow and involve many more people to participate in the data curation, which, as shown in
the examples of Wikipedia and Open Street Map, could lead to a creation of a large, equitable
knowledge base.
   In addition to the sheer number of potential contributors to this project - Wikidata currently
has over 250,000 active users - there are other advantages of this peer production and Wikibase
based approach to traditional methods of FCD development. First, this Wikibase instance
substantially improves the usability of FCDs from different sources for diverse users - from
WikiProjects and Wikipedia editors and viewers to academic researchers to public health
workers. Researchers have found that the absence of culturally-diverse foods in apps such as
MyFitnessPal is a barrier to using them in research [4, 5, 6, 7].
   Building a structured dataset is also a key step in identifying most appropriate data to borrow
in resource-poor settings where up-to-date, detailed, and regionally appropriate FCD are not
readily available. This new database will also open up ways to explore new research questions
to explore more nuanced nutrition data (e.g. changes in nutrient content of the same product,
depending on the climate conditions of the year), which can potentially make substantial
advances in nutrition and health research.
   We created an instance of Wikibase for this project and designed our own data models
which are flexible enough to allow us to incorporate data from heterogeneous data sources.
Connecting this knowledge base with Wikidata allows us to combine this data with cross-domain
data related to micronutrients, chemical compounds, biological pathways, human genes, and
disease information using databases like the Human Metabolome Database and Wikipathways.
Connecting the data in this way allows us to ask questions about how food choices may impact
health in a broad range of ways.


2. Development of WikiFCD
2.1. Wikidata
Wikidata went live in late 2012 [8]. The infrastructure of Wikidata is collaboratively built via
commons-based peer production [9, 10, 11]. Commons-based peer production is the name given
to open collaboration systems where users are creating content under the agreement that all
content will remain in the public domain. This means that content created by the community
can be freely reused by others. The peer production aspect refers to how users coordinate work
themselves, rather than some members of the community organizing the work tasks of other
members. Wikidata is edited by volunteers from all over the world in more than 350 languages
[12].
   In addition to a free software infrastructure, the Wikidata community also publishes all content
in the knowledge base under a Creative Commons Zero License. The Wikidata community makes
dumps of previous versions of the content of the knowledge base available. The infrastructure
of the Wikidata knowledge base is maintained by an international community of people. For
cultural heritage institutions who find the structured data in Wikidata useful for work flows,
this mans that there will be much less staff time necessary to design, build and maintain
infrastructure for this data. The data in WikiFCD complements the work of several active
communities curating data in Wikidata: the GeneWiki initiative [13, 14], the WikiPathways
community [15] The LOTUS initiative [16] and the Scholia project [17].
   Wikidata is a multilingual knowledge base, leveraging the concept mappings created through
years of conceptual alignment among the different language versions of Wikipedia [18]. This
means that more users will have access to data in their language, an important step in reducing
the dominance of the English language which disadvantages other linguistic communities.

2.2. Reusing Wikibase
We chose to create this knowledge graph of structured data published in a publicly-available
instance of the Wikibase platform, called WikiFCD1 . Wikibase is a set of extensions to the
MediaWiki software platform and is developed by the Wikimedia Foundation as free software.
Wikibase is a novel infrastructural platform for data management suitable for data from many
domains. This is the first application built on Wikibase tailored to the needs of the epidemiolog-
ical community. The output of this project will be a knowledge graph of structured data in the
form of a Wikibase instance populated with data from heterogeneous food composition tables.

    1
        https://wikifcd.wiki.opencura.com/wiki/Main_Page
   We reused three subsets from Wikidata to create some of the basic structure of our knowledge
base. Identifying and reusing subsets of Wikidata is still an emerging practice [19]. We wrote
SPARQL queries to identify all taxa with an identifier in the Germplasm Resources Information
Network (GRIN) in Wikidata. We then wrote a bot to populate these items to WikiFCD with
mappings back to the Wikidata item. The purpose of having these taxa in WikiFCD was to
be able to create statements about food items that are derived from a taxon. We did the same
process with the set of human languages and the set of countries/states. We did this so that
we could use language and country items in our statements about individual food composition
tables and to provide linguistic information about the common names of food items.
   We have systematically mapped data in WikiFCD to corresponding items and properties in
Wikidata itself. These mappings allow us to ask questions of both data sets and to make use
of the mappings between Wikidata and thousands of external data sources. These mappings
increase the breadth and complexity of data combinations we can create, using Wikidata as the
hub of connection. Multiple data visualization options are available via the Query Service of our
Wikibase instance. The Query Service is a SPARQL endpoint which supports querying the data
in the knowledge graph via the SPARQL query language. Graphs, charts, network diagrams,
and maps are some of the visualizations we will be able to offer end-users of this knowledge
base [20].
   To collect a list of food composition tables (FCTs) representative of international communities,
we consulted the resources described by the United Nations Food and Agriculture Organization
(FAO)2 . We worked from the FAO’s list of food composition tables to identify existing FCTs that
we could add to our Wikibase. We then found copies of these FCTs where possible. We then
extracted the data from these tables. The FCTs were originally published as CSV or as tabular
data encoded in a PDF.
   We populated WikiFCD with data from the USDA’s Food Data Central database. Food Data
Central has a set of APIs that can be used to access data. We wrote a client to collect data from
Food Data Central and then wrote a bot to populate WikiFCD with the data.
   We created a database model that can represent heterogeneous food composition tables. We
used this model to map multiple food composition tables so that we could then import them
into a Wikibase instance. We also support the addition of data sourced from the literature that
covers a single food. This is an advantage of our data model as well as our contribution model.
While other multi-country food composition data bases (FCDBs) combine national level FCTs
[21], we include foods that are not yet found in any country’s official FCT. We aim for broad
representation of food ways, striving to include food composition data for wild, foraged foods,
and less-commonly-eaten plant foods.
   Our alignment of food composition table data with Wikidata allows us to leverage the sum
of knowledge in the projects of the Wikimedia foundation. Because Wikimedia Commons, the
media repository of Wikimedia projects, has also been aligned with Wikidata, we will be able
to easily reuse images of food items, molecular structure models, and food dishes alongside
our projects. This query from our SPARQL endpoint3 lists all of the food items in our project
Wikibase that have an associated image in Wikimedia Commons.

    2
        http://www.fao.org/infoods/infoods/tables-and-databases/en/
    3
        https://tinyurl.com/y99qtk7p
   We used the wbstack platform to create an instance of Wikibase for testing4 . The wbstack
service provides a hosted version of Wikibase that users can load with their own data. Wikibase
is the software used to support Wikidata itself. In order to populate our system with data we
used a tool called WikidataIntegrator (WDI). WDI is a python library for interacting with data
from Wikidata [22]. WDI was created by the Su Lab of Scripps Research Institute and shared
under an open-source software license via GitHub5 . Using WDI as a framework, we wrote bots
to transfer data from FCTs to our Wikibase.
   The largest class of items in this system is that of f o o d i t e m . There are currently about
400,000 food items in the system. We have more than 300 properties in the system which we
use to describe the items. Examples of properties are Dietary Fiber (P11), and Fatty acids, total
saturated (P86).
   In order to group food items we assign identifiers from FoodOn. FoodOn is an ontology that
describes foods and the organisms from which they are derived [23]. By making use of the
FoodOn ontology we can bring together food items across diverse FCT sources. FoodOn reuses
the Composition Dietary Nutrition Ontology [24]. We plan to map our nutrient properties to
the relevant components of CDNO.
   After importing data from Food Data Central of the United States, we next imported data
from the Malawian Food Composition Table 2019 [25]. We used WikidataIntegrator to write a
bot to read data from a CSV table version of the FCT and write it to WikiFCD according to our
data model. We took several steps to prepare the data before ingest. We split out values from the
column “Food Item” and created three additional columns. We left the English language name
of the food item in the “Food Item” column. We created a new column “Taxon” to accommodate
the binomial names in italics that were previously in the “Food Item” column after the English-
language label. It was very helpful to see that the binomial names were included for so many
food items in this FCT. This information allows us to disambiguate food items. We then created
a column “Taxon ID” for the Qid of the taxon in our knowledge base. We created another new
column “Common name” for the name in parentheses for local names of these food items. The
reason we created separate columns was to prepare the file for use by our bot. For each of
these new columns the data is mapped to a separate property in our data model and our bot
will write different statements to WikiFCD using the appropriate properties. We also had to
remove the quotation marks, square brackets, and parentheses around some of the nutrient
values reported. We could not accommodate those characters in our knowledge base, so we
removed them before the bot run.
   Some creators of FCTs include references for the publications where they sourced their data.
These references are very useful for understanding how the FCT was compiled. These references
can be difficult for us to incorporate into our data model because Wikibase was designed to
accommodate references for each statement [26]. If we are unable to determine which values
were sourced from which publication, the we do not have clarity about which statement(s) on
which to put the reference. The Malawian 2019 FCT clearly indicated for each row of data
which reference was used to source data. We wrote additional bots for each FCT we ingested
into the WikiFCD system.

   4
       https://www.wbstack.com/
   5
       https://github.com/SuLab/WikidataIntegrator
2.3. Populating WikiFCD with Data
Data in WikiFCD is FAIR data. FAIR is a set of data principles [3]. By creating data that aligns
with the FAIR data principles, we ensure that this metadata is easy to find and easy to reuse.
Redundant, fragmented descriptions in siloed repositories are frustratingly incomplete. Many
governmental bodies and international consortia have endorsed the FAIR data principles as a
key aspect of their open science or open data initiatives [27]. FAIR is an acronym for findable,
accessible, interoperable and reusable. Food composition data in WikiFCD are findable in that
WikiFCD is available on the web and is openly accessible. The Qids assigned to WikiFCD items
are their unique, persistent identifiers.
   These metadata are accessible because the entity data associated with their unique ids (all
statements and references asserted about an item) are dereferencable via the HTTP protocol.
They are interoperable in that they link to many other databases and systems through the
Wikidata mappings which connect to external ids.
   These metadata are reusable due to the use of the CCO license for the content of WikiFCD.
Anyone can reuse WikiFCD data for any purpose. Publishing data in the WikiFCD knowledge
base fulfills the most complete degree of FAIRness, level F, “FAIR data, Open Access, Functionally
Linked”, as described in [27].
   We have so far curated data from the United States Department of Agriculture’s FoodData-
Central database, SMILING Indonesia, SMILING Vietnam, SMILING Thailand, SMILING Laos,
and Malawi. Our initial goal is to curate data from low and middle income countries in WikiFCD
with the aim to reduce the aforementioned data disparities in nutrition.

2.4. Mapping food items to FoodOn
FoodOn is an ontology for foods [23]. FoodOn reuses many food categories from LanguaL and
is developed according to the ontology principles of the OBO Foundary. We decided to reuse
FoodOn identifiers on our food items in WikiFCD in order to create a bridge between our food
composition data and the FoodOn ontology. We have mapped some of our food items to their
FoodOn identifiers manually as a test set. In the future we will be able to match some food
items in a semi-automated manner if we have data about the taxon from which the food item
is derived. Some food composition tables provide this information. If this information is not
provided in the FCT we will then map them manually.


3. Use Case
Even though we have only a small fraction of existing FCDs in the world, the benefit of the
creation of this Wikibase instance is apparent. We are able to query for values across all FCTs
in WikiFCD. For example we can query for a ranked list of foods that have the most to least
Docosahexaenoic acid (DHA) per 100 grams6 .
   We have also tested several federated queries that allow data from additional SPARQL end-
points to be included. For the subset of items that we have already mapped to FoodOn, we were
interested to know what metabolites are produced when humans consume these foods. We
   6
       https://tinyurl.com/y56qvvr6
wrote a federated query between WikiFCD and Wikidata to ask about the food items, FoodOn
ids, and taxa from which these foods are derived (facts stored in WikiFCD) with data about
metabolites available from Wikidata7 .
   We explored the reuse of information about biological pathways from Wikidata as well as the
supporting scientific literature from which the information was sourced by writing a federated
query between WikiFCD and Wikidata8 . The query asks for chemical compounds that are part
of a biological pathway in homo sapiens and the scientific articles that provide evidence.
   We can use Wikidata as a hub of identifiers that provide cross-references to additional
databases [28]. This means that once we have the Wikidata Qid for a resource, we find many
other identifiers for that resource from a broad range of other databases and information systems.
For example many chemical compounds have an external identifier for the Human Metabolome
Database (HMDB). We wrote a federated query for taxa listed in WikiFCD in which certain
chemical compounds are found along with the HMDB identifiers for those compounds. This
query allows us to connect food items that are derived from specific plants with a profile of
metabolites that are relevant for human health. The microbiome is recognized as playing a role
in health inequities [29]. Being able to combine these data is an important step in preparing
additional research.
   Items in Wikidata are connected to external databases or collections through the use of
properties that have the data type “external id”. More than half of all properties in Wikidata are
external id properties. Connecting Wikidata items to other resources in this way is a powerful
feature allowing us to fulfill the promise of linked open data [30]. By following external id links,
users can discover more information about the item of interest. We prioritize connecting to
multiple external projects in our curation activities. As more external identifiers are published
to the Wikidata knowledge base it grows in prominence as a cross-switch for identifiers and
vocabularies [31]. Wikidata is becoming a hub of persistent identifiers [28]. As users contribute
additional data to Wikidata it will become even more valuable.


4. Discussion
4.1. Lessons Learned
Through this development of the pilot project, we have learned several valuable lessons in
creating a global FCD. Chan et al. detail the importance of standardizing nutrition data [32].
Our experience importing FCTs into WikiFCD have illustrated how the lack of a standardized
template for food composition tables impedes data interoperability. We encourage future
creators of FCTs to use INFOODS tag names [33, 34]. Currently we need to develop a unique bot
for each FCT. In the future, if a standardized FCT were adopted, we could accomplish the same
work with a single bot built to understand the structure of the standard. This would reduce the
time researchers need to spend reusing data from different FCTs.
   We recommend that teams creating FCTs in the future consider providing mappings for food
items to their corresponding FoodOn identifiers. This step will increase precision by providing

    7
        https://tinyurl.com/yz5seocf
    8
        https://tinyurl.com/ybtgwgby
unambiguous indications of the taxonomic source of the food and which part (eg. plant leaves
vs. plant roots). Currently this information is indicated in the label of the food item in many
FCTs, but the languages for describing organisms varies. Reuse of FoodOn identifiers will also
reduce confusion related to naming differences for foods at the regional and national levels.
   In WikiFCD we established mappings from certain items to their corresponding items in
Wikidata. Queries on the WikiFCD SPARQL query endpoint can be written to include data
from Wikidata because of the fact that the endpoint supports federated querying. This allows
users to ask questions of our data that go far beyond what our dataset can answer. The ability
to connect a global FCD to a general-purpose knowledge base increases the utility of the FCD.
   Maintaining a set of mappings to Wikidata also allows us to be strategic in our curation. As
researchers estimate that there are 200,000 to 1,000,000 different metabolites synthesized by
plants [35], we determined that it is beyond the scope of our knowledge base to store these
metabolites in WikiFCD. Instead connect WikiFCD items for taxon names to the corresponding
items in Wikidata. These mappings then allow us to make federated SPARQL queries to ask
questions about plant metabolites such as “What metabolites are found in foods that are natural
products of Vaccinium deliciosum, and with what do they physically interact9 .
   Connecting Wikidata items to resources in external databases or systems allows for software
agents to discover related content automatically. This allows us to benefit from complementary
work and provides infrastructure for connecting information that was previously fragmented
across multiple systems. In the domain of food, Wikidata has external identifiers for several
large food databases. For example, P r o p e r t y P 4 7 2 9 “INRAN Italian Food ID” is used to link food
items with the Italian national nutrient database. Through the use of P r o p e r t y P 4 7 2 9 the pages
dedicated to these resources can be connected to their corresponding items in Wikidata. By
making use of this property, we can use the INRAN identifier to find additional information
about the food item in the INRAN database. In this way, Wikidata serves as a hub of identifiers
that connect to external resources.

4.2. Building the WikiFCD Community
Using Wikibase as infrastructure has allowed the Wikidata community to engage in peer-
production and collaborative ontology engineering [11]. We identified peer-production and
collaborative ontology engineering as vital components to include in the vision of WikiFCD.
Wikibase offers a novel method to change the current state of FCDs and bring a peer-produced
knowledge base to the field of nutrition research. The current project explores whether a
Wikibase-based FCD can be an effective method of developing a more equitable and compre-
hensive knowledge base in nutrition. WikiFCD is distinct from previous attempts in compiling
a global FCD in that it allows the community members, or “peers”, to become involved in the
database development directly. The project aims to empower users from low resource settings to
fully utilize available nutrient data to answer their own questions, identify knowledge gaps, and
engage in improving the database. In successful peer production communities like Wikipedia,
projects have garnered efforts from hundreds of thousands of volunteers. The involvement of a
large number of “peers” in the production has a potential to successfully building a global FCD.

    9
        https://tinyurl.com/y7qplyjh
Figure 1: Common names listed for Tomato in WikiFCD


   Developing a successful online community can be challenging. Given strong interests and
support we have received from the communities of nutrition researchers, we believe that we
will be able to attract participants of the WikiFCD community. Additionally, our team includes
experienced Wikimedians and academic researchers in the field of online communities and peer
production who are equipped with extensive experiences in peer production communities and
in-depth knowledge on theories of online community development. Furthermore, we will also
be developing a mobile meal-planning application with options to contribute missing data to
WikiFCD, which helps users to connect personal needs to community needs and lowers hurdles
in contributing to this knowledge base.
   One of the reasons we chose Wikibase was the support for multiple concurrent editors.
This means that many different people can contribute data to WikiFCD at the same time. If a
community has recently gathered data for their own FCT, they are welcome to add their data
to WikiFCD. Use the search box to find a food item. If you’d like to add data for tomato, then
start editing the item for T o m a t o , f r e s h 10 . If you’d like to add a name for this food item in a
language that is not yet listed under c o m m o n n a m e then click on a d d v a l u e as seen in Figure 1.
After entering text, then click on a d d q u a l i f i e r select l a n g u a g e o f w o r k o r n a m e and then enter
the language of the word you just contributed. Then provide a reference for your statement.
You can reference a webpage or a published source. If you want to reference a published source,
simply create a new item for that source in WikiFCD using the c r e a t e n e w i t e m link available
from the sidebar menu.
   If there is an image of your food item available within Wikimedia Commons, then it is also
possible to connect that image to the WikiFCD food item. If you would like to connect a food
item to an image from Commons, you can use P59 “image” as the connecting property11 .

4.3. WikiFCD as a Model Database
Finally, WikiFCD will also serve as an example of setting up a peer-produced knowledge base,
helping others who are interested in creating one for their own needs (e.g. local organic farming

    10
         https://wikifcd.wiki.opencura.com/wiki/Item:Q135084
    11
         https://wikifcd.wiki.opencura.com/wiki/Property:P59
communities) while retaining an ability to make federated queries to other Wikibase-based
databases like Wikidata. This knowledge base will be useful for policy makers, epidemiologists,
nutrition researchers, developers of food-related applications, and people interested in food
tracking. This knowledge base will provide a low-cost data publishing option for areas of the
world with limited budgetary resources for data promulgation. From a technology perspective,
many national food composition tables are currently publishing one-star or two-star data. We
will provide the enabling technology for any organization to publish five-star linked open data
that meets the FAIR data guidelines at no cost. We hope that this project becomes the first step
to creating a federated community of food and nutrition knowledge producers.


5. Conclusion
Providing infrastructure for researchers and policy makers who need accurate food composition
data requires a team of technologists working in close collaboration with domain experts.
Populating the resource with data is work that can be shared by anyone interested in food data.
We have created a resource that emphasizes ease of data reuse as well as ease of data addition.
   If successful, WikiFCD can lead to reduction in data disparities and also enable users to pursue
research questions and projects that are currently difficult to explore. WikiFCD will also be able
to identify knowledge gaps in FCDs (e.g. missing nutrient information for regional foods). Our
system also has the advantage of making federated queries to other Wikibase databases, which
will substantially expand the scope of research questions that can be explored. Furthermore, if
subsets of the data are appropriate for other Wikibase instances like Wikidata, we will be able
to provide machine-actionable ShEx schemas that will help us prepare data for other systems.
In this way the data will be readily-available for incorporation into other Wikibase instances if
desired.


References
 [1] R. Fallaize, A. L. Macready, L. Butler, J. Ellis, J. Lovegrove, An insight into the public
     acceptance of nutrigenomic-based personalised nutrition, Nutrition research reviews 26
     (2013) 39–48.
 [2] M. C. Ocké, S. Westenbrink, C. T. van Rossum, E. H. Temme, W. van der Vossen-Wijmenga,
     J. Verkaik-Kloosterman, The essential role of food composition databases for public
     health nutrition – experiences from the netherlands, Journal of Food Composition
     and Analysis 101 (2021) 103967. URL: https://www.sciencedirect.com/science/article/pii/
     S0889157521001678. doi:h t t p s : / / d o i . o r g / 1 0 . 1 0 1 6 / j . j f c a . 2 0 2 1 . 1 0 3 9 6 7 .
 [3] M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak,
     N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne, et al., The fair guiding
     principles for scientific data management and stewardship, Scientific data 3 (2016) 160018.
 [4] M. Egan, A. Fragodt, M. Raats, C. Hodgkins, M. Lumbers, The importance of harmonizing
     food composition data across europe, European journal of clinical nutrition 61 (2007)
     813–821.
 [5] S. Shimbo, A. Hayase, M. Murakami, I. Hatai, K. Higashikawa, C.-S. Moon, Z.-W. Zhang,
     T. Watanabe, H. Iguchi, M. Ikeda, Use of a food composition database to estimate daily
     dietary intake of nutrient or trace elements in japan, with reference to its limitation, Food
     Additives & Contaminants 13 (1996) 775–786.
 [6] A. Durazzo, E. Camilli, S. Marconi, S. Lisciani, P. Gabrielli, L. Gambelli, A. Aguzzi, M. Lu-
     carini, J. Kiefer, L. Marletta, Nutritional composition and dietary intake of composite
     dishes traditionally consumed in italy, Journal of Food Composition and Analysis 77 (2019)
     115–124.
 [7] A. Trichopoulou, S. Soukara, E. Vasilopoulou, Traditional foods: a science and society
     perspective, Trends in Food Science & Technology 18 (2007) 420–427.
 [8] D. Vrandečić, Wikidata: A new platform for collaborative data collection, in: Proceedings
     of the 21st International Conference Companion on World Wide Web, ACM, 2012, pp.
     1063–1064.
 [9] Y. Benkler, Coase’s penguin, or, linux and the nature of the firm, Yale Law Journal (2002)
     369–446.
[10] Y. Benkler, A. Shaw, B. M. Hill, Peer production: a modality of collective intelligence,
     Collective Intelligence (2013).
[11] C. Müller-Birn, B. Karran, J. Lehmann, M. Luczak-Rösch, Peer-production system or
     collaborative ontology engineering effort: What is wikidata?, in: Proceedings of the 11th
     International Symposium on Open Collaboration, ACM, 2015, p. 20.
[12] F. Erxleben, M. Günther, M. Krötzsch, J. Mendez, D. Vrandečić, Introducing wikidata to
     the linked data web, in: The Semantic Web–ISWC 2014, Springer, 2014, pp. 50–65.
[13] A. Waagmeester, G. Stupp, S. Burgstaller-Muehlbacher, B. M. Good, M. Griffith, O. L.
     Griffith, K. Hanspers, H. Hermjakob, T. S. Hudson, K. Hybiske, S. M. Keating, M. Manske,
     M. Mayers, D. Mietchen, E. Mitraka, A. R. Pico, T. Putman, A. Timothy, N. Queralt-Rosinach,
     L. M. Schriml, T. Shafee, D. Slenter, R. Stephan, K. Thornton, G. Tsueng, R. Tu, S. Ul-Hasan,
     E. Willighagen, C. Wu, A. I. Su, Wikidata as a knowledge graph for the life sciences, Elife
     9 (2020) e52614. URL: https://doi.org/10.7554/ELIFE.52614.
[14] E. Mitraka, A. Waagmeester, S. Burgstaller-Muehlbacher, L. M. Schriml, A. I. Su, B. M.
     Good, Wikidata: A platform for data integration and dissemination for the life sciences
     and beyond, bioRxiv (2015) 031971.
[15] M. Martens, A. Ammar, A. Riutta, A. Waagmeester, D. N. Slenter, K. Hanspers, R. A. Miller,
     D. Digles, E. N. Lopes, F. Ehrhart, et al., Wikipathways: connecting communities, Nucleic
     Acids Research 49 (2021) D613–D621.
[16] A. Rutz, M. Sorokina, J. Galgonek, D. Mietchen, E. Willighagen, J. Graham, R. Stephan,
     R. Page, J. Vondrášek, C. Steinbeck, et al., Open natural products research: Curation and
     dissemination of biological occurrences of chemical structures through wikidata, bioArxiv
     (2021). URL: https://doi.org/10.1101/2021.02.28.433265.
[17] F. Å. Nielsen, D. Mietchen, E. Willighagen, Scholia, scientometrics and wikidata, in:
     European Semantic Web Conference, Springer, 2017, pp. 237–259.
[18] S. Burgstaller-Muehlbacher, A. Waagmeester, E. Mitraka, J. Turner, T. Putman, J. Leong,
     C. Naik, P. Pavlidis, L. Schriml, B. M. Good, et al., Wikidata as a semantic framework for
     the gene wiki initiative, Database 2016 (2016) baw015.
[19] J. E. Labra-Gayo, A. Ammar, D. Brickley, D. F. Álvarez, A. G. Hevia, A. J. Gray, E. Prud’hom-
     meaux, D. Slater, H. Solbrig, S. A. H. Beghaeiraveri, et al., Knowledge graphs and wikidata
     subsetting, BioHackathon Europe 2020 (2021). URL: https://biohackrxiv.org/wu9et/.
[20] S. Malyshev, M. Krötzsch, L. González, J. Gonsior, A. Bielefeldt, Getting the most out of
     wikidata: semantic technology usage in wikipedia’s knowledge graph, in: International
     Semantic Web Conference, Springer, 2018, pp. 376–394.
[21] P. M. Finglas, R. Berry, S. Astley, Assessing and improving the quality of food composition
     databases for nutrition and health applications in europe: the contribution of eurofir,
     Advances in Nutrition 5 (2014) 608S–614S.
[22] W. Andra, G. Stupp, B.-M. Sebastian, B. M. Good, G. Malachi, O. L. Griffith, H. Kristina,
     H. Henning, T. S. Hudson, H. Kevin, et al., Wikidata as a knowledge graph for the life
     sciences, eLife 9 (2020).
[23] D. M. Dooley, E. J. Griffiths, G. S. Gosal, P. L. Buttigieg, R. Hoehndorf, M. C. Lange, L. M.
     Schriml, F. S. Brinkman, W. W. Hsiao, Foodon: a harmonized food ontology to increase
     global food traceability, quality control and data integration, npj Science of Food 2 (2018)
     1–10.
[24] L. Andrés-Hernández, A. Baten, R. Azman Halimi, R. Walls, G. J. King, Knowledge
     representation and data sharing to unlock crop variation for nutritional food security,
     Crop Science 60 (2020) 516–529.
[25] S. N. D. A. van Graan, S. K. D. W. A. Masters, K. S. D. F. P. Phiri, A. M. Mwangwela, The
     malawi food composition database (mafoods) (2020).
[26] D. Vrandečić, M. Krötzsch, Wikidata: a free collaborative knowledgebase, Communications
     of the ACM 57 (2014) 78–85. URL: https://web.archive.org/web/20190311200511/http://
     cacm.acm.org/magazines/2014/10/178785-wikidata/fulltext. doi:1 0 . 1 1 4 5 / 2 6 2 9 4 8 9 .
[27] B. Mons, C. Neylon, J. Velterop, M. Dumontier, L. O. B. da Silva Santos, M. D. Wilkinson,
     Cloudy, increasingly fair; revisiting the fair data guiding principles for the european open
     science cloud, Information Services & Use (2017) 1–8.
[28] J. Neubert, Wikidata as a linking hub for knowledge organization systems? integrating an
     authority mapping into wikidata and learning lessons for KOS mappings, in: Proceedings
     of the 17th European Networked Knowledge Organization Systems Workshop co-located
     with the 21st International Conference on Theory and Practice of Digital Libraries 2017
     (TPDL 2017), Thessaloniki, Greece, September 21st, 2017., 2017, pp. 14–25. URL: http:
     //ceur-ws.org/Vol-1937/paper2.pdf.
[29] K. R. Amato, M.-C. Arrieta, M. B. Azad, M. T. Bailey, J. L. Broussard, C. E. Bruggeling,
     E. C. Claud, E. K. Costello, E. R. Davenport, B. E. Dutilh, H. A. Swain Ewald, P. Ewald,
     E. C. Hanlon, W. Julion, A. Keshavarzian, C. F. Maurice, G. E. Miller, G. A. Preidis, L. Se-
     gurel, B. Singer, S. Subramanian, L. Zhao, C. W. Kuzawa, The human gut microbiome
     and health inequities, Proceedings of the National Academy of Sciences 118 (2021).
     URL: https://www.pnas.org/content/118/25/e2017947118. doi:1 0 . 1 0 7 3 / p n a s . 2 0 1 7 9 4 7 1 1 8 .
     arXiv:https://www.pnas.org/content/118/25/e2017947118.full.pdf.
[30] E. Hyvönen, Publishing and using cultural heritage linked data on the semantic web,
     Synthesis Lectures on the Semantic Web: Theory and Technology 2 (2012) 1–159.
[31] M. L. Zeng, J. Qin, Metadata, American Library Association, 2016.
[32] L. Chan, N. Vasilevsky, A. Thessen, J. McMurry, M. Haendel, The landscape of
     nutri-informatics: a review of current resources and challenges for integrative nutri-
     tion research, Database 2021 (2021). URL: https://doi.org/10.1093/database/baab003.
     doi:1 0 . 1 0 9 3 / d a t a b a s e / b a a b 0 0 3 .              arXiv:https://academic.oup.com/database/article-
     p d f / d o i / 1 0 . 1 0 9 3 / d a t a b a s e / b a a b 0 0 3 / 3 6 1 1 0 5 0 2 / b a a b 0 0 3 . p d f , baab003.
[33] P. Puwastien, Issues in the development and use of food composition databases, Public
     health nutrition 5 (2002) 991–999.
[34] U. Charrondiere, B. Burlingame, Identifying food components: Infoods tagnames and
     other component identification systems, Journal of Food Composition and Analysis 20
     (2007) 713–716.
[35] A. Durazzo, L. D’Addezio, E. Camilli, R. Piccinelli, A. Turrini, L. Marletta, S. Marconi,
     M. Lucarini, S. Lisciani, P. Gabrielli, et al., From plant compounds to botanicals and back:
     A current snapshot, Molecules 23 (2018) 1844.

</pre>