A Proposal for a European Large Knowledge
            Repository in Advanced Food Composition Tables
                      for Assessing Dietary Intake

       Oscar Coltell1,2, Francisco Madueño1, Zoe Falomir1, and Dolores Corella2,3
    1 Department of Computing Languages and Systems, Universitat Jaume I, Castellón, Spain

          {oscar.coltell, francisco.madueno, zfalomir}@uji.es
              2 CIBER Physiopathology of Obesity and Nutrition (CIBEROBN),

                         Institute of Health Carlos III, Madrid, Spain
        3 Department of Preventive Medicine and Public Health, University of Valencia,

                                        Valencia, Spain
                                dolores.corella@uv.es


         Abstract. A proposal for designing and developing a European Repository of
         Knowledge on Advanced Food Composition Tables (FCTs), based on the exist-
         ing national FCTs, is proposed in this paper. The requirements of the system, the
         interoperability strategies, and the cooperation of each national FCT for main-
         taining and updating the repository are discussed.

         Keywords: Knowledge repositories, Food Composition Tables (FCTs), Joint
         Programming Initiative in A Healthy Diet.


1        Introduction

The study of the interaction between diet and the genome is crucial to prevent and treat
cardiovascular diseases, some cancers, type 2 diabetes, etc. The assessment of a per-
son's diet is a tedious task, and in practice, a portion of the intake information is evalu-
ated and then the habitual participants' intake is extrapolated. In order to obtain enough
statistical power to avoid measurement errors and changes in diet, it is necessary to
obtain repeated measures of dietary information from a large number of participants
over time. For extracting information regarding participants' diet, nutritionist use Food
Frequency Questionnaires (FFQ), 24 hour dietary recalls (24HDRs), dietary records or
dietary histories [1]. These surveys collect consumed foods or dishes, which can be
transformed into energy and nutrient intake using Food Composition Tables (FCTs).

When conducting large multicenter studies in which individuals from several countries
are involved, one limitation is the difficulty of data acquisition, harmonization and
standardization in the different populations. In 2008, one pioneer initiative on this re-
gard was carried out by the “European Food Information Resource AISBL” (EuroFIR


adfa, p. 1, 2011.
© Springer-Verlag Berlin Heidelberg 2011
AISBL)1, an International non-profit association (AISBL), whose aim was: “the devel-
opment, management, publication and exploitation of food composition data, and the
promotion of international cooperation and harmonization through improved data
quality, database searchability, standards development, dissemination and training for
all users and stakeholders”. The research objective approached here is a proposal of a
knowledge network repository, with four basic types of knowledge (food composition,
dish composition, dietary patterns and diet-disease effects) which can enhance the Eu-
roFIR project with new methods and techniques in the fields of large knowledge repos-
itories, data mining, and ontology engineering.

Last June 14 in The Hague, the Joint Programming Initiative2 (JPI) in “A Healthy Diet
for a Healthy Life” conference was held and the 2010-2020 roadmap for harmonizing
and structuring research efforts in the area of food, nutrition and health was presented.
The goal of the JPI conference was to define the Strategic Research Agenda for the
period 2011-2020 and beyond3, which main aims are to provide a holistic approach to:
(i) identify the key factors that affect diet-related diseases, (ii) discover new relevant
parameters and mechanisms and (iii) define strategies that contribute to the develop-
ment of actions, policies and innovative products suitable to reduce the burden of diet-
related diseases. The JPI Agenda developed the corresponding subroadmap for each
one of the three key interacting research areas that were identified and described in the
previous Vision Document4 of the JPI. The Research Areas (RA) are the following:
RA1-Determinants of diet and physical activity; RA2-Diet and food production; and
RA3-Diet-related chronic diseases.

Each research area roadmap in the Agenda presents two prime initiatives: for 2012-
2014 and 2015-2019. The prime initiative for RA1 (2012-2014) is “Establish a Euro-
pean transdisciplinary research network on determinants of dietary and physical ac-
tivity behaviors and the relation with health and best practice implementation strate-
gies for sustainable changes”. This initiative is a research challenge where the prepar-
atory work is the collection, integration and assessment of monitoring systems, data-
bases, determinants and outcome assessments. And one of the research needs to face
the challenge is to establish and maintain an integrated trans-disciplinary database, with
potential for secondary analysis by interested researchers with specific research hy-
pothesis, assuming the initial data are collected according to best practice in biological,
behavioral, socio-economic and environmental science traditions.


1   EuroFIR. http://www.eurofir.net/. (Last access in August 6, 2012).
2   JPI Conference: https://www.healthydietforhealthylife.eu/hdhlconference/ (Last access in
    August 6, 2012).
3   The JPI Strategic Research Agenda for the period 2011-2020 and beyond.
    https://www.healthydietforhealthylife.eu/index.php?index=25. (Last access in August 6,
    2012).
4   The JPI Vision Paper (September 2010) https://www.healthydietforhealthylife.eu/ in-
    dex.php?index=24. (Last access in August 6, 2012).
Technically speaking, the research challenge of creating a European FCT (EFCT) in-
volves a technological challenge in the field of large databases and large repositories.
The Scientific Advisory Board of the JPI, called DEDIPAC, claimed that the EFCT
should not be a “data” or “information” database, but a knowledge network repository
with contributions of at least 27 European countries. The specific challenge to face is
to organize the existing knowledge, their supporting infrastructures and their associated
management requirements of the databases containing national Food Composition Ta-
bles (FCT) and their integration in a large knowledge repository. Traditionally, FCTs
were tables where a portion of each single food was decomposed in energy, macronu-
trients and other components that are not nutrients. The standard size of the portion is
100 g, but some FTCs take the edible part of the food (i.e., discarding the peel in
oranges; in this case, 100 g of edible orange), and other FTCs take the whole food (i.e.,
the whole 100 g of orange, including the peel). Moreover, macronutrients are grouped
in families, as lipids, proteins, carbohydrates; and no nutrients are minerals, vitamins
and aminoacids. Usually, each FTC register contains around 50 components. However,
the number of components may vary in each FTC. Regarding national and private (ac-
ademic or enterprise) FCT creation, although they can be standardized and biochemi-
cally proved, they are usually different from country to country (or depending on the
academic organization or enterprise aims and resources).

With the evolution of the information and communication technologies, FCTs were
converted in databases and, later, Web services were added to allow on-line access to
them. But the drawbacks of the traditional FTC were inherited by the FCT databases
and emerged some specific problems as, for example, the lack of service due to site
saturation or network breakdowns, the restricted access only to active members (who
have paid the corresponding fee), the lack of programmed access (a set of procedures
to manage queries coming from applications), the native language, and so on. That is
the situation of the European FCT provided by the FAO5 or EuroFIR6.

The aim of this paper is to discuss a proposal for designing and developing a European
Repository of Knowledge on Advanced FCTs and related knowledge (food composi-
tion, dish composition, dietary patterns and diet-disease effects, and semantic connec-
tions between them) based on the existing national FCTs, their system interoperability
strategies, and the cooperation of each national FCT for maintaining and updating the
repository.

For achieving this aim, the following strategies are discussed in this paper: (i) a process
for retrieving data from the different national resources and populate the Repository
(Section 2); (ii) the viability of the current software resources and protocols that can be
used to integrate the different FTC databases (Section 3); and (iii) new methods and


5   FAO. Food Composition Tables–Europe. http://www.fao.org/infoods/tables_europe_en.stm.
    (Last access in August 6, 2012).
6   EuroFIR How to access FCDBs. http://www.eurofir.net/food_information/food_composi-
    tion_databases /how_access_fcdbs. (Last access in August 6, 2012).
techniques for generating and extracting knowledge form the Repository (Section 4).
Finally, some conclusions are provided.


2      Designing a Process for Retrieving Data and Populate the
       Repository

The process for retrieving data from the different national resources and populate the
Repository can be very complex because the national FTC databases has been devel-
oped according to each country objectives, culture, funding and interests. Thus, data
structures, nomenclatures, number of food components included or, even, formats and
units (English or International Metric systems: e.g. quantities in grams vs. quantities in
ounces) are not shared. Moreover, each database has different access protocols and
restrictions (i.e., public vs. private access, human interface vs. programed interface or
both, etc.) Therefore, before starting to discuss how we could apply the technical ap-
proach, previous political work should be done searching agreements for data sharing,
open access protocols and medical and nutritional interests. Despite the above men-
tioned complexity, the process outlines can be described in a workflow composed by
four steps:

STEP1: defining a Minimal Set of FCT data (MS-FCT). The MS-FCT is the common
data that holds every FCT database in the same or approached format (no need of
transformation or conversion). On the other hand, the Standard Set of FCT data (SS-
FCT) must be defined. The SS-FCT is the standardized data that every FCT database
should contain according strategic objectives of the knowledge repository (homogene-
ity, integration, interoperability).

STEP2: defining the knowledge levels in the repository. Initially, we have defined the
following levels (see Fig. 1):
1. Level 1: Food Composition. Basic knowledge about the composition of each food
   but with the following variations: national FCT source, determination methods for
   each component, local and regional variations of the food, and original language.
2. Level 2: Dish Composition. Knowledge about the composition of dishes in single
   food, the standard portions (in Metrical and English measures) and their correspond-
   ing images, the corresponding recipes (the same food mixture is different ac-cording
   the cooking process), and the local and regional variations in recipes and portions.
3. Level 3: Dietary patterns. Knowledge about discovered dietary patterns in nutri-
   tional studies using data mining strategies. From dietary patterns, it would be possi-
   ble to generate dietary models to apply in the kind of studies described in the JPI
   research areas prime initiatives.
4. Level 4: Diet-disease effects. Knowledge about associations and interactions be-
   tween diet and disease (via genetic and phenotypic factors), recommendations for
   specific populations (i.e., celiac), high risk food for specific diseases, lowering risk
   food for specific diseases, etc.
All together should run in cooperation with every national FCT database trust, provid-
ing full access to authorized sources, level of service and frequent updates to guarantee
the quality and accuracy of the provided knowledge in the repository.


Fig. 1. Repository environment and functional structure. From each EU partner database, or set
 of databases (FCT, Dietary monitoring systems, Standardized Determinants and Standardized
  Outcome Assessments), a MS-FCT is provided. After some homogenization and integration
 processes, a SS-FCT is generated to update the Repository. The Queries path shows how que-
ries between levels flow. Semantic relationships are defined only to immediate levels and show
                        how to extract knowledge from the repository.

STEP3: studying, designing, developing and applying current software resources and
protocols to integrate the different EU partners’ FTC databases and other data (Fig.1),
generating the corresponding sets of MS-FCT, for retrieving data from the different
national resources.

STEP4: populating and maintaining the Repository, mainly injecting standardized data
from the different national resources under the SS-FCT approach, but also using direct
built-in methods and interfaces. It should be noted that the information is generated on
national resources and not in the Repository.
3       The Viability of the Current Software Resources and
        Protocols

FCTs allow mapping foods or dishes with their corresponding energy and nutrients. In
Nutritional Epidemiology, this is crucial due to the proved relation that exists between
diet and some diseases [2], as for example, cardiovascular diseases [3-5], diabetes [6-
7], and obesity [8-10], whose study requires large amounts of data for a statistical anal-
ysis. Then, the development of the proposed Large Knowledge Repository is certainly
a colossal and challenging task evolving current technology and new technologies that
undoubtedly have an initial cost but may pay off in the long term.

Previous works by our group [11], developed some medium scale projects in the area
of medical informatics for automatizing nutritional questionnaires and calculating the
nutritional composition of meals using several FCTs which used an ontology for trans-
lating the components in different FCTs to a common name. That ontology, named
Nutriontology (NO), is running on an independent platform, which also contains all
FTCs physical databases, applying interoperability strategies to manage the database
access. Moreover, NO is part of a set of ontologies managed by an upper level ontology
named NutriGenOntology (NGO). Other independent generic Web platform, named
“Project”, manage the set of automatized nutritional questionnaires and the partici-
pant’s (and other data) database corresponding to one nutritional study. Thus, the com-
munication between NO and a project are performed by Web services. Really, Project
is a template which is instantiated in a particular platform as new nutritional studies are
started and, then, the platform adopts the study name or acronym (i.e., Fituveroles,
Obenutic, Obenomics, etc.) Therefore, we consider that this pilot system carried out by
our group, which combines ontologies and web services in the appropriate manner, can
be a start-up for achieving an integrated European FCT.

Besides, currently information repositories technology is rendered as insufficient for
accomplish the integration and interoperability levels expected in such repositories,
and the heterogeneity in the data is not efficiently managed. For example, the Semantic
MediaWiki7 do already consider the unit conversion problem at a very basic level. An-
other option, taking in account the very large scale of our proposal, is to define two
wide strategies in both levels (Fig.1): level 1 with integration and interoperability; level
2 with homogenization. To integrate the different FTC databases, one suitable solution
is combining semantic mappings for modelling FTC structures and semantic operations
for retrieving data from the different national resources, and then, generating the cor-
responding MS-FTCs. Homogenization in the second level, under the SS-FCT ap-
proach, could foster the enhancement and specialization of existing data mining meth-
ods and techniques. Other solutions may be considered since some intelligent systems
can cope with heterogeneity and interoperability in all levels. Then, it is too early for


7   Semantic    MediaWiki      repository.    http://semantic-mediawiki.org/wiki/Help:Cus-
    tom_units#Converting_between_proportional_units. (Last access in August 5, 2012).
comparing the cost of addressing heterogeneity and interoperability versus the cost of
homogenization in the proposed repository.


4      Developing new methods and techniques for generating and
       extracting knowledge form the Repository


It is necessary to define a standard language (i.e., XML-based language) for represent-
ing the Minimal Set of FCTs data and Standard Set of FCTs data, both including the
basic four types of knowledge the Repository has to manage: food composition, dish
composition, dietary patterns and diet-disease effects. But, the characteristics of these
types of knowledge and the challenges derived from them must be identified.

The food composition knowledge tell us what elements are in one standard portion
(100 g. of edible portion or net intake) of each food: macronutrients (proteins, fat and
carbohydrates), micronutrients (aminoacids, minerals and vitamins), other components
(water, alcohol, caffeine, etc.), and the corresponding total energy of the whole portion.
In the biochemical analysis made for composing the FCT, each sample is taken from
raw food, wherever possible with minor exceptions, to avoid nutrient alterations in
cooking processes. Therefore, the primary source of the information is the food com-
position biochemical analysis performed by each national food authority. This kind of
analysis is make once unless a new and better biochemical technique appears in market.
The secondary source of information is the own FCT. It could be subject to change due
to adding new food entries (the most usual) or reviewing the existing ones (very rarely).
Moreover, there are some standards about FCT structure and organization. The derived
challenge is, firstly, to homogenize FCT entries in a common set of components, no-
menclatures and formats/units under the MS-FCT approach but keeping national dif-
ferences; and secondly, to integrate and combine all national FCT entries in a maximal
concept as it is the SS-FCT. The last one would cover lacks of data for each individual
food in a FCT combining data from the rest of FCTs.

The dish composition knowledge describes the three main aspects of each dish: what
food contains and in which quantity/proportion contributes each individual food, what
cooking process has been applied, and what is the size of the portion. The proportion
of each individual food determines the calculations of edible portions for obtaining the
food composition from the FCT. The list of each individual food is not static due to
national, regional, local and, of course, home variations, but keeping the main compo-
nents (i.e., apple pie will not be more apple pie when apple is replaced by peach). Each
kind of cooking process alters the properties of the food (i.e., vitamin or fiber degrada-
tion, fat substitution, etc.). Then, FCTs cannot be applied directly, but with cooking
revisions. The size of the portion is the description of how big is and what quantity of
food contains a dish. Here, a specific problem arises from the term “dish”, because we
can have solid, liquid and semi-liquid food. Then, when we are describing a portion of
solid food, we are using the traditional meaning of physical dish (or similar) and
measures in grams or ounces/pounds. However, when we are describing a portion of
liquid and semi-liquid food, we have to use different container as glass or cup, and
measures in milliliters or liquid ounces/pints. Usually, portions are categorized as
small, medium and big, where each category has assigned one quantity in weight or
volume, but the quantity depends of the nature of food itself. Moreover, there are not
any standard (or the facto standard) about dish structure and portions, but the cooking
alterations are well studied and weighted. Therefore, the primary source of the infor-
mation is composed by, in one hand, published tables of cooked food proprieties; and,
on the other hand, published collections of recipes in books, journals, Web, etc. The
derived challenge in this case is to define a Minimal Common Recipe Catalog (MCRC)
which can be used in the scientific environment for assessing dish composition in the
Repository. The MCRC should include the “official” composition of each dish plus
cooking variants, standardized portions and units according the food state (solid, liquid,
semi-liquid).

The dietary patterns knowledge show us common profiles of food intake in persons to
whom dietary assessment questionnaires were administered. Dietary patterns usually
are inferred from the participants in nutritional studies and, later, can be reviewed and
organized to have well-established patterns. Therefore, the primary source of the infor-
mation is the set of discovered dietary patterns, and the second source is the collection
of scientific publications describing other patterns. The derived challenge in this case
is to achieve a standard catalog of well-established patterns for making comparisons in
each nutritional study.

The diet-disease effects knowledge show us the associations and interactions between
diet and diseases, when diet may act as risk or protector factor over individuals with
(genetic) susceptibility to particular disease. Really, associations and interactions are
not analyzed taking in account a particular meal or food, but specific dietary patterns.
So, dietary patterns and disease are strongly related. Therefore, the main source of the
information is the set of statistically significant diet-disease associations and interac-
tions discovered in the nutritional studies and published in journals. The derived chal-
lenge in this case is having the maximum and accurate knowledge as possible about
diet-disease associations and interactions.


5      Conclusions

A framework for designing and developing a European repository of Knowledge for
Food Composition Tables is proposed with in this paper and the scenarios and the steps
for constructing this repository are also described. The main outline is to construct the
knowledge base in a scalable way, moving from standardized knowledge towards pop-
ulation-dependent knowledge. The main challenge is to integrate repositories belong-
ing to different national states (many issues due to the use of different data structures,
different nomenclatures, and different formats and units). Moreover, FCTs are ex-
tended with three additional types of knowledge, dish composition, diet patterns and
diet-disease effects, coming from other biomedical/biological data sources, for mining
associations and interactions between diseases and food by means of dietary patterns.

A pilot approach was carried out by our group, which developed some medium scale
projects in the area of medical informatics for automatizing nutritional questionnaires
and calculating the nutritional composition of meals using several FCTs which used an
ontology for translating the components in the different FCTs to a common name.
Based on the success of this approach, we propose a solution to the integration of all
European FCTs based on ontologies and web services, and asynchronous web technol-
ogies for assuring the minimal response time in knowledge queries, and for providing
modular services, and the maximal underlying data organization.

Acknowledgements. This work has been partially funded by grants GEWIMICS
(SAF2009-12304, MICINN), AGL2010-22319-C03 (MICINN), BEST/2011/261
(GVA), ACOMP/2011/145 (GVA), and CIBER “Physiopathology of Obesity and Nu-
trition” (ISCIII-FIS). CIBERobn is an initiative of the ISCIII.


References
 1. Falomir Z., Arregui M., Madueño F., Coltell C., Corella D.: Automation of Food Question-
    naires in Medical Studies: a state-of-the-art review and future prospects. Comp. Biol. Med.
    (in press, accepted on 25/07/2012 with DOI 10.1016/j.compbiomed.2012.07.008) (2012)
 2. Feart C., Alles B., Merle B., Samieri C., Barberger-Gateau P.: Adherence to a Mediterra-
    nean diet and energy, macro-, and micronutrient intakes in older persons. J. Physiol. Bio-
    chem. (Epub ahead of print. PubMed PMID: 22760695) (2012)
 3. Ganguly R., Pierce G.N.: Trans fat involvement in cardiovascular disease. Mol. Nutr. Food.
    Res. 56(7), 1090-1096 (2012)
 4. de Oliveira Otto M.C., Mozaffarian D., Kromhout D., Bertoni A.G., Sibley C.T., Jacobs
    D.R. Jr, Nettleton J.A.: Dietary intake of saturated fat by food source and incident cardio-
    vascular disease: the Multi-Ethnic Study of Atherosclerosis. Am. J. Clin. Nutr. 96(2), 397-
    404 (2012)
 5. Hansen-Krone I.J., Enga K.F., Njølstad I., Hansen J.B., Braekkan S.K.: Heart healthy diet
    and risk of myocardial infarction and venous thromboembolism. The Tromsø Study.
    Thromb Haemost. 108(3). (Epub ahead of print. PubMed PMID: 22739999) (2012)
 6. Rivellese A.A., Giacco R., Costabile G.: Dietary Carbohydrates for Diabetics. Curr. Ather-
    oscler. Rep. (Epub ahead of print. PubMed PMID: 22847773) (2012)
 7. Guldbrand H., Dizdar B., Bunjaku B., Lindström T., Bachrach-Lindström M., Fredrikson
    M., Ostgren C.J., Nystrom F.H.: In type 2 diabetes, randomisation to advice to follow a low-
    carbohydrate diet transiently improves glycaemic control compared with advice to follow a
    low-fat diet producing a similar weight loss. Diabetologia. 55(8), 2118-2127 (2012)
 8. Corella D., Arnett D.K., Tucker K.L., Kabagambe E.K., Tsai M., Parnell L.D., Lai C.Q.,
    Lee Y.C., Warodomwichit D., Hopkins P.N., Ordovas J.M.: A high intake of saturated fatty
    acids strengthens the association between the fat mass and obesity-associated gene and BMI.
    J. Nutr. 141(12), 2219-2225 (2011)
 9. Bulló M., Garcia-Aloy M., Martínez-González M.A., Corella D., Fernández-Ballart J.D.,
    Fiol M., Gómez-Gracia E., Estruch R., Ortega-Calvo M., Francisco S., Flores-Mateo G.,
    Serra-Majem L., Pintó X., Covas M.I., Ros E., Lamuela-Raventós R., Salas-Salvadó J.: As-
    sociation between a healthy lifestyle and general obesity and abdominal obesity in an elderly
    population at high cardiovascular risk. Prev. Med. 53(3), 155-161 (2011)
10. Foster G.D., Shantz K.L., Vander Veur S.S., Oliver T.L., Lent M.R., Virus A., Szapary
    P.O., Rader D.J., Zemel B.S., Gilden-Tsai A.: A randomized trial of the effects of an al-
    mond-enriched, hypocaloric diet in the treatment of obesity. Am. J. Clin. Nutr. 96(2), 249-
    54 (2012)
11. Fabregat A., Arregui M., Barrera E., Portolés O., Corella D., Coltell O.: NutriGeneOntol-
    ogy: A Biomedical Ontology for Nutrigenomics. In: Proceedings of the 2008 International
    Conference on Biomedical Engineering and Informatics; 2008, vol. 1, pp. 915-919. IEEE
    Computer Society, New York (2008)