<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantics-Driven Ingredient Substitution in the FoodKG ?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>S. Shir</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minor E. Gor</string-name>
          <email>gordom6g@rpi.edu</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ching-Hu</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>h L. M</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guinn</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IBM Research</institution>
          ,
          <addr-line>Yorktown Heights NY 10598</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Rensselaer Polytechnic Institute</institution>
          ,
          <addr-line>Troy NY 12180</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>People who would like to improve their eating patterns can make small changes in their diet by substituting ingredients in recipes. \Good" substitutions may have multiple dimensions and de nitions, and one de nition is to maintain the recipe while adhering to constraints that may include personal preferences, allergies, and nutritional or other dietary considerations. Our proposed system automatically nds and ranks plausible substitution options using a knowledge graph of food information. We evaluate our substitute ranking heuristic using a novel data set of ground-truth substitutions, showing promising preliminary results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Eating habits can play an important role in improving personal health. For
example, patients diagnosed with diabetes typically receive recommendations to
monitor their intake of speci c nutrients from foods (like carbohydrates and
protein) to treat their condition. If patients nd that some nutrients that they are
monitoring need adjustments, they may choose to modify their diet by
substituting ingredients in some of their recipes. Such substitutions can remove restricted
types of ingredients (e.g., common allergens) or replace ingredients that most
negatively a ect patient health (e.g., replacing potatoes to reduce carbohydrate
intake). Substituting individual ingredients rather than strictly following a new
meal plan allows patients to eat familiar meals while maintaining their dietary
goals.</p>
      <p>
        Our work aims to empower people to improve their eating habits by
simplifying the process of identifying ingredient substitutions that satisfy their dietary
constraints. We facilitate this using our FoodKG [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], a knowledge graph of recipe
and ingredient information. We use linked information about ingredient classi
cation and nutrition to lter valid options for ingredient substitutions. We also
? Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
automatically rank plausible ingredient substitutions using several heuristics for
substitutability. Our approach di ers from previous works in that others have
tended to identify substitutions only using either similarity metrics [
        <xref ref-type="bibr" rid="ref1 ref11 ref2">1, 2, 11</xref>
        ] or
explicit rules based on types of foods [
        <xref ref-type="bibr" rid="ref12 ref4 ref5 ref6">6, 5, 12, 4</xref>
        ]. In this paper, we present
methods and preliminary results for our contributions of a novel heuristics-based
approach to rank ingredient substitutions and an evaluation using a
\groundtruth" data set of substitutable ingredients.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Ranking Plausible Ingredient Substitutions</title>
      <p>Intuitively, our approach to ingredient substitution ranking should consider
similarities in the properties of ingredients as well as the similarity of the recipes in
which they are used. We also must consider how to determine whether ingredients
that are \similar" are also good substitutes. Some combinations of ingredients
(e.g., garlic and olive oil) may be similar in the sense that they co-occur
frequently in recipes, but they may not be substitutable. On the other hand, some
ingredients that often are good substitutes (e.g., potatoes and cauli ower) may
be dissimilar in terms of some properties like their food classi cation.</p>
      <p>
        In order to create a ranked set of substitution options, we seek to develop a
heuristic combining several scores based on latent and explicit semantic
information about ingredients. We use two sources of latent semantics in the form of
word embeddings based on ingredient names. The rst is a Word2Vec [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] model
from Recipe1M [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], trained over ingredient names and recipe instructions. The
second is a pre-trained word embedding model from Spacy [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Cosine similarity
is used to compare ingredient names with the expectation that good ingredient
substitutions would have similar embeddings.
      </p>
      <p>
        For explicit semantic information, we compute two substitutability scores
based on the intuitions that (1) good substitutions should pair well with
similar ingredients, and (2) good substitutions should be used in similar recipes.
Additionally, we use the linked ontology of food from FoodOn [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to further
generalize the ingredients and recipe. For example, rather than just analyzing
whether ingredients occur together with \whole milk," we can further
generalize the ingredient as \milk" or as a \dairy food product." We believe the rich
semantic information from FoodOn will allow our approach to more e ectively
identify substitutions for a wider variety of ingredients than previous works.
Combined Heuristic and Filtering We use a heuristic combining the
aforementioned scoring metrics to rank the substitutability of ingredients. We also
experiment with employing a ltering strategy for substitution options based
on the assumption that super- or sub-classes of the target ingredient are not
the most useful \substitutes." For example, this ltering strategy would remove
the option of \yellow onions" as a substitute for \onions" in a recipe. Linked
information about ingredient nutrition can also be used to lter out substitution
options matching certain nutritional criteria (e.g., potato substitutes with lower
carbs). Figure 1 shows a high-level overview of the substitution ranking process.
      </p>
      <p>Ground-Truth Data Collection Because there is no widely accepted
goldstandard data set for ingredient substitutions, we have taken the approach of
selecting an online source to serve as a \ground-truth" for a wide variety of
ingredient substitutions. We collected substitution data from The Cook's Thesaurus3,
yielding a set of 2,300 substitutions pairs (i.e., pairs of \target ingredient" to
\substitute ingredient"). 1,161 unique ingredients were present in the data set,
providing substitutions for 928 target ingredients.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>We evaluate our results by producing a list of ranked substitution options for each
ingredient and comparing it against our ground-truth data. Since our method
produces substitution rankings for each target ingredient, we frame our approach
as an information retrieval problem and compute mean average precision (MAP)
and mean reciprocal rank (MRR) as our evaluation metrics. MAP allows us to
assess our algorithm's ability to rank all substitutions because the number of
correct substitutes varied for each ingredient (e.g., in our evaluation data set,
\potato" had 6 substitutes while \white asparagus" only had 1). MRR then gives
us insight into the highest-ranked correct substitution for each ingredient. With
our ltering strategy applied, our combined heuristic achieves a MAP of 0.418
and MRR of 0.499 over our ground-truth data. Comparing it against a simple
baseline using word embedding similarity, our heuristic shows some promising
improvement over the Word2Vec model (MAP 0.297 and MRR 0.385) and SpaCy
(MAP 0.274 and MRR 0.371).
3 foodsubs.com</p>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>Although our approach was able to show better performance than simple
baselines, there are many limitations in the methods and evaluation. Our ltering
strategy was not always able to successfully lter out undesirable options
because of incorrect links to FoodOn classes as well as limitations of the
information captured in FoodOn. Sometimes the valid substitutions from the algorithm
that were ltered out were not in line with our ground-truth data. For example,
garlic powder is classi ed as a subclass of garlic in FoodOn, which led our
approach to incorrectly lter it out of garlic's substitution options. This re ects our
reuse of an ontology that perhaps had di erent intentions for how its subclass
relationships may be used from our reuse expectations.</p>
      <p>Although relying on a single website as our ground-truth allowed our
evaluation to avoid subjectivity issues that may arise from user studies, it is also
limited in its scope. While the source of our evaluation data provides a wide
variety of substitution options, it does not cover all possible substitutions that
people may use (e.g., many websites cite zucchini as a valid substitute for
potatoes, but it was not in our ground-truth data). Finally, our current evaluation
method does not assess other factors that may be involved in judging the
goodness of substitutions (e.g., the tness of a substitution for a particular type of
diet, or the tness of a substitute in vastly di erent types of recipes).
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>Using FoodKG and its linked information about ingredient classi cation, our
approach for identifying ingredient substitutions shows promise when compared
against simple baselines. Our current best heuristic achieves a MAP of 0.418 and
MRR of 0.499 in our collected ground-truth substitution data set. Work remains
to improve both our strategies for ingredient substitute ranking as well as its
evaluation. Moving forward, we are also expanding our approach to include more
explicit semantics about the \healthiness" of substitutions in combination with
more sources to develop a more comprehensive evaluation.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements References</title>
      <p>This work is partially supported by IBM Research AI through the AI Horizons
Network.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Achananuparp</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weber</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Extracting food substitutes from food diary via distributional similarity</article-title>
          .
          <source>CoRR</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Akkoyunlu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manfredotti</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cornuejols</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darcel</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delaere</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Investigating substitutability of food items in consumption data p. 5 (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dooley</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gri ths</surname>
          </string-name>
          , E.J.,
          <string-name>
            <surname>Gosal</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buttigieg</surname>
            ,
            <given-names>P.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoehndorf</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lange</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schriml</surname>
            ,
            <given-names>L.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brinkman</surname>
            ,
            <given-names>F.S.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsiao</surname>
          </string-name>
          , W.W.L.:
          <article-title>FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration</article-title>
          .
          <source>npj Science of Food</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <volume>1</volume>
          {
          <fpage>10</fpage>
          (Dec
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gaillard</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Infante-Blanco</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lieber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nauer</surname>
          </string-name>
          , E.:
          <article-title>Tuuurbine: A Generic CBR Engine over RDFS</article-title>
          .
          <source>In: Case-Based Reasoning Research and Development</source>
          , vol.
          <volume>8765</volume>
          , pp.
          <volume>140</volume>
          {
          <fpage>154</fpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gaillard</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lieber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nauer</surname>
          </string-name>
          , E.:
          <article-title>Improving ingredient substitution using formal concept analysis and adaptation of ingredient quantities with mixed linear optimization</article-title>
          . In: Computer Cooking Contest Workshop, Frankfurt, Germany,
          <source>September 28-30</source>
          ,
          <year>2015</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>1520</volume>
          , pp.
          <volume>209</volume>
          {
          <fpage>220</fpage>
          . CEURWS.org (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gaillard</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lieber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nauer</surname>
          </string-name>
          , E.:
          <article-title>Adaptation of TAAABLE to the CCC'2017 Mixology and Salad Challenges, Adaptation of the Cocktail Names</article-title>
          .
          <source>In: International Conference on Case-Based Reasoning (ICCBR) Computer Cooking Contest Workshop</source>
          . pp.
          <volume>253</volume>
          {
          <fpage>268</fpage>
          .
          <string-name>
            <surname>Trondheim</surname>
          </string-name>
          ,
          <string-name>
            <surname>Norway</surname>
          </string-name>
          (Jun
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Haussmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seneviratne</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ne</surname>
            'eman,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Codella</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mcguinness</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaki</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>FoodKG: A Semantics-Driven Knowledge Graph for Food Recommendation</article-title>
          , pp.
          <volume>146</volume>
          {
          <issue>162</issue>
          (10
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Honnibal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montani</surname>
          </string-name>
          , I.:
          <article-title>spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Marin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biswas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            O i,
            <given-names>F.</given-names>
            ,
            <surname>Hynes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Salvador</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Aytar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Torralba</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Recipe1m+: A dataset for learning cross-modal embeddings for cooking recipes and food images</article-title>
          .
          <source>IEEE Trans. Pattern Anal. Mach</source>
          . Intell. (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>E cient estimation of word representations in vector space</article-title>
          . In: Bengio,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>LeCun</surname>
          </string-name>
          , Y. (eds.) 1st
          <source>International Conference on Learning Representations, ICLR</source>
          <year>2013</year>
          , Scottsdale, Arizona, USA, May 2-
          <issue>4</issue>
          ,
          <year>2013</year>
          , Workshop Track Proceedings (
          <year>2013</year>
          ), http://arxiv.org/abs/1301.3781
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Food recipe alternation and generation with natural language processing techniques</article-title>
          .
          <source>In: 2020 IEEE 36th International Conference on Data Engineering Workshops (ICDEW)</source>
          . pp.
          <volume>94</volume>
          {
          <issue>97</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Skjold</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , ynes, M.,
          <string-name>
            <surname>Bach</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aamodt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Intellimeal - enhancing creativity by reusing domain knowledge in the adaptation process</article-title>
          .
          <source>In: Proceedings of ICCBR</source>
          <year>2017</year>
          <article-title>Workshops (CAW, CBRDL</article-title>
          , PO-CBR), Trondheim, Norway, June 26-28,
          <year>2017</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <year>2028</year>
          , pp.
          <volume>277</volume>
          {
          <fpage>284</fpage>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>