Appetitoso: A Search Engine for Restaurant Retrieval based on Dishes
     Gianni Barlacchi1,2 , Azad Abad1 , Emanuele Rossinelli3 , Alessandro Moschitti1,4
     1
       Department of Information Engineering and Computer Science, University of Trento
                   2
                     TIM Semantics and Knowledge Innovation Lab, Trento
                                     3
                                       Kloevolution S.r.l.
                         4
                           Qatar Computing Research Institute, HBKU
         {gianni.barlacchi,e.rossinelli,amoschitti}@gmail.com
                                  azad.abad@unitn.it

                       Abstract                       1       Introduction
                                                         In late 2000’s, we assisted to the explosion of
    English. Recent years have seen an im-
                                                      TripAdvisor2 , the world’s largest travel site, which
    pressive development and diffusion of web
                                                      offers advice about hotel and restaurants. In few
    applications to food domains, e.g., Yelp,
                                                      years, it has revolutionized the restaurant indus-
    TripAdvisors. These mainly exploit text
                                                      try, allowing its users to search restaurants by lo-
    for searching and retrieving food facili-
                                                      cation, broad food categories (e.g., Mexican, Ital-
    ties, e.g., restaurants, caffé, pizzerias. The
                                                      ian, French), reviews and ratings provided by other
    main features of such applications are: the
                                                      users.
    location and quality of the facilities, where
                                                         However, the user expectation has evolved over-
    quality is extrapolated by the users’ re-
                                                      time: looking for restaurants is not enough any-
    views. More recent options also enable
                                                      more, people are now considering finer-grained
    search based on restaurant categorization,
                                                      properties of food, e.g., a particular way to cook
    e.g., Japanese, Italian, Mexican. In this
                                                      a dish along with its specific ingredients. Thus,
    work, we introduce Appetitoso1 , an in-
                                                      there is a clear gap between what the market pro-
    novative approach for finding restaurants
                                                      poses and the emerging trends.
    based on the dishes a user would like to
    taste rather than using the name of food             In this work, we present Appetitoso, a search
    facilities or their general categories.           engine that seeks for restaurants based on dishes.
                                                      This approach is designed to help users to find
    Italiano. Recentemente si è assistito ad         their restaurants having already a specific dish
    un impressionante sviluppo e diffusione di        preference in mind, using fine-grained properties
    applicazioni web per il dominio del cibo,         of the dish.
    e.g., Yelp, TripAdvisors. Queste sfruttano           Appetitoso integrates state-of-the-art search en-
    principalmente il testo per la ricerca e il       gines, such as BM25, with a domain specific
    recupero di punti di ristoro, e.g., ristoranti,   knowledge base describing properties and similar-
    bar, pizzerie. Le caratteristiche principali      ity relations between different Italian dishes. This
    usate dalle applicazioni sono: la posizione       knowledge is very useful, e.g., in our experiments,
    e la qualitá delle strutture che servono il      we show that it greatly boosts dish retrieval.
    cibo, dove la qualitá é estrapolata dalle          Appetitoso is available as a mobile phone ap-
    recensioni degli utenti. Opzioni piú re-         plication (e.g., Android and iOS) and website, re-
    centi consentono anche la ricerca in base         leased in 2014 for two languages, English and Ital-
    alla categoria del ristorante, e.g., Giap-        ian. It is an end-to-end application for finding
    ponese, Italiano, Messicano. Questo arti-         restaurants offering the desired dish. We evalu-
    colo introduce Appetitoso, un nuovo modo          ated it using a set of 547 popular queries typed by
    di trovare punti di ristoro sulla base dei        its users in the cities of Rome, Milan and Florence.
    piatti che il cliente vuole gustare invece           In the reminder of this paper, in Section 2, we
    che sul nome del ristorante o su categories       report related work on systems for automatic food
    generali.                                         recommendation, In Section 3, we introduce Ap-
                                                      petitoso, its knowledge base and the food search
1                                                         2
    http://www.appetitoso.it                                  http://www.tripadvisor.com
engine. Section 4, we describe our experiments on
restaurant retrieval on Italian language and finally,                    Web

in Section 5, we provide our conclusion.                                                              Location

                                                                                                         Query
2       Related Work                                          Dishes
                                                             Databases
                                                                                     Food
                                                                                    Guides

   Nowdays, the importance of data analysis is
becoming fundamental in many fields. From
                                                               NLP pipeline to gather and
telecommunications to social media, the huge                         analyze data                 Search Dishes


amount of available data allows scientists and re-                                                                           Present Search
                                                                                                                                Results
searchers to address previously unsolved problems
(Barlacchi et al., 2015). The food domain repre-
sents one of the field in which emerging big data                    Index Dishes
                                                                                             Ingredients         Dish Name

techniques demonstrated to be very promising and                                              Similars             Tags

able to impact the every daily life of people. In
recipe recommendation, for instance, Teng et al.                Figure 1: Architecture of Appetitoso.
(2012) proposed an approach based on networks
of ingredients, which has been built from a dataset     (t-bone steak), the system retrieves places that sat-
of recipes. In order to capture both ingredient re-     isfy the constraint on the location and, at the same
lations and users’ knowledge for combining ingre-       time, prepare the desired dish or similar dishes.
dients in new recipes, they created two separate           Appetitoso retrieves restaurants from a
networks used for recipe recommendation.                semistructured database, Food Taste Knowledge
   Moreover, Ahn et al. (2011) explored the impact      Base (FKB), which contains text descriptions
of flavor compounds on ingredient combinations          of dishes and restaurants: we in part manually
through a network-based approach. An interest-          inserted them or gathered them from various
ing application was developed by IBM with Chef          sources such as foodblogs, restaurants reviews
Watson3 , which is part of the cognitive computing      and food guides. The search processes is divided
applications developed by the company. The sys-         in two phases: first, the user has to type the query
tem models the chemical compounds of different          and a location, e.g., the address of a target place or
ingredients together with textual information ex-       the current user position captured by GPS. These
tracted from thousands recipes for suggesting new       are both sent to the Appetitoso’s search engine,
ones using innovative ingredient combinations.          which retrieves a list of related dishes from FKB.
   Among the different kinds of data, text surely       The results are grouped by dish name and shown
represents one of the richest sources of informa-       to the user in different course categories, i.e., an-
tion from which we can extract a wide range of          tipasto/entree, primo/first course, secondo/second
statements about food. The use of text in food do-      course, dessert. The input location is used to
main has been widely explored showing promis-           restrict the search area of interest, relying on the
ing results with different models, ranging from         restaurant position available in FKB.
the measurement of sentiment in food reviews               The second phase of the searching process is de-
(Kang et al., 2012) and relation extraction (Wie-       voted to select the best restaurant. Once the user
gand and Klakow, 2013; Wiegand et al., 2012),           chooses a dish from the list above, Appetitoso pro-
to the prediction of attribute reviews in recipes       vides a list of restaurants that offer such food spe-
(Druck, 2013).                                          ciality. Indeed, all the restaurants offering that
                                                        dish are stored in FBK. Additionally, Appetitoso
3       Appetitoso
                                                        provides a DishScore4 for each restaurant, which
   We introduce the idea of searching a dish and        is a measure of the goodness of the dish in that
then finding the best restaurants that can offer it.    restaurant. Fig. 1 shows the high-level architecture
Thus, the aim of our search engine, Appetitoso, is
                                                            4
to find the best restaurants offering dishes relevant         We only inserted restaurant that have a good reputation
                                                        in FBK. In order to generate the DishScore, we trained a lo-
to the user’s request. Starting from a query with       gistic regression over 5 different review scores, e.g., 1 star, 2
food-related content, e.g., bistecca alla fiorentina    star etc. We used various features, e.g., Tripadvisor and food
                                                        guide scores. This description is however beyond the purpose
    3
        https://www.ibmchefwatson.com                   of the current paper.
 of the system. In the next section, we illustrate our
 FKB, which enables accurate retrieval of similar                    Spaghetti alla          Carbonara di
 dishes.                                                             trabaccolara               mare

 3.1   The Food Taste Knowledge Base (FKB)
    A quick analysis of Italian menus clearly show                Paccheri con               Spaghetti allo
 that, in many cases, the name of a dish is not                   pesce spada                  scoglio
 enough to understand its content, which means
 that names do not support an accurate similarity
 measures between dishes. Thus, we created FKB,
                                                                           Linguine
 which also organizes dishes in a hierarchical struc-                      all’astice
 ture, where each node is connected to others in
 case there is a similarity between them.
                                                           Figure 2: Connection between similar fish dishes.
    For instance, Bucatini alla amatriciana (buca-
 tini with amatriciana sauce) can be extended from        - Restaurant: information about the restaurant that
 Spaghetti alla amatriciana (spaghetti with amatri-         cook this dish (e.g. restaurant name and restau-
 ciana sauce) since the only difference between the         rant ID).
 two dishes is the type of pasta (spaghetti vs. bu-       - DishScore: a value that indicates the goodness of
 catini). In this case, we marked the first dish as a       the dish. It is calculated taking into account many
 template for the second one. The relation is one-          factors such as the reputation of the restaurant in
 to-many: one dish can be a template for many oth-          cooking that dish, the number of mentions in food
 ers but it can be only assigned to one template.           guide and the sentiment extracted from foodblog-
 Since every restaurant can have its own way to pre-        ger articles and restaurant reviews.
 pare the dish, multiple instances of the same dish
 can be present in the FKB. We differentiate them             This hierarchical organization is very powerful
 by adding the restaurant ID.                              and allows us to easily keep track of similarities
    Since there is no defined way to assess the sim-       that are not explicit. Fig. 2 shows an example of
 ilarity between two dishes: they may be similar           connections between similar dishes. It is worth
 as they are made by similar ingredients or because        to mention that Appetitoso aims to suggest only
 they are cooked in the same way, we built the FKB         restaurants that own a good reputation in cook-
 hierarchy with a semi-automatic approach. We              ing target dishes, i.e., restaurants in Rome that are
 used name similarity to select similar candidates,        famous for pasta alla carbonara. Consequently,
 which are then manually annotated by food ex-             this limits the number of dishes contained in the
 perts. We manually populated FKB with data col-           FKB and thus on the territory coverage. On the
 lected from the web, food guides and foodblogs.           other hand, it makes it possible to create a manu-
 Every dish belonging to a restaurant is represented       ally checked resource.
 by means of the following information:
                                                           3.2   Dish Retrieval
- ID: unique identifier for the dish.
                                                              Italy has long and variegated traditions on
- Name of the dish: the name of the dish as re-            preparing food: it is possible to find different kinds
  ported in the restaurant menu.                           of cuisine even in nearby cities. This makes the
- Ingredients: list of the principal ingredients.          Italian food incredibly varied and fascinating, but,
  When the ingredients are not provided by the             at the same time, difficult to interpret from a lin-
  restaurant, we use a list of common ingredients          guistic viewpoint. The same dish can be called
  for the dish (e.g., ingredients from online recipes).    in many different ways. In Florence people call
- Tags: list of tags useful to characterize the dish.      Carabaccia the common dish Zuppa di cipolle
  The tag list does not include ingredients but only       with the consequence that the underlying retrieval
  categorical information that can help to character-      problem cannot be addressed by just using a sim-
  ize the dish (e.g., meat or fish).                       ple word matching approach. Indeed, even if a
- Similar dishes: list of similar dishes defined ac-       dish is conceptually the same of another, different
  cording to our hierarchy described above.                restaurants (e.g., in different locations) have their
- Template: ID of the template dish, if it is present.     own way to call it.
      To tackle the problem above, we verified the hy-                          Model            City      MRR     MAP     P@1
                                                                                                 Baselines
   pothesis that a search engine can achieve a better
                                                                                                Milan      53.28   53.28   53,28
   result if we consider further information such as                        String Matching     Rome       71.23   71.23   71.23
   ingredients and tags. This approach significantly                       (on entire names)   Florence 44.87      44.87   44.87
                                                                                                  All      56.46   56.46   56.46
   improves the accuracy of the retrieved list com-                                             Milan      69.75   65.44   68.18
   paring to the simple word matching approach.                                 BM25            Rome       63.86   60.32   58.90
      More specifically, we applied BM25 (Robertson                        (on names only)     Florence 42.31      40.94   37.18
                                                                                                  All      58.64   55.56   54.75
   et al., 1995) to FKB. Given a dish query, Q and a
                                                                                                Our Model
   representation of a candidate dish, D, BM25 ranks                                            Milan      95.35   85.69   93.43
                                                                             Appetitoso
   the latter according to the following score:                         (names, ingredients
                                                                                                Rome       87.40   76.23   84.93
                                                                                               Florence 83.55      75.38   78.21
                                                                        tags, similar names)
               n                                                                                  All      88.76   79.10   85.52
               X          IDF (qi ) · ((k + 1) · T F (qi , D))
s(Q, D) =            
                                           |D|
                                                                  ,
               i=0       k · (1 − b + b · avgD ) + T F (qi , D)        Table 1: Ranking evaluation for different models

   where k and b are two free parameters that modify                   over the average precision scores for each query:
   respectively the impact of term frequency (TF) and                   1 PQ
                                                                       Q     q=1 AveP (q).
                                           |D|
   the document length through the term avgD    , |D| is                  Due to the fact that FKB contains multiple in-
   the document length and avgD, i.e., the average of                  stances of the same dish, we evaluated the col-
   D over the whole dataset. Finally, IDF (qi ) is the                 lapsed list of results by considering the dish name.
   Inverse Document Frequency for the query term                       It is worth to mention that the names of the dishes
   qi , computed as:                                                   are not standard, thus some dishes are the same
                      (N − DF (qi ) + 0.5)                             still having slightly different names. To make
                                           
           log 1 +                            ,                        them more similar, we normalized name forms by
                        (DF (qi ) + 0.5)
                                                                       removing space, articles and punctuation. We con-
   where N is the total number of documents in the
                                                                       sidered a set of 547 popular queries typed by users
   collection, and DF (qi ) is the document frequency
                                                                       in Milan (396 queries), Rome (73 queries) and
   of the term qi .
                                                                       Florence (78 queries). The number of retrieved
      Additionally, we created four different indexes5                 dishes varies for the different queries with aver-
   with the information contained in FKB, i.e., the                    ages of 22.8, 22.3 and 37.4, for Florence, Milan
   (i) dish name, (ii) ingredients, (iii) tags and (iv)                and Rome, respectively. For each retrieved dish,
   similar dishes. Each list is built using the words                  we manually annotated the relevance respect to the
   describing the four items above. Thus, when we                      input query. It should be noted that the same dish
   query a dish, we first retrive four different sets of               is associated (in FKB) with all of the restaurants
   results and then, since they have different impor-                  that are offering it. Thus, restaurant retrieval is a
   tance, we combine them together assigning differ-                   side effect of dish retrieval.
   ent weights, where the latter are set using cross-                     We considered two baselines for evaluating our
   fold validation.                                                    model, namely, String Matching and BM25. The
   4       Experiments                                                 first is based on simple string matching between
                                                                       the query and the dish names. The second is
      Our experiments aim at demonstrating the ef-                     BM25, which can be applied to dish names only.
   fectiveness of our models on the task of dish re-                   We refer to our system (BM25 applied to the 4
   trieval. We used the well known metrics: Pre-                       indexes as described in Sec. 3.2) with the name
   cision at rank 1 (P@1), Mean Reciprocal Rank                        Appetitoso.
   (MRR) and Mean Average Precision (MAP). P@1                            Table 1 shows the results of the baselines and
   indicates the percentage of queries with a cor-                     our model by cities and overall (All). Appetitoso
   rect answer (e.g., the desired dish) found in the                   largely outperforms String Matching and BM25
   first position. The MRR is computed as follows:                     applied to names only, e.g., up to 32 and 24 abso-
               1 P|Q|      1
   MRR= |Q|        q=1 rank(q) , where rank(q) is the                  lute percent points in MRR and MAP, respectively.
   position of the first correct answer in the retrieved
                                                                       5    Conclusion
   list. For a group of queries Q, MAP is the mean
                                                                        In this paper we presented Appetitoso, a se-
       5
           We use Lucene (McCandless et al., 2010)                     mantic search engine for food. The aims of the
search engine is to provide the users with a way              Covers Apache Lucene 3.0. Manning Publications
of searching restaurants by dishes rather than just           Co., Greenwich, CT, USA.
using the restaurants’ address or cuisine type. We          Stephen E Robertson, Steve Walker, Susan Jones,
show that, given the complexity of dish naming,                Micheline M Hancock-Beaulieu, Mike Gatford,
a semistructured database for dishes can largely               et al. 1995. Okapi at trec-3. NIST SPECIAL PUB-
improve BM25. Overall, Appetitoso shows good                   LICATION SP, 109:109.
performance, e.g., achieving 88.76% in MAP. In              Chun-Yuen Teng, Yu-Ru Lin, and Lada A Adamic.
the future, we would like to include more com-                2012. Recipe recommendation using ingredient net-
plex unstructured data such as the description of             works. In Proceedings of the 4th Annual ACM Web
                                                              Science Conference, pages 298–307. ACM.
the dishes and also explore the possibility of word
embeddings for the food domain. Moreover, it is             Michael Wiegand and Dietrich Klakow. 2013. To-
also important increase the coverage of the sys-              wards the detection of reliable food-health relation-
tem by adding more dishes to the FKB. Even if                 ships. NAACL 2013, page 69.
the manual annotation is important, and in some             Michael Wiegand, Benjamin Roth, and Dietrich
cases fundamental, it represents a bottleneck for             Klakow. 2012. Data-driven knowledge extraction
the expansion process. For this reason, in the fu-            for the food domain. In KONVENS, pages 21–29.
ture it would be necessary consider approaches to
automatically extract dish entities from text (e.g.
NER for food).

Acknowledgments
   We would like to thank the Appetitoso team for
making available the system and for providing us
with the data for this work. This work has been
partially supported by the EC project CogNet,
671625 (H2020-ICT-2014-2, Research and Inno-
vation action) and by an IBM Faculty Award. The
first author was supported by a fellowship from
TIM. Many thanks to the anonymous reviewers for
their valuable suggestions.


References
Yong-Yeol Ahn, Sebastian E Ahnert, James P Bagrow,
  and Albert-László Barabási. 2011. Flavor network
  and the principles of food pairing. Scientific reports,
  1.

Gianni Barlacchi, Marco De Nadai, Roberto Larcher,
  Antonio Casella, Cristiana Chitic, Giovanni Torrisi,
  Fabrizio Antonelli, Alessandro Vespignani, Alex
  Pentland, and Bruno Lepri. 2015. A multi-source
  dataset of urban life in the city of milan and the
  province of trentino. Scientific data, 2.

Gregory Druck. 2013. Recipe attribute prediction us-
  ing review text as supervision. In Cooking with
  Computers 2013, IJCAI workshop.

Hanhoon Kang, Seong Joon Yoo, and Dongil Han.
  2012. Senti-lexicon and improved naı̈ve bayes algo-
  rithms for sentiment analysis of restaurant reviews.
  Expert Systems with Applications, 39(5):6000–
  6010.

Michael McCandless, Erik Hatcher, and Otis Gospod-
  netic. 2010. Lucene in Action, Second Edition: