=Paper=
{{Paper
|id=Vol-1749/paper7
|storemode=property
|title=Appetitoso: A Search Engine for Restaurant Retrieval based on Dishes
|pdfUrl=https://ceur-ws.org/Vol-1749/paper7.pdf
|volume=Vol-1749
|authors=Gianni Barlacchi,Azad Abad,Emanuele Rossinelli,Alessandro Moschitti
|dblpUrl=https://dblp.org/rec/conf/clic-it/BarlacchiARM16
}}
==Appetitoso: A Search Engine for Restaurant Retrieval based on Dishes==
Appetitoso: A Search Engine for Restaurant Retrieval based on Dishes
Gianni Barlacchi1,2 , Azad Abad1 , Emanuele Rossinelli3 , Alessandro Moschitti1,4
1
Department of Information Engineering and Computer Science, University of Trento
2
TIM Semantics and Knowledge Innovation Lab, Trento
3
Kloevolution S.r.l.
4
Qatar Computing Research Institute, HBKU
{gianni.barlacchi,e.rossinelli,amoschitti}@gmail.com
azad.abad@unitn.it
Abstract 1 Introduction
In late 2000’s, we assisted to the explosion of
English. Recent years have seen an im-
TripAdvisor2 , the world’s largest travel site, which
pressive development and diffusion of web
offers advice about hotel and restaurants. In few
applications to food domains, e.g., Yelp,
years, it has revolutionized the restaurant indus-
TripAdvisors. These mainly exploit text
try, allowing its users to search restaurants by lo-
for searching and retrieving food facili-
cation, broad food categories (e.g., Mexican, Ital-
ties, e.g., restaurants, caffé, pizzerias. The
ian, French), reviews and ratings provided by other
main features of such applications are: the
users.
location and quality of the facilities, where
However, the user expectation has evolved over-
quality is extrapolated by the users’ re-
time: looking for restaurants is not enough any-
views. More recent options also enable
more, people are now considering finer-grained
search based on restaurant categorization,
properties of food, e.g., a particular way to cook
e.g., Japanese, Italian, Mexican. In this
a dish along with its specific ingredients. Thus,
work, we introduce Appetitoso1 , an in-
there is a clear gap between what the market pro-
novative approach for finding restaurants
poses and the emerging trends.
based on the dishes a user would like to
taste rather than using the name of food In this work, we present Appetitoso, a search
facilities or their general categories. engine that seeks for restaurants based on dishes.
This approach is designed to help users to find
Italiano. Recentemente si è assistito ad their restaurants having already a specific dish
un impressionante sviluppo e diffusione di preference in mind, using fine-grained properties
applicazioni web per il dominio del cibo, of the dish.
e.g., Yelp, TripAdvisors. Queste sfruttano Appetitoso integrates state-of-the-art search en-
principalmente il testo per la ricerca e il gines, such as BM25, with a domain specific
recupero di punti di ristoro, e.g., ristoranti, knowledge base describing properties and similar-
bar, pizzerie. Le caratteristiche principali ity relations between different Italian dishes. This
usate dalle applicazioni sono: la posizione knowledge is very useful, e.g., in our experiments,
e la qualitá delle strutture che servono il we show that it greatly boosts dish retrieval.
cibo, dove la qualitá é estrapolata dalle Appetitoso is available as a mobile phone ap-
recensioni degli utenti. Opzioni piú re- plication (e.g., Android and iOS) and website, re-
centi consentono anche la ricerca in base leased in 2014 for two languages, English and Ital-
alla categoria del ristorante, e.g., Giap- ian. It is an end-to-end application for finding
ponese, Italiano, Messicano. Questo arti- restaurants offering the desired dish. We evalu-
colo introduce Appetitoso, un nuovo modo ated it using a set of 547 popular queries typed by
di trovare punti di ristoro sulla base dei its users in the cities of Rome, Milan and Florence.
piatti che il cliente vuole gustare invece In the reminder of this paper, in Section 2, we
che sul nome del ristorante o su categories report related work on systems for automatic food
generali. recommendation, In Section 3, we introduce Ap-
petitoso, its knowledge base and the food search
1 2
http://www.appetitoso.it http://www.tripadvisor.com
engine. Section 4, we describe our experiments on
restaurant retrieval on Italian language and finally, Web
in Section 5, we provide our conclusion. Location
Query
2 Related Work Dishes
Databases
Food
Guides
Nowdays, the importance of data analysis is
becoming fundamental in many fields. From
NLP pipeline to gather and
telecommunications to social media, the huge analyze data Search Dishes
amount of available data allows scientists and re- Present Search
Results
searchers to address previously unsolved problems
(Barlacchi et al., 2015). The food domain repre-
sents one of the field in which emerging big data Index Dishes
Ingredients Dish Name
techniques demonstrated to be very promising and Similars Tags
able to impact the every daily life of people. In
recipe recommendation, for instance, Teng et al. Figure 1: Architecture of Appetitoso.
(2012) proposed an approach based on networks
of ingredients, which has been built from a dataset (t-bone steak), the system retrieves places that sat-
of recipes. In order to capture both ingredient re- isfy the constraint on the location and, at the same
lations and users’ knowledge for combining ingre- time, prepare the desired dish or similar dishes.
dients in new recipes, they created two separate Appetitoso retrieves restaurants from a
networks used for recipe recommendation. semistructured database, Food Taste Knowledge
Moreover, Ahn et al. (2011) explored the impact Base (FKB), which contains text descriptions
of flavor compounds on ingredient combinations of dishes and restaurants: we in part manually
through a network-based approach. An interest- inserted them or gathered them from various
ing application was developed by IBM with Chef sources such as foodblogs, restaurants reviews
Watson3 , which is part of the cognitive computing and food guides. The search processes is divided
applications developed by the company. The sys- in two phases: first, the user has to type the query
tem models the chemical compounds of different and a location, e.g., the address of a target place or
ingredients together with textual information ex- the current user position captured by GPS. These
tracted from thousands recipes for suggesting new are both sent to the Appetitoso’s search engine,
ones using innovative ingredient combinations. which retrieves a list of related dishes from FKB.
Among the different kinds of data, text surely The results are grouped by dish name and shown
represents one of the richest sources of informa- to the user in different course categories, i.e., an-
tion from which we can extract a wide range of tipasto/entree, primo/first course, secondo/second
statements about food. The use of text in food do- course, dessert. The input location is used to
main has been widely explored showing promis- restrict the search area of interest, relying on the
ing results with different models, ranging from restaurant position available in FKB.
the measurement of sentiment in food reviews The second phase of the searching process is de-
(Kang et al., 2012) and relation extraction (Wie- voted to select the best restaurant. Once the user
gand and Klakow, 2013; Wiegand et al., 2012), chooses a dish from the list above, Appetitoso pro-
to the prediction of attribute reviews in recipes vides a list of restaurants that offer such food spe-
(Druck, 2013). ciality. Indeed, all the restaurants offering that
dish are stored in FBK. Additionally, Appetitoso
3 Appetitoso
provides a DishScore4 for each restaurant, which
We introduce the idea of searching a dish and is a measure of the goodness of the dish in that
then finding the best restaurants that can offer it. restaurant. Fig. 1 shows the high-level architecture
Thus, the aim of our search engine, Appetitoso, is
4
to find the best restaurants offering dishes relevant We only inserted restaurant that have a good reputation
in FBK. In order to generate the DishScore, we trained a lo-
to the user’s request. Starting from a query with gistic regression over 5 different review scores, e.g., 1 star, 2
food-related content, e.g., bistecca alla fiorentina star etc. We used various features, e.g., Tripadvisor and food
guide scores. This description is however beyond the purpose
3
https://www.ibmchefwatson.com of the current paper.
of the system. In the next section, we illustrate our
FKB, which enables accurate retrieval of similar Spaghetti alla Carbonara di
dishes. trabaccolara mare
3.1 The Food Taste Knowledge Base (FKB)
A quick analysis of Italian menus clearly show Paccheri con Spaghetti allo
that, in many cases, the name of a dish is not pesce spada scoglio
enough to understand its content, which means
that names do not support an accurate similarity
measures between dishes. Thus, we created FKB,
Linguine
which also organizes dishes in a hierarchical struc- all’astice
ture, where each node is connected to others in
case there is a similarity between them.
Figure 2: Connection between similar fish dishes.
For instance, Bucatini alla amatriciana (buca-
tini with amatriciana sauce) can be extended from - Restaurant: information about the restaurant that
Spaghetti alla amatriciana (spaghetti with amatri- cook this dish (e.g. restaurant name and restau-
ciana sauce) since the only difference between the rant ID).
two dishes is the type of pasta (spaghetti vs. bu- - DishScore: a value that indicates the goodness of
catini). In this case, we marked the first dish as a the dish. It is calculated taking into account many
template for the second one. The relation is one- factors such as the reputation of the restaurant in
to-many: one dish can be a template for many oth- cooking that dish, the number of mentions in food
ers but it can be only assigned to one template. guide and the sentiment extracted from foodblog-
Since every restaurant can have its own way to pre- ger articles and restaurant reviews.
pare the dish, multiple instances of the same dish
can be present in the FKB. We differentiate them This hierarchical organization is very powerful
by adding the restaurant ID. and allows us to easily keep track of similarities
Since there is no defined way to assess the sim- that are not explicit. Fig. 2 shows an example of
ilarity between two dishes: they may be similar connections between similar dishes. It is worth
as they are made by similar ingredients or because to mention that Appetitoso aims to suggest only
they are cooked in the same way, we built the FKB restaurants that own a good reputation in cook-
hierarchy with a semi-automatic approach. We ing target dishes, i.e., restaurants in Rome that are
used name similarity to select similar candidates, famous for pasta alla carbonara. Consequently,
which are then manually annotated by food ex- this limits the number of dishes contained in the
perts. We manually populated FKB with data col- FKB and thus on the territory coverage. On the
lected from the web, food guides and foodblogs. other hand, it makes it possible to create a manu-
Every dish belonging to a restaurant is represented ally checked resource.
by means of the following information:
3.2 Dish Retrieval
- ID: unique identifier for the dish.
Italy has long and variegated traditions on
- Name of the dish: the name of the dish as re- preparing food: it is possible to find different kinds
ported in the restaurant menu. of cuisine even in nearby cities. This makes the
- Ingredients: list of the principal ingredients. Italian food incredibly varied and fascinating, but,
When the ingredients are not provided by the at the same time, difficult to interpret from a lin-
restaurant, we use a list of common ingredients guistic viewpoint. The same dish can be called
for the dish (e.g., ingredients from online recipes). in many different ways. In Florence people call
- Tags: list of tags useful to characterize the dish. Carabaccia the common dish Zuppa di cipolle
The tag list does not include ingredients but only with the consequence that the underlying retrieval
categorical information that can help to character- problem cannot be addressed by just using a sim-
ize the dish (e.g., meat or fish). ple word matching approach. Indeed, even if a
- Similar dishes: list of similar dishes defined ac- dish is conceptually the same of another, different
cording to our hierarchy described above. restaurants (e.g., in different locations) have their
- Template: ID of the template dish, if it is present. own way to call it.
To tackle the problem above, we verified the hy- Model City MRR MAP P@1
Baselines
pothesis that a search engine can achieve a better
Milan 53.28 53.28 53,28
result if we consider further information such as String Matching Rome 71.23 71.23 71.23
ingredients and tags. This approach significantly (on entire names) Florence 44.87 44.87 44.87
All 56.46 56.46 56.46
improves the accuracy of the retrieved list com- Milan 69.75 65.44 68.18
paring to the simple word matching approach. BM25 Rome 63.86 60.32 58.90
More specifically, we applied BM25 (Robertson (on names only) Florence 42.31 40.94 37.18
All 58.64 55.56 54.75
et al., 1995) to FKB. Given a dish query, Q and a
Our Model
representation of a candidate dish, D, BM25 ranks Milan 95.35 85.69 93.43
Appetitoso
the latter according to the following score: (names, ingredients
Rome 87.40 76.23 84.93
Florence 83.55 75.38 78.21
tags, similar names)
n All 88.76 79.10 85.52
X IDF (qi ) · ((k + 1) · T F (qi , D))
s(Q, D) =
|D|
,
i=0 k · (1 − b + b · avgD ) + T F (qi , D) Table 1: Ranking evaluation for different models
where k and b are two free parameters that modify over the average precision scores for each query:
respectively the impact of term frequency (TF) and 1 PQ
Q q=1 AveP (q).
|D|
the document length through the term avgD , |D| is Due to the fact that FKB contains multiple in-
the document length and avgD, i.e., the average of stances of the same dish, we evaluated the col-
D over the whole dataset. Finally, IDF (qi ) is the lapsed list of results by considering the dish name.
Inverse Document Frequency for the query term It is worth to mention that the names of the dishes
qi , computed as: are not standard, thus some dishes are the same
(N − DF (qi ) + 0.5) still having slightly different names. To make
log 1 + , them more similar, we normalized name forms by
(DF (qi ) + 0.5)
removing space, articles and punctuation. We con-
where N is the total number of documents in the
sidered a set of 547 popular queries typed by users
collection, and DF (qi ) is the document frequency
in Milan (396 queries), Rome (73 queries) and
of the term qi .
Florence (78 queries). The number of retrieved
Additionally, we created four different indexes5 dishes varies for the different queries with aver-
with the information contained in FKB, i.e., the ages of 22.8, 22.3 and 37.4, for Florence, Milan
(i) dish name, (ii) ingredients, (iii) tags and (iv) and Rome, respectively. For each retrieved dish,
similar dishes. Each list is built using the words we manually annotated the relevance respect to the
describing the four items above. Thus, when we input query. It should be noted that the same dish
query a dish, we first retrive four different sets of is associated (in FKB) with all of the restaurants
results and then, since they have different impor- that are offering it. Thus, restaurant retrieval is a
tance, we combine them together assigning differ- side effect of dish retrieval.
ent weights, where the latter are set using cross- We considered two baselines for evaluating our
fold validation. model, namely, String Matching and BM25. The
4 Experiments first is based on simple string matching between
the query and the dish names. The second is
Our experiments aim at demonstrating the ef- BM25, which can be applied to dish names only.
fectiveness of our models on the task of dish re- We refer to our system (BM25 applied to the 4
trieval. We used the well known metrics: Pre- indexes as described in Sec. 3.2) with the name
cision at rank 1 (P@1), Mean Reciprocal Rank Appetitoso.
(MRR) and Mean Average Precision (MAP). P@1 Table 1 shows the results of the baselines and
indicates the percentage of queries with a cor- our model by cities and overall (All). Appetitoso
rect answer (e.g., the desired dish) found in the largely outperforms String Matching and BM25
first position. The MRR is computed as follows: applied to names only, e.g., up to 32 and 24 abso-
1 P|Q| 1
MRR= |Q| q=1 rank(q) , where rank(q) is the lute percent points in MRR and MAP, respectively.
position of the first correct answer in the retrieved
5 Conclusion
list. For a group of queries Q, MAP is the mean
In this paper we presented Appetitoso, a se-
5
We use Lucene (McCandless et al., 2010) mantic search engine for food. The aims of the
search engine is to provide the users with a way Covers Apache Lucene 3.0. Manning Publications
of searching restaurants by dishes rather than just Co., Greenwich, CT, USA.
using the restaurants’ address or cuisine type. We Stephen E Robertson, Steve Walker, Susan Jones,
show that, given the complexity of dish naming, Micheline M Hancock-Beaulieu, Mike Gatford,
a semistructured database for dishes can largely et al. 1995. Okapi at trec-3. NIST SPECIAL PUB-
improve BM25. Overall, Appetitoso shows good LICATION SP, 109:109.
performance, e.g., achieving 88.76% in MAP. In Chun-Yuen Teng, Yu-Ru Lin, and Lada A Adamic.
the future, we would like to include more com- 2012. Recipe recommendation using ingredient net-
plex unstructured data such as the description of works. In Proceedings of the 4th Annual ACM Web
Science Conference, pages 298–307. ACM.
the dishes and also explore the possibility of word
embeddings for the food domain. Moreover, it is Michael Wiegand and Dietrich Klakow. 2013. To-
also important increase the coverage of the sys- wards the detection of reliable food-health relation-
tem by adding more dishes to the FKB. Even if ships. NAACL 2013, page 69.
the manual annotation is important, and in some Michael Wiegand, Benjamin Roth, and Dietrich
cases fundamental, it represents a bottleneck for Klakow. 2012. Data-driven knowledge extraction
the expansion process. For this reason, in the fu- for the food domain. In KONVENS, pages 21–29.
ture it would be necessary consider approaches to
automatically extract dish entities from text (e.g.
NER for food).
Acknowledgments
We would like to thank the Appetitoso team for
making available the system and for providing us
with the data for this work. This work has been
partially supported by the EC project CogNet,
671625 (H2020-ICT-2014-2, Research and Inno-
vation action) and by an IBM Faculty Award. The
first author was supported by a fellowship from
TIM. Many thanks to the anonymous reviewers for
their valuable suggestions.
References
Yong-Yeol Ahn, Sebastian E Ahnert, James P Bagrow,
and Albert-László Barabási. 2011. Flavor network
and the principles of food pairing. Scientific reports,
1.
Gianni Barlacchi, Marco De Nadai, Roberto Larcher,
Antonio Casella, Cristiana Chitic, Giovanni Torrisi,
Fabrizio Antonelli, Alessandro Vespignani, Alex
Pentland, and Bruno Lepri. 2015. A multi-source
dataset of urban life in the city of milan and the
province of trentino. Scientific data, 2.
Gregory Druck. 2013. Recipe attribute prediction us-
ing review text as supervision. In Cooking with
Computers 2013, IJCAI workshop.
Hanhoon Kang, Seong Joon Yoo, and Dongil Han.
2012. Senti-lexicon and improved naı̈ve bayes algo-
rithms for sentiment analysis of restaurant reviews.
Expert Systems with Applications, 39(5):6000–
6010.
Michael McCandless, Erik Hatcher, and Otis Gospod-
netic. 2010. Lucene in Action, Second Edition: