Appetitoso: A Search Engine for Restaurant Retrieval based on Dishes Gianni Barlacchi1,2 , Azad Abad1 , Emanuele Rossinelli3 , Alessandro Moschitti1,4 1 Department of Information Engineering and Computer Science, University of Trento 2 TIM Semantics and Knowledge Innovation Lab, Trento 3 Kloevolution S.r.l. 4 Qatar Computing Research Institute, HBKU {gianni.barlacchi,e.rossinelli,amoschitti}@gmail.com azad.abad@unitn.it Abstract 1 Introduction In late 2000’s, we assisted to the explosion of English. Recent years have seen an im- TripAdvisor2 , the world’s largest travel site, which pressive development and diffusion of web offers advice about hotel and restaurants. In few applications to food domains, e.g., Yelp, years, it has revolutionized the restaurant indus- TripAdvisors. These mainly exploit text try, allowing its users to search restaurants by lo- for searching and retrieving food facili- cation, broad food categories (e.g., Mexican, Ital- ties, e.g., restaurants, caffé, pizzerias. The ian, French), reviews and ratings provided by other main features of such applications are: the users. location and quality of the facilities, where However, the user expectation has evolved over- quality is extrapolated by the users’ re- time: looking for restaurants is not enough any- views. More recent options also enable more, people are now considering finer-grained search based on restaurant categorization, properties of food, e.g., a particular way to cook e.g., Japanese, Italian, Mexican. In this a dish along with its specific ingredients. Thus, work, we introduce Appetitoso1 , an in- there is a clear gap between what the market pro- novative approach for finding restaurants poses and the emerging trends. based on the dishes a user would like to taste rather than using the name of food In this work, we present Appetitoso, a search facilities or their general categories. engine that seeks for restaurants based on dishes. This approach is designed to help users to find Italiano. Recentemente si è assistito ad their restaurants having already a specific dish un impressionante sviluppo e diffusione di preference in mind, using fine-grained properties applicazioni web per il dominio del cibo, of the dish. e.g., Yelp, TripAdvisors. Queste sfruttano Appetitoso integrates state-of-the-art search en- principalmente il testo per la ricerca e il gines, such as BM25, with a domain specific recupero di punti di ristoro, e.g., ristoranti, knowledge base describing properties and similar- bar, pizzerie. Le caratteristiche principali ity relations between different Italian dishes. This usate dalle applicazioni sono: la posizione knowledge is very useful, e.g., in our experiments, e la qualitá delle strutture che servono il we show that it greatly boosts dish retrieval. cibo, dove la qualitá é estrapolata dalle Appetitoso is available as a mobile phone ap- recensioni degli utenti. Opzioni piú re- plication (e.g., Android and iOS) and website, re- centi consentono anche la ricerca in base leased in 2014 for two languages, English and Ital- alla categoria del ristorante, e.g., Giap- ian. It is an end-to-end application for finding ponese, Italiano, Messicano. Questo arti- restaurants offering the desired dish. We evalu- colo introduce Appetitoso, un nuovo modo ated it using a set of 547 popular queries typed by di trovare punti di ristoro sulla base dei its users in the cities of Rome, Milan and Florence. piatti che il cliente vuole gustare invece In the reminder of this paper, in Section 2, we che sul nome del ristorante o su categories report related work on systems for automatic food generali. recommendation, In Section 3, we introduce Ap- petitoso, its knowledge base and the food search 1 2 http://www.appetitoso.it http://www.tripadvisor.com engine. Section 4, we describe our experiments on restaurant retrieval on Italian language and finally, Web in Section 5, we provide our conclusion. Location Query 2 Related Work Dishes Databases Food Guides Nowdays, the importance of data analysis is becoming fundamental in many fields. From NLP pipeline to gather and telecommunications to social media, the huge analyze data Search Dishes amount of available data allows scientists and re- Present Search Results searchers to address previously unsolved problems (Barlacchi et al., 2015). The food domain repre- sents one of the field in which emerging big data Index Dishes Ingredients Dish Name techniques demonstrated to be very promising and Similars Tags able to impact the every daily life of people. In recipe recommendation, for instance, Teng et al. Figure 1: Architecture of Appetitoso. (2012) proposed an approach based on networks of ingredients, which has been built from a dataset (t-bone steak), the system retrieves places that sat- of recipes. In order to capture both ingredient re- isfy the constraint on the location and, at the same lations and users’ knowledge for combining ingre- time, prepare the desired dish or similar dishes. dients in new recipes, they created two separate Appetitoso retrieves restaurants from a networks used for recipe recommendation. semistructured database, Food Taste Knowledge Moreover, Ahn et al. (2011) explored the impact Base (FKB), which contains text descriptions of flavor compounds on ingredient combinations of dishes and restaurants: we in part manually through a network-based approach. An interest- inserted them or gathered them from various ing application was developed by IBM with Chef sources such as foodblogs, restaurants reviews Watson3 , which is part of the cognitive computing and food guides. The search processes is divided applications developed by the company. The sys- in two phases: first, the user has to type the query tem models the chemical compounds of different and a location, e.g., the address of a target place or ingredients together with textual information ex- the current user position captured by GPS. These tracted from thousands recipes for suggesting new are both sent to the Appetitoso’s search engine, ones using innovative ingredient combinations. which retrieves a list of related dishes from FKB. Among the different kinds of data, text surely The results are grouped by dish name and shown represents one of the richest sources of informa- to the user in different course categories, i.e., an- tion from which we can extract a wide range of tipasto/entree, primo/first course, secondo/second statements about food. The use of text in food do- course, dessert. The input location is used to main has been widely explored showing promis- restrict the search area of interest, relying on the ing results with different models, ranging from restaurant position available in FKB. the measurement of sentiment in food reviews The second phase of the searching process is de- (Kang et al., 2012) and relation extraction (Wie- voted to select the best restaurant. Once the user gand and Klakow, 2013; Wiegand et al., 2012), chooses a dish from the list above, Appetitoso pro- to the prediction of attribute reviews in recipes vides a list of restaurants that offer such food spe- (Druck, 2013). ciality. Indeed, all the restaurants offering that dish are stored in FBK. Additionally, Appetitoso 3 Appetitoso provides a DishScore4 for each restaurant, which We introduce the idea of searching a dish and is a measure of the goodness of the dish in that then finding the best restaurants that can offer it. restaurant. Fig. 1 shows the high-level architecture Thus, the aim of our search engine, Appetitoso, is 4 to find the best restaurants offering dishes relevant We only inserted restaurant that have a good reputation in FBK. In order to generate the DishScore, we trained a lo- to the user’s request. Starting from a query with gistic regression over 5 different review scores, e.g., 1 star, 2 food-related content, e.g., bistecca alla fiorentina star etc. We used various features, e.g., Tripadvisor and food guide scores. This description is however beyond the purpose 3 https://www.ibmchefwatson.com of the current paper. of the system. In the next section, we illustrate our FKB, which enables accurate retrieval of similar Spaghetti alla Carbonara di dishes. trabaccolara mare 3.1 The Food Taste Knowledge Base (FKB) A quick analysis of Italian menus clearly show Paccheri con Spaghetti allo that, in many cases, the name of a dish is not pesce spada scoglio enough to understand its content, which means that names do not support an accurate similarity measures between dishes. Thus, we created FKB, Linguine which also organizes dishes in a hierarchical struc- all’astice ture, where each node is connected to others in case there is a similarity between them. Figure 2: Connection between similar fish dishes. For instance, Bucatini alla amatriciana (buca- tini with amatriciana sauce) can be extended from - Restaurant: information about the restaurant that Spaghetti alla amatriciana (spaghetti with amatri- cook this dish (e.g. restaurant name and restau- ciana sauce) since the only difference between the rant ID). two dishes is the type of pasta (spaghetti vs. bu- - DishScore: a value that indicates the goodness of catini). In this case, we marked the first dish as a the dish. It is calculated taking into account many template for the second one. The relation is one- factors such as the reputation of the restaurant in to-many: one dish can be a template for many oth- cooking that dish, the number of mentions in food ers but it can be only assigned to one template. guide and the sentiment extracted from foodblog- Since every restaurant can have its own way to pre- ger articles and restaurant reviews. pare the dish, multiple instances of the same dish can be present in the FKB. We differentiate them This hierarchical organization is very powerful by adding the restaurant ID. and allows us to easily keep track of similarities Since there is no defined way to assess the sim- that are not explicit. Fig. 2 shows an example of ilarity between two dishes: they may be similar connections between similar dishes. It is worth as they are made by similar ingredients or because to mention that Appetitoso aims to suggest only they are cooked in the same way, we built the FKB restaurants that own a good reputation in cook- hierarchy with a semi-automatic approach. We ing target dishes, i.e., restaurants in Rome that are used name similarity to select similar candidates, famous for pasta alla carbonara. Consequently, which are then manually annotated by food ex- this limits the number of dishes contained in the perts. We manually populated FKB with data col- FKB and thus on the territory coverage. On the lected from the web, food guides and foodblogs. other hand, it makes it possible to create a manu- Every dish belonging to a restaurant is represented ally checked resource. by means of the following information: 3.2 Dish Retrieval - ID: unique identifier for the dish. Italy has long and variegated traditions on - Name of the dish: the name of the dish as re- preparing food: it is possible to find different kinds ported in the restaurant menu. of cuisine even in nearby cities. This makes the - Ingredients: list of the principal ingredients. Italian food incredibly varied and fascinating, but, When the ingredients are not provided by the at the same time, difficult to interpret from a lin- restaurant, we use a list of common ingredients guistic viewpoint. The same dish can be called for the dish (e.g., ingredients from online recipes). in many different ways. In Florence people call - Tags: list of tags useful to characterize the dish. Carabaccia the common dish Zuppa di cipolle The tag list does not include ingredients but only with the consequence that the underlying retrieval categorical information that can help to character- problem cannot be addressed by just using a sim- ize the dish (e.g., meat or fish). ple word matching approach. Indeed, even if a - Similar dishes: list of similar dishes defined ac- dish is conceptually the same of another, different cording to our hierarchy described above. restaurants (e.g., in different locations) have their - Template: ID of the template dish, if it is present. own way to call it. To tackle the problem above, we verified the hy- Model City MRR MAP P@1 Baselines pothesis that a search engine can achieve a better Milan 53.28 53.28 53,28 result if we consider further information such as String Matching Rome 71.23 71.23 71.23 ingredients and tags. This approach significantly (on entire names) Florence 44.87 44.87 44.87 All 56.46 56.46 56.46 improves the accuracy of the retrieved list com- Milan 69.75 65.44 68.18 paring to the simple word matching approach. BM25 Rome 63.86 60.32 58.90 More specifically, we applied BM25 (Robertson (on names only) Florence 42.31 40.94 37.18 All 58.64 55.56 54.75 et al., 1995) to FKB. Given a dish query, Q and a Our Model representation of a candidate dish, D, BM25 ranks Milan 95.35 85.69 93.43 Appetitoso the latter according to the following score: (names, ingredients Rome 87.40 76.23 84.93 Florence 83.55 75.38 78.21 tags, similar names) n All 88.76 79.10 85.52 X IDF (qi ) · ((k + 1) · T F (qi , D)) s(Q, D) =  |D| , i=0 k · (1 − b + b · avgD ) + T F (qi , D) Table 1: Ranking evaluation for different models where k and b are two free parameters that modify over the average precision scores for each query: respectively the impact of term frequency (TF) and 1 PQ Q q=1 AveP (q). |D| the document length through the term avgD , |D| is Due to the fact that FKB contains multiple in- the document length and avgD, i.e., the average of stances of the same dish, we evaluated the col- D over the whole dataset. Finally, IDF (qi ) is the lapsed list of results by considering the dish name. Inverse Document Frequency for the query term It is worth to mention that the names of the dishes qi , computed as: are not standard, thus some dishes are the same (N − DF (qi ) + 0.5) still having slightly different names. To make   log 1 + , them more similar, we normalized name forms by (DF (qi ) + 0.5) removing space, articles and punctuation. We con- where N is the total number of documents in the sidered a set of 547 popular queries typed by users collection, and DF (qi ) is the document frequency in Milan (396 queries), Rome (73 queries) and of the term qi . Florence (78 queries). The number of retrieved Additionally, we created four different indexes5 dishes varies for the different queries with aver- with the information contained in FKB, i.e., the ages of 22.8, 22.3 and 37.4, for Florence, Milan (i) dish name, (ii) ingredients, (iii) tags and (iv) and Rome, respectively. For each retrieved dish, similar dishes. Each list is built using the words we manually annotated the relevance respect to the describing the four items above. Thus, when we input query. It should be noted that the same dish query a dish, we first retrive four different sets of is associated (in FKB) with all of the restaurants results and then, since they have different impor- that are offering it. Thus, restaurant retrieval is a tance, we combine them together assigning differ- side effect of dish retrieval. ent weights, where the latter are set using cross- We considered two baselines for evaluating our fold validation. model, namely, String Matching and BM25. The 4 Experiments first is based on simple string matching between the query and the dish names. The second is Our experiments aim at demonstrating the ef- BM25, which can be applied to dish names only. fectiveness of our models on the task of dish re- We refer to our system (BM25 applied to the 4 trieval. We used the well known metrics: Pre- indexes as described in Sec. 3.2) with the name cision at rank 1 (P@1), Mean Reciprocal Rank Appetitoso. (MRR) and Mean Average Precision (MAP). P@1 Table 1 shows the results of the baselines and indicates the percentage of queries with a cor- our model by cities and overall (All). Appetitoso rect answer (e.g., the desired dish) found in the largely outperforms String Matching and BM25 first position. The MRR is computed as follows: applied to names only, e.g., up to 32 and 24 abso- 1 P|Q| 1 MRR= |Q| q=1 rank(q) , where rank(q) is the lute percent points in MRR and MAP, respectively. position of the first correct answer in the retrieved 5 Conclusion list. For a group of queries Q, MAP is the mean In this paper we presented Appetitoso, a se- 5 We use Lucene (McCandless et al., 2010) mantic search engine for food. The aims of the search engine is to provide the users with a way Covers Apache Lucene 3.0. Manning Publications of searching restaurants by dishes rather than just Co., Greenwich, CT, USA. using the restaurants’ address or cuisine type. We Stephen E Robertson, Steve Walker, Susan Jones, show that, given the complexity of dish naming, Micheline M Hancock-Beaulieu, Mike Gatford, a semistructured database for dishes can largely et al. 1995. Okapi at trec-3. NIST SPECIAL PUB- improve BM25. Overall, Appetitoso shows good LICATION SP, 109:109. performance, e.g., achieving 88.76% in MAP. In Chun-Yuen Teng, Yu-Ru Lin, and Lada A Adamic. the future, we would like to include more com- 2012. Recipe recommendation using ingredient net- plex unstructured data such as the description of works. In Proceedings of the 4th Annual ACM Web Science Conference, pages 298–307. ACM. the dishes and also explore the possibility of word embeddings for the food domain. Moreover, it is Michael Wiegand and Dietrich Klakow. 2013. To- also important increase the coverage of the sys- wards the detection of reliable food-health relation- tem by adding more dishes to the FKB. Even if ships. NAACL 2013, page 69. the manual annotation is important, and in some Michael Wiegand, Benjamin Roth, and Dietrich cases fundamental, it represents a bottleneck for Klakow. 2012. Data-driven knowledge extraction the expansion process. For this reason, in the fu- for the food domain. In KONVENS, pages 21–29. ture it would be necessary consider approaches to automatically extract dish entities from text (e.g. NER for food). Acknowledgments We would like to thank the Appetitoso team for making available the system and for providing us with the data for this work. This work has been partially supported by the EC project CogNet, 671625 (H2020-ICT-2014-2, Research and Inno- vation action) and by an IBM Faculty Award. The first author was supported by a fellowship from TIM. Many thanks to the anonymous reviewers for their valuable suggestions. References Yong-Yeol Ahn, Sebastian E Ahnert, James P Bagrow, and Albert-László Barabási. 2011. Flavor network and the principles of food pairing. Scientific reports, 1. Gianni Barlacchi, Marco De Nadai, Roberto Larcher, Antonio Casella, Cristiana Chitic, Giovanni Torrisi, Fabrizio Antonelli, Alessandro Vespignani, Alex Pentland, and Bruno Lepri. 2015. A multi-source dataset of urban life in the city of milan and the province of trentino. Scientific data, 2. Gregory Druck. 2013. Recipe attribute prediction us- ing review text as supervision. In Cooking with Computers 2013, IJCAI workshop. Hanhoon Kang, Seong Joon Yoo, and Dongil Han. 2012. Senti-lexicon and improved naı̈ve bayes algo- rithms for sentiment analysis of restaurant reviews. Expert Systems with Applications, 39(5):6000– 6010. Michael McCandless, Erik Hatcher, and Otis Gospod- netic. 2010. Lucene in Action, Second Edition: