RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 42 Re-CoSKQ: Towards POIs Recommendation Using Collective Spatial Keyword Queries Ramon Hermoso∗ Sergio Ilarri Raquel Trillo-Lado rhermoso@unizar.es silarri@unizar.es raqueltl@unizar.es Department of Computer Science and Department of Computer Science and Department of Computer Science and Systems Engineering, University of Systems Engineering, I3A, University Systems Engineering, I3A, University Zaragoza of Zaragoza of Zaragoza Zaragoza, Spain Zaragoza, Spain Zaragoza, Spain ABSTRACT We believe that exploiting spatial keyword querying as a basis The goal of collective spatial keyword queries is to retrieve, from to build recommender systems is an interesting research avenue a spatial database, a group of spatial items such that the descrip- to explore. Therefore, combining both fields of research, in this tion of the items included in that set (typically based on the use of position paper we present the idea of Re-CoSKQ, a recommender keywords) is completely covered by the query’s keywords. More- system that uses CoSKQ to provide a set of items that semantically over, it ensures that the items retrieved are as near as possible to covers the keywords of a query (even if they do not match perfectly) the query location and have the lowest inter-item distances. We and minimizes the cost, in terms of the distance to get to them and argue that using this concept in the field of recommender systems the similarity between query keywords and item descriptions. could be useful. Therefore, in this position paper, we outline the As a problem statement, let us consider a set U = {u 1, ..., un } of idea of Re-CoSKQ, an adaptation of Collective Spatial Keyword users spending their time in a city as tourists. Let O = {o 1, ..., om } Query (CoSKQ) for recommender systems in the tourism domain to be a set of POIs, i.e., spots with some kind of relevant attraction for provide the user with a set of Points of Interest (POIs) that satisfy visitors. Examples of POIs could be museums, monuments, parks, his/her queries both geographically and semantically. or buildings with some historical flavour, just to mention a few. Now, let oi .κ = {k 1, ..., k j } be a set of keywords with which a CCS CONCEPTS POI oi ∈ O is described. These keywords can usually be retrieved in an automated way by using semantically-annotated resources. • Retrieval tasks and goals → Recommender systems. Moreover, every POI oi ∈ O is placed in a location denoted by oi .λ. KEYWORDS Re-CoSKQ uses collective spatial keyword querying in order to cope with the location of POIs and users and also with the similarity Collective spatial keyword querying, recommender systems, tourism between the keywords in the user’s query and the description of the POIs. Let q = ⟨λ, κ⟩ be a user’s query, where q.λ represents 1 INTRODUCTION the user’s location and q.κ stands for the query split in keywords Recommender systems (RS) have been studied for several decades, (only relevant words for the search are taken into account). The aiming to facilitate item selection as part of the user’s decision- main goal is to provide a method to return a set of items O ′ ⊆ O making processes [11]. One of the hard challenges of recommender which semantically covers the keywords in q.κ and also ensures systems is to provide successful responses to user queries, especially that their cost, in terms of distance –between the POIs and the user when little information is available. In most RS approaches, alge- who issued the query– and the similarity of terms, is minimal. braic operations with user-item rating matrices allow predicting The next sections intend to shed some light into the problem and the future likeness of new items for a user (e.g., using collaborative present the approach we have envisioned to deal with it. Specifically, filtering, content-based, or hybrid approaches). However, when the the rest of the paper is structured as follows. First, Section 2 we suitability of the suggested items depends on different features such revise the concept of CoSKQ. Then, in Section 3, we present the as the location of items and users, textual descriptions of items, or Re-CoSKQ approach. In Section 4, we sketch an evaluation proposal. the (sometimes blurry) query description, those approaches face Finally, in Section 5, we conclude with a summary and some future new problems to address. For example, for the recommendation of work. points of interest (POIs), the location of the items and the user, as well as other context attributes, may play a key role [6]. 2 BACKGROUND: CoSKQ The idea of Collective Spatial Keyword Querying (CoSKQ) emer- As we have previously stated, CoSKQ attempts to find the solution ged some years ago as a promising technique to query spatial to the problem of retrieving a group of spatial objects that collec- databases containing information about items and their location [2]. tively match the user preferences given specific locations (of the It puts forward a smart solution to retrieve a group of spatial items user and also of the objects) and a set of keywords. The method is such that the description of the items included in that set (typically designed to work with spatial databases, so it does much effort on based on the use of keywords) is completely covered by the query’s providing an efficient computation, in terms of the data structure keywords and assures that the items are as near as possible to the used and how data are accessed [2, 3]. Although going in depth on query location and have the lowest inter-item distances. the subtle considerations of the method is out of the scope of this ∗ All authors contributed equally to this research. paper, we summarize how it works applied to our domain. Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 43 It uses the concept of IR-tree data structures [4] to efficiently R5 R6 store information about POIs. This type of structure allows indexing objects and the keywords which describe them as well as their R3 R4 R1 R2 spatial position. IR-trees are a type of balanced trees in which each R3 R4 R1 R2 leaf node contains an item o (a POI object), a bounding rectangle of o and an item identifier, while each non-leaf node in the tree contains o1 o2 o6 o8 o3 o4 o5 o7 o9 o10 a pointer to a child node, a Minimum Bounding Rectangle (MBR) of all rectangles in entries of the child node, and an item identifier Figure 3: Resulting IR-tree containing data for the example containing the set of all keywords in the entries of the child node. Moreover, each leaf node contains a pointer to an inverted file with the keywords that describe the POIs stored in that node. Figure 1 3 Re-CoSKQ APPROACH depicts an example for which CoSKQ may offer a solution with a We present Re-CoSKQ as an instantiation of the CoSKQ problem, query q and a set of POIs {o 1, ..., o 10 }. Figures 2 and 3 show how especially designed for recommendations in the tourism domain the data are geographically partitioned and stored in an IR-tree. (i.e., the user is a tourist and the items are points of interest that the user may want to visit). The most common instantiation of CoSKQ assumes that the set of keywords describing the POIs in the query o1 o2 o3 result must contain, at least, all the keywords contained in the o4 query [2]. Formally, q.κ ⊆ ∪oi′ ∈O′ oi′ .κ, where O ′ is the set of POIs calculated as a result of a user issuing the query q; for simplicity, o6 q o7 from now on, we will use o ∈ O ′ to avoid oi′ . However, there o5 are scenarios in which this assumption must hold some more hard o8 o9 constraints. For instance, when tackling a recommendation problem, o10 we need to ensure not only that the keyword query is covered by the resulting O ′ but also that both the maximum distance between the query location and any of the POIs in O ′ and the maximum Figure 1: Example of a possible scenario distance between any two POIs in O ′ are minimized. Moreover, in this paper, we do not assume that q.κ can be fully covered. Actually, we claim that this assumption may derive in empty sets in many recommendation scenarios where queries are o1 o2 o3 expressed, for instance, with different vocabularies, or where they R3 R1 cannot be easily solved with the given descriptions of POIs. Thus, o4 R6 we believe that it is important to provide query outcomes even R5 when full keyword coverage is not possible. In order to do that, we o6 q propose to use a similarity function to calculate how similar the o7 R4 o5 keywords in q.κ are compared to those in ∪o ∈O′ . For example, given o8 o9 R2 q.κ = {outdoors, animals, kids}, if located nearby, one of the POIs included in the outcome could be a zoo, which could be described by o10 a set of keywords {open-air, birds, snakes, mammals, f amily}. This object would never be returned using a classic CoSKQ approach, but considering the semantic similarity between keywords one can Figure 2: Item positioning for the example easily observe that the terms are related, since birds, snakes and CoSKQ presents different algorithmic solutions based on min- mammals are types of animals, outdoors and open-air are synonyms imizing a cost function. The chosen cost function may vary de- and kids are part of families. We will present how to cope with this pending on the authors of each specific proposal and the scenario when presenting different cost functions. where it is applied. Different cost functions, taking into account the distances between items and query locations, can be found in 3.1 Cost Analysis [2, 3]. It has been proved that solving a spatial group keyword query Re-CoSKQ attempts to minimize the cost of finding an appropriate is an NP-complete problem [2], i.e., the performance of an exact set of POIs for a given query q. This cost is modelled as a function algorithm does not present itself as a reasonable solution, in terms that depends on distances between the query and the locations of of running time and I/O cost [7]. For that reason, some approxima- POIs as well as between the keywords. Different equations have tion algorithms have been developed to calculate the output sets been proposed to model cost in the CoSKQ problem [1, 2, 10]. In the of objects [2, 3, 7, 12]. Besides, in special cases, the application of following, we redefine some of them for the Re-CoSKQ problem. an exact algorithm may be plausible, especially when the number TYPE 1. A linear combination of the maximum distance be- of keywords in the query is small. Some exact algorithms, based tween the query location and any POI in O ′ , the maximum pairwise on dynamic programming for minimizing the cost function are distance between any two POIs in O ′ , and the maximum of the presented in [2, 3, 7]. semantic distance between the query keywords (q.κ) and the set of Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 44 keywords in O ′ , i.e. ∪o ∈O′ o.κ. It is formally defined in Eq. 1. calculated with different geometrical approaches. In the following, cost (q, O′ ) = α · max′ [dist (q .λ, o .λ)] + β · max ′ [dist (o 1 , o 2 )] we point out some possible functions. o∈O o 1 ,o 2 ∈O (1) Euclidean distance. It is probably the most common distance +ω · max [dist (k 1 , k 2 )] function used in the literature for many different types of problems k 1 ∈q .κ ,k 2 ∈∪o∈O′ o .κ and domains. It is formally defined by Eq. 5: where α + β + ω = 1 are weights to denote the relevance of each tn v of the three types of distances involved, which allow adding up Õ distances which may have different ranges of values. dist (q .λ, o .λ) = (q .λ i − o .λ i )2 (5) i =1 TYPE 2. This type of function defines cost as the maximum of the three factors in the TYPE 1 function; i.e., the highest value We assume that the position of queries and POIs are given by a pair between the maximum distance among the query location and of coordinates ⟨lat, lonд⟩. This distance may work well when the any POI in O ′ , the maximum pairwise distance between any two routes between POIs are roughly calculated or the users can walk POIs in O ′ , and the maximum of the semantic distance between straight from any location to another. query keywords (q.κ) and the set of keywords in ∪o ∈O′ o.κ. This is L1 -Norm. It is another well-known distance function, also known formally defined by Eq. 2, where again weights α, β and ω are used. as Manhattan distance. It calculates the sum of the magnitudes of cost (q, O′ ) = max α · max′ [dist (q .λ, o .λ)] , β · max ′ [dist (o 1 , o 2 )] ,  the vectors in a space, i.e., the sum of absolute difference of the o∈O o 1 ,o 2 ∈O components of the vectors (see Eq. 6). ω· max [dist (k 1 , k 2 )] k 1 ∈q .κ ,k 2 ∈∪o∈O′ o .κ n Õ (2) dist (q .λ, o .λ) = |q .λ i − o .λ i | (6) i =1 TYPE 3. This function uses a min-max approach, linearly com- bining the minimum distance between the query location and any We use 2-dimension spaces, denoted by location coordinates. This POI with the maximum values for pairwise distance between any distance may be suitable for grid-based scenarios, e.g., POIs in a city two POIs in O ′ and the semantic distance between query keywords connected by roads/paths, or halls in a museum linked by corridors. (q.κ) and the set of keywords in O ′ , i.e., ∪o ∈O′ o.κ (see Eq. 3). Geodesic distance. It is the type of function we need if we use a graph to model how POIs and users are connected. The geodesic cost (q, O′ ) = α · min′ [dist (q .λ, o .λ)] + β · max ′ [dist (o 1 , o 2 )] o∈O o 1 ,o 2 ∈O distance is defined as the shortest path between two vertices in a (3) +ω · max [dist (k 1 , k 2 )] graph. This is useful when modeling a scenario with weighted edges, k 1 ∈q .κ ,k 2 ∈∪o∈O′ o .κ since some extra information can be added (e.g., about congested again with α + β + ω = 1. routes or crowded halls). Many algorithms can be used to calculate TYPE 4. This is a unified cost function, adapted from [3], that shortest paths in graphs (e.g., the Dijkstra’s algorithm). generalizes types 1 to 3 in one function. It is presented in Eq. 4. POI-to-POI distance. (dist(o 1, o 2 )) could also be called intra-  1  ϕ2 "  Õ ϕ POI distance, since it calculates the distance between two POIs. Note cost (q, O′ ) = α · (dist (q .λ, o .λ))ϕ1 1 again that the location of o ∈ O ′ is denoted by o.λ. As we assume o∈O′   ϕ2 a 2-dimension space in Re-CoSKQ, we can reduce the calculation + β · max ′ dist (o 1 , o 2 ) (4) of this distance to the problem of calculating the location distance. o 1 ,o 2 ∈O Thus, the same functions described above may apply to this case.   ϕ2 # ϕ12 Term distance. (dist(k 1, k 2 )) is the distance we use to calculate + ω· max dist (k 1 , k 2 ) how similar two different keywords are. In this case, we compare k 1 ∈q .κ ,k 2 ∈∪o∈O′ o .κ the query keywords (q.κ) and the keywords in ∪o ∈O′ o.κ. In the with α + β + ω = 1, ϕ 1 ∈ {−∞, 1, ∞} and ϕ 2 ∈ {1, ∞} . The ϕ 1 cost function, we try to minimize the maximum distance between and ϕ 2 values stand for tuning parameters, allowing to describe the q.κ set and o.κ in a pairwise basis. In order to calculate the sim- the previous cost functions (types 1-3) by varying their values. For ilarity between keywords, we adhere to ontology-based measures, example, an instantiation with α, β, ω = 13 , ϕ 1 = ∞ and ϕ 2 = 1 typically used in semantic web approaches. This type of measures results in a Type 1 cost function with the weights α, β, and ω usually calculates the similarity according to structured knowledge indicated: defined by an ontology. In the following, we propose some functions  1 Õ cost (q, O′ ) = max (dist (q .λ, o .λ))+ that we consider to be suitable for the Re-CoSKQ problem; once the 3 ′o∈O similarity has been estimated, we should provide a way to calculate the distance associated to it, such as dist(k 1, k 2 ) = 1 − sim(k 1, k 2 ).  + max ′ dist (o 1 , o 2 ) + max dist (k 1 , k 2 ) o 1 ,o 2 ∈O k 1 ∈q .κ ,k 2 ∈∪o∈O′ o .κ Similarity based on concept closeness. This measure takes into account the closeness of the concepts in the hierarchical tree rep- 3.2 Distance Analysis resenting the ontology. It is based on the relatedness property pre- sp(k 1 ,k 2 ) As we have pointed out, there exist different distance functions sented in [8] and is defined as sim(k 1, k2) = 1 − 2D , where needed to calculate the cost in Re-CoSKQ. Analyzing any of the pro- sp(·) is a function that returns the shortest path between the two posed cost functions, we can observe that there are three different terms in the ontology tree and 2D denotes the maximum distance distance instantiations, as we explain in the following. between any two concepts in the ontology (D is the ontology depth). Location distance. (dist(q.λ, o.λ)) refers to the physical dis- Similarity based on closeness and concept depth. This measure, tance between the query location and a POI’s location. It can be proposed in [9], takes into account the closeness of keywords in Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). RecTour 2019, September 19th, 2019, Copenhagen, Denmark. 45 the ontology but also the depth in the ontology tree where they can resources, we will use exact and approximate algorithms to test be found (see Eq. 7). It assumes that the semantics of concepts are their performance (running time and approximation ratio). more general in higher levels. Thus, the higher we find the concept in the ontology the lower the similarity while, on the contrary, the 5 CONCLUSIONS AND FUTURE WORK lower we find the concepts the higher the similarity. In this position paper, we have presented the idea of Re-CoSKQ, a βh −β h collective spatial keyword query approach for recommender sys- ( e −α l e β h −e −β h if k1 , k2 sim(k 1 , k 2 ) = e +e (7) tems, where keyword coverage is not assumed, by considering 1 other wise keyword similarity. We have tackled the problem as a minimization where l is the shortest path between k 1 and k 2 and h is the depth of problem, for which we have defined some cost functions. the least common subsumer of both concepts. Parameters α, β > 0 We are currently working on an empirical evaluation to test are weights to modulate the contribution of these factors. the approach and its benefits over other POI recommendation ap- proaches. Furthermore, we would like to extend the approach to 3.3 Outline of Processing Issues for Re-CoSKQ group recommendation; that is, different users in different locations will issue their queries and the opportunity of group visits (groups Several algorithms address the problem of implementing CoSKQ. of people visiting the same items together) will be explored, so the CoSKQ is an NP-complete problem, so exact algorithms (e.g., the problem becomes more complex, since O ′ must contain suitable linear programming approaches presented in [2, 3]) only make POIs that satisfy all users (or at least a set of them). We are also sense when the number of query keywords is low. However, on interested in dynamic environments where both the POIs and users average conditions, an approximate algorithm is needed. Different could potentially be on the move and context conditions can change approaches using greedy techniques and pruning steps have been quickly over time. Finally, we also intend to consider other spatial presented to reduce the needed resources. Due to lack of space, we distance calculation approaches, such as heuristic searches (e.g., by using A⋆ algorithms), as well as other approaches to compute term omit further details and refer the reader to [2, 3] for further revision on approximate algorithms for CoSKQ. Re-CoSKQ needs to tackle distances (e.g., a word embedding approach such as word2vec). an optimization problem to try to minimize the cost function. Acknowledgments 4 EVALUATION PROPOSAL Work supported by the project TIN2016-78011-C4-3-R (AEI/FEDER, UE) and Most works on CoSKQ focus their evaluation on measuring the the Government of Aragon (Group Reference T35_17D, COSMOS group) performance (in terms of running time) and approximation ratio. and co-funded with Feder 2014-2020 “Construyendo Europa desde Aragon”. Nevertheless, when applying this approach to recommender sys- tems we are not only interested in these issues, as the quality of REFERENCES the recommendation is also key. An interesting problem is that [1] Xin Cao, Gao Cong, Tao Guo, Christian S Jensen, and Beng Chin Ooi. 2015. Efficient processing of spatial group keyword queries. ACM Transactions on full coverage is assumed in classic CoSKQ; that is, ∪o ∈O′ o.κ is Database Systems (TODS) 40, 2 (2015), 13:1–13:48. assumed to contain, at least, all keywords in q.κ. This is not the [2] Xin Cao, Gao Cong, Christian S Jensen, and Beng Chin Ooi. 2011. Collective case of Re-CoSKQ, where the coverage is estimated by keyword spatial keyword querying. In 2011 ACM SIGMOD Int. Conference on Management of Data. ACM, 373–384. similarity. Moreover, the evaluation usually needs a ground truth to [3] Harry Kai-Ho Chan, Cheng Long, and Raymond Chi-Wing Wong. 2018. On compare with, in order to be able to calculate accuracy metrics such generalizing collective spatial keyword queries. IEEE Transactions on Knowledge as precision and recall. As far as we know, there is no dataset anno- and Data Engineering 30, 9 (2018), 1712–1726. [4] Gao Cong, Christian S Jensen, and Dingming Wu. 2009. Efficient retrieval of the tated with this type of information. Thus, we propose to first define top-k most relevant spatial web objects. Proceedings of the VLDB Endowment 2, 1 a set of interesting and representative keyword queries and then (2009), 337–348. [5] María del Carmen Rodríguez-Hernández, Sergio Ilarri, Ramón Hermoso, and manually annotate a dataset containing POIs descriptions with the Raquel Trillo-Lado. 2017. DataGenCARS: A Generator of Synthetic Data for the keywords by assigning each POI to a set of predefined categories Evaluation of Context-Aware Recommendation Systems. Pervasive and Mobile (much smaller than the number of keywords) defined based on the Computing 38, Part 2 (2017), 516–541. [6] María del Carmen Rodríguez-Hernández, Sergio Ilarri, Raquel Trillo-Lado, and queries that have been selected for evaluation, in order to define a Ramón Hermoso. 2015. Location-Aware Recommendation Systems: Where We dataset with information that can represent a suitable ground truth Are and Where We Recommend to Go. In Int. Workshop on Location-Aware to compare with. Precision and recall may be calculated by com- Recommendations (LocalRec), Vol. 1405. CEUR Workshop Proceedings, 1–8. [7] Yunjun Gao, Jingwen Zhao, Baihua Zheng, and Gang Chen. 2015. Efficient paring the retrieved POIs according to the categories specified by collective spatial keyword query processing on road networks. IEEE Transactions the user in the query. Regarding the items, there are many datasets on Intelligent Transportation Systems 17, 2 (2015), 469–480. [8] Claudia Leacock and Martin Chodorow. 1998. Combining local context and that contain information about geographic locations and keyword WordNet similarity for word sense identification. WordNet: An electronic lexical descriptions; a tailored synthetic dataset could also be generated database 49, 2 (1998), 265–283. by using DataGenCARS [5]. All this could be complemented with a [9] Yuhua Li, Zuhair A Bandar, and David McLean. 2003. An approach for measuring semantic similarity between words using multiple information sources. IEEE user-centered evaluation. Transactions on Knowledge and Data Engineering 15, 4 (2003), 871–882. The main idea in the empirical evaluation is to show the benefits [10] Cheng Long, Raymond Chi-Wing Wong, Ke Wang, and Ada Wai-Chee Fu. 2013. of the proposal and test how different cost functions behave, tuning Collective spatial keyword queries: a distance owner-driven approach. In 2013 ACM SIGMOD Int. Conference on Management of Data. ACM, 689–700. different parameters. We are also interested in the scalability of the [11] Francesco Ricci, Lior Rokach, and Bracha Shapira. 2011. Introduction to Recom- proposal, so tests with different numbers of query keywords and mender Systems Handbook. In Recommender Systems Handbook. Springer. [12] Pengfei Zhang, Huaizhong Lin, Bin Yao, and Dongming Lu. 2017. Level-aware POIs (as well as simultaneous queries/users) should be carried out. collective spatial keyword queries. Information Sciences 378 (2017), 194–214. Moreover, in order to check the feasibility concerning the use of Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).