Query Embedding Learning for Context-based Social Search Cheng-Te Li and Yi-Chun Chen Institute of Data Science, National Cheng Kung University, Taiwan chengte@mail.ncku.edu.tw,guava830420@gmail.com ABSTRACT labels associated on users. SSC can be also considered as a query- Recommending individuals through keywords is an essential and based people recommendation. By specifying a small set of labels common search task in online social platforms such as Facebook depicting the target, rather than her name, SSC is designed to and LinkedIn. However, it is often that one has only the impression recommend a list of persons such that the true desired target is about the desired targets, depicted by labels of social contexts (e.g. covered and ranked at the top position in the list. gender, interests, skills, visited locations, employment, etc). Assume The proposed Search by Social Contexts lies in that there should each user is associated a set of labels, we propose a novel task, Search be latent correlation between the input query labels and the target by Social Contexts (SSC), in online social networks. SSC is a kind of through the interactions between users and labels and the con- query-based people recommendation, recommending the desired nections between users in social networks. To perform search and target based on a set of user-specified query labels. We develop recommendation by contexts, we leverage the technique of node the method Social Query Embedding Learning (SQEL) to deal with feature representation learning in graphs to model the hidden corre- SSC. SQEL aims to learn the feature representation (i.e., embedding lation between query labels and potential target users in the latent vector) of the query, along with user feature vectors derived from space. A novel method, Social Query Embedding Learning (SQEL), graph embedding, and use the learned query vectors to find the is developed to learn the feature representation (i.e., embedding targets via similarity. Experiments conducted on Facebook and vector) of the query, and use the learned feature vector to find the Twitter datasets exhibit satisfying accuracy and encourage more users with highest vector similarity scores as the recommended advanced efforts on search by social contexts. targets. Here is our basic idea: the query embedding feature vec- tor is a certain ensemble of an accurate query language model and the proximity distribution between the query and other labels. The former depicts the semantic similarity considering the interactions between users and labels in the latent feature space, while the latter KEYWORDS quantifies the connectivity of query labels in the graph structure. We conduct the experiments using Facebook and Twitter datasets social search, context query, query embedding, social network to validate whether SQEL can truly find the targets. The results exhibit that SQEL outperforms a number of competing methods in terms of accuracy. Related Work. J. Kleinberg [5] first defines social search in a network – searching over a social graph to find the set of indi- viduals closest to a given user. Vieira et al. [12] re-formulate the 1 INTRODUCTION task as a structural ranking problem via friend recommendation Users in online social services such as Facebook, Instagram, and for users. Though Schendel et al. [9] recommend items possessing LinkedIn, often perform search tasks to find the particular persons. a given set of tags in a social tagging graph, social interactions are Given the name of the target person specified, the goal of social neglected. With search log data in LinkedIn and Facebook, Huang search is usually to accurately recommend targets who are user et al. [4] unfold the relationships between graph distances and user desires. However, it is every now and then that users have no search behaviors. On the other hand, Roth et al. [8] and McAuley idea about the names of the targets, and have only very limited and Leskovec [6] group and recommend friends based on social information about the targets (e.g. gender, interests, hometown, activities of people. Though Mouratids et al. [7] use both graph company, and graduated school). On the other hand, sometimes and geographical distance for people search, labels associated on users want to find some persons (e.g. experts or celebrities) who users are not considered. Last, a recent relevant study is Person- meet some requirements (e.g. skills, awards, and visited locations). of-Interest Search (POI-Search) [3], which allows user-specified The common settings of both scenarios is that the target persons labels and the input user as the query. It is apparent that our SSC can be depicted by a set of labels. We think it is potential to develop is more challenging since the input user is not considered. In addi- a system allowing such kind of social search without specifying tion, requiring the input user in POI-Search is neither realistic or the names of targets. In this paper, we propose a novel search generalized because the search task can be performed by anyone, task, Search by Social Contexts (SSC), which is to find the desired rather than only users in the specific social service. Nevertheless, targeted persons using the social network behind users and the we will compare with POI-Search in the experiments. Copyright © CIKM 2018 for the individual papers by the papers' 2 PROBLEM STATEMENT authors. Copyright © CIKM 2018 for the volume as a collection We first define a number of terms and notations, followed by the by its editors. This volume and its papers are published under problem definition of Search by Social Contexts. the Creative Commons License Attribution 4.0 International (CC BY 4.0). Definition 1: Social Network. A social network is a weighted query language model and the graph proximity distribution. The graph G = (V , E). Each node v has a set of labels. Each edge repre- query feature vector is required to be reflected by not only the sents the interaction between two nodes. Edge weights refer to the co-occurrence between labels, but also how users adopt labels in interaction cost, where lower values indicate better interactions. the graph. In other words, we seek to learn the query embedding Definition 2: Context Query. A context query consists of a set feature vector that is close to the fusion of the query language of labels q, in which ∀c i ∈ q : c i ∈ L, where L is the universe set of model and the graph proximity distribution by MLE. Here we first labels. define the probability of each label c given a query vector q, ® termed Definition 3: User-Label Interaction. A user-label interaction query embedding distribution, as below: r i j = ui , c j is a connection that links user ui ∈ V with label c j ∈ L if ui possesses c j . We can have a set of user-label interaction s(® c , q) ® p(c |q) ® = Í , (1) R = {r i j } from all of the interactions between users and labels. c ′ ∈L s(c , q) ®′ ® Definition 4: Hybrid Graph. A hybrid graph is a weighted graph H = (V, E), where the node set is the union of user set V where s(·, ·) is a similarity function that calculates the similarity and label set L, i.e., V = V ∪ L, and the edge set is the union of between two feature vectors, c® is the embedding feature vector social tie set E and user-label interaction set R, i.e., E = E ∪ R. Note of label c, and L is the set of all labels. Then for the query q, we that we create a node for each label in the hybrid graph. The hybrid assume that there is a query language model θq , which quantifies graph can be considered to jointly model the connections between the importance of each query label c i ∈ q in the query. Let q®⋆ be users and the interactions between users and labels. the optimal query feature vector, which is the query embedding Problem Definition: Search by Social Contexts (SSC). Given vector (Eq. 1 that is closest to the ensemble of the query language a social network G, the label set L, the user-label interaction R, the model θq and the proximity distribution p H (q → c). Therefore, context query q, and the budget parameter k, the task of search we propose to find the query feature vector that maximizes the by social contexts (SSC) is to recommend a ranking list of k users following log-likelihood function: S (|S | = k) such that the ground-truth target u ⋆ can be not only covered by list S (i.e., u ⋆ ∈ S), but also accurately ranked at the Õ q®⋆ = arg max α · p(c |θq ) + (1 − α) · p H (q → c) log p(c |q),  ® highest position in list S. q® c ∈L (2) 3 THE PROPOSED METHOD where the proximity distribution from query labels to other labels To deal with the task of search by social contexts, we propose a novel p H (q → c) is derived using Random Walk with Restart [11] in the method, Social Query Embedding Learning (SQEL). The main idea of hybrid graph H , α ∈ [0, 1] is the parameter to determine the relative SQEL is to learn the feature representation (i.e., embedding vector) importance between the query language model and the proximity of the query, and use the learned feature vector to find the users distribution, and we will explain how to obtain the query language with highest vector similarity scores as the results. SQEL consists model θq later in Section 3.3. We empirically set α = 0.7 by default of four main steps. First, based on the constructed hybrid graph, to derive the best results. Directly maximizing the objective in Eq. 2 we can learn the feature representation of each node using the is time-consuming due to the normalization in the denominator of technique of node2vec [2], which maps nodes to a low-dimensional Eq. 1. And such normalization relies on the query vectors, so we can space of features that maximizes the likelihood of preserving graph have offline processing. We resort to take the relaxation [14] that neighborhoods of users and labels). For each label c ∈ L and each assumes the normalization is equal for all query vectors (Note that it user u ∈ V , we can derive d-dimensional vectors c® and u, ® where is a kind of heuristic). Consequently, we can use the similarity s(® c , q) ® d is set as 128 through this work. When the query is given, the to replace p(c |q) ® in Eq. 2. Since the similarity depends on the query second step is to build the query language model (Section 3.3) that vector q, ® maximizing objective function can be still influenced by estimates the probability of any label c for the set of query labels q. the query. The third step is to learn the feature representation of the query We take advantage of softmax function to estimate the similar- (Section 3.1). The query embedding vector q® will be derived using ity between learned feature vectors of labels. We can define the the query language model and the graph proximity between nodes similarity function s(·, ·) as below: in the hybrid graph. The last step is to generate the recommended Íd ®′ i=0 c®i c i ! list of users based on the similarity between feature vectors of the s(® c , c®′ ) = exp , (3) query and users (Section 3.2). c ∥∥c®′ ∥ ∥® 3.1 Learning Query Representation where d is the dimension of learned feature vectors. We assume that each learned feature vector of label is a unit vector, i.e., the We aim to leverage the technique of maximum likelihood estimation norm equals to 1, under the setting without loss of generality. Then (MLE) to learn the feature representation (i.e., embedding vector) we can rewrite the objective function from Eq. 2 to the following: of the query. The central idea of our method is to maximize the likelihood of an query embedding probabilistic distribution towards Íd ! i=0 c®i q®i Õ a query language model and the proximity distribution from query q®⋆ = arg max α · p(c |θq ) + (1 − α) · p H (q → c) ·  labels to other labels in the hybrid graph H . The underlying as- q® ∥q∥ ® c ∈L sumption is that the query feature vector is a certain ensemble of (4) By letting the query feature vector be a unit vector, i.e., ∥q∥ = 1, abovementioned conditional independence between query labels, we can have the objective function in the form of Lagrange function: we can have the query language model as below: Õ d k Õ Ö q, λ) = L(® α · p(c |θq ) + (1 − α) · p H (q → c) c®i q®i p(c |θq ) ≈ p(c 1 , c 2 , · · · , c k |c)p(c) = p(c) p(c i |c), (8) c ∈L i=0 i=1 (5) d where {c i |i = 1, 2, · · · , k} are the set of query labels, and k is ! Õ 2 +λ 1− (q®i ) , the size of query label set. We compute p(c i |c) in Eq. 8 using the i=0 similarity formula Eq. 3 with the learned embedding feature vectors where λ is the Lagrange multiplier. Through the mathematical that depict the high-order relationships between labels and users optimization method for Lagrange multipliers, we can derive the in the hybrid graph, as follows. first derivatives of the Lagrange objective function, as below: s(c®i , c®) ∂L = Í p(c i |c) = Í . (9) c j ∈L s(c®j , c®) ( c ∈L c®i α · p(c |θ q ) + (1 − α) · p H (q → c) − 2λq®i  ∂q®i ∂ L = 1 − Íd (q® )2 , ∂λ i=1 i In addition, we can to compute p(c) based on the law of total prob- ability: p(c) ≈ c j ∈L s(c®j , c®). A label with higher similarity with all Í (6) where q®i is the i-th value of query feature vector q. ® By setting the the other labels will derive higher probability. partial derivatives in Eq. 6 as zero, we can obtain the closed-form solution of our objective as follows: 4 EVALUATION c ∈L c®i α · p(c |θ q ) + (1 − α) · p(q → c)  The experiments are conducted using Facebook and Twitter social Í q®i = q  2 . (7) network data provided by SNAP1 . The data statistics are shown Íd Í j=1 c ∈L c®j α · p(c |θ q ) + (1 − α) · p H (q → c) in Table 1, in which CC means clustering coefficient while APL Eventually we can have the query feature vector q®i using the closed- refers to average path length in the social network G. Labels in form formula in Eq. 7. both data used include the terms of school, hometown, employer, skill, gender, location, position, etc. In the social networks, we 3.2 Deriving the Ranking List compute the edge weights using the Jaccard coefficient: wu,v = 1 − (|Lu ∩ Lv |/|Lu ∪ Lv |), where Lu is the set of labels of node u. Since our goal is to recommend a ranking list of k targeted users S, Our general aims at examining whether or not the proposed SQEL we aim at estimating the social affinity between the query q and each can help accurately find the desired targets for the task of search user ui ∈ V based on the learned embedding vectors of users and by social contexts. query u®i and q. ® Users with higher social affinity scores to the query are considered as more relevant to the query, and thus are ranked Table 1: Statistics of Facebook and Twitter data. at higher positions in the final list S. We select the top k users with #nodes #edges CC APL highest social affinity scores. We define the social affinity δ (ui , q) Facebook 4,039 88,234 0.606 4.7 between user ui and query q using the similarity between two Twitter 81,306 1,768,149 0.565 4.5 s(u® , q) ® embedding vectors defined in Eq. 3: δ (ui , q) = Í s(iu® , q) . We can j ® j We compare SQEL with four competing methods: (a) label match- simplify the computation of δ (ui , q) by discarding the denominator ing (LM): using the query label set Q to find a list of possible because it is the same for all users, i.e., δ (ui , q) ≈ s(u®i , q). ® target users u such that the labels of u have the highest over- |Lu ∩Q | lap with Q, LM(u, Q) = |L ; (b) Center-Piece Subgraph (CePS) 3.3 Query Language Model u| [10] with OR operation: consider those users containing at least The word embedding techniques have been validated to improve one label in Q as the source nodes in CePS, run the CePS algo- the accuracy of document retrieval [1][13]. Based on such an idea, rithm, and return a list of possible targets u with the highest CePS to develop an accurate language model θq for the query, we further scores CePS(t, Q); (c) CePS+LM: the selection procedure is simi- extend the embedding from vocabulary terms to label nodes in the lar to CePS but returns the list of possible target users with the graph. To estimate the probability of any label c given the language highest average rank scores of CePS and LM, CePSLM(u, Q) = model θq , i.e., p(c |θq ), our main idea is to impose an assumption: 1 (rank(CePS(u, Q) + rank(LM(u, Q))); and (d) POI-Search [3] is a 2 the conditional independence between query labels – when users greedy heuristic algorithm to find the target covering the query performing context queries, they tend to describe the underlying labels and having strong social interactions and higher proximity target using clues from multiple aspects that might not be directly scores towards the query labels. correlated (e.g. q = {NYC, Python, Spaghetti, BMW}). Note that this In the evaluation setting, we first choose users in a random conditional independence assumption may not hold for all cases manner as the ground-truth targets. Then for each ground-truth of user queries. To simplify the computation, we make such an target user ū, we randomly select k labels from the label set of assumption, and leave the modeling of correlation between query users {ū} ∪ N (ū), denoted by L {ū }∪N(ū) , to be the query label set labels in the future work. Qū , where N (ū) is the immediate neighbors of user ū in the social In order to estimate the query language model p(c |θq ), the Bayes network G, L X = vinX Lv is the union of label sets of users X , and Ð p(θ |c)p(c) rule is applied: p(c |θq ) = q p(q) ≈ p(θq |c)p(c) (the term p(q) 1 http://snap.stanford.edu/data/egonets-Facebook.html can be neglected as it is independent of label c). Based on the http://snap.stanford.edu/data/egonets-Twitter.html Figure 2: Accuracy by varying (a) the hardness parameter τ , and (b) α in Eq. 2. Figure 1: Accuracy scores by varying the number of query labels in Facebook ad Twitter data. The results are consistent in both Facebook and Twitter datasets. On the other hand, in Figure 2(b), too high or too low values of α |Qū | = k. Here we set k = 10 by default, and will vary k to simulate values lead to worse accuracy scores. The performance tends to different knowledge extent about the target. To control the hardness be better when alpha is around 0.7, which is therefore used as the of the search task, we introduce a hardness parameter τ ∈ [0, 1]. In default value. Such results also inform us the influence of query selecting the query labels, we randomly choose τk labels from L N(ū) language model is more significant than the proximity distribution and (1 − τ )k labels from Lū . Higher τ values make the task more in learning query representation. challenging. The design τ can also reflect that real users sometimes provide inaccurate evidences about the desired targets. We set τ = 5 CONCLUSIONS 0.6 by default. We generate 500 pairs of query Q i and corresponding This paper proposes the task of Search by Social Contexts (SSC) ground-truth user u¯i based on the above setting, i.e., Q i ∈ T , |T | = to recommend the target individuals desired by the user based 500. Given the top-n targets Sn (Q i ) returned by a method and on a set of query labels in social networks. The main technical the ground-truth targets S ⋆ (Q i ) = u¯i , we define the evaluation contribution is the novel development of Social Query Embedding metric using accuracy: accuracy = Q i ∈T hit(Sn (Q i ), S ⋆ (Q i ))/|T |, Í Learning (SQEL), which learns the feature representation of the where hit(Sn (Q i ), S ⋆ (Q i )) = 1 if |Sn (Q i ) ∩ S ⋆ (Q i )| , 0; otherwise: query considering the query language model and the proximity hit(Sn (Q i ), S ⋆ (Q i )) = 0. It can be observed that as n increases, the distribution in a user-label interaction hybrid graph. Experimental search task tends to be easier and will obtain higher accuracy. We results show promising accuracy. We will also evaluate SQEL using set n = 5 by default. a user study by a developing Facebook application. The future work The evaluation plan consists of two parts. The first is General is to extend SSC by imposing spatio-temporal contexts – the targets Evaluation: varying the number of query labels k = |Q | to show the are required to reflect the current time and location of the user. performance of SQEL. A larger Q set means that the query contains more information depicting the target. The second is sensitivity analysis: varying the hardness τ – higher τ will introduce more REFERENCES indirect labels that might be less relevant to the target, and thus [1] Debasis Ganguly, Dwaipayan Roy, Mandar Mitra, and Gareth J.F. Jones. 2015. Word Embedding Based Generalized Language Model for Information Retrieval. make the task more difficult; and vary the parameter α in Eq. 2 – In SIGIR. higher α will make the query embedding more significantly learned [2] Aditya Grover and Jure Leskovec. 2016. Node2Vec: Scalable Feature Learning for Networks. In KDD. from the query language model. [3] Hsun-Ping Hsieh, Cheng-Te Li, and Rui Yan. 2015. I See You: Person-of-Interest Experimental results are shown in Figure 1. By varying the num- Search in Social Networks. In SIGIR. ber of query labels k, we can find that the proposed SQEL outper- [4] S.-W. Huang, D. Tunkelang, and K. Karahalios. 2014. The Role of Network Distance in LinkedIn People Search. In SIGIR. forms the other competing methods, especially when the number [5] J. Kleinberg. 2006. Social Networks, Incentives, and Search. In SIGIR. of query labels increases. It is because more labels provide more evi- [6] J. McAuley and J. Leskovec. 2012. Learning to Discover Social Circles in Ego dences about the desired targets. Though the integration (CePS+LM) Networks. In NIPS. [7] K. Mouratidis, J. Li, Y. Tang, and N. Mamoulis. 2015. Joint Search by Social and of structural connectivity in CePS and label matching in LM can Spatial Proximity. In IEEE TKDE. boost the effectiveness, the accuracy values are still lower than [8] M. Roth, A. Ben-David, D. Deutscher, G. Flysher, I. Horn, A. Leichtberg, N. Leiser, Y. Matias, and R. Merom. 2010. Suggesting Friends Using the Implicit Social SQEL. In addition, SQEL also clearly and consistently outperforms Graph. In KDD. the recent work POI-Search. These results imply that learning the [9] R. Schenkel, T. Crecelius, M. Kacimi, S. Michel, T. Neumann, and G. Weikum. feature representation of query and users can effectively capture 2008. Efficient Top-k Querying over Social-Tagging Networks. In SIGIR. [10] H. Tong and C. Faloutsos. 2006. Center-Piece Subgraph: Problem Definition and the search intent and accurately obtain the targets. Fast Solutions. In KDD. Figure 2 shows the results of sensitivity analysis for the parame- [11] H. Tong, C. Faloutsos, and J.-Y. Pan. 2006. Fast Random Walk with Restart and ters of (a) the hardness τ and (b) α that determines the importance Its Applications. In ICDM. [12] M. V. Vieira, B. M. Fonseca, R. Damazio, P. B. Golgher, D. D. C. Reis, and B. between query language model and proximity distribution in Eq. Ribeiro-Neto. 2007. Efficient Search Ranking in Social Networks. In CIKM. 2. We can apparently find, in Figure 2(a) that higher values of τ [13] Hamed Zamani and W. Bruce Croft. 2016. Embedding-based Query Language Models. In ICTIR. lead to lower accuracy scores while lower τ produces better perfor- [14] Hamed Zamani and W. Bruce Croft. 2016. Estimating Embedding Vectors for mance, since more labels describing the target make the task easier. Queries. In ICTIR.