Query Embedding Learning for Context-based Social Search
                                                    Cheng-Te Li and Yi-Chun Chen
                                  Institute of Data Science, National Cheng Kung University, Taiwan
                                         chengte@mail.ncku.edu.tw,guava830420@gmail.com

ABSTRACT                                                                   labels associated on users. SSC can be also considered as a query-
Recommending individuals through keywords is an essential and              based people recommendation. By specifying a small set of labels
common search task in online social platforms such as Facebook             depicting the target, rather than her name, SSC is designed to
and LinkedIn. However, it is often that one has only the impression        recommend a list of persons such that the true desired target is
about the desired targets, depicted by labels of social contexts (e.g.     covered and ranked at the top position in the list.
gender, interests, skills, visited locations, employment, etc). Assume        The proposed Search by Social Contexts lies in that there should
each user is associated a set of labels, we propose a novel task, Search   be latent correlation between the input query labels and the target
by Social Contexts (SSC), in online social networks. SSC is a kind of      through the interactions between users and labels and the con-
query-based people recommendation, recommending the desired                nections between users in social networks. To perform search and
target based on a set of user-specified query labels. We develop           recommendation by contexts, we leverage the technique of node
the method Social Query Embedding Learning (SQEL) to deal with             feature representation learning in graphs to model the hidden corre-
SSC. SQEL aims to learn the feature representation (i.e., embedding        lation between query labels and potential target users in the latent
vector) of the query, along with user feature vectors derived from         space. A novel method, Social Query Embedding Learning (SQEL),
graph embedding, and use the learned query vectors to find the             is developed to learn the feature representation (i.e., embedding
targets via similarity. Experiments conducted on Facebook and              vector) of the query, and use the learned feature vector to find the
Twitter datasets exhibit satisfying accuracy and encourage more            users with highest vector similarity scores as the recommended
advanced efforts on search by social contexts.                             targets. Here is our basic idea: the query embedding feature vec-
                                                                           tor is a certain ensemble of an accurate query language model and
                                                                           the proximity distribution between the query and other labels. The
                                                                           former depicts the semantic similarity considering the interactions
                                                                           between users and labels in the latent feature space, while the latter
KEYWORDS                                                                   quantifies the connectivity of query labels in the graph structure.
                                                                           We conduct the experiments using Facebook and Twitter datasets
social search, context query, query embedding, social network              to validate whether SQEL can truly find the targets. The results
                                                                           exhibit that SQEL outperforms a number of competing methods in
                                                                           terms of accuracy.
                                                                              Related Work. J. Kleinberg [5] first defines social search in a
                                                                           network – searching over a social graph to find the set of indi-
                                                                           viduals closest to a given user. Vieira et al. [12] re-formulate the
1   INTRODUCTION                                                           task as a structural ranking problem via friend recommendation
Users in online social services such as Facebook, Instagram, and           for users. Though Schendel et al. [9] recommend items possessing
LinkedIn, often perform search tasks to find the particular persons.       a given set of tags in a social tagging graph, social interactions are
Given the name of the target person specified, the goal of social          neglected. With search log data in LinkedIn and Facebook, Huang
search is usually to accurately recommend targets who are user             et al. [4] unfold the relationships between graph distances and user
desires. However, it is every now and then that users have no              search behaviors. On the other hand, Roth et al. [8] and McAuley
idea about the names of the targets, and have only very limited            and Leskovec [6] group and recommend friends based on social
information about the targets (e.g. gender, interests, hometown,           activities of people. Though Mouratids et al. [7] use both graph
company, and graduated school). On the other hand, sometimes               and geographical distance for people search, labels associated on
users want to find some persons (e.g. experts or celebrities) who          users are not considered. Last, a recent relevant study is Person-
meet some requirements (e.g. skills, awards, and visited locations).       of-Interest Search (POI-Search) [3], which allows user-specified
The common settings of both scenarios is that the target persons           labels and the input user as the query. It is apparent that our SSC
can be depicted by a set of labels. We think it is potential to develop    is more challenging since the input user is not considered. In addi-
a system allowing such kind of social search without specifying            tion, requiring the input user in POI-Search is neither realistic or
the names of targets. In this paper, we propose a novel search             generalized because the search task can be performed by anyone,
task, Search by Social Contexts (SSC), which is to find the desired        rather than only users in the specific social service. Nevertheless,
targeted persons using the social network behind users and the             we will compare with POI-Search in the experiments.
Copyright © CIKM 2018 for the individual papers by the papers'
                                                                           2   PROBLEM STATEMENT
authors. Copyright © CIKM 2018 for the volume as a collection
                                                                           We first define a number of terms and notations, followed by the
by its editors. This volume and its papers are published under
                                                                           problem definition of Search by Social Contexts.
the Creative Commons License Attribution 4.0 International (CC
BY 4.0).
     Definition 1: Social Network. A social network is a weighted           query language model and the graph proximity distribution. The
graph G = (V , E). Each node v has a set of labels. Each edge repre-        query feature vector is required to be reflected by not only the
sents the interaction between two nodes. Edge weights refer to the          co-occurrence between labels, but also how users adopt labels in
interaction cost, where lower values indicate better interactions.          the graph. In other words, we seek to learn the query embedding
     Definition 2: Context Query. A context query consists of a set         feature vector that is close to the fusion of the query language
of labels q, in which ∀c i ∈ q : c i ∈ L, where L is the universe set of    model and the graph proximity distribution by MLE. Here we first
labels.                                                                     define the probability of each label c given a query vector q,
                                                                                                                                        ® termed
     Definition 3: User-Label Interaction. A user-label interaction         query embedding distribution, as below:
r i j = ui , c j is a connection that links user ui ∈ V with label
c j ∈ L if ui possesses c j . We can have a set of user-label interaction                                                     s(®
                                                                                                                                c , q)
                                                                                                                                    ®
                                                                                                            p(c |q)
                                                                                                                 ® = Í                    ,                         (1)
R = {r i j } from all of the interactions between users and labels.                                                       c ′ ∈L s(c , q)
                                                                                                                                     ®′ ®
     Definition 4: Hybrid Graph. A hybrid graph is a weighted
graph H = (V, E), where the node set is the union of user set V             where s(·, ·) is a similarity function that calculates the similarity
and label set L, i.e., V = V ∪ L, and the edge set is the union of          between two feature vectors, c® is the embedding feature vector
social tie set E and user-label interaction set R, i.e., E = E ∪ R. Note    of label c, and L is the set of all labels. Then for the query q, we
that we create a node for each label in the hybrid graph. The hybrid        assume that there is a query language model θq , which quantifies
graph can be considered to jointly model the connections between            the importance of each query label c i ∈ q in the query. Let q®⋆ be
users and the interactions between users and labels.                        the optimal query feature vector, which is the query embedding
     Problem Definition: Search by Social Contexts (SSC). Given             vector (Eq. 1 that is closest to the ensemble of the query language
a social network G, the label set L, the user-label interaction R, the      model θq and the proximity distribution p H (q → c). Therefore,
context query q, and the budget parameter k, the task of search             we propose to find the query feature vector that maximizes the
by social contexts (SSC) is to recommend a ranking list of k users          following log-likelihood function:
S (|S | = k) such that the ground-truth target u ⋆ can be not only
covered by list S (i.e., u ⋆ ∈ S), but also accurately ranked at the
                                                                                                    Õ
                                                                             q®⋆ = arg max                 α · p(c |θq ) + (1 − α) · p H (q → c) log p(c |q),
                                                                                                                                                
                                                                                                                                                          ®
highest position in list S.                                                                  q®
                                                                                                   c ∈L
                                                                                                                                                  (2)
3     THE PROPOSED METHOD                                                   where the proximity distribution from query labels to other labels
To deal with the task of search by social contexts, we propose a novel      p H (q → c) is derived using Random Walk with Restart [11] in the
method, Social Query Embedding Learning (SQEL). The main idea of            hybrid graph H , α ∈ [0, 1] is the parameter to determine the relative
SQEL is to learn the feature representation (i.e., embedding vector)        importance between the query language model and the proximity
of the query, and use the learned feature vector to find the users          distribution, and we will explain how to obtain the query language
with highest vector similarity scores as the results. SQEL consists         model θq later in Section 3.3. We empirically set α = 0.7 by default
of four main steps. First, based on the constructed hybrid graph,           to derive the best results. Directly maximizing the objective in Eq. 2
we can learn the feature representation of each node using the              is time-consuming due to the normalization in the denominator of
technique of node2vec [2], which maps nodes to a low-dimensional            Eq. 1. And such normalization relies on the query vectors, so we can
space of features that maximizes the likelihood of preserving graph         have offline processing. We resort to take the relaxation [14] that
neighborhoods of users and labels). For each label c ∈ L and each           assumes the normalization is equal for all query vectors (Note that it
user u ∈ V , we can derive d-dimensional vectors c® and u,   ® where        is a kind of heuristic). Consequently, we can use the similarity s(®
                                                                                                                                               c , q)
                                                                                                                                                   ®
d is set as 128 through this work. When the query is given, the             to replace p(c |q)
                                                                                            ® in Eq. 2. Since the similarity depends on the query
second step is to build the query language model (Section 3.3) that         vector q,
                                                                                    ® maximizing objective function can be still influenced by
estimates the probability of any label c for the set of query labels q.     the query.
The third step is to learn the feature representation of the query              We take advantage of softmax function to estimate the similar-
(Section 3.1). The query embedding vector q® will be derived using          ity between learned feature vectors of labels. We can define the
the query language model and the graph proximity between nodes              similarity function s(·, ·) as below:
in the hybrid graph. The last step is to generate the recommended                                                             Íd        ®′
                                                                                                                               i=0 c®i c i
                                                                                                                                             !
list of users based on the similarity between feature vectors of the                                      s(®
                                                                                                            c , c®′ ) = exp                      ,                  (3)
query and users (Section 3.2).                                                                                                 c ∥∥c®′ ∥
                                                                                                                              ∥®

3.1    Learning Query Representation                                        where d is the dimension of learned feature vectors. We assume
                                                                            that each learned feature vector of label is a unit vector, i.e., the
We aim to leverage the technique of maximum likelihood estimation
                                                                            norm equals to 1, under the setting without loss of generality. Then
(MLE) to learn the feature representation (i.e., embedding vector)
                                                                            we can rewrite the objective function from Eq. 2 to the following:
of the query. The central idea of our method is to maximize the
likelihood of an query embedding probabilistic distribution towards                                                                                      Íd             !
                                                                                                                                                          i=0 c®i q®i
                                                                                                  Õ
a query language model and the proximity distribution from query            q®⋆ = arg max                α · p(c |θq ) + (1 − α) · p H (q → c) ·
                                                                                                                                                     
labels to other labels in the hybrid graph H . The underlying as-                       q®                                                                    ∥q∥
                                                                                                                                                               ®
                                                                                                  c ∈L
sumption is that the query feature vector is a certain ensemble of                                                                                                  (4)
  By letting the query feature vector be a unit vector, i.e., ∥q∥ = 1,                abovementioned conditional independence between query labels,
we can have the objective function in the form of Lagrange function:                  we can have the query language model as below:
                Õ                                                d                                                                                      k
                                                                Õ                                                                                      Ö
      q, λ) =
    L(®                α · p(c |θq ) + (1 − α) · p H (q → c)          c®i q®i                   p(c |θq ) ≈ p(c 1 , c 2 , · · · , c k |c)p(c) = p(c)          p(c i |c),   (8)
                c ∈L                                            i=0                                                                                     i=1
                                                                                (5)
                           d                                                          where {c i |i = 1, 2, · · · , k} are the set of query labels, and k is
                                              !
                           Õ
                                          2
                +λ 1−            (q®i )           ,                                   the size of query label set. We compute p(c i |c) in Eq. 8 using the
                           i=0                                                        similarity formula Eq. 3 with the learned embedding feature vectors
where λ is the Lagrange multiplier. Through the mathematical                          that depict the high-order relationships between labels and users
optimization method for Lagrange multipliers, we can derive the                       in the hybrid graph, as follows.
first derivatives of the Lagrange objective function, as below:                                                                     s(c®i , c®)
      ∂L = Í                                                                                                     p(c i |c) = Í                      .                      (9)
                                                                                                                                 c j ∈L s(c®j , c®)
   (
               c ∈L c®i α · p(c |θ q ) + (1 − α) · p H (q → c) − 2λq®i
                                                                 
      ∂q®i
      ∂ L = 1 − Íd (q® )2                                                     ,
      ∂λ           i=1 i                                                              In addition, we can to compute p(c) based on the law of total prob-
                                                                                      ability: p(c) ≈ c j ∈L s(c®j , c®). A label with higher similarity with all
                                                                                                     Í
                                                                              (6)
where q®i is the i-th value of query feature vector q.        ® By setting the        the other labels will derive higher probability.
partial derivatives in Eq. 6 as zero, we can obtain the closed-form
solution of our objective as follows:                                                 4    EVALUATION
                  c ∈L c®i α · p(c |θ q ) + (1 − α) · p(q → c)
                                                                  
                                                                                      The experiments are conducted using Facebook and Twitter social
                Í
  q®i = q                                                               2 . (7)     network data provided by SNAP1 . The data statistics are shown
           Íd Í
             j=1      c ∈L c®j α · p(c |θ q ) + (1 − α) · p H (q →  c)                in Table 1, in which CC means clustering coefficient while APL
Eventually we can have the query feature vector q®i using the closed-                 refers to average path length in the social network G. Labels in
form formula in Eq. 7.                                                                both data used include the terms of school, hometown, employer,
                                                                                      skill, gender, location, position, etc. In the social networks, we
3.2     Deriving the Ranking List                                                     compute the edge weights using the Jaccard coefficient: wu,v =
                                                                                      1 − (|Lu ∩ Lv |/|Lu ∪ Lv |), where Lu is the set of labels of node u.
Since our goal is to recommend a ranking list of k targeted users S,
                                                                                      Our general aims at examining whether or not the proposed SQEL
we aim at estimating the social affinity between the query q and each
                                                                                      can help accurately find the desired targets for the task of search
user ui ∈ V based on the learned embedding vectors of users and
                                                                                      by social contexts.
query u®i and q.
              ® Users with higher social affinity scores to the query
are considered as more relevant to the query, and thus are ranked                             Table 1: Statistics of Facebook and Twitter data.
at higher positions in the final list S. We select the top k users with                                           #nodes          #edges             CC         APL
highest social affinity scores. We define the social affinity δ (ui , q)                         Facebook           4,039           88,234          0.606        4.7
between user ui and query q using the similarity between two                                      Twitter          81,306        1,768,149          0.565        4.5
                                                      s(u® , q)
                                                             ®
embedding vectors defined in Eq. 3: δ (ui , q) = Í s(iu® , q)    . We can
                                                            j  ® j                        We compare SQEL with four competing methods: (a) label match-
simplify the computation of δ (ui , q) by discarding the denominator                  ing (LM): using the query label set Q to find a list of possible
because it is the same for all users, i.e., δ (ui , q) ≈ s(u®i , q).
                                                                 ®                    target users u such that the labels of u have the highest over-
                                                                                                                  |Lu ∩Q |
                                                                                      lap with Q, LM(u, Q) = |L            ; (b) Center-Piece Subgraph (CePS)
3.3     Query Language Model                                                                                          u|
                                                                                      [10] with OR operation: consider those users containing at least
The word embedding techniques have been validated to improve                          one label in Q as the source nodes in CePS, run the CePS algo-
the accuracy of document retrieval [1][13]. Based on such an idea,                    rithm, and return a list of possible targets u with the highest CePS
to develop an accurate language model θq for the query, we further                    scores CePS(t, Q); (c) CePS+LM: the selection procedure is simi-
extend the embedding from vocabulary terms to label nodes in the                      lar to CePS but returns the list of possible target users with the
graph. To estimate the probability of any label c given the language                  highest average rank scores of CePS and LM, CePSLM(u, Q) =
model θq , i.e., p(c |θq ), our main idea is to impose an assumption:                  1 (rank(CePS(u, Q) + rank(LM(u, Q))); and (d) POI-Search [3] is a
                                                                                       2
the conditional independence between query labels – when users                        greedy heuristic algorithm to find the target covering the query
performing context queries, they tend to describe the underlying                      labels and having strong social interactions and higher proximity
target using clues from multiple aspects that might not be directly                   scores towards the query labels.
correlated (e.g. q = {NYC, Python, Spaghetti, BMW}). Note that this                       In the evaluation setting, we first choose users in a random
conditional independence assumption may not hold for all cases                        manner as the ground-truth targets. Then for each ground-truth
of user queries. To simplify the computation, we make such an                         target user ū, we randomly select k labels from the label set of
assumption, and leave the modeling of correlation between query                       users {ū} ∪ N (ū), denoted by L {ū }∪N(ū) , to be the query label set
labels in the future work.                                                            Qū , where N (ū) is the immediate neighbors of user ū in the social
   In order to estimate the query language model p(c |θq ), the Bayes                 network G, L X = vinX Lv is the union of label sets of users X , and
                                                                                                          Ð
                                 p(θ |c)p(c)
rule is applied: p(c |θq ) = q
                             p(q)  ≈ p(θq |c)p(c) (the term p(q)                      1 http://snap.stanford.edu/data/egonets-Facebook.html
can be neglected as it is independent of label c). Based on the                       http://snap.stanford.edu/data/egonets-Twitter.html
                                                                                   Figure 2: Accuracy by varying (a) the hardness parameter τ ,
                                                                                   and (b) α in Eq. 2.
Figure 1: Accuracy scores by varying the number of query
labels in Facebook ad Twitter data.                                                The results are consistent in both Facebook and Twitter datasets.
                                                                                   On the other hand, in Figure 2(b), too high or too low values of α
|Qū | = k. Here we set k = 10 by default, and will vary k to simulate             values lead to worse accuracy scores. The performance tends to
different knowledge extent about the target. To control the hardness               be better when alpha is around 0.7, which is therefore used as the
of the search task, we introduce a hardness parameter τ ∈ [0, 1]. In               default value. Such results also inform us the influence of query
selecting the query labels, we randomly choose τk labels from L N(ū)              language model is more significant than the proximity distribution
and (1 − τ )k labels from Lū . Higher τ values make the task more                 in learning query representation.
challenging. The design τ can also reflect that real users sometimes
provide inaccurate evidences about the desired targets. We set τ =                 5     CONCLUSIONS
0.6 by default. We generate 500 pairs of query Q i and corresponding
                                                                                   This paper proposes the task of Search by Social Contexts (SSC)
ground-truth user u¯i based on the above setting, i.e., Q i ∈ T , |T | =
                                                                                   to recommend the target individuals desired by the user based
500. Given the top-n targets Sn (Q i ) returned by a method and
                                                                                   on a set of query labels in social networks. The main technical
the ground-truth targets S ⋆ (Q i ) = u¯i , we define the evaluation
                                                                                   contribution is the novel development of Social Query Embedding
metric using accuracy: accuracy = Q i ∈T hit(Sn (Q i ), S ⋆ (Q i ))/|T |,
                                           Í
                                                                                   Learning (SQEL), which learns the feature representation of the
where hit(Sn (Q i ), S ⋆ (Q i )) = 1 if |Sn (Q i ) ∩ S ⋆ (Q i )| , 0; otherwise:
                                                                                   query considering the query language model and the proximity
hit(Sn (Q i ), S ⋆ (Q i )) = 0. It can be observed that as n increases, the
                                                                                   distribution in a user-label interaction hybrid graph. Experimental
search task tends to be easier and will obtain higher accuracy. We
                                                                                   results show promising accuracy. We will also evaluate SQEL using
set n = 5 by default.
                                                                                   a user study by a developing Facebook application. The future work
   The evaluation plan consists of two parts. The first is General
                                                                                   is to extend SSC by imposing spatio-temporal contexts – the targets
Evaluation: varying the number of query labels k = |Q | to show the
                                                                                   are required to reflect the current time and location of the user.
performance of SQEL. A larger Q set means that the query contains
more information depicting the target. The second is sensitivity
analysis: varying the hardness τ – higher τ will introduce more                    REFERENCES
indirect labels that might be less relevant to the target, and thus                 [1] Debasis Ganguly, Dwaipayan Roy, Mandar Mitra, and Gareth J.F. Jones. 2015.
                                                                                        Word Embedding Based Generalized Language Model for Information Retrieval.
make the task more difficult; and vary the parameter α in Eq. 2 –                       In SIGIR.
higher α will make the query embedding more significantly learned                   [2] Aditya Grover and Jure Leskovec. 2016. Node2Vec: Scalable Feature Learning for
                                                                                        Networks. In KDD.
from the query language model.                                                      [3] Hsun-Ping Hsieh, Cheng-Te Li, and Rui Yan. 2015. I See You: Person-of-Interest
   Experimental results are shown in Figure 1. By varying the num-                      Search in Social Networks. In SIGIR.
ber of query labels k, we can find that the proposed SQEL outper-                   [4] S.-W. Huang, D. Tunkelang, and K. Karahalios. 2014. The Role of Network
                                                                                        Distance in LinkedIn People Search. In SIGIR.
forms the other competing methods, especially when the number                       [5] J. Kleinberg. 2006. Social Networks, Incentives, and Search. In SIGIR.
of query labels increases. It is because more labels provide more evi-              [6] J. McAuley and J. Leskovec. 2012. Learning to Discover Social Circles in Ego
dences about the desired targets. Though the integration (CePS+LM)                      Networks. In NIPS.
                                                                                    [7] K. Mouratidis, J. Li, Y. Tang, and N. Mamoulis. 2015. Joint Search by Social and
of structural connectivity in CePS and label matching in LM can                         Spatial Proximity. In IEEE TKDE.
boost the effectiveness, the accuracy values are still lower than                   [8] M. Roth, A. Ben-David, D. Deutscher, G. Flysher, I. Horn, A. Leichtberg, N. Leiser,
                                                                                        Y. Matias, and R. Merom. 2010. Suggesting Friends Using the Implicit Social
SQEL. In addition, SQEL also clearly and consistently outperforms                       Graph. In KDD.
the recent work POI-Search. These results imply that learning the                   [9] R. Schenkel, T. Crecelius, M. Kacimi, S. Michel, T. Neumann, and G. Weikum.
feature representation of query and users can effectively capture                       2008. Efficient Top-k Querying over Social-Tagging Networks. In SIGIR.
                                                                                   [10] H. Tong and C. Faloutsos. 2006. Center-Piece Subgraph: Problem Definition and
the search intent and accurately obtain the targets.                                    Fast Solutions. In KDD.
   Figure 2 shows the results of sensitivity analysis for the parame-              [11] H. Tong, C. Faloutsos, and J.-Y. Pan. 2006. Fast Random Walk with Restart and
ters of (a) the hardness τ and (b) α that determines the importance                     Its Applications. In ICDM.
                                                                                   [12] M. V. Vieira, B. M. Fonseca, R. Damazio, P. B. Golgher, D. D. C. Reis, and B.
between query language model and proximity distribution in Eq.                          Ribeiro-Neto. 2007. Efficient Search Ranking in Social Networks. In CIKM.
2. We can apparently find, in Figure 2(a) that higher values of τ                  [13] Hamed Zamani and W. Bruce Croft. 2016. Embedding-based Query Language
                                                                                        Models. In ICTIR.
lead to lower accuracy scores while lower τ produces better perfor-                [14] Hamed Zamani and W. Bruce Croft. 2016. Estimating Embedding Vectors for
mance, since more labels describing the target make the task easier.                    Queries. In ICTIR.