Benchmarking the Privacy-Preserving People Search Shuguang Han, Daqing He and Zhen Yue School of Information Sciences, University of Pittsburgh 135 N Bellefield Ave., Pittsburgh, PA, United States shh69@pitt.edu, dah44@pitt.edu, zhy18@pitt.edu ABSTRACT many social network services [7, 8] - users often either opt out People search is an important topic in information retrieval. Many from certain social networks or provide incomplete or even fake previous studies on this topic employed social networks to boost social network information. Early research work has shown that search performance by incorporating either local network features many data mining algorithms may not work or even harm user (e.g. the common connections between the querying user and experience when equipped with such incomplete and noisy social candidates in social networks), or global network features (e.g. the information [9]. Recently, researchers start to incorporating social PageRank), or both. However, the available social network information into people search systems, and the coauthor information can be restricted because of the privacy settings of networks generated from scholarly publications were often involved users, which in turn would affect the performance of utilized [4, 5, 10]. However, probably because the coauthor people search. Therefore, in this paper, we focus on the privacy networks often have less privacy concerns, little attention has issues in people search. We propose simulating different privacy been paid to the privacy related issues in people search. settings with a public social network due to the unavailability of Furthermore, there is no study on how incomplete social networks privacy-concerned networks. Our study examines the influences would affect the performance of people search systems. of privacy concerns on the local and global network features, and In this paper, we are particularly interested in the privacy issues in their impacts on the performance of people search. Our results people search and the impacts of these issues on people search show that: 1) the privacy concerns of different people in the performance. The TREC experience demonstrates that it would be networks have different influences. People with higher association a critical drawback for studying the search problems if there are (i.e. higher degree in a network) have much greater impacts on the no appropriate test beds. Considering the difficulty of obtaining an performance of people search; 2) local network features are more open privacy-concerned social network and the expense of sensitive to the privacy concerns, especially when such concerns constructing such a network from scratch for research purpose, we come from high association peoples in the network who are also propose in this paper to simulate the privacy-concerned social related to the querying user. As the first study on this topic, we network using the public available coauthor networks. Note that, hope to generate further discussions on these issues. users in many social network services are able to keep both their profiles and social connections as private. In this paper, we focus Categories and Subject Descriptors on the privacy issues of sharing social connections. H.2.8 [Database Applications]: Data Mining; H.3.3 [Information The key assumption of our simulation is that a coauthor network Storage and Retrieval]: Information Search and Retrieval - would have the same or similar network characteristics with a Search process privacy-concerned social network. The foundation of our simulation approach is based on some existing studies [11-13], Keywords which state that many real-world social networks (including People Search; Privacy-preserving networks; Privacy-preserving coauthor networks and many other privacy-concerned networks people search such as Facebook social networks) share the same patterns: they are small-world networks and their degree distributions are highly 1. INTRODUCTION skewed. Newman [14] studied the assortative patterns (the Modern search engines often assume that their search algorithms preferences of connecting people who share the similar features) should return the most relevant documents to a query. However, of social networks. He found that the social networks showed in many occasions, users actually want to look for relevant people assortatively mixed patterns, whereas technological and biological rather than documents. For example, company recruiters may seems to be disassortative. Therefore, it is reasonable to assume need to find appropriate job candidates for a job opening [1]; or that coauthor networks and many privacy-preserving networks conference chairs may need to invite the right experts to form a (because they are both social networks) share some important program committee [2]. These topics have been studied as the common characteristics. Therefore, coauthor networks, which are expert finding problems in the information retrieval community publically available, can be used as the surrogate for studying [3], and the expert is often defined as the people who have domain privacy-preserving social networks. In the remaining part of this knowledge for a given topic. However, expert finding is only one paper, all the privacy related discussions are based on coauthor type of people search tasks. Many other scenarios such as finding network and coauthor network-based people search. appropriate collaborators [4] or thesis committee members [5], In order to study the impact of privacy concerns to the people require not only the topical expertise matching but also the social search performance, we need to examine how the social network matching [6] because a higher social similarity make it easier for information is used in existing people search systems. We refer people to connect. the global network features as the features that are propagated In order to perform social matching, the retrieval systems need to through the whole networks while the local network features are access users’ social networks and return the potential candidates those that are directly related to the ego-network of the querying who have either direct or indirect connections with the given user [15]. Some people search systems adopted only the local users. However, privacy has been identified as a major concern in network features [4], whereas some others used both the local and global network features [5, 10]. For example, Han et al. [5] took further author disambiguation step was performed. In total, the into consideration of both the local social similarity between the collection contains 253,390 unique authors and 953,685 coauthor querying user and each returned candidate (measured by the connection instances. Therefore, that collection contains both proportion of common social connections) and the global content information about papers (title and abstract) and social authority of each returned candidate (measured by the PageRank network of authors (i.e., coauthor networks). value running on the whole social networks). They found that The goal of the user study presented in Han et al. [5] was to combing both global and local network features with the topic evaluate a people search system. The study involved four different relevance would provide better support of modeling diverse people search tasks, each of which aimed to search for 5 people search contexts and further augment the search candidates satisfying a querying user’s search need. Two systems experiences. Since both the global and local network features were used in the study: a baseline plain content-based people played important roles in people search systems [5], the study of search system and an experimental system that enhances people privacy needs to consider both. search with three interactive facets: content relevance, social Both the local and global network features could be influenced by similarity between the user and a candidate (the local network the completeness of social network information. Therefore, a feature) and the authority of a candidate (the global network privacy-preserving network with many private (unrevealed) social feature). The experiment system allowed the querying users to connections would affect the calculation of the global and local tune the importance associated with each facet in order to generate network features, which may in turn affect the people search a better candidate search result. 24 participants were recruited for performance. The incomplete social contexts of the querying user the user study. At the beginning of the user study, each participant and the network candidates affect the calculation of the proportion was asked to provide their publications and their social of common coauthors between them. This is the reason why we connections (such as advisors). In the post-task questionnaire, the examine the impacts of privacy concerns on the local network participants were asked to rate the relevance of each marked features for both network candidates and querying user. When candidate in a Five-point Likert scale (1 as non-relevant and 5 as analyze the local network features, we study the privacy settings the highly relevant). of querying users and candidates separately. The global network We reuse the data from [5] in the following ways. First, we use features rely on the information propagation through the whole the same academic publication collection which contains both the network which is only related to network candidates. We study papers and the coauthor networks. Secondly, we use the marked global network features for network candidates only. highly relevant candidates (i.e., those with ratings higher than 3) In summary, we identify that privacy-preserving people search is from the user study as our ground-truth, which are further used to still an almost untouched research topic. In this paper, we make measure the effectiveness of the search algorithms under different the first attempt to provide some benchmarks by simulating privacy-preserving network scenarios. privacy-preserving networks and examining how these networks affect the performance of people search. To achieve the goal of 2.2 Configuring Privacy-Preserving Networks this study, we need to properly simulate different types of We identify two different types of users in our study, the people privacy-preserving networks. A privacy-reserving network is who initiates the people search requests (i.e. the participants in the essentially a subset of the full network, so we model different user study. Therefore, they are called querying users) and the privacy concerns as different sampling strategies (the purpose is candidates in the publication collection and the coauthor networks to sample a subset of privacy-concerned people). We discuss (therefore, called the candidates). We treat them differently sampling strategies in section 2. To be specific, our research because: 1) although many querying users would be on the questions are: coauthor network, some others may not be; 2) more importantly, we believe that the calculation of local network features can be  RQ1: How to properly simulate different types of privacy- influenced by the privacy settings of the querying users as well as preserving social networks? the candidates, and the impacts of privacy setting from different  RQ2: How does each type of privacy-preserving network affect users would be different. the global and local network features?  RQ3: How does the global and local network features derived 2.2.1 Modeling Privacy for the Candidates from privacy-preserving networks further affect the people Although the privacy settings are related to various factors, those factors would result in a common outcome – a user either has search performance? privacy concern or not. We assume that there is a probability (i.e. pi) for each candidate being privacy-concerned. Based on different 2. DATASET AND METHODOLOGY roles that people can play in a network, we think that modeling privacy concern as being associated with the candidate’s degree of 2.1 Experiment Dataset associations (i.e. the coauthor relationships) on the network would Our experiments in this paper reuse the user study data and the be a reasonable approach to study the impacts of privacy settings publication collection presented in Han et al. [5]. The dataset used for people with different roles. We could see that there are two in that study was an academic publication collection containing extremes for different candidates to have privacy concerns: 1) the 219,677 conference papers from the ACM Digital Library. These top degree of association candidates have privacy-concerns; or 2) papers were published in academic conferences (the full list of the bottom degree of association candidates have privacy concern. conferences is available at ACM Digital Library 1) between 1990 and 2013. Only public available information of a paper (the title, Suppose that for each candidate i, his/her degree of association on abstract and authors) was collected. The unique identifier assigned the network is di and the maximized degree on the network is dmax, by ACM Digital Library was used to identify each author, and no we have Eq. 1 to provide one formula with a parameter λ for modeling candidates with different degree of associations on the network to have privacy concerns. When λ is set as negatives or 1 http://dl.acm.org/proceedings.cfm positives, we can obtain different simulations for indicating either top-degree or bottom-degree candidates to have more privacy information for the querying user. When we set pc = 1.0, it means concerns. The absolute value of λ corresponds to the power of that the complete social connections for the querying user is emphasizing on top-degree or bottom-degree candidates. When λ available. When set pc to the other values, we can only use partial is set to 0, it is uniform and each user has equivalent probability. social connections. To remove the sampling bias, we randomly sample the incomplete social connections 10 runs and the reported results are based on the average over 10 runs. ( ) Eq. 1 2.3 Experiment Setup Besides λ, we need another parameter to control the proportion of Our study involves two sets of experiments. The first set examines candidates on the networks who have the privacy concerns (noted the impacts of various privacy settings on the computing of global as pb). In this paper, we will test nine different pb (from 0.1 to 0.9, and local network feature. The second set tests their further with 0.1 for each step) and under each pb. Besides, we also test influences on people search. different values of λ. For each pair of <λ, pb>, we sample 10 different runs to remove the bias. Our reported results are based 2.3.1 Testing the Impacts on Global Network Feature on the average over those 10 runs. To be specific, suppose that we Since the local network feature is directly related to the querying have N candidates and we think that N × pb of them have privacy users, it is difficult to study it independently. In contrast, the concern. The goal of sampling, therefore, is to return N × pb global network feature is computed through the propagation on sampled privacy-concerned candidates. Our sampling algorithm is the whole network and it is independent of the querying users. So, a “sampling without replacement” (see Figure 1). we only examine the influences of different privacy settings on the global network feature in this section. Algorithm: Sampling privacy-concerned candidates The global network feature of a candidate is represented as his/her authority value, which is measured by the PageRank value on the Input: N, pb and λ; Output: N × pb privacy-concerned candidates U coauthor networks. We first compute the authority value (pra) for each candidate a using the whole network information. This is Procedure: treated as the ground-truth values. To test the impact of a privacy 1 : compute pi using Eq. 1, put it in array P[] and compute the sum S of P[] setting, we re-compute the authority value (prap) for the candidate 2 : for run = 1 : 10 a with different portion of people on the network do not share 3: M = N their social connections because of the privacy concerns. We use 3 : for i = 1 : N × pb //sampling N × pb candidates the Mean Absolute Error (MAE) between the new authority 4: randomly generate a number r in [0,S)5: values and ground-truth authority values over all of the authors as 5: for a = 1 : M the indication of the impact from privacy concerns (see Eq. 2). 6: if Σ P[a] ≥ r 7: put the corresponding candidate into U 8: S = S – P[a]; ∑| | Eq. 2 9: break; 10: M = M -1; Figure 1: Algorithm for generating the privacy-concerned candidates 2.3.2 Testing the Impacts on People Search When examining the impacts of different privacy-preserving 2.2.2 Modeling Privacy for the Querying Users networks on the people search performance, we adopted the user The local network feature in this paper refers to the proportion of study data from Han et al. [5]. In that experiment setting, the common social connections between the querying users and the effectiveness of a people search was affected by three facets: existing candidates. Therefore, the privacy settings of both people content relevance, local network feature and global network will influence the calculation of the local network feature. feature. The three facets are displayed to the querying users so Modeling privacy concerns for the candidates has been discussed that the users could directly configure the importance of each above; here we present our modeling of the privacy concerns on facet. To test the influences of using privacy-preserving networks, the querying users. The social connections of the querying users we can directly test its impacts on a live system by comparing were obtained through the users themselves in the user study system performance in two scenarios: one with complete network (more details see Han et al. [5]). In that study, each participant and the other one with privacy-preserving networks. However, it was asked to provide his/her personal information as well as will be very time-consuming and may be unable to detect the his/her close social connections. subtle differences. Therefore, we decide to conduct a simulation study based on the queries and marked candidates from [5]. The privacy-conscious users may either do not provide any or only provide incomplete personal social information. In our study, We assume that a querying user u issued several queries in order therefore, we introduce the completeness of the provided to finish a task and under K queries, u has marked at least one information (pc) as the indicator of the querying user’s privacy candidate. We name those K queries as the effective queries. We concerns. It is measured by the percentage of social connections assume that the purpose of each effective query is to retrieve the that a querying user provided over the complete “oracle” social best-matched candidates (i.e. the ground-truth). Although the connections of that user. The “oracle” social connections are ordering of those K queries may reveal their importance in the simulated by the user provided information from the user study in whole search process, we do not consider such information in this [5] because the users were explicitly asked to provide complete paper for simplicity. Therefore, for each effective query in [5], we social connections during the user experiment. compute three scores: the query-candidate content match SC, the local network feature SL and the global network feature SG. Those In this paper, we test elven different values of pc (from 0.0 to 1.0, scores were transformed into logarithmic values and combined with 0.1 for each step). Note that, when set pc = 0.0, it linearly. In a live system, the querying users can tune the corresponds to the scenarios that we do not have any social importance of each facet: wc (for SC), wg (for SG) and wl (for SL). The computation of each score and their integration are the same candidates are usually well-connected in networks, we anticipate a as Han et al. [5]. The Integration score S is computed using Eq. 3. higher impact from their privacy concerns. We see in Figure 2 that The candidates are ranked based on this score. the MAE curves for both two positive λ values are above the baseline. When sampled more privacy-concerned candidates from Eq. 3 high-degree candidates (i.e. compare λ = + 0.5 with λ = + 1.0), we see an increase of the MAE errors. For each effective query, different configurations of wc, wg and wl 0.50 yield different search performance. Lacking of the real user λ = - 1.0 interactions, we cannot obtain how users would set those weights. 0.40 λ= - 0.5 In the simulation study, we assume that users are able to tune the λ = 0.0 best configurations to achieve the best search performance. The search performance of each effective query qi is measured by the 0.30 λ = + 0.5 Average Precision (AP) under the best configuration of wc, wg and λ = + 1.0 wl, as shown in Eq. 4. The AP is computed using the ground-truth 0.20 data (the marked candidates for a task with ratings bigger than 3.0) from the user study in Han et al. [5]. The ground-truth is built 0.10 for user-task pair so that any of the K effective queries within one user-task pair would share the same ground-truth. The search 0.00 performance of each user-task pair is then measured by the Mean 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Average Precision (MAP) over all of the K effective queries, as shown in Eq. 5. Then, the search performance depends on the SC, Figure 2: The impacts of different privacy-preserving networks on the SG and SL (as shown in Eq. 3.), which are determined by the calculation of global network feature. We measure the impacts using MAE. X axis: pb in Figure 1; Y axis: the MAE. Each value is available information in the privacy-preserving networks. The aggregated over 10 runs. (MAE, the smaller the better) comparison of privacy-preserving networks can be transformed to compare the MAP. 3.2 Impacts on the People Search We further study how different privacy-preserving networks Eq. 4 affect the people search performance. We took λ = -1.0 (+1.0) as the upper (lower) bound based on the result in Figure 2 and still used λ = 0.0 as the baseline. ∑ Eq. 5 To simulate and measure the people search performance, we need to set appropriate parameters (wc, wg and wl) in Eq. 3. Since we only focus on the impact of the global network feature in this 3. IMPACTS ON GLOBAL FEATURE IN section, we set the weight for local network feature wl = 0. We estimate the parameters based on the full network information, PRIVACY-PRESERVING NETWORKS and assume that parameters are also applied to privacy-preserving In this section, we study how different privacy-preserving networks. We acknowledge the limitation of not tuning networks influence the computation of the global network feature parameters for each network. We think the parameters reveal and how it further affects the performance of people search. users’ objective view of the importance of each facet and it remains the same under different networks. The parameters we 3.1 Impacts on Global Network Feature used in this section are wc = 1.0 and wg = 0.1. We simulate different privacy-preserving networks by setting different λ in Eq. 1. We compare five λ in this paper: -1.0, -0.5, The MAP evaluations under different privacy-preserving 0.0, 0.5 and 1.0. Under each λ, we then adopt the sampling networks (different values of λ and pb) are shown in Figure 3. We procedure described in the section 2.2.1 to choose a certain also plot the MAP performance using the full network information percentage (pb in the Figure 1) of privacy-concerned candidates. (the red solid line) as an upper bound baseline. We find that the To measure its impacts on the computing of global network results of λ = -1.0 have very similar performance to the upper feature, we measure the MAE between its values on the full bound baseline even when pb is as large as 0.9. This is because networks and the sampled privacy-preserving networks. here only those low-degree candidates have privacy concerns while the core candidates with medium or high degree remains in The MAE results are shown in Figure 3. As stated, when λ is set the network. In contrast, the results of λ = +1.0 (high-degree to 0.0, the candidates on the network have uniformed probability people has more privacy concerns) have clearly impacts on the (pi in Eq. 1) of being concerned on sharing social connections. We people search performance even when pb is as small as 0.1 and treat it as one of the baselines. We also set λ as negative values to 0.2. This is because many core candidates with top degree of simulate the scenario that candidates with low association degrees associations are removed from the networks. have more privacy-concern. Since those low association degree candidates only affect a small proportion of the connections on the Although the maximal change of MAP is a 3.87% drop (relative network, we suspect that they have less impact. The results from percentage when λ=+1.0 and pb=0.8, comparing to the “Full Figure 2 confirm our expectation. In addition, λ with smaller Networks”), the changes for all pb are still significant under the negative values (i.e., bigger absolute values) results in slightly Wilcoxon Sign Test (e.g. p-value=0.040 for pb=0.1, p-value better MAE, which is not surprising based on our suspect. =0.016 for pb = 0.2 and p-value= 0.000 for pb=0.3 and etc). Again, the results of λ = 0.0 lie between that of λ = + 1.0 and that of λ = - When set λ into a positive value, it corresponds to the scenario 1.0 because the high- or low-degree candidates have the same that the high association degree candidates have higher chance to probability of being sampled as the privacy-concerned candidates. have privacy concerns. Since those high association degree 0.28 well-connected candidates are removed. The MAP of randomly selecting candidates (λ=0.0) to have privacy concerns lies between Full Networks λ= - 1.0 that of λ=-1.0 and that of λ =+1.0. 0.26 λ= 0.0 λ = + 1.0 0.28 0.24 0.26 0.22 0.24 0.20 Full Networks λ= - 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.22 Figure 3: The impacts of new global network feature under different λ= 0.0 λ = + 1.0 privacy-preserving networks to the performance of people search. The 0.20 impact is measured by MAP. X axis: pb in Figure 1; Y axis: the MAP. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Each value for different λ (except the “Full Networks”) is aggregated over 10 runs. (MAP, the bigger the better) Figure 4: The impacts of new local network feature under different privacy-preserving networks to the performance of people search. The 4. IMPACTS ON LOCAL FEATURE IN impact is measured by MAP. X axis: pb in Figure 1; Y axis: the MAP. Each value for different λ (except the “Full Networks”) is aggregated PRIVACY-PRESERVING NETWORKS over 10 runs. (MAP, the bigger the better) In this section, we try to understand the impacts of privacy- preserving networks on local network feature. Since it is related to 4.2 Impacts of Querying Users’ Privacy both the candidates and the querying users, we study the privacy settings for both two types of users. Setting on Local Network Feature The last privacy setting we examined is related to the 4.1 Impact of Candidates’ Privacy Setting on completeness of social information provided by the querying users Local Network Feature that is to test the influence of different settings of pc (see the section 2.2.2 for its definition) on people search performance. The Since we are focusing on the local network feature in this section, MAP evaluations over different pc are shown in Figure 5. The we set wg = 0.0. To find appropriate weights for SC and SL (i.e. the “No Social Info.” means that we do not use the local network optimal wc and wl), we re-examine users’ people search process feature. The “Full Social Info.” corresponds to the scenario that based on the user study data and find the corresponding optimal we can obtain the complete user social connections and use them parameters that maximize the people search performance over all to compute the local network feature. The “No Social Info.” effective queries. Same as the Section 3.2, in this process, we use performs as the lower bound of the MAP whereas the “Full Social the full network information and assume that the same parameter Info.” acts as the upper bound. setting also applies in the privacy-preserving networks. The best parameters we chose is the wc = 1.0 and wg = 0.082. We also use 0.28 the same parameters in the section 4.2. 0.26 The MAP evaluations on different privacy-preserving networks are shown in Figure 4, where we examine the results of three 0.24 different λ values: -1.0, 0.0 and +1.0. Besides, we consider the “Full Networks” as an upper bound baseline. It is the same as No Social Info. Partial Social Info. what we did in Section 3.2. We find that local network feature 0.22 Full Social Info. produces more improvements on the performance of people search than global network feature -- the MAP equals to 0.2352 0.20 for global network feature (combing with the content relevance) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 while it equals to 0.2752 for local network feature (combing with Figure 5: The impacts of new local network feature under different content relevance) when using the full network information. The privacy settings of querying users to the people search performance. X difference is significant under the Wilcoxon Sign test, p=0.003. axis: pc, i.e. the completeness of user provided social information; Y However, we observe that local network feature is more sensitive axis: the MAP. Each point for the “Partial Social Info.” is averaged to the privacy setting than global network feature – the maximized over 10 runs. (MAP, the bigger the better) MAP change for the λ = 0.0 is less than 0.01 for global network feature (as shown in Figure 3) while it changes more than 0.035 We observe that the upper bound is significantly better than the for local network feature (as shown in Figure 4). lower bound (+15.58%, with p-value= 0.001 under Wilcoxon Sign Test), which indicates the usefulness of involving local network We further find that removing those high-degree candidates (i.e., feature of the querying users in the people search process. We also λ=+1.0) has a great impact -- the performance has a substantial find that the search performance will keep steadily increasing drop even when only a small portion of candidates have privacy when having more social information about the query user (the concerns (pb =0.1 or 0.2). This indicates the import roles that the dotted red line with “Partial Social Info.”). high-degree candidates played in the computing of local network feature. We think it may be because of that most of the desired 5. CONCLUSION AND DISCUSSIONS candidates (i.e. candidates in the ground-truth) for our user study People search has been extensively studied in recent years. Many are actually directly or indirectly connected to the top degree of the researchers identified that social network information is an candidates. However, this is not the case when λ=-1.0 where less important resource for improving the people search performance [4, 5, 10, 15]. The social networks can be used to infer the local Finally, we tested the impacts of local and global network features network feature between the querying users and candidates, as separately; whereas we know that privacy concerns affect people well as the global network feature regarding the candidates. search system such as in PeopleExplorer 2 on both features. In However, both the local and global network features can be highly addition, we studied the privacy settings for the querying users affected by the privacy settings of querying users and candidates. and candidates separately. In the real settings, all these factors Although the privacy issues are increasingly important in recent should be studied together. years, its impacts on people search haven’t been studied yet. It may be due to the difficulty of obtaining a privacy-preserving 6. REFERENCES [1] Rodriguez, M., Posse, C. and Zhang, E. Multiple objective social network and make it openly available for research purpose. optimization in recommender systems. ACM, City, 2012. Therefore, in this paper, we focus on simulating the privacy- preserving social networks using a publicly available network – [2] Han, S., Jiang, J., Yue, Z. and He, D. Recommending the academic coauthor network. The privacy could come from program committee candidates for academic conferences. In either the querying users or the candidates in the networks. We Proceedings of the Proceedings of the 2013 workshop on studied their impacts separately. For the querying users, we Computational scientometrics: theory & applications (San treated the completeness of social information as a parameter to Francisco, California, USA, 2013). ACM. simulate the scenario that users do not provide full social [3] Balog, K., Azzopardi, L. and Rijke, M. d. Formal models for information. For the candidates, we introduced the proportion of expert finding in enterprise corpora. In SIGIR 2006, Seattle, candidates that has privacy concerns and the strength of Washington, USA, 2006. ACM. association (i.e. his/her degree in the networks) as two parameters. We assume that candidates’ privacy concerns are correlated with [4] Chen, H.-H., Gou, L., Zhang, X. and Giles, C. L. Collabseer: their degree of association in networks. a search engine for collaboration discovery. In JCDL '11. ACM, New York, NY, USA, 231-240. When using the full network information, we find that both the local and global network features provide significant boosts on the [5] Han, S., He, D., Jiang, J. and Yue, Z. Supporting exploratory performance of people search (compare to not using social people search: a study of factor transparency and user network). However, comparing to the global network feature, the control. In CIKM 2013, San Francisco, California, USA, local network feature can provide greater improvements. Using 2013. ACM. the simulated networks, we also find that privacy-preserving [6] Terveen, L. and McDonald, D. W. Social matching: A networks have significant influences on the performance of people framework and research agenda. ACM transactions on search with both the local and global network features (comparing computer-human interaction (TOCHI), 12, 3 2005), 401-434. to the use of complete network information). [7] Acquisti, A. and Gross, R. Imagined communities: In additional, we observe that different roles of candidates can Awareness, information sharing, and privacy on the exert different impacts on the computing of global network Facebook. Springer, 2006. feature and they further impose different influences on the people [8] Dwyer, C., Hiltz, S. R. and Passerini, K. Trust and Privacy search process. The privacy concerns from the high-degree Concern Within Social Networking Sites: A Comparison of candidates in the network have more impacts. Since the local Facebook and MySpace. 2007. network feature is related to both the querying users and the candidates in the networks, we find that the privacy concerns from [9] Agrawal, R. and Srikant, R. Privacy-preserving data mining. both of them have significant impacts on the search performance. ACM Sigmod Record, 29, 2 2000), 439-450. The privacy concerns from high-degree candidates have bigger [10] Zhang, J., Tang, J. and Li, J. Expert finding in a social influences on the people search than that of the lower-degree network. Springer, 2007. candidates, especially when those high-degree candidates are related to the querying user. We also find that if the querying [11] Ugander, J., Karrer, B., Backstrom, L. and Marlow, C. The users provide more social connections, the search performance anatomy of the facebook social graph. arXiv preprint would increase steadily. arXiv:1111.45032011). We do acknowledge that there are still several limitations in this [12] Barabási, A.-L. and Albert, R. Emergence of scaling in paper. First of all, our simulation study assumed that the purpose random networks. science, 286, 5439 1999), 509-512. of each query is to find the best-matching candidates so we didn’t [13] Watts, D. J. and Strogatz, S. H. Collective dynamics of differentiate the deeper intentions of different queries. However, it ‘small-world’networks. nature, 393, 6684 1998), 440-442. is observed that users may develop different strategies in their [14] Newman, M. E. Assortative mixing in networks. Physical search processes so that some queries may be only used to filter review letters, 89, 20 2002), 208701. out certain non-relevant ones. Identifying the search intentions behind each query would give us better understanding of the [15] Han, S., He, D., Brusilovsky, P. and Yue, Z. Coauthor impacts of privacy concerns. This is one future direction. prediction for junior researchers. In Proceedings of the Proceedings of the 6th international conference on Social Secondly, we also assumed that each querying user is able to tune Computing, Behavioral-Cultural Modeling and Prediction the optimized configurations of the weights for each feature; (Washington, DC, 2013). Springer-Verlag. while it may not be the case in a live search system. Users may exhibit different behaviors as we expected -- they may not necessary to tune for the optimal parameters and find the best matched candidates. Our next step is to conduct a live user experiment to study how users interact with the search system under different privacy-preserving networks. 2 http://crystal.exp.sis.pitt.edu:8080/PeopleExplorer/