=Paper=
{{Paper
|id=None
|storemode=property
|title=The Development and Application of an Evaluation Methodology for Person Search Engines
|pdfUrl=https://ceur-ws.org/Vol-763/posterC.pdf
|volume=Vol-763
|dblpUrl=https://dblp.org/rec/conf/eurohcir/BrenneckeMW11
}}
==The Development and Application of an Evaluation Methodology for Person Search Engines==
The Development and Application of an
Evaluation Methodology for Person Search Engines
Roland Brenneke Thomas Mandl Christa Womser-Hacker
Information Science Information Science Information Science
University of Hildesheim University of Hildesheim University of Hildesheim
Marienburger Platz 22 Marienburger Platz 22 Marienburger Platz 22
Germany Germany Germany
roland.brenneke@gmx.de mandl@uni-hildesheim.de womser@uni-hildesheim.de
ABSTRACT Web search or go directly to social networks to find out about
people. Nevertheless, 10% is still a significant share and hit rates
This paper presents a user oriented evaluation methodology for for person search engines are constantly high. In addition, many
comparing person search services on the Web. Many established of these searches may have a high impact. Many recruiters use
system oriented methods from information retrieval cannot be person search engines for checking on candidates.
applied to this domain. Our user oriented methodology is applied
A questionnaire study among 548 enterprises was published in
to a test comparing the person search engines yasni, pipl.com and
2010 [5]. This Social Media HR Report 2010, revealed that in
123people. The user study with over 30 participants led to
2009 over 59% of the companies have used the internet to check
relevant results. The coverage of data object types within the
on applicants. Almost 10% had already turned down an
person search engine results is quite different. Especially the
application because of information on the Web. Companies who
amount of pictures and social media network entries which are
do not use the Web for checking on applicants` state that lack of
presented by the systems and which are perceived by the test users
time and ethical questions are the main reasons not to do so [5].
differ greatly. The results also revealed a tendency to judge people
more positively when more information was found. An international study showed that this behaviour is more
widespread in the US than in European countries [3]. Interviews
with decision makers in German companies revealed that they are
well aware of the potential of retrieving applicant information
1. INTRODUCTION [11].
Person search engines are important specialized search services on
the Web. These systems consult other services for information The use of person search engines for job applicants is only one
about a person and integrate it in one interface. They can be potential usage scenario; however, it is a very prominent one.
regarded as meta search services or one point stops for personal Other than that, there are many reasons for why a user would want
information. Mostly, they are tailored for normal people and not to search for a person. And despite the use of a named entity in
for celebrities and other famous people. As such, it is different the search, the information need is rather vague and can be
from named entity search in general. rephrased with “Find out something about person X”.
Especially in the Web 2.0 and its ease of publishing content on The success of a person search engine depends on many factors.
the Web, many people deposit much information about them or Person search engines are meta services which extract results from
content they created in various sites. Users need to have the a large variety of different online media. The presentation of these
proper information competence to foresee the consequences of results in the user interface is an essential factor for the success of
such behavior. Often, users are advised not to publish too much the search service. If a result is far down on the result page and
information. Online reputation management becomes an the user never scrolls there, potentially relevant items cannot be
important issue. On the side of the users, social networks and found. That means that the search capability is only one success
person search services lead to information ethical considerations factor for person search engines. Consequently, our experiment
about the use of personal information. was designed as a user test. We intended to evaluate the user
experience and the success with the tool person search engine and
Searching on information about others is a very frequent neither specific system components nor absolute retrieval
information need and a reason for using a search service. performance.
According to Google Trends, the most popular person search
services receive over 200,000 hits per day. However, 90% of the
users do not rely on person search engines but they use general 2. RELATED WORK
The evaluation of retrieval systems is central in information
Copyright © 2011 for the individual papers by the papers' authors. retrieval research because the system performance cannot be
Copying permitted only for private and academic purposes. This volume predicted. The most influential retrieval evaluation methodology
is published and copyrighted by the editors of euroHCIR2011. is called the Cranfield paradigm. Information retrieval research
has adopted an evaluation scheme which tries to ignore subjective
EuroHCIR 2011. The 1st European Workshop on Human-Computer differences between users in order to be able to compare systems
Interaction and Information Retrieval. July 4th 2011. Newcastle, UK and algorithms. The user is replaced by a prototypical and
constant user. Relevance judgments are provided by domain
experts [8, 10].
Cranfield evaluations have often been criticised for several We selected people who had posted a large amount of information
reasons. The main objections come from advocates of user about themselves in the network. Again, this was done to obtain
oriented studies. The search situation of users depends on many similar and comparable difficulty for the three test cases. Three
individual and contextual factors which can only be captured in person search engines were selected for the comparative test. We
user experiments [6]. The real user experience and the success in chose yasni, pipl.com and 123people because they were very
a real world situation cannot be measured with the laboratory style popular at the time of the study according to Google trends. All
experiments based on the Cranfield paradigm [12]. three companies claim that they exploit only information available
Person search engines have a higher chance to succeed than on the public Web.
general purpose search services. The retrieval with named entities
is known to be easier than searches without names entities [9]. 4. STUDY
The selection of a person search engine hints the type of result. Students of the University of Hildesheim were recruited through a
Consequently, synonymy between names and words are a smaller mailing list of students. Participation was voluntarily and no
problem than in general purpose search engines. Synonymy gratification was given. None of the participants had a computer
between names, on the other hand, is a big challenge for person science background. They all were frequent Internet users and had
search engines. searched for people before but only 10% had used a person search
engine before. The others use Google or social networks to find
3. METHODOLOGY information on people.
The balance between control and realism is a challenge for each The issue of relevance is always a crucial one in information
experiment. For the presented study, we chose a user experiment retrieval evaluation. In our study, any item could contribute to the
to test person search engines because an approach purely full picture of the applicant. Despite the clearly defined scenario,
dedicated to retrieval power does not mirror the user experience it remains vague which information is needed and what type of
for person search engines well. It is necessary to limit the realism information is useful. It is difficult to assign relevance to items or
in a user experiment in order to allow comparison across even weights to categories. The user interfaces of the person
participants in the test. We selected a job applicant scenario in search engines present the items in categories like e.g. social
order to make the experiment interesting for the users. Applicant network entries or videos.
search is a very prominent usage type. The method was successful A questionnaire study [7] showed that users search mainly for the
in making the experiment attractive. The test users liked the following items in the order presented when retrieving
experiment very much and through word of mouth, more information about a specific person:
applicants wanted to register for the experiment than were needed.
• Contact information
The selection of persons for the task defines the content for the
• Profile on a social network
test. It seemed necessary to identify people for whom much
• Photo
information can be found on the Web. If there were no videos,
working results like presentations or social network entries, then • Information about professional accomplishments or
the performance of the person search engine could not be tested interests
with our experiment. So even if the persons selected are not
representative in terms of amount of online information for the The most frequently researched item, contact information does not
whole population or all persons who are indexed in a person apply for our scenario because the persons had sent a letter of
search service it increases the validity of the test to select persons application. The next two most frequent items are included. The
with a large amount of online information. fourth item is rather vague as some of the other items following as
far as the categories of person search engines are concerned. As a
Three people were carefully selected who had similar consequence, the data available does not justify the assignment of
qualifications. For them, a job profile was developed which was weights to some items. In our study, all clicks on items were
given to the participants together with the names of the people. scored equally. The results will also show which of the items were
The users were asked to search for these people who would be most popular. The time per applicant was limited to 10 minutes.
interviewed for the position and check if they were appropriate. The entire experiment took 45 minutes on average including the
The job description and the name of each applicant were given to pre- and post questionnaire.
the test persons. Each of the candidates was well qualified for the
job but had one negative aspect in his online data. One was an One search service modified the interface after the first two tests.
advocate of nuclear power and the job was for offered by an So it was necessary to eliminate three test sessions from the
alternative energy company. The second applicant was a serial results and recruit further test users. This shows that not only the
entrepreneur who portrayed himself on Facebook in pictures with dynamics of the personal data presents a challenge for the test but
attractive women and sports cars. The third applicant had party also the ongoing modifications of the search engine. Overall, 34
photos online where he could be seen smoking cigarettes and he took part in the experiment. Due to the problems of a relaunch of
considered himself as lazy in one social network while he had a one service, we could consider the experiments of 10 users of
very business oriented self image in another social network. 123people, 11 users of Pipl and 10 user of Yasni.
Obviously, such a scenario has some limitations. Person search Each test person worked with one search engines on all three
engines need to disambiguate between people with the same applicants. This between groups approach was applied was mainly
name. We decided to choose people who are not ambiguous in applied to avoid a long learning phase for each of the person
order to have the same difficulty for each person. Such issues are search engines. All tests were recorded with appropriate software.
evaluated in the system oriented campaign WEPS [1].
Figure 1: Popularity of person search engines according to Google Trends
5. RESULTS
The result description focuses on the information perceived by
users and the performance of the test users in the application task.
The information items clicked by the users were categorized. It
can be seen that the services lead to a similar number of clicks
when summed up over all users. Each of the services resulted in
between 110 to 120 clicks for the ten test persons. In the case of
Pipl, 11 test persons were considered. Each engine leads to a
sufficient number of entries and has abundant information on the
applicants in our scenario. This was a goal of the test design and
was accomplished.
The type of information which was encountered was quite
different. It can be easily seen, that 123.people facilitates access to
photos whereas Pipl leads more users to social network entries. A
comparative analysis for the services for the most popular item
types is shown in Table 1.
In the post test questionnaire, users were asked about their
subjective impression of the service they had used. In the overall
satisfaction, 123people was rated highest. For the page structure,
pipl received the best grades and the coverage of different
business networks yasni was rated as most successful. In the latter
case, the finding from the objective click data was confirmed.
Further details on the results are provided in [2]. Figure 2: Clicks on items in the three person search engines
Table 1: Comparison of data types encountered
Item 123people Pipl Yasni
Photo ++ +− −−
Business network − − ++
Social network − ++ + Perception
Homepage/Blog + + +−
++ Excellent
Microblog + +− +
Yellow pages +− −− + + Good
Forum post − +− + +− Moderate
Videoclip + +− +− − Poor
Publication
−− Unperceived
Presentation
Because of a very low number of clicks is no rating
Email address possible.
Address
Phone number
For two services, applicant 1 was selected by the majority of the [3] CrossTab Marketing Services. 2010. Europäischer
test users. These two services had identified most items for this Datenschutztag: Studie zur Online Reputation
applicant. For yasni, applicant 2 was chosen as the best Trustworthy Computing Group, Microsoft (Hrsg.).
applicant despite the fact that the other two services found on http://www.microsoft.com/germany/sicherheit/datenschutzstudie.
average 10 items more for this person. Applicant 3 was given mspx
the last place for all three person search services. For each [4] Hellmann, R.; Griesbaum, J.; Mandl, T. 2010. Quality in
service, he is the applicant with the fewest items. There might be Blogs: How to find the best User Generated Content. In:
a trend to rate people higher when more information is available 13th Intl Conf on Business Information Systems (BIS 2010)
online. Berlin, 3.-5. May. Berlin et al.: Springer [LNBIP 47] pp.
47-58.
6. RESUME [5] Zur Jacobsmühlen, T. (2010): Social Media HR Report
We presented a holistic evaluation methodology for person 2010 Stepstone.de & HRM.de (eds.).
search engines. The performance of these search services is http://www.jacobsmuehlen.de/studie/
measured by observing the perception of test users. The test
methodology is built on a realistic scenario and use case but it [6] Lamm, K.; Greve, W.; Mandl, T.; Womser-Hacker, C.
does not cover all the relevant quality aspects of person search 2010. The Influence of Expectation and System
engines. The important capability to resolve the ambiguity of Performance on User Satisfaction with Retrieval Systems.
names was not dealt with. In future work, it might be promising In: Proc EVIA 2010: The First Intl Workshop on
to develop a performance based test for this task only. Evaluating Information Access June 2010 National
Institute of Informatics (NII) Tokyo, Japan, June 15-18,
The complete information seeking behaviour and its success is http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings
also not measured with our test. In a realistic scenario, people 8/EVIA/09-EVIA2010-LammK.pdf
might access the social media networks through a person search
engine and continue their search mainly there. This issue could [7] Madden, M.; Smith, A. 2010. Reputation Management and
be resolved by observing real behaviour. Social Media: How people monitor their identity and
search for others online. PEW Internet & American Life
In the test, the search engine 123people was the winner. It not Project. http://pewinternet.org/Reports/2010/Reputation-
only led users to the highest number of items, but it was also Management.aspx
subjectively judged to be the best person search engine.
However, in several aspects other systems performed better and [8] Mandl, T. 2008. Recent Developments in the Evaluation of
were judged better. The evaluation showed that the different Information Retrieval Systems: Moving Toward Diversity
tools are all based on the freely available data on the Web but and Practical Applications. In: Informatica – An Intl.
that they lead to different results. The most sought items in our Journal of Computing and Informatics vol. 32. pp. 27-38.
test were photos, entries and profiles in social and business [9] Mandl, T.; Womser-Hacker, C. 2005. The Effect of Named
networks and personal homepages. Each of the engines Entities on Effectiveness in Cross-Language Information
exhibited a strength in one of these items, e.g. 123people for Retrieval Evaluation. In: Proc 2005 ACM SAC Symposium
photos because they are shown as top results. This is also on Applied Computing (SAC). Santa Fe, New Mexico,
confirmed by the questionnaire study among American USA. March 13.-17. 2005. pp. 1059-1064.
recruiters [7]. [10] Robertson, S. 2008. On the history of evaluation in IR. In:
For the users who publish information about themselves and Journal of Information Science 34(4). pp. 439-456
who become information providers by doing that the issue of [11] Schäuble, T.; Griesbaum, J.; Mandl, T. 2009. Mehr-
information competence will become more and more important. wertpotenziale von Online-Social-Business-Netzwerken für
Personal Online Identity Management is a growing field and die Personalbeschaffung von Fach- und Führungskräften.
several new companies are entering the market. In: Informatik 2009 - Beiträge 39. Jahrestagung der
Gesellschaft für Informatik e.V. (GI) Lübeck [LNI P-154]
7. REFERENCES pp. 2166 – 2180.
[1] Artiles, J.; Borthwick, A.; Gonzalo, J.; Sekine, S.; Amigó,
E. 2010. WePS-3 Evaluation Campaign: Overview of the [12] Tawileh, W.; Mandl, T.; Griesbaum, J. 2010. Evaluation of
Web People Search Clustering and Attribute Extraction five web search engines in Arabic language. In: LWA–
Tasks. In: CLEF Working Notes Lernen - Wissensentdeckung – Adaptivität: Proc Work-
http://nlp.uned.es/weps/weps-3/papers shopwoche GI, Universität Kassel. Workshop Information
Retrieval.
[2] Brenneke, R. 2010. Evaluation von Personen- http://www.kde.cs.uni-kassel.de/conf/lwa10/papers/ir1.pdf
suchmaschinen und Umgang mit persönlichen Daten im
Internet. Master Thesis, University of Hildesheim,
Germany. International Information Management.