=Paper=
{{Paper
|id=None
|storemode=property
|title=The Development and Application of an Evaluation Methodology for Person Search Engines
|pdfUrl=https://ceur-ws.org/Vol-763/posterC.pdf
|volume=Vol-763
|dblpUrl=https://dblp.org/rec/conf/eurohcir/BrenneckeMW11
}}
==The Development and Application of an Evaluation Methodology for Person Search Engines==
<pdf width="1500px">https://ceur-ws.org/Vol-763/posterC.pdf</pdf>
<pre>
              The Development and Application of an
        Evaluation Methodology for Person Search Engines
           Roland Brenneke                                    Thomas Mandl                          Christa Womser-Hacker
          Information Science                               Information Science                          Information Science
        University of Hildesheim                          University of Hildesheim                     University of Hildesheim
         Marienburger Platz 22                             Marienburger Platz 22                        Marienburger Platz 22
               Germany                                           Germany                                      Germany
    roland.brenneke@gmx.de                            mandl@uni-hildesheim.de                    womser@uni-hildesheim.de

ABSTRACT                                                                 Web search or go directly to social networks to find out about
                                                                         people. Nevertheless, 10% is still a significant share and hit rates
This paper presents a user oriented evaluation methodology for           for person search engines are constantly high. In addition, many
comparing person search services on the Web. Many established            of these searches may have a high impact. Many recruiters use
system oriented methods from information retrieval cannot be             person search engines for checking on candidates.
applied to this domain. Our user oriented methodology is applied
                                                                         A questionnaire study among 548 enterprises was published in
to a test comparing the person search engines yasni, pipl.com and
                                                                         2010 [5]. This Social Media HR Report 2010, revealed that in
123people. The user study with over 30 participants led to
                                                                         2009 over 59% of the companies have used the internet to check
relevant results. The coverage of data object types within the
                                                                         on applicants. Almost 10% had already turned down an
person search engine results is quite different. Especially the
                                                                         application because of information on the Web. Companies who
amount of pictures and social media network entries which are
                                                                         do not use the Web for checking on applicants` state that lack of
presented by the systems and which are perceived by the test users
                                                                         time and ethical questions are the main reasons not to do so [5].
differ greatly. The results also revealed a tendency to judge people
more positively when more information was found.                         An international study showed that this behaviour is more
                                                                         widespread in the US than in European countries [3]. Interviews
                                                                         with decision makers in German companies revealed that they are
                                                                         well aware of the potential of retrieving applicant information
1. INTRODUCTION                                                          [11].
Person search engines are important specialized search services on
the Web. These systems consult other services for information            The use of person search engines for job applicants is only one
about a person and integrate it in one interface. They can be            potential usage scenario; however, it is a very prominent one.
regarded as meta search services or one point stops for personal         Other than that, there are many reasons for why a user would want
information. Mostly, they are tailored for normal people and not         to search for a person. And despite the use of a named entity in
for celebrities and other famous people. As such, it is different        the search, the information need is rather vague and can be
from named entity search in general.                                     rephrased with “Find out something about person X”.
Especially in the Web 2.0 and its ease of publishing content on          The success of a person search engine depends on many factors.
the Web, many people deposit much information about them or              Person search engines are meta services which extract results from
content they created in various sites. Users need to have the            a large variety of different online media. The presentation of these
proper information competence to foresee the consequences of             results in the user interface is an essential factor for the success of
such behavior. Often, users are advised not to publish too much          the search service. If a result is far down on the result page and
information. Online reputation management becomes an                     the user never scrolls there, potentially relevant items cannot be
important issue. On the side of the users, social networks and           found. That means that the search capability is only one success
person search services lead to information ethical considerations        factor for person search engines. Consequently, our experiment
about the use of personal information.                                   was designed as a user test. We intended to evaluate the user
                                                                         experience and the success with the tool person search engine and
Searching on information about others is a very frequent                 neither specific system components nor absolute retrieval
information need and a reason for using a search service.                performance.
According to Google Trends, the most popular person search
services receive over 200,000 hits per day. However, 90% of the
users do not rely on person search engines but they use general          2. RELATED WORK
                                                                         The evaluation of retrieval systems is central in information
 Copyright © 2011 for the individual papers by the papers' authors.      retrieval research because the system performance cannot be
 Copying permitted only for private and academic purposes. This volume   predicted. The most influential retrieval evaluation methodology
 is published and copyrighted by the editors of euroHCIR2011.            is called the Cranfield paradigm. Information retrieval research
                                                                         has adopted an evaluation scheme which tries to ignore subjective
 EuroHCIR 2011. The 1st European Workshop on Human-Computer              differences between users in order to be able to compare systems
 Interaction and Information Retrieval. July 4th 2011. Newcastle, UK     and algorithms. The user is replaced by a prototypical and
                                                                         constant user. Relevance judgments are provided by domain
                                                                         experts [8, 10].
Cranfield evaluations have often been criticised for several             We selected people who had posted a large amount of information
reasons. The main objections come from advocates of user                 about themselves in the network. Again, this was done to obtain
oriented studies. The search situation of users depends on many          similar and comparable difficulty for the three test cases. Three
individual and contextual factors which can only be captured in          person search engines were selected for the comparative test. We
user experiments [6]. The real user experience and the success in        chose yasni, pipl.com and 123people because they were very
a real world situation cannot be measured with the laboratory style      popular at the time of the study according to Google trends. All
experiments based on the Cranfield paradigm [12].                        three companies claim that they exploit only information available
Person search engines have a higher chance to succeed than               on the public Web.
general purpose search services. The retrieval with named entities
is known to be easier than searches without names entities [9].          4. STUDY
The selection of a person search engine hints the type of result.        Students of the University of Hildesheim were recruited through a
Consequently, synonymy between names and words are a smaller             mailing list of students. Participation was voluntarily and no
problem than in general purpose search engines. Synonymy                 gratification was given. None of the participants had a computer
between names, on the other hand, is a big challenge for person          science background. They all were frequent Internet users and had
search engines.                                                          searched for people before but only 10% had used a person search
                                                                         engine before. The others use Google or social networks to find
3. METHODOLOGY                                                           information on people.
The balance between control and realism is a challenge for each          The issue of relevance is always a crucial one in information
experiment. For the presented study, we chose a user experiment          retrieval evaluation. In our study, any item could contribute to the
to test person search engines because an approach purely                 full picture of the applicant. Despite the clearly defined scenario,
dedicated to retrieval power does not mirror the user experience         it remains vague which information is needed and what type of
for person search engines well. It is necessary to limit the realism     information is useful. It is difficult to assign relevance to items or
in a user experiment in order to allow comparison across                 even weights to categories. The user interfaces of the person
participants in the test. We selected a job applicant scenario in        search engines present the items in categories like e.g. social
order to make the experiment interesting for the users. Applicant        network entries or videos.
search is a very prominent usage type. The method was successful         A questionnaire study [7] showed that users search mainly for the
in making the experiment attractive. The test users liked the            following items in the order presented when retrieving
experiment very much and through word of mouth, more                     information about a specific person:
applicants wanted to register for the experiment than were needed.
                                                                              •    Contact information
The selection of persons for the task defines the content for the
                                                                              •    Profile on a social network
test. It seemed necessary to identify people for whom much
                                                                              •    Photo
information can be found on the Web. If there were no videos,
working results like presentations or social network entries, then            •    Information about professional accomplishments or
the performance of the person search engine could not be tested                    interests
with our experiment. So even if the persons selected are not
representative in terms of amount of online information for the          The most frequently researched item, contact information does not
whole population or all persons who are indexed in a person              apply for our scenario because the persons had sent a letter of
search service it increases the validity of the test to select persons   application. The next two most frequent items are included. The
with a large amount of online information.                               fourth item is rather vague as some of the other items following as
                                                                         far as the categories of person search engines are concerned. As a
Three people were carefully selected who had similar                     consequence, the data available does not justify the assignment of
qualifications. For them, a job profile was developed which was          weights to some items. In our study, all clicks on items were
given to the participants together with the names of the people.         scored equally. The results will also show which of the items were
The users were asked to search for these people who would be             most popular. The time per applicant was limited to 10 minutes.
interviewed for the position and check if they were appropriate.         The entire experiment took 45 minutes on average including the
The job description and the name of each applicant were given to         pre- and post questionnaire.
the test persons. Each of the candidates was well qualified for the
job but had one negative aspect in his online data. One was an           One search service modified the interface after the first two tests.
advocate of nuclear power and the job was for offered by an              So it was necessary to eliminate three test sessions from the
alternative energy company. The second applicant was a serial            results and recruit further test users. This shows that not only the
entrepreneur who portrayed himself on Facebook in pictures with          dynamics of the personal data presents a challenge for the test but
attractive women and sports cars. The third applicant had party          also the ongoing modifications of the search engine. Overall, 34
photos online where he could be seen smoking cigarettes and he           took part in the experiment. Due to the problems of a relaunch of
considered himself as lazy in one social network while he had a          one service, we could consider the experiments of 10 users of
very business oriented self image in another social network.             123people, 11 users of Pipl and 10 user of Yasni.

Obviously, such a scenario has some limitations. Person search           Each test person worked with one search engines on all three
engines need to disambiguate between people with the same                applicants. This between groups approach was applied was mainly
name. We decided to choose people who are not ambiguous in               applied to avoid a long learning phase for each of the person
order to have the same difficulty for each person. Such issues are       search engines. All tests were recorded with appropriate software.
evaluated in the system oriented campaign WEPS [1].
                                   Figure 1: Popularity of person search engines according to Google Trends

5. RESULTS
The result description focuses on the information perceived by
users and the performance of the test users in the application task.
The information items clicked by the users were categorized. It
can be seen that the services lead to a similar number of clicks
when summed up over all users. Each of the services resulted in
between 110 to 120 clicks for the ten test persons. In the case of
Pipl, 11 test persons were considered. Each engine leads to a
sufficient number of entries and has abundant information on the
applicants in our scenario. This was a goal of the test design and
was accomplished.
The type of information which was encountered was quite
different. It can be easily seen, that 123.people facilitates access to
photos whereas Pipl leads more users to social network entries. A
comparative analysis for the services for the most popular item
types is shown in Table 1.
In the post test questionnaire, users were asked about their
subjective impression of the service they had used. In the overall
satisfaction, 123people was rated highest. For the page structure,
pipl received the best grades and the coverage of different
business networks yasni was rated as most successful. In the latter
case, the finding from the objective click data was confirmed.
Further details on the results are provided in [2].                            Figure 2: Clicks on items in the three person search engines


                                                   Table 1: Comparison of data types encountered

                        Item                               123people            Pipl               Yasni
                        Photo                                  ++               +−                  −−
                 Business network                               −                −                  ++
                   Social network                               −               ++                   +                          Perception
                  Homepage/Blog                                 +                +                  +−
                                                                                                                           ++        Excellent
                     Microblog                                  +               +−                   +
                    Yellow pages                               +−               −−                   +                      +          Good

                     Forum post                                 −               +−                   +                     +−        Moderate
                      Videoclip                                 +               +−                  +−                      −           Poor
                     Publication
                                                                                                                           −−      Unperceived
                    Presentation
                                                          Because of a very low number of clicks is no rating
                   Email address                                               possible.
                       Address
                   Phone number
For two services, applicant 1 was selected by the majority of the    [3] CrossTab Marketing Services. 2010. Europäischer
test users. These two services had identified most items for this        Datenschutztag: Studie zur Online Reputation
applicant. For yasni, applicant 2 was chosen as the best                 Trustworthy Computing Group, Microsoft (Hrsg.).
applicant despite the fact that the other two services found on           http://www.microsoft.com/germany/sicherheit/datenschutzstudie.
average 10 items more for this person. Applicant 3 was given              mspx
the last place for all three person search services. For each        [4] Hellmann, R.; Griesbaum, J.; Mandl, T. 2010. Quality in
service, he is the applicant with the fewest items. There might be       Blogs: How to find the best User Generated Content. In:
a trend to rate people higher when more information is available         13th Intl Conf on Business Information Systems (BIS 2010)
online.                                                                  Berlin, 3.-5. May. Berlin et al.: Springer [LNBIP 47] pp.
                                                                         47-58.
6. RESUME                                                            [5] Zur Jacobsmühlen, T. (2010): Social Media HR Report
We presented a holistic evaluation methodology for person                2010 Stepstone.de & HRM.de (eds.).
search engines. The performance of these search services is              http://www.jacobsmuehlen.de/studie/
measured by observing the perception of test users. The test
methodology is built on a realistic scenario and use case but it     [6] Lamm, K.; Greve, W.; Mandl, T.; Womser-Hacker, C.
does not cover all the relevant quality aspects of person search         2010. The Influence of Expectation and System
engines. The important capability to resolve the ambiguity of            Performance on User Satisfaction with Retrieval Systems.
names was not dealt with. In future work, it might be promising          In: Proc EVIA 2010: The First Intl Workshop on
to develop a performance based test for this task only.                  Evaluating Information Access June 2010 National
                                                                         Institute of Informatics (NII) Tokyo, Japan, June 15-18,
The complete information seeking behaviour and its success is            http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings
also not measured with our test. In a realistic scenario, people         8/EVIA/09-EVIA2010-LammK.pdf
might access the social media networks through a person search
engine and continue their search mainly there. This issue could      [7] Madden, M.; Smith, A. 2010. Reputation Management and
be resolved by observing real behaviour.                                 Social Media: How people monitor their identity and
                                                                         search for others online. PEW Internet & American Life
In the test, the search engine 123people was the winner. It not          Project. http://pewinternet.org/Reports/2010/Reputation-
only led users to the highest number of items, but it was also           Management.aspx
subjectively judged to be the best person search engine.
However, in several aspects other systems performed better and       [8] Mandl, T. 2008. Recent Developments in the Evaluation of
were judged better. The evaluation showed that the different             Information Retrieval Systems: Moving Toward Diversity
tools are all based on the freely available data on the Web but          and Practical Applications. In: Informatica – An Intl.
that they lead to different results. The most sought items in our        Journal of Computing and Informatics vol. 32. pp. 27-38.
test were photos, entries and profiles in social and business        [9] Mandl, T.; Womser-Hacker, C. 2005. The Effect of Named
networks and personal homepages. Each of the engines                     Entities on Effectiveness in Cross-Language Information
exhibited a strength in one of these items, e.g. 123people for           Retrieval Evaluation. In: Proc 2005 ACM SAC Symposium
photos because they are shown as top results. This is also               on Applied Computing (SAC). Santa Fe, New Mexico,
confirmed by the questionnaire study among American                      USA. March 13.-17. 2005. pp. 1059-1064.
recruiters [7].                                                      [10] Robertson, S. 2008. On the history of evaluation in IR. In:
For the users who publish information about themselves and                Journal of Information Science 34(4). pp. 439-456
who become information providers by doing that the issue of          [11] Schäuble, T.; Griesbaum, J.; Mandl, T. 2009. Mehr-
information competence will become more and more important.               wertpotenziale von Online-Social-Business-Netzwerken für
Personal Online Identity Management is a growing field and                die Personalbeschaffung von Fach- und Führungskräften.
several new companies are entering the market.                            In: Informatik 2009 - Beiträge 39. Jahrestagung der
                                                                          Gesellschaft für Informatik e.V. (GI) Lübeck [LNI P-154]
7. REFERENCES                                                             pp. 2166 – 2180.
[1] Artiles, J.; Borthwick, A.; Gonzalo, J.; Sekine, S.; Amigó,
    E. 2010. WePS-3 Evaluation Campaign: Overview of the             [12] Tawileh, W.; Mandl, T.; Griesbaum, J. 2010. Evaluation of
    Web People Search Clustering and Attribute Extraction                 five web search engines in Arabic language. In: LWA–
    Tasks. In: CLEF Working Notes                                         Lernen - Wissensentdeckung – Adaptivität: Proc Work-
    http://nlp.uned.es/weps/weps-3/papers                                 shopwoche GI, Universität Kassel. Workshop Information
                                                                          Retrieval.
[2] Brenneke, R. 2010. Evaluation von Personen-                           http://www.kde.cs.uni-kassel.de/conf/lwa10/papers/ir1.pdf
    suchmaschinen und Umgang mit persönlichen Daten im
    Internet. Master Thesis, University of Hildesheim,
    Germany. International Information Management.

</pre>