<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Development and Application of an Evaluation Methodology for Person Search Engines</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roland Brenneke</string-name>
          <email>roland.brenneke@gmx.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Mandl</string-name>
          <email>mandl@uni-hildesheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christa Womser-Hacker</string-name>
          <email>womser@uni-hildesheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Science, University of Hildesheim</institution>
          ,
          <addr-line>Marienburger Platz 22</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2011</year>
      </pub-date>
      <abstract>
        <p>This paper presents a user oriented evaluation methodology for comparing person search services on the Web. Many established system oriented methods from information retrieval cannot be applied to this domain. Our user oriented methodology is applied to a test comparing the person search engines yasni, pipl.com and 123people. The user study with over 30 participants led to relevant results. The coverage of data object types within the person search engine results is quite different. Especially the amount of pictures and social media network entries which are presented by the systems and which are perceived by the test users differ greatly. The results also revealed a tendency to judge people more positively when more information was found.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>2. RELATED WORK</title>
      <p>
        The evaluation of retrieval systems is central in information
retrieval research because the system performance cannot be
predicted. The most influential retrieval evaluation methodology
is called the Cranfield paradigm. Information retrieval research
has adopted an evaluation scheme which tries to ignore subjective
differences between users in order to be able to compare systems
and algorithms. The user is replaced by a prototypical and
constant user. Relevance judgments are provided by domain
experts [
        <xref ref-type="bibr" rid="ref10 ref8">8, 10</xref>
        ].
      </p>
      <p>
        Cranfield evaluations have often been criticised for several
reasons. The main objections come from advocates of user
oriented studies. The search situation of users depends on many
individual and contextual factors which can only be captured in
user experiments [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The real user experience and the success in
a real world situation cannot be measured with the laboratory style
experiments based on the Cranfield paradigm [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        Person search engines have a higher chance to succeed than
general purpose search services. The retrieval with named entities
is known to be easier than searches without names entities [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
The selection of a person search engine hints the type of result.
Consequently, synonymy between names and words are a smaller
problem than in general purpose search engines. Synonymy
between names, on the other hand, is a big challenge for person
search engines.
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. METHODOLOGY</title>
      <p>The balance between control and realism is a challenge for each
experiment. For the presented study, we chose a user experiment
to test person search engines because an approach purely
dedicated to retrieval power does not mirror the user experience
for person search engines well. It is necessary to limit the realism
in a user experiment in order to allow comparison across
participants in the test. We selected a job applicant scenario in
order to make the experiment interesting for the users. Applicant
search is a very prominent usage type. The method was successful
in making the experiment attractive. The test users liked the
experiment very much and through word of mouth, more
applicants wanted to register for the experiment than were needed.
The selection of persons for the task defines the content for the
test. It seemed necessary to identify people for whom much
information can be found on the Web. If there were no videos,
working results like presentations or social network entries, then
the performance of the person search engine could not be tested
with our experiment. So even if the persons selected are not
representative in terms of amount of online information for the
whole population or all persons who are indexed in a person
search service it increases the validity of the test to select persons
with a large amount of online information.</p>
      <p>
        Three people were carefully selected who had similar
qualifications. For them, a job profile was developed which was
given to the participants together with the names of the people.
The users were asked to search for these people who would be
interviewed for the position and check if they were appropriate.
The job description and the name of each applicant were given to
the test persons. Each of the candidates was well qualified for the
job but had one negative aspect in his online data. One was an
advocate of nuclear power and the job was for offered by an
alternative energy company. The second applicant was a serial
entrepreneur who portrayed himself on Facebook in pictures with
attractive women and sports cars. The third applicant had party
photos online where he could be seen smoking cigarettes and he
considered himself as lazy in one social network while he had a
very business oriented self image in another social network.
Obviously, such a scenario has some limitations. Person search
engines need to disambiguate between people with the same
name. We decided to choose people who are not ambiguous in
order to have the same difficulty for each person. Such issues are
evaluated in the system oriented campaign WEPS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
We selected people who had posted a large amount of information
about themselves in the network. Again, this was done to obtain
similar and comparable difficulty for the three test cases. Three
person search engines were selected for the comparative test. We
chose yasni, pipl.com and 123people because they were very
popular at the time of the study according to Google trends. All
three companies claim that they exploit only information available
on the public Web.
      </p>
    </sec>
    <sec id="sec-3">
      <title>4. STUDY</title>
      <p>Students of the University of Hildesheim were recruited through a
mailing list of students. Participation was voluntarily and no
gratification was given. None of the participants had a computer
science background. They all were frequent Internet users and had
searched for people before but only 10% had used a person search
engine before. The others use Google or social networks to find
information on people.</p>
      <p>The issue of relevance is always a crucial one in information
retrieval evaluation. In our study, any item could contribute to the
full picture of the applicant. Despite the clearly defined scenario,
it remains vague which information is needed and what type of
information is useful. It is difficult to assign relevance to items or
even weights to categories. The user interfaces of the person
search engines present the items in categories like e.g. social
network entries or videos.</p>
      <p>
        A questionnaire study [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] showed that users search mainly for the
following items in the order presented when retrieving
information about a specific person:
•
•
•
•
      </p>
      <p>Contact information
Profile on a social network
Photo
Information about professional accomplishments or
interests
The most frequently researched item, contact information does not
apply for our scenario because the persons had sent a letter of
application. The next two most frequent items are included. The
fourth item is rather vague as some of the other items following as
far as the categories of person search engines are concerned. As a
consequence, the data available does not justify the assignment of
weights to some items. In our study, all clicks on items were
scored equally. The results will also show which of the items were
most popular. The time per applicant was limited to 10 minutes.
The entire experiment took 45 minutes on average including the
pre- and post questionnaire.</p>
      <p>One search service modified the interface after the first two tests.
So it was necessary to eliminate three test sessions from the
results and recruit further test users. This shows that not only the
dynamics of the personal data presents a challenge for the test but
also the ongoing modifications of the search engine. Overall, 34
took part in the experiment. Due to the problems of a relaunch of
one service, we could consider the experiments of 10 users of
123people, 11 users of Pipl and 10 user of Yasni.</p>
      <p>Each test person worked with one search engines on all three
applicants. This between groups approach was applied was mainly
applied to avoid a long learning phase for each of the person
search engines. All tests were recorded with appropriate software.</p>
    </sec>
    <sec id="sec-4">
      <title>5. RESULTS</title>
      <p>The result description focuses on the information perceived by
users and the performance of the test users in the application task.
The information items clicked by the users were categorized. It
can be seen that the services lead to a similar number of clicks
when summed up over all users. Each of the services resulted in
between 110 to 120 clicks for the ten test persons. In the case of
Pipl, 11 test persons were considered. Each engine leads to a
sufficient number of entries and has abundant information on the
applicants in our scenario. This was a goal of the test design and
was accomplished.</p>
      <p>The type of information which was encountered was quite
different. It can be easily seen, that 123.people facilitates access to
photos whereas Pipl leads more users to social network entries. A
comparative analysis for the services for the most popular item
types is shown in Table 1.</p>
      <p>
        In the post test questionnaire, users were asked about their
subjective impression of the service they had used. In the overall
satisfaction, 123people was rated highest. For the page structure,
pipl received the best grades and the coverage of different
business networks yasni was rated as most successful. In the latter
case, the finding from the objective click data was confirmed.
Further details on the results are provided in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
For two services, applicant 1 was selected by the majority of the
test users. These two services had identified most items for this
applicant. For yasni, applicant 2 was chosen as the best
applicant despite the fact that the other two services found on
average 10 items more for this person. Applicant 3 was given
the last place for all three person search services. For each
service, he is the applicant with the fewest items. There might be
a trend to rate people higher when more information is available
online.
      </p>
    </sec>
    <sec id="sec-5">
      <title>6. RESUME</title>
      <p>We presented a holistic evaluation methodology for person
search engines. The performance of these search services is
measured by observing the perception of test users. The test
methodology is built on a realistic scenario and use case but it
does not cover all the relevant quality aspects of person search
engines. The important capability to resolve the ambiguity of
names was not dealt with. In future work, it might be promising
to develop a performance based test for this task only.
The complete information seeking behaviour and its success is
also not measured with our test. In a realistic scenario, people
might access the social media networks through a person search
engine and continue their search mainly there. This issue could
be resolved by observing real behaviour.</p>
      <p>
        In the test, the search engine 123people was the winner. It not
only led users to the highest number of items, but it was also
subjectively judged to be the best person search engine.
However, in several aspects other systems performed better and
were judged better. The evaluation showed that the different
tools are all based on the freely available data on the Web but
that they lead to different results. The most sought items in our
test were photos, entries and profiles in social and business
networks and personal homepages. Each of the engines
exhibited a strength in one of these items, e.g. 123people for
photos because they are shown as top results. This is also
confirmed by the questionnaire study among American
recruiters [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>For the users who publish information about themselves and
who become information providers by doing that the issue of
information competence will become more and more important.
Personal Online Identity Management is a growing field and
several new companies are entering the market.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Artiles</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Borthwick</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Sekine,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>Amigó</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <year>2010</year>
          .
          <article-title>WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks</article-title>
          . In: CLEF Working Notes http://nlp.uned.es/weps/weps-3/papers
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Brenneke</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Evaluation von Personensuchmaschinen und Umgang mit persönlichen Daten im Internet</article-title>
          .
          <source>Master Thesis</source>
          , University of Hildesheim, Germany. International Information Management.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>CrossTab</given-names>
            <surname>Marketing Services</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Europäischer Datenschutztag: Studie zur Online Reputation Trustworthy Computing Group, Microsoft (Hrsg</article-title>
          .). http://www.microsoft.com/germany/sicherheit/datenschutzstudie. mspx
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Griesbaum,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ; Mandl,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <year>2010</year>
          .
          <article-title>Quality in Blogs: How to find the best User Generated Content</article-title>
          .
          <source>In: 13th Intl Conf on Business Information Systems (BIS</source>
          <year>2010</year>
          ) Berlin, 3.-
          <fpage>5</fpage>
          . May. Berlin et al.: Springer [LNBIP 47] pp.
          <fpage>47</fpage>
          -
          <lpage>58</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Zur</given-names>
            <surname>Jacobsmühlen</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          (
          <year>2010</year>
          ):
          <source>Social Media HR Report</source>
          <year>2010</year>
          Stepstone.de &amp; HRM.de (eds.). http://www.jacobsmuehlen.de/studie/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Lamm</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Greve</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Mandl,
          <string-name>
            <given-names>T.</given-names>
            ;
            <surname>Womser-Hacker</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <year>2010</year>
          .
          <article-title>The Influence of Expectation and System Performance on User Satisfaction with Retrieval Systems</article-title>
          .
          <source>In: Proc EVIA 2010: The First Intl Workshop on Evaluating Information Access</source>
          June 2010 National Institute of Informatics (NII) Tokyo, Japan, June 15-18, http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings 8/EVIA/09-EVIA2010-LammK.pdf
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Madden</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Reputation Management and Social Media: How people monitor their identity and search for others online</article-title>
          .
          <source>PEW Internet</source>
          &amp;
          <article-title>American Life Project</article-title>
          . http://pewinternet.org/Reports/2010/ReputationManagement.aspx
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Mandl</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>Recent Developments in the Evaluation of Information Retrieval Systems: Moving Toward Diversity and Practical Applications</article-title>
          . In: Informatica - An
          <string-name>
            <surname>Intl</surname>
          </string-name>
          .
          <source>Journal of Computing and Informatics</source>
          vol.
          <volume>32</volume>
          . pp.
          <fpage>27</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Mandl</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Womser-Hacker</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2005</year>
          .
          <article-title>The Effect of Named Entities on Effectiveness in Cross-Language Information Retrieval Evaluation</article-title>
          .
          <source>In: Proc 2005 ACM SAC Symposium on Applied Computing (SAC)</source>
          .
          <source>Santa Fe</source>
          , New Mexico, USA. March 13.-
          <fpage>17</fpage>
          .
          <year>2005</year>
          . pp.
          <fpage>1059</fpage>
          -
          <lpage>1064</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Robertson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>On the history of evaluation in IR</article-title>
          .
          <source>In: Journal of Information Science</source>
          <volume>34</volume>
          (
          <issue>4</issue>
          ). pp.
          <fpage>439</fpage>
          -
          <lpage>456</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Schäuble</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Griesbaum</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Mandl,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <year>2009</year>
          .
          <article-title>Mehrwertpotenziale von Online-Social-Business-Netzwerken für die Personalbeschaffung von Fach- und Führungskräften</article-title>
          . In: Informatik 2009 - Beiträge 39.
          <article-title>Jahrestagung der Gesellschaft für Informatik e.V. (GI) Lübeck</article-title>
          [LNI P-154] pp.
          <fpage>2166</fpage>
          -
          <lpage>2180</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Tawileh</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Mandl,
          <string-name>
            <given-names>T.</given-names>
            ;
            <surname>Griesbaum</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2010</year>
          .
          <article-title>Evaluation of five web search engines in Arabic language</article-title>
          . In: LWALernen - Wissensentdeckung
          <string-name>
            <surname>- Adaptivität: Proc Workshopwoche</surname>
            <given-names>GI</given-names>
          </string-name>
          , Universität Kassel. Workshop Information Retrieval. http://www.kde.cs.uni-kassel.de/conf/lwa10/papers/ir1.pdf
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>