<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Personalized Recommendations in Police Photo Lineup Assembling Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ladislav Peska</string-name>
          <email>peska@ksi.mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hana Trojanova</string-name>
          <email>trojhanka@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Psychology, Faculty of Arts, Charles University</institution>
          ,
          <addr-line>Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Software Engineering, Faculty of Mathematics and Physics, Charles University</institution>
          ,
          <addr-line>Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>2203</volume>
      <fpage>157</fpage>
      <lpage>160</lpage>
      <abstract>
        <p>In this paper, we aim to present a novel application domain for recommender systems: police photo lineups. Photo lineups play a significant role in the eyewitness identification prosecution and subsequent conviction of suspects. Unfortunately, there are many cases where lineups have led to the conviction of an innocent persons. One of the key factors contributing to the incorrect identification is unfairly assembled (biased) lineups, i.e. that the suspect differs significantly from all other candidates. Although the process of assembling fair lineup is both highly important and time-consuming, only a handful of tools are available to simplify the task. We describe our work towards using recommender systems for the photo lineup assembling task. Initially, two non-personalized recommending methods were evaluated: one based on the visual descriptors of persons and the other their content-based attributes. Next, some personalized hybrid techniques combining both methods based on the feedback from forensic technicians were evaluated. Some of the personalized techniques significantly improved the results of both non-personalized techniques w.r.t. nDCG and recall@top-k.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Evidence from eyewitnesses often plays a significant role
in criminal proceedings. A very important part is the lineup,
i.e., eyewitness identification of the perpetrator. Lineups
may lead to the prosecution and subsequent conviction of the
perpetrator. Yet there are cases where lineups can played a
role in the conviction of an innocent suspect. This forensic
method consists of the recognition of persons or things and
thus is linked with a wide range of psychological processes
such as perception, memory, and decision making. Those
processes can be influenced by the lineup itself. In order to
prevent witnesses from making incorrect identifications, the
lineup assembling task is among the top research topics of
the psychology of eyewitness identification [
        <xref ref-type="bibr" rid="ref13">1, 4, 6, 9, 10</xref>
        ].
      </p>
      <p>
        The sources of error in eyewitness identifications are
numerous. Some variables affecting error probability are on
the side of the witness (e.g., level of attention, age or
ethnicity) and the event (e.g., distance, lighting, time of the
day) and in general cannot be controlled [6, 9]. Controllable
variables include the method of questioning, identification
procedure, interaction with investigators, and similar [
        <xref ref-type="bibr" rid="ref13">9, 10</xref>
        ].
      </p>
      <p>One of the principal recommendations for inhibiting
errors in identification is to assemble lineups according to
the lineup fairness principle [1, 5]. Lineup fairness is usually
assessed on the basis of data obtained from "mock
witnesses" - people who have not seen the offender, but
received a short description of him/her. Lineup fairness
measures a bias against the suspect and defining the
assembled lineup as fair if mock witnesses are unable to
identify a suspect based only on a brief textual description.
See Figure 1 for an example of a highly biased lineup.</p>
      <p>
        Assembling photo lineups, i.e., finding candidates for
filling the lineup for a particular suspect, according to the
lineup fairness principle is a challenging and
timeconsuming task involving the exploration of large datasets
of candidates. In the recent years, some research projects [
        <xref ref-type="bibr" rid="ref14">4,
11</xref>
        ] as well as commerce activities, e.g., elineup.org, aimed
to simplify the process of eyewitness identifications.
However, they mostly focused on the lineup administration
and do not support intelligent lineup assembling.
      </p>
      <p>From the point of view of recommender systems, lineup
assembling is quite specific task for several reasons. Users
of the system are respected experts, who assemble lineups
regularly, although, usually, not on a daily bases. Therefore,
we can expect a steady flow of feedback from long-term
users. Also, each lineup assembling task is highly unique,
i.e., the same suspect hardly ever appears in multiple lineups.
Thus, some popular approaches incorporating collaborative
filtering [2] or “the wisdom of the crowd” cannot be applied
in this scenario. Last, but not least, the relevance judgement
is highly based on the visual appearance and/or similarity of
the suspect and lineup candidate.</p>
      <p>In this paper, we describe our work in progress towards
designing recommender systems aiding user to assemble fair
lineups. In our previous work, we evaluated two
nonpersonalized, item-based recommending strategies [8].
Based on the initial evaluation of non-personalized methods,
we propose a content-based personalized approach
combining both non-personalized techniques, aiming to
rerank the list of proposed candidates according to the
longterm preferences of the user.</p>
      <p>More specifically, main contributions of this paper are:
 Proposed and evaluated hybrid personalized
recommendation method.
 Dataset of assembled lineups with both positive and
negative training examples.</p>
      <p>To the best of our knowledge, our work is the first
application of recommender systems principles on the lineup
assembling task.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Item-based Recommendations</title>
      <sec id="sec-2-1">
        <title>Dataset of Lineup Candidates</title>
        <p>Although there are several commercial lineup databases1,
we need to approach carefully while applying such datasets
due to the problem of localization. Not only are the racial
groups highly different e.g., in North America (where the
datasets are mostly based) and Central Europe, but other
aspects such as common clothing patterns, haircuts or make
up trends vary greatly in different countries and continents.
Uunderlined datasets should follow the same localization as
the suspect in order to inhibit the bias of detecting strangers
or having the incorrect ethnicity in a lineup. We evaluated
the proposed methods in the context of the Czech Republic.
Although the majority of the population is Caucasian, mostly
of Czech, Slovak, Polish and German nationality, there are
large Vietnamese and Romany minorities which make
lineup assembling more challenging. We collected the
dataset of candidate persons from the wanted and missing
persons application2 of the Police of the Czech Republic. In
total, we collected data about 4,423 missing or wanted
males. All records contained a photo, nationality, age and
appearance characteristics such as: (facial) hair color and
style, eye color, figure shape, tattoos and more. More
information about the dataset may be found in [8].</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2 Item-Based Recommending Strategies for Lineup</title>
      </sec>
      <sec id="sec-2-3">
        <title>Assembling</title>
        <p>In our previous work [8], we proposed two
nonpersonalized recommending strategies, where the list of
proposed candidates is based on the similarity between the
suspect and lineup candidates. We use the underlined
assumption that the lineup fairness can be approximated
through the similarity of the suspect and fillers, i.e. by filling
lineups with candidates similar to the suspect, we ensure
that lineups remain unbiased.</p>
        <p>Content-based Recommendation Strategy (CB-RS)
leverages the collected content-based attributes of
candidates. We employed the Vector Space Model [3] with
1 e.g., http://elineup.org
2 aplikace.policie.cz/patrani-osoby/Vyhledavani.aspx
3 The ordering of candidates proposed by each method
was maintained, i.e., the randomness was applied on the
binarized features, TF-IDF weighting and cosine similarity.
CB-RS strategy was intended to be closely similar to the
attribute-based searching, which is commonly available in
lineup assembling tools.</p>
        <p>Recommendation Based on visual features (Visual-RS)
leverages the similarity of visual descriptors received from a
pre-trained CNN (VGG network for facial recognition
problems, VGG-Face [7], in our case). More information is
available in the previous work [8].
2.3</p>
      </sec>
      <sec id="sec-2-4">
        <title>Evaluation of Item-Based Recommenders</title>
        <p>To make this paper self-contained, let us briefly describe
the results of non-personalized recommendation strategies.</p>
        <p>The evaluation was based on a user study of domain
experts, i.e., forensic technicians, whose task was to select
best lineup candidates out of the ones recommended by both
techniques. More specifically, 30 persons were selected
from the dataset to play the role of suspects. For each
suspect, both non-personalized recommendation strategies
proposed top-20 candidates that were merged into a single
list3 and displayed together with the suspect to the domain
experts. Domain experts selects the most suitable candidates;
these were considered as positively preferred. Participants
were instructed to maintain lineup fairness principles, they
were allowed to produce incomplete lineups if no more
suitable candidates were available, or select more candidates
if they were equally eligible.</p>
        <p>The evaluation was performed by seven forensic
technicians from the Czech Republic, with 202 assembled
lineups and 800 selected candidates in total. Table 1
illustrates overall results of the user study. One can observe
that although Visual-RS clearly outperformed CB-RS, also
the candidates recommended by CB-RS were selected quite
often. Together with the surprisingly low size of the
intersection (1.83%) between the lists of recommended
candidates and relatively high level of disagreement among
participants on the selected candidates, the results indicate
that some merged, personalized strategy is plausible.
Furthermore, as the mean rank of selected candidates was
relatively high for both methods (8, resp. 9 out of 20), there
is a room for some re-ranking approach.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Personalized Recommendations</title>
      <p>Based on the evaluation of non-personalized, item-based
recommending techniques,
we hypothesized that the
proposed recommendations can be further improved by
Non-personalized similarity based on the  1
distances (baseline)
Linear regression (denoted as LM in the evaluation)</p>
      <sec id="sec-3-1">
        <title>Lasso regression (Lasso)</title>
      </sec>
      <sec id="sec-3-2">
        <title>Decision tree (Dec. tree)</title>
      </sec>
      <sec id="sec-3-3">
        <title>Gradient boosted tree (GBT) As the initial evaluation of the proposed method was only</title>
        <p>4 Please note that although the classification is a natural
choice due to the binary output variable, the final output of
the method should be ranking of candidates. Thus, we also
evaluate several regression-based machine learning methods
and in case of classification method, we use positive class
probability score as ranking.</p>
        <p>5 We use the</p>
        <p>methods’ implementation from sci-kit
package, http://scikit-learn.org.
interactions with the system are in the form of triples
  : {</p>
        <p>( ,  )}, where  is the suspect of some previously
created lineup,  is a recommended candidate and   = 1 if 
was
selected to the lineup
and 
 = 0
otherwise.</p>
        <p>Furthermore, both  and  can be represented by three sets of
attributes:



employing some content-based personalized techniques. We
partially successful (machine learning methods were to able
approach this task through state-of-the-art machine learning
methods as follows.
significantly improve the baseline only in the case of
  attribute set), we further proposed a hybrid approach
Suppose that for arbitrary user  , his/her previous
integrating two components:
represents the visual descriptor based on the
3.2</p>
        <sec id="sec-3-3-1">
          <title>Evaluation of Personalized Recommendations</title>
          <p>are TF-IDF values of content-based attributes
of each object.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>VGG-Face network.</title>
        <p>The union of both sets:  
∪  
Suppose
that equations
below
represents scoring
functions of the non-personalized recommending strategies.
  ( ,  ) =</p>
        <p>1
1 + ∑ ∈  |  −   |


( ,  ) =</p>
        <p>1
1 + ∑ ∈  |  −   |</p>
        <p>Now, let us define a personalized classification /
regression task4 with the train set examples constructed as
follows. For each  ∈   , the output variable  =  and the
list of dependent variables  
are constructed as a
subtraction of suspect’s and candidate’s attributes for a set
of attributes  : ∀ ∈  :   ≔ |  −   |.</p>
        <p>Given an arbitrary classification method  , the model of
user preferences   , is trained by applying method 
on
the per-user train set {(  ,  )}. When the user starts a new
lineup task with some new suspect  ,̅ the lineup candidates
are ranked according to their probability to be selected in the
lineup:

 ≔  (  ( ,̅  ) = 1|  , ).</p>
        <p>We would like to note that such recommendation scenario
is quite challenging as we do not have any feedback from the
current lineup and need to rely solely on the long-term user
preferences (note the relation to the page zero problem or
homepage recommendation problem). On the other hand,
quite complex learning methods can be used, because the
time-span between two consecutive lineup assembling
performed by the same forensic technician tends to be rather
large.</p>
        <p>Following preference learning methods were evaluated5:
Predictions of a selected machine learning method
on</p>
        <p>attribute set.</p>
      </sec>
      <sec id="sec-3-5">
        <title>Predictions based on a</title>
        <p>distance metric applied on  
non-personalized  1
attribute set.</p>
        <p>Both
prediction
techniques
are
aggregated
via
probabilistic sum, i.e.,   ≔  

+   
−   
×    . This
approach is denoted as hybrid in the evaluation.</p>
        <p>The main goal of the personalized recommendations
evaluation is to clarify, whether the long-term
user
preferences, i.e., collected during some previous lineups
assembling, can be utilized to improve the list of
recommended candidates for the current lineup.</p>
        <p>In order to confirm this hypothesis, we performed an
offline evaluation on the dataset of assembled lineups collected
during the evaluation of item-based recommendations. The
resulting dataset contained in total 7659 records (800
positive and 6859 negative), i.e., in average 1094 records per
user. Proposed methods were evaluated based on the 10-fold
cross-validation
protocol
applied
on
the
lineups.</p>
        <p>Hyperparameters of the methods were learned via
gridsearch on an internal leave-one-lineup-out protocol.</p>
        <p>For each tested lineup, each recommending method
reranks objects originally displayed to the forensic technicians
according to the computed relevance   (selected candidates
should appear on top of the list). We measure normalized
discounted cumulative gain (nDCG), recall at top-10 and
on the average results for all evaluated users and lineups.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>The main aim of this work in progress was to analyze the
applicability of recommender systems principles in the
problem of photo lineup assembling. Although the photo
lineup assembling task is both important and
timeconsuming task, state-of-the-art tools do not provide
intelligent search API beyond simple attribute search and to
the best of our knowledge, apart from our work, there are no
papers utilizing recommending principles in the lineup
assembling task.</p>
      <p>After the initial evaluation of item-based recommending
algorithms, we proposed several variants of content-based
personalized recommending algorithms utilizing long term
preferences of the user. The off-line evaluation confirmed
that long-term preferences can be used to improve the final
ranking of candidates, however, only in case of
contentbased attributes.</p>
      <p>Proposed approaches remained ineffective in the case of
visual descriptors, so one direction of our future work is to
further analyze this problem and providing solutions suitable
also for visual descriptors. Siamese networks merging both
content-based and visual descriptors seems particularly
suitable for the task. Another option is to use visual
descriptors as a base for short-term user preferences, i.e., the
ones expressed in the current lineup and refine the
recommended objects based on the already selected
candidates.</p>
      <p>Textual description of the suspect also plays an important
role in the lineup assembling, as forensic technicians often
tries to select candidates that match mentioned, highly
specific, features, e.g., scars, skin defects, specific haircut
etc. Another direction of our future work would aim to
incorporate searching for these specific features in a “guided
recommendation” API. Selecting specific regions of interest
within the suspect’s photo seems to be a suitable initial
strategy.</p>
      <p>Finally, the long term goal of our work is to move from
the recommendation of candidates to the recommendation of
assembled lineups and to provide a ready-to-use software for
forensic technicians.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was supported by the Czech grants
GAUK232217 and Q48.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Brigham</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          <year>1999</year>
          .
          <article-title>Applied issues in the construction and expert assessment of photo lineups</article-title>
          . Applied Cognitive Psychology. (
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          DOI:https://doi.org/10.1002/(SICI)
          <fpage>1099</fpage>
          -
          <lpage>0720</lpage>
          (
          <issue>199911</issue>
          )13:
          <article-title>1+&lt;S73::AID-ACP631&gt;3.3</article-title>
          .CO;2-W.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          et al.
          <year>2008</year>
          .
          <article-title>Collaborative Filtering for Implicit Feedback Datasets</article-title>
          .
          <source>Proceedings of the 2008 Eighth IEEE International Conference on Data Mining</source>
          (Washington, DC, USA,
          <year>2008</year>
          ),
          <fpage>263</fpage>
          -
          <lpage>272</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Lops</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          et al.
          <year>2011</year>
          .
          <article-title>Content-based Recommender Systems: State of the Art and</article-title>
          <string-name>
            <given-names>Trends. Recommender Systems</given-names>
            <surname>Handbook. F. Ricci</surname>
          </string-name>
          et al., eds. Springer US.
          <volume>73</volume>
          -
          <fpage>105</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>MacLin</surname>
            ,
            <given-names>O.H.</given-names>
          </string-name>
          et al.
          <year>2005</year>
          .
          <article-title>PC_eyewitness and the sequential superiority effect: Computer-based lineup administration</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Law</surname>
            and
            <given-names>Human Behavior.</given-names>
          </string-name>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>DOI:https://doi.org/10.1007/s10979-005-3319-5.</mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Mansour</surname>
            ,
            <given-names>J.K.</given-names>
          </string-name>
          et al.
          <year>2017</year>
          .
          <article-title>Evaluating lineup fairness: Variations across methods and measures</article-title>
          .
          <source>Law and Human Behavior</source>
          . (
          <year>2017</year>
          ). DOI:https://doi.org/10.1037/lhb0000203.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Meissner</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Brigham</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          <year>2001</year>
          .
          <article-title>Thirty Years of Investigating the Own-Race Bias in Memory for Faces: A Meta-Analytic Review</article-title>
          . Psychology, Public Policy, and Law.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Parkhi</surname>
            ,
            <given-names>O.M.</given-names>
          </string-name>
          et al.
          <year>2015</year>
          .
          <article-title>Deep Face Recognition</article-title>
          .
          <source>Procedings of the British Machine Vision Conference</source>
          <year>2015</year>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Peska</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Trojanova</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <year>2017</year>
          .
          <article-title>Towards recommender systems for police photo lineup</article-title>
          . ACM International Conference Proceeding Series (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Shapiro</surname>
            ,
            <given-names>P.N.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Penrod</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>1986</year>
          .
          <article-title>Meta-Analysis of Facial Identification Studies</article-title>
          . Psychological Bulletin.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Steblay</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          et al.
          <year>2003</year>
          .
          <article-title>Eyewitness Accuracy Rates in Police Showup and Lineup Presentations: A Meta-Analytic Comparison</article-title>
          . Law and Human
          <string-name>
            <surname>Behavior</surname>
          </string-name>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Valentine</surname>
            ,
            <given-names>T.R.</given-names>
          </string-name>
          et al.
          <year>2007</year>
          .
          <article-title>How can psychological science enhance the effectiveness of identification procedures? An international comparison</article-title>
          .
          <source>Public Interest Law Reporter.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>DOI:https://doi.org/10.1017/CBO9781107415324.004.</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>