=Paper=
{{Paper
|id=Vol-2960/paper14
|storemode=property
|title=LiGAN: Recommending Artificial Fillers for Police Photo Lineups (Short paper)
|pdfUrl=https://ceur-ws.org/Vol-2960/paper14.pdf
|volume=Vol-2960
|authors=Patrik Dokoupil,Ladislav Peska
|dblpUrl=https://dblp.org/rec/conf/recsys/DokoupilP21
}}
==LiGAN: Recommending Artificial Fillers for Police Photo Lineups (Short paper)==
LiGAN: Recommending Artificial Fillers for Police Photo Lineups Patrik Dokoupil1 , Ladislav Peska1 1 Faculty of Mathematics and Physics, Charles University, Malostranské nám. 25, Prague, Czech Republic Abstract Police photo lineups are an important part of criminal proceedings, where the task is to identify the perpetrator among photos of other persons (fillers). In order to prevent major errors in criminal proceedings, lineups should be unbiased (i.e. the suspect and fillers should share similar appearance characteristics). Capability to assemble unbiased lineups is often hindered by the lack of effective methods to explore the database of fillers (i.e. good fillers are hard to be found), but also by the insufficient size of the database itself (i.e. no good fillers exist). In this demo, we present LiGAN application aiming on on- the-fly recommendation of artificial fillers for police photo lineups. We consider this to be a highly novel recommending task, where items can be generated with arbitrary density and arbitrary precision to the (estimated) user’s needs. LiGAN utilizes StyleGAN2 architecture to generate images, identity-preserving autoencoder for suspect seeding and optional model fine- tuning for individual lineups. It recommends fillers based on the semantic proximity to the suspect, or as an interpolation between suspect and filler images. As such, LiGAN aims to contribute towards both the fillers existence and the fillers findability problems. Keywords Recommender Systems, Police Photo Lineups, Generative Adversarial Networks 1. Introduction and Related Work figure size, etc.). Examples of biased and unbiased lineups are depicted on Figure 1. Eyewitness identification of suspects is an important part In the current police praxis, photo lineups are still of criminal proceedings. It often leads to the prosecu- mostly constructed manually via browsing through the tion and eventual conviction of crime perpetrators, but (limited) database of available fillers. This brings two it is also prone to the human errors [1]. There are doc- problems. First, manual browsing is rather tedious, so ei- umented cases, where incorrect eyewitness testimony ther the construction of unbiased lineups takes excessive led to false accusation and conviction of innocent sus- amount of time, or (partially) biased lineups are produced. pects and therefore, error-proof methods for eyewitness The second problem comes with the features of the fillers identification is an intensively studied research subject. database, which (mainly due to various legal constraints) One of the recommended approaches is identification often contains only several thousands of photos. In addi- via photo lineup. In this case, a witness receives a selec- tion to that, various appearance characteristics are often tion of several photos (usually four to eight), where one not represented evenly, so suitable fillers may be un- depicts the suspect and others depict additional persons available for some suspects and constructing an unbiased (so-called fillers), that are known not to be on the crime lineup is not possible. [3] scene. The idea behind photo lineup is that only a rea- In our previous work, we focused on the first problem sonably certain witness can identify the perpetrator if and considered it from the perspective of content-based similar fillers are present [2]. As such, the requirement recommender systems (RS) [4]. We utilized the semantic for the suspect-fillers similarity is crucial. In criminal psy- similarity of photos induced by a pre-trained convolu- chology literature, the suspect-fillers similarity problem tional network and recommended fillers similar to the is often formulated as (un)biased lineups: the lineup is suspect as well as other members of so-far constructed biased if the suspect’s photo poses considerably different lineup. This approach led to a reduction of task’s tem- appearance characteristics. Those can be both features poral complexity, but we did not tackle the database size of the person (age, skin color, face shape, haircut, etc.), problem. but also features of the photography (background, angle, In this demo paper we present LiGAN, an experimental application based on Generative Adversarial Networks 3rd Edition of Knowledge-aware and Conversational Recommender (GANs). LiGAN provides on-the-fly generation and rec- Systems (KaRS) & 5th Edition of Recommendation in Complex ommendation of artificial fillers for police photo lineups. Environments (ComplexRec) Joint Workshop @ RecSys 2021, September 27–1 October 2021, Amsterdam, Netherlands With this approach, we aim to contribute towards solving " patrik.dokoupil1996@gmail.com (P. Dokoupil); both the problem of database size as well as the problem ladislav.peska@matfyz.cuni.cz (L. Peska) of fillers discovery. Nonetheless, recommending artificial © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). objects, which (in theory) can be constructed with an CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Figure 1: Examples of an unbiased (left) and a biased (right) photo lineups. For the sake of convenience, red borders denote a suspect (note that no such distinction is given in actual lineups). While the suspect’s on the left resembles appearance of other persons in the lineup, the suspect on the right considerably differs (younger, no beard). Images were generated by LiGAN tool and do not show real persons. unlimited density and unlimited proximity to the user’s generator to construct artificial images w.r.t. the sup- needs brings interesting theoretical challenges as well. plied style. Recommending component is responsible In the next section we describe LiGAN application, while for the modifications of style vectors, so suitable fillers we briefly present some of the theoretical challenges in are provided to the user. LiGAN features a REST-like the discussion section. webserver that encapsulates generator components and While the proposed application domain (recommend- tracks individual user sessions (e.g. for the sake of model ing artificial fillers for photo lineups) is brand new, there fine-tuning). Due to space limitations we only briefly are some related approaches in other domains. GANs describe the main principles behind LiGAN, details can themselves are frequently present in RS literature, but be found in [10]. rarely used for image synthesis [5]. One notable excep- tion is the fashion domain, where GANs are often used 2.1. On-Demand Fillers Generation to construct artificial clothing [6, 7, 8, 9]. An underlying motivation of these approaches is to help designers to For image generation, we utilized a state-of-the-art Style- find new styles of products that users might like althoughGAN2 [11] architecture. StyleGAN2 training is conducted they do not exist yet. as a zero sum game between two model components: The Kang et al.[6] use conditional GAN where the gen- generator 𝐺 receives a random seed vector 𝑧 ∈ 𝒵 (512 di- erator receives a user and an item category and then mensions)1 and aims to generate images that fits into the produces items that are most consistent with the given training dataset. The discriminator 𝐷 aims to distinguish category as well as user preferences. The main difference between real and generated images. We trained the model to our approach is the usage of conditional GAN (i.e. gen-from scratch based on the dataset of missing and wanted erator is directly conditioned on product category) while persons from two Central European countries. After the we do not utilize conditioning, but instead employ an pre-processing steps, the dataset contained over 90000 identity-preserving encoder to reconstruct the suspect passport-style photos with the resolution of 256 × 256 image. Analogical differences can be found also between pixels. our approach and the work of Yang et al.[7], Shih et al.[8] Instead of constructing a simple dataset of generated and Kumar et al. [9] who focus on generating compatible figures, we decided to embrace the opportunity to gener- fashion items. ate fillers on-demand based on the suspect’s photography. This approach provide more versatility than just selecting from a fixed dataset (e.g. it allows to fine-tune the model 2. LiGAN Application for particular suspect or react on user’s feedback). We relied on StyleGAN’s similarity-preservation feature, i.e. From user’s perspective, LiGAN is a classical single-page that similar input vectors produce similar output images. web application (see Figure 2. It allows to upload sus- In order to exploit this feature, we trained an identity- pect’s photo, select recommended fillers or provide ad- preserving encoder 𝐸 that aims to minimize distances 𝑖𝑑 ditional feedback and iteratively construct the lineup. between 𝑖𝑚𝑔 and 𝑖𝑚𝑔 ¯ , where 𝑖𝑚𝑔 ¯ = 𝐺(𝐸𝑖𝑑 (𝑖𝑚𝑔)). Main components of LiGAN’s backend are StyleGAN2 generator 𝐺, identity-preserving encoder 𝐸𝑖𝑑 and recom- 1 For the sake of feature disentanglement and generator stabil- mending component 𝑅. The encoder transforms images ity, StyleGAN2 uses a mapping network to transform the seed vector into corresponding style vectors that are utilized by the 𝑧 ∈ 𝒵 into a style vector 𝑤 ∈ 𝒲 (512 dimensions), that is sup- plied to all layers of the StyleGAN architecture. Figure 2: Screenshot of LiGAN application. Several encoder architectures and distance metrics were already selected fillers as well. This could help to get rid considered, but training an encoder to the original input of the "centering" effect, i.e. that the suspect is in an imag- vector space 𝒵 or the style vector space 𝒲 was not suc- inary center of all fillers’s appearance characteristics and cessful. Resulting 𝑖𝑚𝑔 ¯ s were either of insufficient quality therefore easier to be identified. In both cases, the neces- or too different from the original 𝑖𝑚𝑔 (see Figure 3 left). sary levels of diversity and filler-based recommendations We suspected that too much information is lost with are not known upfront and should be assessed online the reduction into 𝒵 or 𝒲 and therefore, we extended based on user’s feedback. Recommending component de- the encoder’s output space to allow supplying different scribed in the next section is responsible for appropriate style vectors for each StyleGAN’s layers similarly as in selection of filler’s style vectors. [12]. I.e., the encoder produces a matrix of style vectors The crucial part of LiGAN design is the identity pre- 𝑤+ ∈ 𝒲+ with 12 × 512 dimensions. This extension serving encoder. The quality of suspect’s reconstruction considerably improved the identity preservation. from learned style matrix directly affects the ability to Once the 𝑤+ mapping is obtained, similar fillers can propose relevant fillers. However, despite our effort, the in theory be generated by small variations of the vector. results were sometimes not satisfactory (see Figure 3 In our early attempts, we implemented these variations right). In order to cope with this problem, we allowed as a random sampling from a hyperball around the partic- to fine-tune the encoder 𝐸𝑖𝑑 and the generator 𝐺 for ular 𝑤+ vector. Nonetheless, sampling from 𝒲+ space the particular suspect’s image. Such fine-tuning is rather often provided poor results (see Figure 3 middle). There- fragile as if sufficient steps are performed, it would even- fore, we prepend a PCA dimensionality reduction before tually cause a mode collapse. Therefore, the time allowed the sampling phase. PCA was trained w.r.t. 𝑤+ vectors for fine-tuning is limited and user is allowed to modify it corresponding to the sample of 10000 randomly gener- if necessary. Nonetheless, in several cases, fine-tuning ated images and after the hyperparameter tuning, the subjectively improved the results of identity-preserving output dimensionality was set to 256. This helps to focus transformation as can be seen on Figure 3 right-bottom. the sampling procedure towards images that resemble Overall, the fillers generation procedure is as follows: real persons better. upon the receipt of suspect’s photo 𝑖𝑚𝑔𝑠 a corresponding In theory, we can generate persons with infinitely close reduced style vector is generated 𝑤𝑠𝑃 𝐶𝐴 = 𝑃 𝐶𝐴(𝐸𝑖𝑑 (𝑖𝑚𝑔𝑠 )). style vectors that would be rather indistinguishable from This vector, together with 𝑤𝑙𝑃𝑖 𝐶𝐴 vectors of already se- suspect. However, this is not a desired output as it would lected lineup members is supplied to the recommend- render the lineup identification impossible. Instead, cer- ing component that outputs vectors of recommended tain level of noise needs to be introduced in the fillers fillers 𝑤𝑓,1 𝑃 𝐶𝐴 𝑃 𝐶𝐴 , ..., 𝑤𝑓,𝑘 . Then, all fillers’ vectors are generation procedure. Also, newly generated fillers may transformed back to the 𝒲+ space via inverse PCA and be sampled from the space around the style vectors of StyleGAN2’s generator is used to generate individual im- Figure 3: Illustratory examples behind LiGAN design choices: variants of encoder architecture and output space (left), using dimensionality reduction before sampling (middle) and fine-tuning generation network for particular suspect (right). Original images were taken from the train dataset (left) and FEI Face Database [13] (right). ages: 𝑖𝑚𝑔𝑓,𝑖 = 𝐺(𝑃 𝐶𝐴−1 (𝑤𝑓,𝑖 𝑃 𝐶𝐴 )). These images are with recommenders 𝑟0 , 𝑟1 and 𝑟2 , each of them receiving then presented to the user. equal initial consumption statistics (i.e. 𝛼0 and 𝛽0 param- User has several feedback options (asking for less simi- eters from Eq. 1). For each recommended position and lar, more like this or more similar recommendations, trig- each eligible recommender 𝑟𝑖 , a random value 𝑏𝑖 from a gering interpolation between a filler and the suspect or beta distribution of its convergence statistics is sampled initiating a fine-tuning) that may modify the internal and the recommender with the highest value is selected model of the LiGAN and trigger a new recommendation to fill this position. Specifically, process. 𝑏𝑖 = 𝐵𝑒𝑡𝑎(𝛼0 + 𝑝𝑜𝑠𝑖 , 𝛽0 + 𝑠ℎ𝑜𝑤𝑛𝑖 − 𝑝𝑜𝑠𝑖 ) (1) 2.2. Fillers Recommendation where 𝑝𝑜𝑠𝑖 denotes the sum of positive feedback (e.g. selecting recommended filler for the lineup) received by Once the image generator and the identity preserving recommender 𝑟𝑖 and 𝑠ℎ𝑜𝑤𝑛𝑖 denotes the total volume encoder are established, the important question is how of recommendations given by 𝑟𝑖 . to select fillers (or their corresponding style vectors). With this solution alone, recommendations can be We assume that two key concepts should be considered tuned over time to have a desired distance from the sus- during the selection process. First, fillers should maintain pect, but only within a fixed pre-defined range. This is certain level of diversity from the suspect, but the user impractical as estimating such range is very tricky and it should have some means to tune this diversity. Second, may also differ for various areas of the style vector space. fillers should be mainly generated based on the suspect, Therefore, we provide users users with explicit options but already selected fillers may play some role in the to increase / decrease the distance between the suspect recommendation process as well. and recommended fillers (i.e., "More similar" and "Less As the expected level of diversity is unknown up front, similar" buttons). Each time the button is pressed, the rec- we decided to learn it on-line based on the Thompson ommender selection process is performed as usual, but sampling multi-armed bandits [14]. Specifically, we con- the actual recommender that provides recommendation struct a series of recommenders 𝑟𝑖 ∈ ℛ. Each recom- is shifted in the direction of expressed user desire. For ex- mender 𝑟𝑖 , upon receiving a source vector 𝑤𝑠𝑃 𝐶𝐴 , sam- ample, if the user clicked on "Less similar" button and 𝑟𝑖 is ples a filler from a hollow hyperball around it, i.e. from a selected via Thompson sampling to fill the position, 𝑟𝑖+𝑘 space bounded by two spheres, with the center at 𝑤𝑠𝑃 𝐶𝐴 recommender is used instead. If user hits the "Less simi- and diameters 𝑑𝑖−1 and 𝑑𝑖 . I-th diameter is constructed lar" button again, 𝑟𝑖+2𝑘 is used and so on. Furthermore, as 𝑑𝑖 = 𝑏𝑎𝑠𝑒 * 𝑐𝑖 , where 𝑏𝑎𝑠𝑒 is an initial diameter and if user selects a filler supplied by 𝑟𝑖+2𝑘 recommender, it 𝑐 is a steepness hyperparameter governing how quickly is added to the pool of initially eligible recommenders should we converge towards more/less similar recom- with appropriate consumption statistics, so the next time mendations. As such, the previous recommender to the the suspect is submitted, more appropriate initial recom- current one, 𝑟𝑖−1 , generates strictly more similar fillers, mendations are given. The 𝑘 hyperparameter governs while the next recommender, 𝑟𝑖+1 , generates strictly less the steepness of similarity traversal steps. We set 𝑘 = 3, similar fillers than the current one. In the current version i.e., in the initial case the adjacent triple of recommenders of LiGAN, we kept 𝑐 = 1.2 and leave experiments with would be utilized. In addition to the selection-based pos- the steepness factor on future work. itive feedback, we also consider that simple asking for We follow the same approach to generate the final list more / less similar results is a form of (weaker) positive of recommendations as proposed by Broden et al. [14] feedback. Therefore, all recommenders involved in the with one important distinction: the list of eligible rec- generation of the next list of recommendations receive a ommenders changes based on user feedback. We start small volume of positive feedback. As such, convergence towards proper diversity thresholds is secured even if no ciently represented in the training data. Legal challenges filler is selected and the user, e.g., starts to fine-tune the (although interesting) are out of scope of our research. model. However, we believe that before such questions may be Next, for each recommended position, we select at even risen, the technical feasibility have to be sufficiently random with a fixed probability, whether the suspect demonstrated. Nonetheless, even before legal issues are (p=0.7) or one of the fillers (p=0.3) should be utilized solved, artificial fillers may prove beneficial e.g. for po- as a center of the sampling process. We opted for this lice training (no need to consider privacy issues as with simple procedure mainly to gain some initial feedback real person’s photos). on both approaches. For the future work, we would like Fillers recommendation in LiGAN is rather basic at the to focus on modelling a joint probability based on both moment. We approached the problem as session-based suspect and fillers similarly as [4] does for a fixed set of recommendation with on-line learning and a background candidates. knowledge represented by the person’s style vectors. Ac- Finally, LiGAN also allows users to manually decrease cording to the common nomenclature, suspect’s and se- the desired diversity between the suspect and a selected lected filler’s photos play the role of items "visited" in the filler through image interpolations. In this case, two pho- current session. From this perspective, asking for more / tos (𝑖𝑚𝑔𝑠 , 𝑖𝑚𝑔𝑓 ) are supplied and a linear interpolation less similar recommendations as well as interpolations between the corresponding 𝑤𝑠𝑃 𝐶𝐴 and 𝑤𝑓𝑃 𝐶𝐴 vectors is can be considered as a special cases of recommendation calculated. LiGAN then displays fillers corresponding to critiquing. the individual interpolated points. Due to the reasonable Furthermore, we would like to note that once there level of feature disentanglement in StyleGAN architec- is an unbound volume of candidates for recommendation, ture, interpolated fillers empirically provide a smooth many commonly utilized recommending approaches have transition of one person into another. to be re-considered before application. For instance, rec- ommending items most similar to the user’s profile (i.e. suspect’s photo) does not seem sensible as we can easily 3. Discussion and Outlook generate near-duplicates with no practical applicability. By developing LiGAN application, we hope to contribute The need for diversity, novelty, coverage or fairness towards both the practical problem of unbiased lineups of representation greatly increased, but many paradigms construction, but also provide foundations for a novel sub- used to incorporate these metrics were tailored for a area of RS: recommending artificially generated objects. finite set of items [15, 16, 17]. Sampling from the recom- Artificial fillers has the potential to improve the lineup mendable objects and subsequent post-processing is a construction process if the following conditions are met: plausible first approach, but it may be more interesting 1) we can generate images of sufficient quality, 2) poten- to incorporate e.g. diversity or fairness preservation into tial witnesses cannot reliably distinguish between real the sampling process itself. and artificial photos, 3) we can pre-select suitable filler In the current version of LiGAN we only tackled this candidates automatically and 4) legal conditions has to be problem via on-line learning of the sampling radius, but met. Although additional improvements are necessary, we believe that re-formulating e.g. per-list diversity we believe that LiGAN shows that first three conditions preservation into a continuous probability distribution are feasible. The first condition is mainly the question problem may be an interesting future work. Also, several of computational power and data availability as shown directions of long-term user preference may be explored in other StyleGAN2 applications [11]. We consider the as well, e.g. learning the personalized sampling radius for current LiGAN’s generator as sufficient for a showcase, individual style dimensions, or focusing on an interplay but plan to expand both image’s resolution as well as between the suspect-based and fillers-based distances. train data diversity in the future. For the second condition, we conducted a user study with 80 participants to evaluate their capability to distin- Acknowledgments guish between real and generated photos. Participants The work on this paper has been supported by Czech Sci- received a list of photos both real and generated and their ence Foundation project GACR-19-22071Y and by Charles task was to select the generated ones. Average precision University grant SVV-260588. Source codes can be ob- per user was 0.65, while average recall was 0.39, so users tained from https://gitlab.mff.cuni.cz/dokoupipa/ligan-thesis/ performed slightly better than random guessing, which -/tree/recsys. LiGAN application can be accessed from can be considered as a success. http://gpulab.ms.mff.cuni.cz:7022/. Ability to recommend reasonable fillers should be fur- ther tested, but first empirical results seems promising as long as suspect’s appearance characteristics are suffi- References CVPR42600.2020.00813. [12] E. Richardson, Y. Alaluf, O. Patashnik, Y. Nitzan, [1] J. Mansour, J. Beaudry, N. Kalmet, M. I. Bertrand, Y. Azar, S. Shapiro, D. Cohen-Or, Encoding in style: R. C. L. Lindsay, Evaluating lineup fairness: a stylegan encoder for image-to-image translation, Variations across methods and measures, Law CoRR abs/2008.00951 (2020). URL: https://arxiv.org/ and Human Behavior 41 (2016). doi:10.1037/ abs/2008.00951. arXiv:2008.00951. lhb0000203. [13] C. E. Thomaz, G. A. Giraldi, A new ranking method [2] S. Clark, M. Erickson, J. Breneman, Probative value for principal components analysis and its applica- of absolute and relative judgments in eyewitness tion to face image analysis, Image and Vision Com- identification, Law and human behavior 35 (2010) puting 28 (2010) 902–913. doi:https://doi.org/ 364–80. doi:10.1007/s10979-010-9245-1. 10.1016/j.imavis.2009.11.005. [3] A. N. Bergold, P. S. Heaton, Does filler database [14] B. Brodén, M. Hammar, B. J. Nilsson, D. Paraschakis, size influence identification accuracy?, Law and Ensemble recommendations via thompson sam- Human Behavior 42 (2018) 227–243. pling: An experimental study within e-commerce, [4] L. Peška, H. Trojanová, Lineit: Similarity search in: IUI ’18, ACM, 2018, pp. 19–29. and recommendation tool for photo lineup assem- [15] J. Carbonell, J. Goldstein, The use of mmr, diversity- bling, in: Database and Expert Systems Appli- based reranking for reordering documents and pro- cations, Springer International Publishing, Cham, ducing summaries, in: SIGIR ’98, ACM, New York, 2019, pp. 199–209. NY, USA, 1998, pp. 335–336. [5] Y. Deldjoo, T. D. Noia, F. A. Merra, A survey [16] L. Malecek, L. Peska, Fairness-preserving group on adversarial recommender systems: From at- recommendations with user weighting, in: Ad- tack/defense strategies to generative adversarial junct Proceedings of the 29th ACM Conference networks, ACM Comput. Surv. 54 (2021) 35:1–35:38. on User Modeling, Adaptation and Personalization, doi:10.1145/3439729. UMAP ’21, Association for Computing Machinery, [6] W. Kang, C. Fang, Z. Wang, J. J. McAuley, Visually- New York, NY, USA, 2021, p. 4–9. doi:10.1145/ aware fashion recommendation and design with 3450614.3461679. generative image models, in: 2017 IEEE Interna- [17] H. Steck, Calibrated recommendations, in: RecSys tional Conference on Data Mining, ICDM 2017, ’18, ACM, 2018, pp. 154–162. New Orleans, LA, USA, November 18-21, 2017, IEEE Computer Society, 2017, pp. 207–216. doi:10. 1109/ICDM.2017.30. [7] Z. Yang, Z. Su, Y. Yang, G. Lin, From recommenda- tion to generation: A novel fashion clothing advis- ing framework, in: 2018 7th International Confer- ence on Digital Home (ICDH), 2018, pp. 180–186. doi:10.1109/ICDH.2018.00040. [8] Y. Shih, K. Chang, H. Lin, M. Sun, Compatibility family learning for item recommendation and gen- eration, in: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), AAAI Press, 2018, pp. 2403–2410. [9] S. Kumar, M. D. Gupta, c+ gan: Comple- mentary fashion item recommendation, CoRR abs/1906.05596 (2019). URL: http://arxiv.org/abs/ 1906.05596. arXiv:1906.05596. [10] P. Dokoupil, Generating synthetic data for an assembly of police lineups, Master’s thesis, Charles University, 2021. URL: https://dspace.cuni. cz/handle/20.500.11956/127394. [11] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehti- nen, T. Aila, Analyzing and improving the im- age quality of stylegan, in: 2020 IEEE/CVF Con- ference on Computer Vision and Pattern Recog- nition, CVPR 2020, Seattle, WA, USA, June 13- 19, 2020, IEEE, 2020, pp. 8107–8116. doi:10.1109/