=Paper=
{{Paper
|id=Vol-1892/paper1
|storemode=property
|title=Body Measure-aware Fashion Product Recommendations: Evaluating the Predictive Power of Body Scan Data
|pdfUrl=https://ceur-ws.org/Vol-1892/paper1.pdf
|volume=Vol-1892
|authors=Alexander Piazza,Jochen Süßmuth,Freimut Bodendorf
|dblpUrl=https://dblp.org/rec/conf/recsys/PiazzaSB17
}}
==Body Measure-aware Fashion Product Recommendations: Evaluating the Predictive Power of Body Scan Data==
Body measure-aware fashion product recommendations: evaluating the predictive power of body scan data Alexander Piazza Jochen Süßmuth Freimut Bodendorf University of Erlangen-Nuremberg University of Erlangen-Nuremberg University of Erlangen-Nuremberg Institute of Information Systems Chair of Computer Graphics Institute of Information Systems Lange Gasse 20 Cauerstraße 11 Lange Gasse 20 Nuremberg 90403 Erlangen 91058 Nuremberg 90403 alexander.piazza@fau.de jochen.suessmuth@fau.de freimut.bodendorf@fau.de ABSTRACT can make informed decisions [11]. Especially fashion purchase Fashion product consumer are faced with large and fast changing decisions are complex, as multiple aspects of the user are influecing product offerings. The fashion purchase decision process is complex, factors like the personality, emotion, or general fashion trends as as the consumer has to consider various influencing factors like cur- well as their physical appearance like the height [9], skin color, rent fashion trends, what fashion products fit to their personality, or body type [12]. The user has to decide not only based on what and what products fit to their physical appearance like hair colors fashion products she or he in general likes, but also e.g., what col- or body measures. Based on novel technologies, 3D body avatars ors fit his or her hair and skin color, and what cut of the clothes can be reconstructed from 3D or 2D data. From these avatars, body emphasizing or covers certain body parts. measures can be determined. The objective of this research is to Usually, recommender systems are applied to filter relevant prod- investigate the predictive performance of body measures extracted ucts for the visitors of online stores, leveraging techniques like col- from a 3D body scanner for predicting fashion item preferences. laborative filtering or content-based filtering. The central assump- Therefore, item preferences and body scans from 200 users were tion of collaborative filtering is, that users having demonstrated collected. From the body scans, 11 body measures are extracted and similar preferences in the past will have similar preferences in the integrated into a prediction model using Factorization Machines. future [3]. Therefore, conventional collaborative filtering based The results from a cross-validation show, that including body mea- recommender systems only consider the product preferences per surements significantly improves the prediction performance of user in the form of ratings, views, or purchases and consequently the recommendation model, especially in new user scenarios, when derive similarities between users on these data. However, due to the no information about the fashion product preferences of the active wide adoption of new technologies like social networks, or mobile user is known. phones by the users as well as novel technologies to store and pro- cessing of large datasets, rich side information about users, items, KEYWORDS and their interaction are available. Therefore, further research is needed to investigate mechanisms for integrating such rich side fashion product recommendation, body scan data, factorization information into recommendation systems, as well as to assess the machines, feature-driven recommendation impact of such side information on the prediction quality [18]. Based on novel technologies, information about the users’ phys- 1 INTRODUCTION ical appearance can be acquired. For instance, 3D models of bodies Fashion products have the highest turnover of all product cate- or faces can be constructed based on low-cost 3D scanners [21] [22], gories sold via e-commerce worldwide. Online fashion retailer offer or reconstructed from 2D photos [8] [5]. User studies indicate that their consumers large product ranges. For instance, the German they perceive 3D scanner as useful for online shopping in general online retailer Zalando indicates to have 150,000 products from [20], as well as in the fashion product context [12], as long as data 1,500 brands in their assortment 1 . protection is ensured. However, according to a recent literature Consumer behavior research indicates, that having a too broad review about apparel product recommendation research, there is a product offer can cause a choice overload for the consumer, what research gap regarding the potential of body scan measurements leads to a delayed or no final purchase decision [7], and can decrease to enrich consumer profiles in apparel recommendation [4]. the consumers’ satisfaction regarding their purchase decision [17]. The objective of this research is to investigate the predictive This effect has also been identified within fashion online purchase power of body measures for predicting fashion product preferences decisions [11]. To avoid choice overload, fashion online stores of individual users. Therefore, a dataset containing fashion product should limit the options for consumers in a way, so they are dis- preferences as well as body measurements extracted from body played with a sufficient product variety but in the same time, they scans are collected, and offline experiments are conducted to com- pare the impact of these measurements on the prediction accuracy. 1 https://www.zalando.de/presse zahlen-und-fakten/ In this paper, first the data collection and the resulting data set is ComplexRec 2017, Como, Italy. illustrated in Section 2, and preliminary results are demonstrated 2017. Copyright for the individual papers remains with the authors. Copying permitted in Section 3. Finally, in Section 4 the results are discussed as well for private and academic purposes. This volume is published and copyrighted by its as planned next steps for further research are illustrated. editors. Published on CEUR-WS, Volume 1892.. ComplexRec 2017, August 31, 2017, Como, Italy. A. Piazza et al. 2 DATA COLLECTION 2.1 Data collection process For data collection, volunteers were recruited from the School of Business at the University of Erlangen-Nuremberg. To obtain one rather large sample instead of two smaller ones, data collection was concentrated on only one gender. The decision was made to focus on female participants, as a higher variation in body shapes are measurement distribution expected as well as female participants are assumed to have stronger 100 opinions according to their fashion preferences. The data collection process consists out of two steps. In the first step, participants were scanned using a low-cost body scanner which creates three- dimensional polygonal mesh data. Based on the mesh data, 11 key body measurements per participant were extracted. In the second step, the participants filled out an online questionnaire. In the online questionnaire, the participants gave information about their height and weight as well as indicated the color of their hair, eyes, and skin. Also, they classified their body shape into one of eight shapes. Finally, every participant indicated their preference 50 towards 36 fashion products by answering the question, whether they would buy the displayed clothing or not on a seven-point Likert scale (1=do not want to buy it at all; 7= would buy it definitely). For the set of fashion products, upper an lower apparel that rather accentuate or mask certain body features were selected. 2.2 Resulting data-set arm breast hip belly back leg out leg in neckbreast back width waist calves In total, 200 persons participated in the survey and body scanning. The distribution of the eleven body measures are illustrated in Figure 3. From the 200 valid scans, the average age is 22.35 and the standard deviation 2.94 years. Body measures 3 EVALUATION Figure 1: Distribution of the eleven body measures extracted from the body scans (N=200) 3.1 Evaluation protocol For determining the prediction quality, the rating is converted from the seven-point Likert scale to a binary rating, indicating whether a consumer would buy the product or not, by transforming ratings ≤ 5 as not interested (=0), and ratings > 5 as interested (=1). In the following, the term rating prediction is therefore interpreted as a classification problem. During the evaluation, the data is split up in a test and training set using a 10-fold cross-validation, as it is illustrated in Figure 4. Each user is randomly assigned to one of the ten test and train user-sets. Furthermore, to simulate the new user scenarios, the 36 items are randomly divided into 24 test and 12 train item-sets. The split is based on a random selection of items, which is illustrated in Figure 3, where the darker gray bars indicate the test items belonging to the test-set. For each iteration through the ten folds, the algorithm is provided with all 36 ratings of all of Figure 2: Exemplary polygonal meshes resulting from the the users in the train user-set. Furthermore, to investigate the new body scans. user scenarios, the algorithm was provided with no, two, or five item ratings randomly selected from the train item-set. For all of the three scenarios, the ratings for the same 20 products of the test item-set are predicted. are used to aggregate the F-measures from the individual cross- For determining the prediction quality, the metrics precision validation results, which lead to diverging results. In this research, (Equation 1), recall (Equation 2), and the F-measure (Equation 3) are we use the formulation of the F-measure suggested by Forman and used. In machine learning research, various aggregation approaches Scholz (2010) which resulted in the least bias [2]. Body measure-aware fashion product recommendations ComplexRec 2017, August 31, 2017, Como, Italy. by the additional information. Another approach is to build mul- tidimensional models, also referred as contextual models, where the additional information is directly integrated into the prediction model [1]. Recently, especially tensor decomposition approaches 100 gained attraction, enabling the direct modeling of multi-modal infor- mation. In previous research, the focus was especially on the Tucker value decomposition [19], the Parallel Factor Analysis (PARAFAC) [6], and Pairwise Interaction Tensor Factorization (PITIF) [16], which 50 demonstrated high prediction qualities. However, the main draw- back of these methods is that their adaption to non-categorical factors is difficult and error-prone [14]. As an alternative, Factorization Machines (FM) were introduced which combine the advantages of the tensor decomposition ap- 0 proaches to make predictions in highly sparse and multi-modal conditions, with the ability of support-vector-machines (SVM) to 04 23 32 29 13 05 19 20 08 35 14 03 15 09 11 07 18 26 12 27 30 24 31 34 02 16 25 10 17 06 22 21 28 33 00 01 Items orderd by global preference be a general predictor [13]. With FM, the user, rating, and additional information can be modeled as feature vectors, having categorical Figure 3: All 32 fashion products ordered by global prefer- or continuous values. The target variable y can be a real-valued ence. The products in darker gray color are selected for the rating or a binary value [13]. FM have been demonstrated to be an test-set. effective approach to integrate contextual information, like mood, into movie recommender systems [15]. The key aspect of FM is, that the interactions between the input variables are not calcu- items lated directly, but a low-rank approximation is used. Within this paper, the FM model shown in Equation 4 is considered, which Fold 1 models binary interactions between the low-rank approximations Fold 2 V = hvi , v j i ∈ Rn×k . The variable k ∈ N+ 0 represents the number of latent variables. Appropriate values of k have to be determined Fold 3 empirically. On the one side, the value should be large enough so users train user-set Fold 4 relevant interactions in the data can be captured. On the other side, Fold 5 restricting the value of k, and therefore its expressiveness might lead to better generalization of the model. [13] Fold 6 Fold 7 n Õ n Õ Õ n ŷ(x) := w 0 + wi xi + hvi , v j ix i x j (4) Fold 8 i=1 i=1 j=i+1 Fold 9 In this paper, the reference implementation of FM within the test item-set train item-set Fold 10 test user-set LibFM library is used [14]. For learning models based on FM, op- timization models based on Stochastic Gradient Descent (SGD), Alternative Least Squares (ALS), and Markov-Chain Monte Carlo Figure 4: Illustration of the applied 10-fold cross-validation (MCMC) are proposed, and implemented in LibFM. We decided to schema. use the MCMC optimization, as this approach has the less hyperpa- rameter which have to be tuned. True Positive 4 RESULTS AND DISCUSSION precision = (1) True Positive × False Positive The resulting F-measure values are illustrated in Figure 5. The mod- True Positive els were built having latent variables values of k ∈ {16, 32, 64, 128}. precision = (2) As mentioned in subsection 3.1, three new user scenarios are inves- True Positive × False Negative tigated, in which no, two, or five ratings of the user are known. In 2 × True Positive the first scenario, the models having the measurement information F= (3) 2 × True Positive + False Positive + False Negative of the user have a considerable better performance than the models without this information. In this scenario, the best model without 3.2 Factorization Machines information has a F-value of 0.40, whereas the best model with In recommender systems research, various approaches were sug- F-value a value of 0.49. This clearly better performance shrinks gested to consider additional information besides the ratings of in case of the second scenario, where two items are known and users per item. The additional information can be integrated via vanishes in the last scenario, where five items are known. The pre- pre-filtering or post-filtering, where conventional recommendation cision of the models having body measurement information is in all algorithms are applied and the input data or the results are filtered cases higher, but at the same time, the recall is lower compared to ComplexRec 2017, August 31, 2017, Como, Italy. A. Piazza et al. 0−items 2−items 5−items [7] Sheena S. Iyengar and Mark R. Lepper. 2000. When choice is demotivating: Can one desire too much of a good thing? Journal of personality and social psychology 0.7 79, 6 (2000), 995. no measure [8] Luo Jiang, Juyong Zhang, Bailin Deng, Hao Li, and Ligang Liu. 2017. 3D Face with measure Reconstruction with Geometry Details from a Single Image. (2017). http://arxiv. org/pdf/1702.05619 0.6 [9] Michelle R. Jones and Valerie L. Giddings. 2010. Tall women’s satisfaction with the fit and style of tall women’s clothing. Jnl of Fashion Mrkting and Mgt 14, 1 (2010), 58–71. DOI:http://dx.doi.org/10.1108/13612021011025438 [10] Jeong Yim Lee, Cynthia L. Istook, Yun Ja Nam, and Sun Mi Park. 2007. Compari- F−measure 0.5 son of body shape between USA and Korean women. Int Jnl of Clothing Sci & Tech 19, 5 (2007), 374–391. DOI:http://dx.doi.org/10.1108/09556220710819555 [11] Komal Nagar and Payal Gandotra. 2016. Exploring Choice Overload, Internet Shopping Anxiety, Variety Seeking and Online Shopping Adoption Relationship: 0.4 Evidence from Online Fashion Stores. Global Business Review 17, 4 (2016), 851– 869. DOI:http://dx.doi.org/10.1177/0972150916645682 [12] Jaekyung Park, Yunja Nam, Kueng-mi Choi, Yuri Lee, and Kyu-Hye Lee. 2009. Apparel consumers’ body type and their shopping characteristics. Jnl of Fash- ion Mrkting and Mgt 13, 3 (2009), 372–393. DOI:http://dx.doi.org/10.1108/ 0.3 13612020910974500 [13] Steffen Rendle. 2010. Factorization Machines. In Proceedings of the 2010 IEEE International Conference on Data Mining (ICDM ’10). IEEE Computer Society, Washington, DC, USA, 995–1000. DOI:http://dx.doi.org/10.1109/ICDM.2010.127 0.2 [14] Steffen Rendle. 2012. Factorization Machines with libFM. ACM Trans. Intell. Syst. 16 32 64 128 16 32 64 128 16 32 64 128 Technol. 3, 3 (2012), 57:1–57:22. [15] Stefen Rendle, Zeno Gantner, Christoph Freudenthaler, and Schmidt-Thieme Lars. number of latent variables 2011. Fast Context-aware Recommendations with Factorization Machines: 34th International ACM SIGIR Conference on Research and Development in Information Figure 5: Resulting predictive performance per new user sce- Retrieval ; July 24 - 28, 2011, Beijing, China. ACM, New York, NY. [16] Steffen Rendle and Lars Schmidt-Thieme. 2010. Pairwise Interaction Tensor nario. Factorization for Personalized Tag Recommendation. In Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM ’10). ACM, New York, NY, USA, 81–90. DOI:http://dx.doi.org/10.1145/1718487.1718498 the model having no measure information. In total, the empirical [17] Barry Schwartz. 2016. The paradox of choice: Why more is less (revised edition ed.). results indicate, that body measures possess significant predictive [18] Yue Shi, Martha Larson, and Alan Hanjalic. 2014. Collaborative Filtering Beyond power in the context of apparel recommendation, especially in the User-Item Matrix: A Survey of the State of the Art and Future Challenges. ACM Comput. Surv. 47, 1 (2014), 3:1–3:45. DOI:http://dx.doi.org/10.1145/2556270 new users scenarios, where no previous user product preference [19] Ledyard R. Tucker. 1966. Some mathematical notes on three-mode factor analysis. information are available. In practice, one situation could be when Psychometrika 31, 3 (1966), 279–311. DOI:http://dx.doi.org/10.1007/BF02289464 consumers are getting scanned in a store the first time. [20] Christian Zagel and Jochen Süßmuth. 2013. Nutzenpotenziale maßgetreuer 3D Avatare aus Low-cost Bodyscannern. HMD : Praxis der Wirtschaftsinformatik 50 Further research is needed to identify which specific body mea- (2013), 48–57. DOI:http://dx.doi.org/10.1007/BF03342068 sures have predictive power and which rather introduce noise to [21] Christian Zagel, Jochen Süßmuth, and Freimut Bodendorf. 2013. Automatische the model. From the eleven measures used, it can be expected, that Rekonstruktion eines 3D Körpermodells aus Kinect Sensordaten. In Wirtschaftsin- formatik Proceedings 2013. Paper 35. http://aisel.aisnet.org/wi2013/35 some measures like the hip or belly measure give more information [22] Michael Zollhöfer, Michael Martinek, Günther Greiner, Marc Stamminger, and to the model than for example the calves measure. Another possi- Jochen Süßmuth. 2011. Automatic Reconstruction of Personalized Avatars from 3D Face Scans. Computer Animation and Virtual Worlds (Proceedings of CASA bility to integrate body scan information is to assign each scan to a 2011) 22, 2-3 (2011), 195–202. distinctive body shape class [10], or use the principal components from the morphable model approach [22]. In addition, the impact of the hair and skin color nuances on the predictive performance will be investigated in further research. REFERENCES [1] Gediminas Adomavicius and Alexander Tuzhilin. 2008. Context-aware Rec- ommender Systems. In Proceedings of the 2008 ACM Conference on Recom- mender Systems (RecSys ’08). ACM, New York, NY, USA, 335–336. DOI:http: //dx.doi.org/10.1145/1454008.1454068 [2] George Forman and Martin Scholz. 2010. Apples-to-apples in Cross-validation Studies: Pitfalls in Classifier Performance Measurement. SIGKDD Explor. Newsl. 12, 1 (2010), 49–57. DOI:http://dx.doi.org/10.1145/1882471.1882479 [3] David Goldberg, David Nichols, Brian M. Oki, and Douglas Terry. 1992. Using Collaborative Filtering to Weave an Information Tapestry. Commun. ACM 35, 12 (1992), 61–70. DOI:http://dx.doi.org/10.1145/138859.138867 [4] Congying Guan, Shengfeng Qin, Wessie Ling, and Guofu Ding. 2016. Apparel recommendation system evolution: an empirical review. International Journal of Clothing Science and Technology 28, 6 (2016), 854–879. DOI:http://dx.doi.org/10. 1108/IJCST-09-2015-0100 [5] Peng Guan, Alexander Weiss, Alexandru O. Balan, and Michael J. Black. 2009. Estimating human shape and pose from a single image. In Computer Vision, 2009 IEEE 12th International Conference on. 1381–1388. [6] Richard Harshman. 1970. Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics 16 (1970).