CCS CONCEPTS

August

Rethinking Conventional Collaborative Filtering for Recommending Daily Fashion Outfits

Anders Kolstad, O¨ zlem O¨ zg o¨bek, Jon Atle Gulla

andekol@stud.ntnu.no fozlem.ozgobek,jon.atle.gullag@ntnu.no jon.atle.gullag@ntnu.no 1

Recommender Systems, Machine Learning, Fashion Recommenda-

Simon Litlehamar

simon.litlehamar@accenture.com 0 0 Accenture AS , Fornebu , Norway 1 Norwegian University of Science and Technology , Trondheim , Norway 2 tion, Collaborative Filtering, Internet of ings

2017

27 2017 22 29

A conventional collaborative ltering approach using a standard utility matrix fails to capture the aspect of matching clothing items when recommending daily fashion outts. Moreover, it is challenged by the new user cold-start problem. In this paper, we describe a novel approach for guiding users in selecting daily fashion outts, by providing outt recommendations from a system consisting of an Internet of ings wardrobe enabled with RFID technology and a corresponding mobile application. We show where a conventional collaborative ltering approach comes short when recommending fashion outts, and how our novel approach-powered by machine learning algorithms-shows promising results in the domain of fashion recommendation. Evaluation of our novel approach using a real-world dataset demonstrates the system's eectiveness and its ability to provide daily outt recommendations that are relevant to the users. A non-comparable evaluation of the conventional approach is also given.

CCS CONCEPTS

•Information systems →Evaluation of retrieval results; Web applications; •Computing methodologies →Classication and regression trees;

INTRODUCTION

Selecting an outt every morning is a task that many people struggle with, oen due to time constraints or the feeling of having nothing to wear. In [ 20 ], Pruit argues that our selection of an outt inuences other people’s impressions of us, and that it is of high importance to our cultural lives. Moreover, the average Norwegian has 359 unique garments in their closets [ 15 ]. is suggests that people need guidance and suggestions for selecting an outt from their clothing haystack each morning.

Klepp and Laitala found that 20% of clothes bought by Norwegians were never or rarely used [ 15 ]. A reason for this might be that they did not actually like the item they bought or that the item did not match any existing clothing items in their closet. is information is very valuable to the clothing retailer. With such information, the retailer can map the customer’s taste prole and generate targeted ads for the customer, reducing the number of unnecessary purchases, and increasing the number of satised customers.

Generating such outt suggestions and targeted ads can be made a reality by recommender systems. A recommender system tries to predict the rating value of a user-item combination, where the user has indicated their ratings for other items in the past [ 1 ]. e system tracks these ratings by receiving user feedback. User feedback is classied into explicit and implicit. Explicit feedback is when the user explicitly rates an item on, e.g., a 5-star scale. Implicit feedback records other user interactions, e.g., how long a user spends on a web page on a certain topic. With the retrieved ratings by user feedback, the recommender system can predict the user’s ratings of new items, and suggest the items with a high predicted rating. One of the most successful recommendation technique is called collaborative ltering (CF) [ 22 ]. CF recommends items on the assumption that users who have interacted in similar ways before, will have common interests in the future as well. Conventional CF bases its recommendations from a matrix called the utility matrix, which captures every rating value for the user-item combinations known to the system [ 17 ]. Table 1 shows an example of such a matrix, consisting of user-item combinations of users and movies. A known challenge in CF is called new user cold-start problem. is challenge is about how to recommend items to new users that have not rated any items yet. Suppose we were to introduce a fourth user in Table 1. e user-item combinations for this fourth user would all have ’?’ as a value. How to then recommend items to this user is not an easy task.

Recommending individual items, such as in Table 1, is what nearly all recommender systems are focusing on. In recent years, recommendations of collections, such as music playlists [ 12, 13, 23 ], has gained a lot of aention. Hansen and Golbeck identied some key aspects that aects the recommendation of collections [ 10 ]. One aspect that especially applies to outt recommendation is the co-occurrence interaction eect. Matching clothing items (items that go well together) will have a positive interaction eect when they co-occur together, and will therefore generate a more relevant outt recommendation to the user.

In [ 16 ], we proposed Connected Closet, a system consisting of an Internet of ings wardrobe enabled with an RFID reader, so that clothing items with RFID tags can be checked in and out of the closet, generating implicit feedback on clothing items the user likes. Using a mobile application, the user can give explicit feedback on outt he likes, and receive daily outt recommendations based on outside temperature and wardrobe inventory. In this paper, we describe an implementation of the proposed system. We show where a conventional CF approach comes short in terms of the new user cold-start problem and where it fails to capture the cooccurrence eect between items. Moreover, we propose a novel CF approach that mitigates the shortcomings of the conventional approach and implement the novel approach into the proposed system. Evaluations using a real-world dataset are performed on both approaches.

e main contributions of this paper are: (1) A novel CF approach for recommending daily fashion outts. (2) An accuracy evaluation of the approach using dierent classication algorithms.

is work is a joint eort between the Smartmedia program1 at NTNU2 and Accenture Norway3. e Smartmedia program is researching mobile context-aware recommender systems. While, in this work, Accenture’s main goal is to research modern technology for building web-based information systems and to keep track of technology key trends, such as Internet of ings.

e rest of the paper is structured as follows. In Section 2, we give an overview of related work, followed by a description of the proposed system in Section 3. Section 4 introduces the concept of outt recommendation. e recommendation approaches are described in Section 5 and Section 6. Evaluation of the approaches is given in Section 7. We conclude with a summary and discuss future work in Section 8. 2

RELATED WORK

ere are not many systems addressing daily outt recommendations from either an Internet of ings wardrobe or a virtual wardrobe. In this section, we give an overview of the state of the art, identify gaps in these works, and show where our system diers from past work and how it complements previous work.

Dumeljic et al. propose a virtual wardrobe implemented as a mobile application [ 6 ]. By explicitly stating the user’s current mood, the user can add clothing items that best t the mood, to the virtual inventory. In [ 6 ], the outt recommendation approach is not described and has not been implemented in the system. Moreover, a user study of ten people was conducted, where they concluded

1hp://research.idi.ntnu.no/SmartMedia/ 2hp://www.ntnu.edu/ 3hps://www.accenture.com/no-en

that mood is a motivator for selecting outts, but that users would be more invested in the system if it also considered weather.

In [ 19 ], Limaksornkul et al. also propose a mobile application used as a virtual wardrobe. ey try to solve the problem of eciently managing closet inventory and guiding users in selecting clothes based on the user’s fashion style, trends, their friends’ styles, weather, and occasion. In the mobile application, the users can manage their clothes, and receive statistical-based, weather-based, and event-based clothing suggestions. e statistical-based recommendation engine is preliminary and is the only approach that takes user’s preferences into account. Moreover, no evaluation of the system is given.

A smart wardrobe system is proposed by Goh et al. in [ 9 ]. Here, garments aached to RFID tags can be scanned in the user’s closet. Using a system application, the user can get clothing recommendations based on the user’s mood, preferred color or and occasion.

Yu-Chu et al. propose a recommendation system using a modied Bayesian network for generating outt recommendations from the user’s clothing items enabled with RFID tags stored in a smart wardrobe [ 24 ]. By taking weather, season, and occasion into consideration, the system rst select a top, and then nds booms which match the selected top. e process of selecting a boom depend on user feedback rating the combination. An experiment on 10 users concluded that the proposed system gave more satised users than a baseline using a basic Bayesian network without user feedback.

An important aspect that needs to be mentioned is that virtual wardrobes are heavily dependent on explicit user feedback, while the Internet of ings wardrobes can make use of implicit user feedback as well.

As seen in the works above, most of the recommender systems are preliminary, and does not contain clear steps for the recommendation algorithm. e ones that do have an implemented recommender system only have user studies and are lacking accuracy evaluation of their recommendations. In this paper, we describe a fully implemented prototype, using similar architecture to [ 9 ] and [ 24 ], enabled with a novel recommendation approach evaluated on a real-world dataset. To the best of our knowledge, our novel approach is a completely unique way of generating recommendations using CF. is is mostly because the majority of CF recommender systems today, are heavily based on the utility matrix [ 22 ], which is not present in our approach. 3

SYSTEM OVERVIEW

In this section, we describe the architecture of the smart wardrobe proposed in [ 16 ]. Moreover, we explain how the users receive recommendations through the mobile application which is a part of the architecture. We built and implemented a prototype of the whole system and created a short demonstration video available at hps://goo.gl/rZBZqo. 3.1

Architecture

RFID

Closet Mobile Application

with the system. Such clothing items can be manually scanned through the RFID reader. When a scanning occurs, a message gets broadcasted to multiple services deployed in the Cloud. ese services include—among others—a recommender service and an inventory service. By communicating with each other and a thirdparty Weather API, they provide outt recommendations to the Mobile Application. 3.2

Mobile Application

When the user opens the mobile application, he gets displayed a recommendation for an outt that suits today’s temperature and is inside the user’s closet. By swiping through a list, the user is displayed multiple recommended outts. Moreover, the user can modify the recommended outt by using the arrows that corresponds to each clothing item. By clicking a Save buon, the user gives an explicit positive feedback on the displayed outt, indicating that the user has this outt as one of his favorites.

OUTFIT RECOMMENDATION

We dene an outt, denoted o, as a tuple of two items, c1 and c2, where c1 is a top and c2 is a boom. Although clothing outts can also contain more, or less, than two items, the current version of our system only addresses outts of two items. is is with the assumption that most outts comprise of one top and one boom. Recommendation of outts consisting of a one-piece, e.g., a dress, or with additional accessories, is planned for later research. 4.1

Inclusion Criteria

To ensure that the user receives outt recommendations that are relevant for a given day, we dene an inclusion criteria for the clothing items that can be part of a recommended outt. e inclusion criteria are dened as follows: (1) Clothing item must be inside the closet. e status of the item is determined by the latest RFID tag scan. (2) Clothing item must be suitable for current weather.

Items are stored in a database with a suitable temperature range property. is is the range of temperatures a clothing item is comfortable to wear. e outside temperature at time of recommendation, must be inside the item’s suitable temperature range.

All clothing items that are owned by a user ui and ts the inclusion criteria is represented as a set I ¹ui º. All outt combinations that can be generated from I ¹ui º are added to the set O¹ui º. 4.2

User Ratings

e favored outts indicated (explicitly or implicitly) by the user, are stored in the system using unary positive-only values. Outts that have not been rated are outts that the users either do not like or have not been seen or used together from the user’s closet C¹ui º. Not rated outts will be referred to as ’neutral’ outts in the rest of this paper. 4.3

Recommended Outts

e list of recommended outts that the user receives in the mobile application is generated by the system’s recommender service that returns the set R¹ui º of recommended outts for the user. 4.4

Notation

All the notations dened in this section are summarized in Table 2. ese notations will be used throughout the paper.

RETHINKING CONVENTIONAL CF

In this section, we introduce an approach for outt recommendation using a conventional utility matrix for collaborative ltering. We discuss where this approach comes short, and introduce a novel approach for outt recommendation using an outt-item matrix. 5.1

Conventional CF Approach

An obvious solution to recommending fashion outts is to map the users’ favorite outts onto a utility matrix U , consisting of users and outts. en, using a neighborhood model, one could predict new outts for users by comparing the user’s interaction paern with users with same interaction paern. To recommend the daily outts R¹ui º, we need to match the predicted outts with the items that t the inclusion criteria I ¹ui º, and lter out outts that do not contain only such items. e approach is illustrated in Figure 3.

e rst problem with this approach is that it can only recommend outts that have been favored by other users. In other words, it cannot generate completely new outts, and therefore fails to capture the co-occurrence eect between individual items. Another problem with this approach is that it is challenged by the new user cold-start problem. Users who have not favored any outts or checked out any items, cannot receive recommendations. Lastly, privacy is becoming a huge concern in recommender systems [ 2, 3 ], and in this approach, we store all the users’ ratings in one centralized matrix, causing a huge risk for the users’ privacy. 5.2

Novel Outt-Item Matrix Apprach

By basing our recommendations on the idea that users that have similar items in their closets will also have similar taste in outts, we propose a novel approach where we rethink the conventional approach by completely transforming the utility matrix. In Figure 4, we create a matrix Z , where the columns represent outts, and the rows represent the clothing items that compose the outt. Each outt is associated with a weight w. is weight is the number 0 un o1 · · · of users who have favored an outt. Using Z and W , we train a classier using a classication model. Outts that have been favored by users and have an associated weight above 0 will be classied as ’positive’, while outts with an associated weight of 0 will be classied as ’neutral’. When the model has been trained, we generate all the possible outt combinations O¹ui º, of the items that t the inclusion criteria for the given user ui . By using the classier, we can now recommend the outts that are classied as ’positive’ to the user R¹ui º.

e advantages of this approach are that it captures the cooccurrence interaction eect between two clothing items. is is because it considers the clothing items that an outt is composed of, instead of just looking at the outts as a whole. Moreover, it is not challenged by the new user cold-start problem because we assume that people that own similar clothing items will have same taste in outts as well. Lastly, this approach has a huge advantage in terms of user privacy, because it does not need to store the user-item combinations in one centralized matrix.

In Figure 5, we give an example of a possible recommendation pipeline that can occur in our system using the novel approach. To the le is the set of all the clothing items owned by the user. By inpuing this and the current outside temperature at the user’s location, the function f1 lters out and generates possible outts for recommendation wrt. the inclusion criteria. ese outts are then inpued to f2, which follows the same steps as described in Figure 4. In the end of the pipeline, we get the generated set of recommended outts that is displayed in the mobile application.

Although not implemented in our system, this approach could be easily used by a clothing retailer to generate targeted ads by inpuing clothing items from the retailer together with the user’s clothing items in C¹ui º. en, the clothing retailer could recommend new outts that the users might want to buy, or individual items that would make a great outt with clothing items already owned by the user. 0 w1 1 .

.. CA wk Classification model I(u ) i

Filter function 25 ℃ f1 o1 o2 f2 o1 o1 In this section, we present the recommendation model for our novel approach using dierent classication models. e chosen classication models are widely known and perform well in many domains [ 4, 5 ]. e classication models also include a baseline classier. Moreover, we introduce some neighborhood models that are applied with the conventional approach. 6.1

Classication Models

Na¨ıve Bayes. Assuming the aributes of the samples are conditionally independent and given the sample’s class labels, Na¨ıve Bayes assigns a test sample the class label Y by maximizing the numerator in this equation [ 18 ]:

P ¹Y j X º =

P ¹Y º Îdi=1 P ¹Xi j Y º

P ¹X º ; where X is a set of d aributes.

Adaptive Boosting (AdaBoost). Over the recent years, classication techniques known as ensemble methods have gained a lot of aention. One of the most popular ones is AdaBoost. It aggregates over a set of weak learners ht ¹x º that tends to perform slightly beer than a random classier. e nal classier H ¹x º is then obtained by ensembling the weak learners by a weighted majority voting scheme using this equation [ 7 ]:

H ¹x º = sign

αt ht ¹x º ; T Õ t =1 where αt is the assigned weight for each weak learner.

To pick the weak learners, each training sample is associated with a weight indicting its importance. AdaBoost will then pick its weak learners in a forward stage-wise manner by focusing on predicting the high-weight samples correctly.

Gradient Boosting. Another popular ensemble method that relies on a set of weak learners is called Gradient Boosting. It follows the same fundamental idea as AdaBoost, but instead of focusing on the sample weights when picking its weak learners, it focuses on gradients [ 8 ].

Uniform. As a baseline, we use a classier that generates class predictions uniformly at random. 6.2

Neighborhood Models

To predict the ratings of the user-outt combinations in the matrix U , given in Figure 3, we apply the user-based neighborhood model [ 1 ]. is model predicts user ratings by nding users that have rated similar outts. To nd similar users, we can apply dierent similarity measures. In our model, we apply Jaccard (JAC) and cosine similarity (COS) as dened by Equation 3:

Sim J AC ¹A; Bº = jA \ B j SimCOS ¹A; Bº = A B (3) jA [ B j jjAjj jjB jj Aer user similarities have been calculated we can predict the ratings rˆui of unrated outts using this formula: rˆui = Ív Sim¹u; vºrvi Ív jSim¹u; vºj (4) 6.3

Ranking Model

To rank the outts that are predicted to the user in R¹ui º, using the novel approach, we assign each prediction of an outt oj to a ranking score equal to the classier’s probability of the class label being ’positive’ P ¹wj > 0 j oj º. It should be noted that this is not a personalized ranking model, but as seen from our results, it performed well for each individual user.

e conventional approach does not use classication models, so the probability of the predicted class label is not available. Instead, the outts are ranked according to the predicted rating calculated using the similarity measures. (1) (2)

EXPERIMENTS

In this section, we describe the seing for how our experiment was performed. We give a detailed description of the dataset that was used and present the results of the dierent models that were evaluated. e main goals of the experiments are to demonstrate the eectiveness of the system and to compare and select the best classication model for our system. 7.1

Dataset

e dataset is scraped from Polyvore.com5. Polyvore is a social media site where users can create clothing outts by matching individual clothing items. Other users can then ’like’ these outts by a clicking a ’like buon’.

From the available outts at Polyvore, we rst gathered the most liked outts from the last 3 months. For these outts, we ltered the outts so that they only contained a top and a boom. en, we collected other outts that these items also were a part of, and ltered them. Lastly, we gathered all the user likes for each of the outts we had gathered. Table 3 describes the size of the dataset.

From the gathered dataset, we have 260 outts that are classied as ’positive’ and 5,917 that are classied as ’neutral’. is means that the dataset has an imbalance approximately of 23 to 1.

In total, there are 158 individual clothing items in the dataset. is means that the feature vectors used in the classication models will be relatively sparse binary vectors of 158 dimensions. 7.2

Evaluation Methods

To evaluate our novel approach, we iterated through the following procedure for all users with at least 20 outt likes: For all of the user’s favorite outts, we hide each of the user’s ground-truth favorite outts from the system by decreasing the outts’ corresponding weights in W by 1. en, we train the classication model using Z and W . Moreover, with the assumption that a user only own items that are part of the items the user likes, we generate outt combinations, assuming all of the items in C¹ui º ing the inclusion criteria. We then compared the predicted class labels of the generated outts combinations to the true favorite outts of the user. We also ran the procedure a second time, but now by randomly removing 50% of the users’ tops and booms in C¹ui º. is was done to simulate outt recommendations from a half empty closet. In Table 4, we summarize some statistics for the test sets that was generated by running these methods. As seen in this table, there are—on average—quite many outts that are being classied for each user O¹ui º, compared to the true number of the user’s favorite outts O¹ui ºT P .

To reduce the dimensionality of the samples and to detect items that are interrelated, the multivariate analysis technique called principal component analysis was applied to the samples before training the models [ 14 ]. e reduction is done by transforming to a new set of uncorrelated features ordered so that the rst ones retain most of the original variation.

For evaluating the conventional approach using the dierent neighborhood models, we rst randomly removed 30% of the user likes from the utility-matrix. en, we predicted all outt likes for each user, and ltered them out wrt. I ¹ui º using the same assumption above. e recommended outts were then compared to the true outt likes. If we look at the task of recommending the outts as retrieving all relevant items (outts) from a collection of outts separated into the two classes; relevant and not relevant, we can apply the popular accuracy metrics from information retrieval systems. In our case, we say that the relevant outts are the ones classied as ’positive’, and the not relevant are the outts classied as ’neutral’. en, we can use a popular metric known as Recall. It measures the ratio of relevant items retrieved to the number of all relevant items available [ 11 ]:

Recall = jrelevant items retrievedj

jall relevant itemsj

In this paper, we also report Recall@N, which is the Recall in a ranked list just considering the N rst elements. We compute Recall and Recall@N by averaging over the result for each user ui .

A way to graphically display the tradeo between the true positive rate and the false positive rate, is known as a receiver operating characteristic (ROC) curve. e true positive rate is the same as Recall, and the false positive rate is the ratio of non-relevant items retrieved to the number of all non-relevant items available. e ROC curve is great to compare the performance dierence between classiers, where the best classiers tend to be located in the upper le corner of the diagram. e classiers that performs best on average will have a large area under the ROC curve (AUC) [ 11 ].

To evaluate the ranking via utility, we sum the utility of an outt j to a user u over a ranked recommended list of size L. By summing over this value for each user, we obtain the R-score as follows [ 1 ]: m R-score = Õ

Õ u=1 j 2Iu;vj L max fru j ; 0g 2¹vj 1ºα ; where vj is the rank of outt j and ru j is the ground-truth rating of outt j. α is the half-life, set to 5 in our experiments. e higher the R-score is, the true favorite outts for each user tend to appear in the top of the ranked list. (5) (6)

Results and Discussion

In this section, we present our results and discuss some insight we obtained while running the experiments. By the end of this section we will have answered the following questions:

Q1. How do the dierent classication models compare using our novel approach? Q2. How does closet size aect the recommendation results? Q3. To what extent can the conventional approach be used to recommend new outts to the users? e evaluation method for the novel approach was performed using the classication models in Section 6.1. For Na¨ıve Bayes the best conguration was seing a prior probability for the ’neutral’ class label to 0.99 and a 0.01 prior probability for the ’positive’ class. is was mostly due to the 23 to 1 imbalance in the dataset. AdaBoost gave the best result using decision trees as weak learners and with a learning rate of 1.0. Gradient Boosting performed best with similar congurations.

In Table 5, we report AUC, Accuracy and Recall for the predicted class labels for all of the outts that were tested when simulating a full closet. In the right-hand side of the table, we also report the R-score and Recall@N in a ranked list of L outts. Because each user has dierent numbers of clothes in their closet, every user is recommended a ranked list of various lengths of L. e best performing model in each category is highlighted by underlining its result. As seen in the table, Gradient Boosting and AdaBoost are the Naive Bayes Gradient Boosting AdaBoost Uniform 0.8 1.0 dominating models in all categories. On average and overall, Gradient Boosting performs best, while in a top-L ranked list, AdaBoost performs slightly beer. For N > 5, Gradient Boosting was—at maximum—only .006 points beer than AdaBoost in Recall@N. In terms of the R-score, AdaBoost is superior to Gradient Boosting. Because of this, we conclude that AdaBoost is the model yielding highest utility to the users.

In Figure 6, we plot a ROC curve for the dierent models used to generate a single ranked list of user-outt pairs. is type of ROC curve is sometimes referred to as a global ROC curve [ 21 ]. As indicated by the gray doed line, AdaBoost is the best model at a false positive rate at 20%, predicting 86% of the users’ favorite outts. As the false positive rate increase, Gradient Boosting becomes slightly superior to AdaBoost. On average, Gradient Boosting and AdaBoost dominates the two other models with an AUC of .864 and .885, respectively. Na¨ıve Bayes yields a satisfactory AUC of .704, while from the Uniform model we got an expected AUC of .500.

e high values of AUC and R-score are a strong indication that the non-personalized ranking model performs quite well and even beer than expected.

s o itn60 a d n e40 m m o c e20 R #0

Favorite outfits

Novel outfits

Figure 7 shows the distributions of outt recommendations in a top-20 list recommended to the users with at least 20 outt likes. In total, 196 unique outts were recommended to the users, where 33 of them were novel outts—never favored by any users in the past. is shows that a wide range of outts end up in the users’ recommended top lists.

Experiment on a half empty closet resulted in no change in terms of overall Recall, and at most, a .005 decrease in AUC, and for this reason, we do not report any results beyond this. Besides the fact that few clothing items will result in fewer outt recommendations, we conclude that closet size has lile eect on the recommendations.

In Table 6, results from evaluation of the conventional approach is given. e table shows Recall@N in a ranked list of M outts. Because M is much lower than L, we only report up to N = 5 (as 1.0 0.8 e t a R0.6 e v ii t s o P e0.4 u r T 0.2 0.0 0.0 0.2 0.4 0.6

False Positive Rate

Recall@5 .050 .250 opposed to N = 20 in evaluation of the novel approach). Note that the results are not comparable to the results in Table 5, as they are derived using an approach that is fundamentally dierent. e best performing model is highlighted with underlined results. As the numbers indicates, the approach generates new outt recommendations to the users at with a satisfactory accuracy. However, these outt recommendations are—as argued in Section 5—only outts that have been composed and favored by other users in the past. erefore, we conclude that this approach is insucient when it comes to recommending novel and personalized daily outts. 8

CONCLUSION AND FUTURE WORK

We have introduced a novel approach for recommending daily fashion outts from a smart closet. Our novel approach mitigate a wide range of challenges faced by a conventional approach that tries to recommend daily fashion outts. Evaluation of our novel approach demonstrates the method’s eectiveness, and its ability to provide users with accurate and novel outt recommendations.

e results from the evaluation helped us select which model to deploy in the system. R-score, AUC, and Recall@N are the most useful measures regarding each individual user. Since, AdaBoost achieved the highest R-score and AUC, it was chosen as the main classier and implemented with the novel approach in the recommender service deployed in the cloud. It should be noted that Gradient Boosting achieved slightly beer results in Recall@N, but we regard this dierence as insignicant and conclude that

AdaBoost is indeed the best t for our system.

A non-comparable evaluation of the conventional approach was performed to see to what extent it could recommend daily outts. e accuracy results are acceptable, but due to the approach’s many challenges, it cannot be considered as an ecient method for recommending daily outts.

Although we have demonstrated the system’s performance using a real-world dataset, a full scale evaluation using data gathered from physical clothes enabled with RFID tags is planned for future work. e current state of the system should be considered as an early prototype and is premature for such a full scale evaluation. Because of this, these plans are preliminary and we consider other research topics to be more important at the current stage. ese topics include content-based outt recommendation and recommendation of garments to be recycled or donated. With these research topics, we intend to incorporate additional contextual factors such as season, user’s occasion, and user’s body type.

ACKNOWLEDGMENTS is work is an extension to a prototype of the proposed system initially developed during an internship at Accenture. e authors would like to thank everyone involved in the internship for their contributions prior this work.

e authors would also like to thank everyone at Accenture who has provided valuable feedback on this research.

[1] Charu

Aggarwal . 2016 . Recommender Systems: e Textbook (1st ed.). Springer Publishing Company, Incorporated.

[2]

Arnaud

Berlioz , Arik Friedman, Mohamed Ali Kaafar, Roksana Boreli, and

Shlomo

Berkovsky . 2015 . Applying Dierential Privacy to Matrix Factorization . In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys '15) . ACM, New York, NY, USA, 107 - 114 .

[3]

Smriti

Bhagat , Udi Weinsberg, Stratis Ioannidis, and Nina Ta. 2014 . Recommending with an Agenda: Active Learning of Private Aributes Using Matrix Factorization . In Proceedings of the 8th ACM Conference on Recommender Systems (RecSys '14) . ACM, New York, NY, USA, 65 - 72 .

[4]

Rich

Caruana and

Alexandru

Niculescu-Mizil . 2006 . An Empirical Comparison of Supervised Learning Algorithms . In Proceedings of the 23rd International Conference on Machine Learning (ICML '06) . ACM, New York, NY, USA, 161 - 168 .

[5] omas G. Dieerich. 2000 . Ensemble Methods in Machine Learning . In Proceedings of the First International Workshop on Multiple Classier Systems (MCS '00) . Springer-Verlag, London, UK, UK, 1 - 15 .

[6]

Bojana

Dumeljic , Martha Larson, and

Alessandro

Bozzon . 2014 . Moody Closet: Exploring Intriguing New Views on Wardrobe Recommendation . In Proceedings of the First International Workshop on Gamication for Information Retrieval (GamifIR '14) . ACM, New York, NY, USA, 61 - 62 .

[7]

Yoav

Freund and

Robert E

Schapire . 1997 . A Decision-eoretic Generalization of On-Line Learning and an Application to Boosting . J. Comput. Syst. Sci. 55 , 1 (Aug. 1997 ), 119 - 139 .

[8] Jerome

Friedman . 2000 . Greedy Function Approximation: A Gradient Boosting Machine . Annals of Statistics 29 ( 2000 ), 1189 - 1232 .

[9]

K. N.

Goh ,

Y. Y.

Chen , and

E. S.

Lin . 2011 . Developing a smart wardrobe system . In 2011 IEEE Consumer Communications and Networking Conference (CCNC) . 303 - 307 .

[10] Derek

Hansen and Jennifer

Golbeck . 2009 . Mixing It Up: Recommending Collections of Items . In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '09) . ACM, New York, NY, USA, 1217 - 1226 .

[11] Jonathan

Herlocker , Joseph A. Konstan , Loren G. Terveen, and John T. Riedl. 2004 . Evaluating Collaborative Filtering Recommender Systems . ACM Trans. Inf. Syst . 22 , 1 (Jan. 2004 ), 5 - 53 .

[12] Kurt

Jacobson

, Vidhya Murali, Edward Newe,

Brian

Whitman , and

Romain

Yon . 2016 . Music Personalization at Spotify . In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys '16) . ACM, New York, NY, USA, 373 - 373 .

[13] Dietmar

Jannach

, Lukas Lerche, and

Iman

Kamehkhosh . 2015 . Beyond ”Hiing the Hits”: Generating Coherent Music Playlist Continuations with the Right Tracks . In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys '15) . ACM, New York, NY, USA, 187 - 194 .

[14] Ian

Jollie.

2002 . Principal component analysis . Wiley Online Library.

[15]

Ingunn

Grimstad Klepp and

Kirsi

Laitala . 2016 . Clothing consumption in Norway . Technical report 2 . Oslo and Akershus University College, Oslo. In Norwegian.

[16]

Anders

Kolstad , O¨

zlem O¨zgo¨bek, Jon Atle Gulla, and

Simon

Litlehamar . 2017 . Connected Closet - A Semantically Enriched Mobile Recommender System for Smart Closets . In Proceedings of the 13th International Conference on Web Information Systems and Technologies (WEBIST 2017 ). 298 - 305 .

[17] Jure

Leskovec

, Anand Rajaraman, and Jerey David Ullman. 2014 . Mining of Massive Datasets (2nd ed .). Cambridge University Press, New York, NY, USA.

[18] David

Lewis . 1998 . Naive (Bayes) at Forty: e Independence Assumption in Information Retrieval . In Proceedings of the 10th European Conference on Machine Learning (ECML '98) . Springer-Verlag, London, UK, UK, 4 - 15 .

[19] Chantima

Limaksornkul

, Duangkamol Na Nakorn,

Onidta

Rakmanee , and

Wantanee

Viriyasitavat . 2014 . Smart Closet: Statistical-based apparel recommendation system . In Student Project Conference (ICT-ISPC) , 2014 ird ICT International. IEEE, 155 - 158 .

[20] John

C Pruit.

2015 . Geing Dressed. Popular Culture as Everyday Life ( 2015 ).

[21] Andrew

I. Schein

, Alexandrin Popescul, Lyle H. Ungar , and David M. Pennock . 2002 . Methods and Metrics for Cold-start Recommendations . In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '02) . ACM, New York, NY, USA, 253 - 260 .

[22]

Xiaoyuan

Su and Taghi M. Khoshgo aar. 2009 . A Survey of Collaborative Filtering Techniques . Adv. in Artif. Intell. 2009, Article 4 ( Jan . 2009 ).

[23]

Andreu

Vall . 2015 . Listener-Inspired Automated Music Playlist Generation . In Proceedings of the 9th ACM Conference on Recommender Systems (RecSys '15) . ACM, New York, NY, USA, 387 - 390 .

[24]

Lin

Yu-Chu , Yuusuke Kawakita, Etsuko Suzuki, and

Haruhisa

Ichikawa . 2012 . Personalized Clothing-Recommendation System Based on a Modied Bayesian Network . In Proceedings of the 2012 IEEE/IPSJ 12th International Symposium on Applications and the Internet (SAINT '12) . IEEE Computer Society, Washington, DC, USA, 414 - 417 .