Visualizable and explicable recommendations obtained from price estimation functions Claudia Becerra and Alexander Gelbukh Fabio Gonzalez Center for Computing Research (CIC) Intelligent Systems Research Laboratory (LISI) National Polytechnic Institute (IPN) Universidad Nacional de Colombia, Bogota Mexico DF, 07738, Mexico [cjbecerrac,fagonzalezo]@unal.edu.co gelbukh@gelbukh.com ABSTRACT assist decision-making, is a task that largely overwhelms hu- Collaborative filtering is one of the most common approaches man capacities. Moreover, several studies have shown that in many current recommender systems. However, historical this problem generates adverse e↵ects on people such as: re- data and customer profiles, necessary for this approach, are gret due to the selected option, dissatisfaction due to poor not always available. Similarly, new products are constantly justification for the decision, uncertainty about the idea of launched to the market lacking historical information. We “best option”, and overload of time, attention and memory propose a new method to deal with these “cold start” sce- (see [20]). narios, designing price-estimation functions used for making Many recommender systems approaches have addressed recommendations based on cost-benefit analysis. Experi- this problem through collaborative filtering [6] based on prod- mental results, using a data set of 836 laptop descriptions, uct content (i.e. descriptions) and on customer informa- showed that such price-estimation functions can be learned tion [2, 9]. This approach recommends products similar to from data. Besides, they can also be used to formulate inter- those chosen by similar users. On the other hand, latent se- pretable recommendations that explain to users how prod- mantics approaches [10] have been successfully used to build uct features determine its price. Finally a 2D visualization affinity measures between products and users. Most of the of the proposed recommender system was provided. aforementioned approaches have been applied in domains with products such as books and movies that remain avail- able long enough to collect enough historical data to build Categories and Subject Descriptors a model [4, 12]. H.1.2 [User/Machine Systems]: Human information pro- While impressive progresses have been made in the field cessing; H.4.2 [Types of Systems]: Decision support using collaborative filtering, the relevance of current ap- proaches in domains with frequent changes in products is still an open question [8]. For example, customer-electronics General Terms domain is characterized by products with a very short life Experimentation cycle in the market and a constant renewal of technologies and paradigms. Collaborative approaches face two major problems in this scenario [13]. First, product features are Keywords constantly redefined, making difficult for users to identify Apriori recommendation, Cold-start recommendation, Price relevant attributes. Second, historical sales product data estimation functions become obsolete very quickly due to the frequent product substitution. This problem of making automatic recom- mendations without historical data is known as cold-start 1. INTRODUCTION recommendation [18]. The internet and e-commerce grow exponentially. As a In this paper, we propose a new cold-start method based result, decision-making process about products and services on an estimate of the benefit to the user when purchasing is becoming increasingly complex. These processes involve a product. This function is formulated as the di↵erence hundreds and even thousands of choices and a growing num- between estimated and real prices. Therefore, our approach ber of heterogeneous features for each product. This is recommends products with high benefit-cost ratio to find mainly due to the introduction and constant evolution of “best-deals” on a data set of products. Figure 1 shows an new markets, technologies and products. example of such recommendations based on utility functions Unfortunately, human capacity for decision-making is too displaying 900 laptop computers. In this figure, the features limited to address the complexity of this scenario. Studies in of laptops below the line in bold, indicating fair prices, do psychology field have shown that human cognitive capacities not justify prices of laptops. are limited from five to nine alternatives for simultaneous The rest of the paper is organized as follows. In Section comparison [17, 14]. Consequently, making a purchasing de- 2, the necesary background and proposed method are pre- cision at an e-commerce store that does not provide tools to sented. In Section 3, an evaluation methodology and some data refinements are proposed and applied to the model. Copyright is held by the author/owner(s). Finally, in Section 4, some concluding remarks are briefly RecSys’11, October 23–27, 2011, Chicago, Illinois, USA. discussed.  27 tor regression [19], provide better models with lower error Figure 1: Graphic recommender based on price es- rates but also with lower interpretability. timates This trade-o↵ can be overcome with a hybrid regression Recommendation model as 3FML (three-function meta-learner) [3]. This meta- regressor combines two di↵erent regression methods in a new improved combined model in a way similar to other meta- Recommendation algorithms such as voting, bagging and AdaBoost (see [1]). Price estimation ($) Unlike these methods, 3FML uses one regression method to make price predictions and another to predict the error. As fair long as the former regression method is weak, stable and price interpretable, the latter can be any other regression method regardless its interpretability. As a result, the combined re- gression preserves the same interpretability level of the first regressor but with lower error rate. A linear regression model can be trained to learn param- eters m by minimizing the least squared error from data [15]. This first model can be used by 3FML to build a base regression model f0 (x) with the full dataset. Then, this model is used to divide the data into two additional groups depending on whether the obtained price predictions were Actual price ($) below or above the training price, given a di↵erence thresh- old ✓. Next, using the same base-regression method, two ad- ditional models f+1 (x) and f 1 (x) are trained with the pair 2. APRIORI RECOMMENDATIONS USING of subsets called respectively, upper model and lower model. UTILITY FUNCTIONS Figure 3 illustrates upper, base and lower models compared The general intuition of method is led by the lexicograph- to the target function, which is the price in a data set of ical criterion [22]. That is, users prefer products that o↵er laptop computers. The three resulting models are combined more value for their money. Clearly, this approach is not ap- using an aggregation mechanism – called mixture of experts plicable to all circumstances, but it is general enough when [11] – with the following expression: customer profiles are not available in cold-start scenarios. P When a user purchases a product xi , a utility function utility(xi ) provides an estimation of the di↵erence between fˆ(xi ) = wl (xi )fl (xi ), (2) l✏H the estimated price f (xi ) and the market price yi , that is utility(xi ) = f (xi ) yi . Thus, the products in the market having P are represented as a set X = {x1 , x2 ,..., xn , ..., xN }, where each product xi is a vector characterized in a feature space wl (xi ) = 1, i 2 {1 . . . n} (3) l✏H RM . With these data, a regression model, learned from X and the vector of prices y, generates price estimations f (xi ) H is a dictionary of experts consisting on the base model required for calculation of the utility. Finally, the utility and two additional specialized models, H = {f 1 (x), f0 (x), function is computed on all products thus providing an or- f+1 (x)}. The gating coefficient wli establishes the level of dered list with the top-n apriori recommendations. relevance of the l model into the final price prediction for Estimates of price f (xi ) can be obtained by a linear- the i-th product. regression model as: In 3FML model, coefficients wli = wl (xi ) are obtained X by chaining a membership function wl for each regression f (xi ) = o + m xim . (1) model to a function ↵ that depends on the errors of the m2{1,...,M } three models, wl (xi ) = wl (↵(f 1 (xi ), f0 (xi ), f+1 (xi ), yi )). This model is equivalent to an additive value function These membership functions wl (↵) are similar to those used used in the decision-making model SAW (simple additive in fuzzy sets [23] but these satisfy the constraint given by weighting) [5], but with coefficients m learned automat- eq. 3. Three examples of those functions are shown in Fig- ically. Clearly, the recommendations obtained from these ure 2; one triangular and two Gaussian. Clearly, the range estimates can be explained to users, since each term m xim of the error function ↵ must agree with the domain of the represents the money contribution to the final price estimate membership functions. For instance, if the domain of the provided by the m-th feature of the i-th product. membership functions is [0, 1], an appropriate function ↵i The quality of the apriori recommendations obtained with must return a value close to 0.5 when yi is better modeled the proposed method depends primarily on three factors: by f0 (xi ). Similarly, reasonable values for ↵i , if yi is better the amount of training data, the accuracy of price estimates modeled by f 1 (xi ) or f+1 (xi ), are respectively 0.0 and 1.0. f (xi ), and the ability to extract user-understandable expla- Such function ↵ can be arithmetically constructed (see nations from the regression model. Certainly, linear models, [3] for triangular and Gaussian cases) and ↵i can be ob- such as that of eq. 1, o↵er good interpretability, but in many tained for every xi . 3FML makes use of a second regres- cases, these models generate high rates of error in their pre- sion method to learn a function for ↵i . This function is dictions when the interactions among features are complex. called ↵-learner, which seeks to predict the same target yi These models are known as weak regression models. On the but indirectly through the errors obtained by f 1 , f0 and other hand, discriminative approaches, such as support vec- f+1 . The estimates obtained with ↵-learner are used in com- 28 Figure 2: Triangular and Gaussian membership Table 1: Attributes in Laptops 17 836 data set functions Feature name Type % missing Manufacturer Nominal 0.00% Processor Speed Numeric 0.40% Installed Memory Numeric 1.90% Operating System Nominal 0.00% Processor Nominal 0.20% Memory Technology Nominal 7.20% Max Horizontal Resolution Numeric 7.90% Warranty-Days Numeric 15.50% Figure 3: 3FML’s three regression models graph Infrared Nominal 0.00% Bluetooth Nominal 0.00% Docking Station Nominal 0.00% Port Replicator Nominal 0.00% Fingerprint Nominal 0.00% Subwoofer Nominal 0.00% Price ($) External Battery Nominal 0.00% CDMA Nominal 0.00% Price Numeric 0.00% of the entire data set. These resources allow the reader – Laptop computers sorted by actual price Target Base LMS Lower model Upper model guided by a brief discussion – to qualitatively evaluate the recommendations obtained with the proposed method. 3.1 Data bination with the membership functions to get coefficients The data is a set of 836 laptop computers each represented wl (xi ). Therefore, final predictions are obtained with a dif- by a vector of 69 attributes including price, which is the ferent linear model for each target price yi . The resulting attribute to be estimated. Data were collected by Becerra1 model is also linear, but di↵erent for each product instance from several U.S. e-commerce sites (e.g. Pricegrabber, Cnet, in function to xi : Yahoo, etc.), during the second half of 2007 within a month. P A subset of 17 features was selected using the correlation- fˆ(xi ) = ˆ0 (xi ) + ˆm (xi )xim , (4) based selection method proposed by Hall [7]. We call this m✏ {1, ..., M } dataset Laptops 17 836 ; all its features and percentage of where missing values are shown in Table 1. X X ˆo (xi ) = lo wl (xi ) ; ˆm (xi ) = lm wl (xi ). 3.2 Price estimation results l2H l2H For the construction of the price-estimation function, sev- Clearly, the model in eq. 4 is as user-explainable as that eral regression methods were used, namely: least mean squares of eq. 1. linear regression (LMS) [15], M5P regression tree [21, 16], The e↵ect of ↵-learner in eq. 4 is that the entire data set support vector regression (SVR) [19] and three-function meta- is clustered into three latent classes. These classes can be learner (3FML, described in previous section). 3FML pro- considered as market segments namely: high-end, mid-range vides three interpretable linear models: upper, base and and low-end products. Many commercial markets exhibit lower models, which can be associated with product classes. this segmentation, e.g. computers, mobile phones, cars, etc. Finally, estimated price for each laptop was obtained with the combination of these three models using eq. 4 with the weights obtained from ↵-Learner and Gaussian membership 3. EXPERIMENTAL VALIDATION functions. The aim of experiments is to build a model that provides a The performance of each method was measured using root- cost-benefit ranking of a set of products where each product mean-square error (RMSE) defined as: is represented as a vector of features. To assess the quality of this ranking, two factors are observed. First, the error v uP ⇣ ⌘2 of the price-estimation regression should be low to make u ˆ t i f (xi ) yi sure that this function provides a reasonable explanation RM SE = . of the data set. Second, the model must be interpretable |X| and discovered knowledge must be consistent with market The data set was randomly divided into 75% for training data. For example, if a proposed model discovers a ranking and 25% for testing. Ten di↵erent runs of this partition ratio of how much money each operating system contributes to were used for each method. These ten RMSE results were laptop prices, this ranking should be in agreement the prices averaged and reported. Table 2 shows the results, their stan- of retail versions of the same operating systems. dard deviation (in parentheses) and some model parameters. In addition, the full features set of the top-10 recom- 1 mended products is provided along with a 2D visualization http://unal.academia.edu/claudiabecerra/teaching 29 The method with lowest RMSE was SVR with a complex- ity parameter C = 100 using radial basis functions (RBF) Table 3: Proportions of operating systems ocur- as kernel. However, interpretability of this model is quite rences in Laptops 17 836 data set limited, given the embedded feature space induced by the Operating System # % kernel. On the other hand, LMS and 3FML provide straight- Vista Home Premium (WinVHP) 251 30.02% forward interpretation of coefficients, which represent the WinXP Pro (WinXPP) 208 24.88% amount of the contribution of each feature to the product WinXP (WinXP) 151 18.06% estimated price. Clearly, 3FML was the method that better Win. Vista Business (WinVB) 137 16.39% coped with this interpretability-accuracy trade-o↵. Win. Vista Home Basic (WinVHB) 44 5.26% Mac OS (MacOS) 34 4.07% Win. Vista Ultimate (WinVU) 11 1.32% Table 2: 10 runs average RMSE results for price Total 836 100% estimates obtained with several regression methods Regression model Avg. RMSE M5P regression tree 239.70(21.57) Table 4: Retail prices for di↵erent editions of Win- Least Mean Squares (LMS) 259.87(17.90) dows Vista "-SVR, C = 100, linear kernel 258.93(16.93) O.S.! WinVHB WinVHP WinVB WinVU 3FML (LMS as base model) 233.48(14.76) Retail price* $199.95 $259.95 $299.95 $319.95 3FML ("-SVR, C = 100, linear kernel) 223.76( 8.57) *http://www.microsoft.com/windows/windows- "-SVR, C = 100, RBF kernel = 7.07 230.23(12.27) vista/compare-editions (site consulted in September 2007) 3.3 Evaluation and feedback string that contains — in most of cases – the manufacturer, In this section the price estimation function obtained us- the product family and the model (e.g. “Intel Core 2 Duo ing 3FML is manually analyzed checking coherence of co- Mobile T7200”). Unlike OS attribute, which has only seven efficients with real facts of the market. Particularly, coef- possible alternatives, Processor attribute has 133 possible ficients for attributes operating system, processor and nu- processor models. Moreover, the frequencies of occurrence merical features are reviewed, and – when necessary – some of each processor model exhibit a Zipf-type distribution (see refinements are proposed to the data sets to deal with dis- Figure 4). Thus, approximately half of the 836 laptops have cussed issues. only 8 di↵erent processors and more than 80 processors oc- 3.3.1 Operating System attribute analysis cur only in one laptop. Part of this sparseness is due to missing information, abbreviations and formatting. Table 3 shows the distribution of the di↵erent operating The Processor attribute, as found in the data set, can systems into the entire data set of laptops and the abbrevi- generate a detrimental e↵ect on the price-estimation func- ations that we use to refer them at Table 5 and Table 4. tion. Besides, coefficients could hardly be explained and In order to evaluate the portion of the price estimation their evaluation against market facts could lead to mislead- model related to operating system (OS) attribute, coeffi- ing results. Thus, the model was withdrawn from Processor cients of this feature are compared with related Microsoft’s attribute and it was renamed as Proc. Family. In addition, retail prices. Table 4 shows public retail prices for Windows the data set was enriched manually adding the following four Vista published at 2007-3Q. In spite that at that date, processor related attributes: Windows Vista operating system had already six months of launched, many brand new laptops still had pre-installed • L2-Cache: processor cache in Kibibytes (210 bytes). previous Windows XP . Thus, we consider for analysis Win- dows XP Pro equivalent to Windows Vista Business , as • Hyper Transport: frontal bus clock rate in Mhz. well as, Windows XP equivalent to Windows Vista Home • Thermal Design: maximum dissipated power in watt. Premium . This assumption is also coherent with the ob- served behavior in Microsoft’s price policy that keeps prices • Process Technology: CMOS technology in nanometre. of previous product releases invariable during version tran- sition periods. This new data set is referred as Laptops 21 836 data set. It is interesting to highlight the behavior of 3FML model Performance results of new price-estimation functions are with Windows Vista Ultimate . Although this OS version shown in Table 6. Clearly, SVR and 3FML obtained sub- occurs only at 1.32% of instances (see Table 3), it is cor- stantial improvements using this new data set. rectly recognized as the most expensive OS (see Table 4) by Similarly to the analysis made for OS attribute, processors the upper model. This fact corrects an erroneous tendency families also have a consumer-value ranking given by their recognized by base and lower models. In general terms, for technology, which can be compared to a ranking taken from other OS versions, 3FML managed to predict similar order- an interpretable price-estimation function. The technology ing as that of retail prices. ranking of Intel processors is: (1st) Core 2 Duo , Core Duo , Core Solo , Pentium Dual Core and Celeron . 3.3.2 Processor attribute coefficients Same for AMD’s processors: (1st) Turion , Athlon and 2 As shown in Table 1, Laptops 17 836 data set has two fea- Sempron . We extracted a ordering for processor fami- tures to describe the main processors of laptops , they are: 2 see http://www.notebookcheck.net/Notebook- Processor Speed (numeric) and Processor (nominal). The Processors.129.0.html for a short description of mobile former is the processor clock rate and the latter is a text processor families (site consulted in June 2011) 30 5 Table 7: Processor families rankings obtained from Table 5: 3FML base, upper and lower model coeffi- 3FML price-estimation function cients s.o for operating system attribute Upper model Lower model Base model Upper model Lower model Intel Core2 Duo 7.4(0.8) Intel Core2 Duo 7.6(0.5) S.O. s.o S.O. s.o S.O. s.o Intel Core Duo 7.2(1.2) Intel Core Solo 6.2(1.7) WinVU 323.3 WinVB 185.6 WinVB 127.3 Intel Core Solo 5.3(2.1) AMD Athlon 5.7(3.8) WinVB 260.3 WinXPP 184.5 WinXPP 127 Intel Celeron 5.1(1.7) Intel Core Duo 4.8(2.1) WinXPP 249.8 MacOS 169.2 MacOS 95.2 PowerPC 4.6(2.6) AMD Turion 4.8(1.8) MacOS 245.9 WinVU 96.4 WinVHP 24.7 Pent DualCore 3.7(1.4) Pent DualCore 4.3(2.1) WinVHP 116.7 WinVHP 57 WinVU 0.0 AMD Sempron 3.4(2.4) PowerPC 4.3(3.1) WinXP 94.3 WinXP 27.8 WinVHB 0.0 AMD Turion 3.3(1.8) Intel Celeron 3.3(1.3) WinVHB 0.0 WinVHB 0.0 WinXP -7.1 AMD Athlon 1.8(1.4) AMD Sempron 2.8(2.3) Base model Intel Core Solo 8.5(0.7) Figure 4: Distribution of Processor attribute Intel Core2 Duo 8.3(0.7) 2.00 Intel Core Duo 6.8(0.9) 1.80 Pent DualCore 5.1(1.4) 1.60 Intel Celeron 5.0(0.7) log(frequency) 1.40 AMD Turion 3.7(1.1) 1.20 PowerPC 2.9(2.1) 1.00 AMD Sempron 2.6(1.8) 0.80 AMD Athlon 2.1(1.3) 0.60 0.40 0.20 Table 8: coefficients for numerical attributes from 0.00 3FML model with Laptops 21 836 data set 1 34 67 100 Feature name Upper Base Lower processor models ranked by data set frequency 0 0.23 0.12 0.06 Warranty Days 0.04 0.01 -0.01 Installed Memory -0.11 0.17 0.10 Max. Horizontal Resolution 0.12 0.37 0.15 lies by their corresponding coefficients from 3FML models. Processor Tech. 0.30 0.08 0.05 Results for this ranking – means and standard deviation – Thermal Desing -0.01 -0.37 -0.27 making 10 runs with di↵erent samples of 75% training and Hyper Transport -0.02 0.25 0.05 25% test are shown in Table 7. L2-Cache 0.08 0.16 0.11 Results in Table 7 show how upper model better ordered Processor Speed -0.03 0.25 0.17 processor families with high technological ranking. Simi- larly, lower model does a similar work recognizing Sempron family at the lowest rank. Besides, these coefficients also shows that this e↵ect a↵ects prices more at mid-range and low-end laptop-market seg- 3.3.3 Numerical attributes coefficients ments. Similarly, Max. Horizontal Resolution attribute re- This subsection present a brief discussion on the interpre- veals that this feature has greater impact on the mid-range tation of coefficients extracted from the price-estimation laptop market prices. function for some numeric attributes (shown in Table 8). Interestingly, there is a phenomenon revealed by the fea- Although this interpretation is clearly subjective, it reveals tures that are easy perceived by users, such as Installed some laptop-market facts, which were extracted in an unsu- Memory, Max. Horizontal Res. (number of horizontal pixels pervised way from the data. on screen), L2-Cache and Processor Speed. That is: those For instance, consider Thermal Design attribute. Neg- features have considerably less e↵ect on prices in high-end ative values in the coefficients reveal a fact: the lesser than in mid-range and low-end market segments. This phe- power the CPU dissipates, the higher the laptop’s price. nomenon can be explained by the fact that “luxury” goods justify their price more by attributes such as brand-label, exclusive features and physical appearance rather than for Table 6: RMSE for regression price estimates in their configuration. Laptops 21 836 data set 3.4 Recommendations for users Regression model RMSE "-SVR (C = 1, lineal kernel ) 254.56(11.75) 3.4.1 Top-10 recommendations "-SVR (C = 100, RBF kernel, ) 219.16( 9.88) After the quantitative evaluation (i.e. regression error) 3FML* 220.91(10.97) and qualitative assessment (i.e. agreement with market facts) * Base model: "-SVR, C = 1, lineal kernel. ↵-Learner: "- using 3FML model, the resulting functions provided reason- SVR, C = 100, RBF kernel. able estimates of price and support elements to explainable 31 Figure 5: Visualization of 836 laptops recommendation ranking Profit $400 $200 $0 -$200 -$400 Price $450 $550 $650 $750 $850 $950 $1050 $1150 $1250 $1350 $1450 $1550 $1650 $1750 $1850 recommendations such as rankings and weights of attributes. 3.4.2 2D Visualization After obtaining the estimates of prices, the profit for each Ordered lists are the most common format to present rec- laptop is calculated from the di↵erence between this esti- ommendations to users. However, despite having such an mate and real price. Table 9 shows the top-10 recommen- ordination, establish the most appropriated choice for a par- dations with the highest profit among all 836 laptops. ticular user is a difficult task. Therefore, we propose a novel The first and second top-ranked laptops have similar con- visualization method for our recommender system. The pur- figurations, but even small di↵erences make comparison dif- pose of this is to enable users to build a mental model of the ficult at first sight. The second laptop has better price, more market. When users do not have a clear aim or a defined memory, docking station and ports replicator slots. Unlike, budget, this tool provides a rough idea of the number of the former has higher screen resolution and a fingerprint sen- options and prices. In addition, visualization can help the sor. These di↵erences can be compared quantitatively with short-term memory decreasing cognitive load and highlight the help of coefficients provided by the model. However, a the recommended options. better explanation of the #1 recommended choice is a mar- The proposed 2D visualization is shown in Figure 5. The ket fact extracted from the obtained manufacturer ranking horizontal axe represents actual price and the vertical axe showed in Table 10. The three regression models identify represents the profit, which is the di↵erence between the es- the Lenovo brand better ranked than HP . Therefore, timated and actual price. Each laptop is represented as a the first recommended laptop becomes a “best deal” given bubble, where larger radius and warmer colors (darker gray the standard prices of Lenovo at the time. Similarly, rec- in the grayscale version) means higher profit-price di↵er- ommendations #7, #9 and #10 seem to get their high user ences. Besides, the number of ranking was included in the profit not because of their configuration features, but be- top-99 recommendations. cause of their label Sony , which do better positions on the This visualization highlights other “best deals” that are ranking of manufacturers than its counterparts. hidden in the ranking list. For instance, consider recom- Second and third recommendations only di↵er in Proces- mendation #53 (see Figure 5 in the coordinates $1550 price sor Speed attribute. Clearly, the estimated cost of that dif- and $260 profit). Perhaps this is an interesting option to ferentia is the numerical di↵erence between their estimated consider if user’s budget is over $1500. Similarly, recom- prices, which is $42. Nevertheless, their real price di↵erence mendation #26 can be quickly identified as the best option is $50. This explains the order of position in the ranking for buyer on a low budget. assigned by the recommender system to the #2 and #3 rec- The proposed visualization also allows a qualitative as- ommendations. More pair-wise comparisons and evaluations sessment of the price-estimation function. For instance, con- could be made but are omitted due to space limitations. sider the laptops above $1300, this function has difficulties These paired comparisons become cognitively more diffi- to predict prices using the current set of features, which in cult when the number of features, di↵erences and instances turn appears to be very e↵ective for mid-range prices. This increases. However, the proposed recommender method pro- problem could be solved indentifying and adding to the set vides reasonable explanations no matter how much data is of attributes those distinctive features of high-end laptops, involved, and these can be provided by user request. This namely: shockproof devices, special designs, colors, housing is important because cold-start recommender systems need materials, exclusive options, etc. to establish trust in users due of the lack of collaborative support. 32 Table 9: Detailed top-10 ranked recomendations Recommendation rank ! #1 #2 #3 #4 #5 Horizontal Resol. 900 pixels 800 pixels 800 pixels 1536 pixels 768 pixels Memory Tech. DDR2 DDR2 DDR2 DDR2 DDR2 Inst. Memory 512 MB 1024 MB 1024 MB 1024 MB 1024 MB Family Core Duo Core Duo Core Duo Core2 Duo Core2 Duo Processor Speed 1830 GHz 1830 GHz 2000 GHz 1500 GHz 2000 GHz L2 Cache ?* ? ? 2048 kB 4096 kB Hyper Transp ? ? ? 667 Mhz 667 Mhz Thermal Design ? ? ? 35 34 Process Tech. ? ? ? 65nm 65nm Manufacturer Lenovo HP HP Lenovo HP Op. System WinXPP WinXPP WinXPP WinVB WinXPP Warranty Days 1095 1095 1095 365 W 1095 W IBDPFWC** YNYYYNN YYYYYNN YYYYYNN NNYYNNN YYYYYNN Actual Price $ 899 $ 795 $ 845 $ 875 $ 849 Estimated Price $ 1,438 $ 1,319 $ 1,361 $ 1,383 $ 1,332 Profit $ 539 $ 524 $ 516 $ 508 $ 483 Recommendation rank ! #6 #7 #8 #9 #10 Horizontal Resol. 1050 pixels 800 pixels 800 pixels 800 pixels 800 Memory Tech. DDR2 DDR2 DDR2 DDR2 DDR2 Inst. Memory 1024 MB 1024 MB 1024 MB 512 MB 1024 MB Family Core Duo Core2 Duo Core Duo Core Duo Core2 Duo Processor Speed 2000 GHz 2160 GHz 1830 GHz 1660 GHz 2000 GHz L2 Cache 2048 kB 4096 kB ?* 2048 kB 4096 kB Hyper Transp 667 Mhz 667 Mhz ? 667 Mhz 800 Mhz Thermal Design 31 W 34 W ? 31 W 35 W Process Tech. 65nm 65nm ? 65nm 65nm Manufacturer Lenovo Sony HP Sony Sony Op. System WinXP Pro WinXP Pro WinXP Pro WinXP Pro V Business Warranty Days 365 365 1095 365 365 I BDPFWC** NYNNYNN NYYYNNN YYYYYNN NNYYNNNN NYYYYNN Actual Price $ 845 $ 1,060 $ 868 $ 649 $ 1,080 Estimated Price $ 1,312 $ 1,526 $ 1,319 $ 1,093 $ 1,522 Profit $ 467 $ 466 $ 451 $ 444 $ 442 * Question mark stands for missing values. ** Initials I B D P F W C stand for Infrared, Bluetooth, Docking Station, Port Replicator, Fingerprint, Subwoofer and CDMA. Table 10: Average ranking of Manufacturer attribute using 3FML at Laptops 21 836 data set Base model Upper model Lower model Asus 10.6(0.7) Dell 10.0(0.9) Asus 10.2(0.8) Sony 9.6(1.2) Fujitsu 9.2(1.0) Fujitsu 9.6(1.5) Fujitsu 8.4(1.4) Sony 7.4(1.8) Sony 8.1(1.2) Dell 7.9(0.9) Asus 6.8(1.8) Dell 7.3(1.4) Apple 7.0(2.4) Apple 6.1(1.3) Apple 6.7(2.0) Lenovo 6.5(1.6) Lenovo 4.1(2.4) Lenovo 5.2(1.5) Toshiba 5.0(1.2) Gateway 4.0(2.9) Toshiba 4.7(1.5) Acer 4.7(1.2) Averatec 3.8(1.3) Acer 3.9(1.3) HP 2.3(0.7) Acer 3.6(1.3) HP 3.4(1.7) Averatec 2.1(1.3) HP 3.3(1.9) Gateway 1.9(0.8) Gateway 1.9(1.0) Toshiba 2.4(1.5) Averatec 1.9(2.1) 33 4. CONCLUSIONS [6] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry. We presented a novel product recommender system based Using collaborative filtering to weave an information on an interpretable price-estimation function, which esti- tapestry. Commun. ACM, 35(12):61–70, 1992. mates the economic benefit for the customer to buy a prod- [7] M. A. Hall. Correlation-based feature selection for uct in a particular market. Accurate and interpretable price discrete and numeric class machine learning. In ICML estimations were obtained using the 3FML (three-function ’00: Proceedings of the 17th International Conference meta-learner) method. This regression method allows the on Machine Learning, pages 359–366, San Francisco, combination of an interpretable regressor (e.g. LMS) to es- CA, USA, 2000. timate prices and an uninterpretable regressor (e.g. SVR) [8] E. Han and G. Karypis. Feature-based to identify the latent class of each product. The combined recommendation system. In Proceedings of the 14th model obtained better price estimates than LMS, SVR and ACM International Conference on Information and M5P regression tree, while it kept a high level of interpreta- Knowledge Management (CIKM), pages 446–452, tion. 2005. The proposed method was tested with real-market data [9] Borchers A. Riedl J. Herlocker J. L., Konstan J. A. An from a data set of laptops. The obtained price-estimation algorithmic framework for performing collaborative model was interpretable, allowing evaluation and refinement filtering. In Proc. 22nd ACM SIGIR Conference on by domain experts and ensuring that price estimates are Information Retrieval, pages 230–237, 1999. a coherent consequence of the product features. In addi- [10] T. Hofmann. Latent semantic models for collaborative tion, the obtained recommendations are easy to understand filtering. ACM Trans. on Information Systems, pages by users. For instance, feature rankings (e.g. ranking of 89–115, 2004. CPU) and feature price contributions (e.g. cost per GB of [11] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. main memory) are provided. Importantly, while the price Hinton. Adaptive mixtures of local experts. Neural estimates are obtained in a supervised way, other domain Comput., 3(1):79–87, 1991. knowledge is extracted in a non-supervised way. Although [12] G. Linden, B. Smith, and J. York. Amazon.com the proposed method was tested in a particular domain (i.e. recommendations: Item-to-item collaborative filtering. laptops), this same process can be applied to other domains IEEE Internet Computing, 7(1):76–80, 2003. that exhibit similar number of options and features. [13] M. Mandl, A. Felfernig, E. Teppan, and M. Schubert. Moreover, a user-friendly visualization method for recom- Consumer decision making in knowledge-based mendations was proposed using a 2D Cartesian metaphor recommendation. Journal of Intelligent Information and concrete variables such as cost and profit. This visual- Systems, 2010. ization allows users to make a quick mental map of a large [14] G. A. Miller. The magical number seven: Plus or market to explore and identify recommendations in di↵erent minus two: Some limits on our capability for price ranges. processing information. Psychological Review, In conclusion, the proposed method is flexible and can be 63(2):81–97, 1956. useful in e-commerce scenarios with products that allow the construction of price-estimation functions, such as customer- [15] D. C. Montgomery, E. A. Peck, and G. G. Vining. electronics products and others. Finally, our method fills a Introduction to linear regression analysis. Wiley gap where recommender systems based on historical infor- Interscience, 2006. mation fail because of the lack of such information. [16] R. J. Quinlan. Learning with continuous classes. In 5th Australian Joint Conference on Artificial Intelligence, pages 343–348, 1992. 5. ACKNOWLEDGEMENTS [17] T. L. Saaty and M. S. Ozdemir. Why the magic This research is funded in part by the Bogota Research number seven plus or minus two. Mathematical and Division (DIB) at the National University of Colombia, and Computer Modelling, 38(3):233–244, 2003. throught a grant from Colciencias, project 110152128465. [18] A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock. Methods and metrics for cold-start 6. REFERENCES recommendations. In SIGIR, 2002. [19] B. Scholkopf and A. J. Smola. Learning with Kernels: [1] E. Alpaydin. Introduction to Machine Learning. The Support Vector Machines, Regularization, MIT Press, October 2004. Optimization, and Beyond. MIT Press, 2001. [2] M. Balabanovic and Y. Shoham. Fab: Content-based [20] B. Schwartz. The tyranny of choice. Scientific collaborative recommendation. Communications of the American, pages 71–75, April 2004. Association for Computing Machinery, 40(3):66–72, [21] Y. Wang and I. H. Witten. Induction of model trees 1997. for predicting continuous classes. In Poster papers of [3] C. Becerra and F. Gonzalez. 3-functions meta-learner the 9th European Conference on Machine Learning. algorithm: a mixture of experts technique to improve Springer, 1997. regression models. In DMIN08: Proceedings of the 4th [22] K. Yoon and C. Hwang. Multiple attribute decision international conference on data mining, Las Vegas, making: An introduction. Sage University papers, NV, USA., 2008. 7(104), 1995. [4] J. Bennet and S. Lanning. The netflix prize. In KDD [23] L. Zadeh. Fuzzy Sets, Fuzzy Logic and Fuzzy Systems. Cup and Workshop, 2007. World Scientific, 1996. [5] R. T. Eckenrode. Weighting multiple criteria. Management Sciences, 12:180–192, 1965. 34