Visualizable and explicable recommendations
                  obtained from price estimation functions

                          Claudia Becerra and                                       Alexander Gelbukh
                            Fabio Gonzalez                                  Center for Computing Research (CIC)
            Intelligent Systems Research Laboratory (LISI)                   National Polytechnic Institute (IPN)
              Universidad Nacional de Colombia, Bogota                           Mexico DF, 07738, Mexico
            [cjbecerrac,fagonzalezo]@unal.edu.co                                  gelbukh@gelbukh.com

ABSTRACT                                                              assist decision-making, is a task that largely overwhelms hu-
Collaborative filtering is one of the most common approaches          man capacities. Moreover, several studies have shown that
in many current recommender systems. However, historical              this problem generates adverse e↵ects on people such as: re-
data and customer profiles, necessary for this approach, are          gret due to the selected option, dissatisfaction due to poor
not always available. Similarly, new products are constantly          justification for the decision, uncertainty about the idea of
launched to the market lacking historical information. We             “best option”, and overload of time, attention and memory
propose a new method to deal with these “cold start” sce-             (see [20]).
narios, designing price-estimation functions used for making             Many recommender systems approaches have addressed
recommendations based on cost-benefit analysis. Experi-               this problem through collaborative filtering [6] based on prod-
mental results, using a data set of 836 laptop descriptions,          uct content (i.e. descriptions) and on customer informa-
showed that such price-estimation functions can be learned            tion [2, 9]. This approach recommends products similar to
from data. Besides, they can also be used to formulate inter-         those chosen by similar users. On the other hand, latent se-
pretable recommendations that explain to users how prod-              mantics approaches [10] have been successfully used to build
uct features determine its price. Finally a 2D visualization          affinity measures between products and users. Most of the
of the proposed recommender system was provided.                      aforementioned approaches have been applied in domains
                                                                      with products such as books and movies that remain avail-
                                                                      able long enough to collect enough historical data to build
Categories and Subject Descriptors                                    a model [4, 12].
H.1.2 [User/Machine Systems]: Human information pro-                     While impressive progresses have been made in the field
cessing; H.4.2 [Types of Systems]: Decision support                   using collaborative filtering, the relevance of current ap-
                                                                      proaches in domains with frequent changes in products is
                                                                      still an open question [8]. For example, customer-electronics
General Terms                                                         domain is characterized by products with a very short life
Experimentation                                                       cycle in the market and a constant renewal of technologies
                                                                      and paradigms. Collaborative approaches face two major
                                                                      problems in this scenario [13]. First, product features are
Keywords                                                              constantly redefined, making difficult for users to identify
Apriori recommendation, Cold-start recommendation, Price              relevant attributes. Second, historical sales product data
estimation functions                                                  become obsolete very quickly due to the frequent product
                                                                      substitution. This problem of making automatic recom-
                                                                      mendations without historical data is known as cold-start
1.    INTRODUCTION                                                    recommendation [18].
   The internet and e-commerce grow exponentially. As a                  In this paper, we propose a new cold-start method based
result, decision-making process about products and services           on an estimate of the benefit to the user when purchasing
is becoming increasingly complex. These processes involve             a product. This function is formulated as the di↵erence
hundreds and even thousands of choices and a growing num-             between estimated and real prices. Therefore, our approach
ber of heterogeneous features for each product. This is               recommends products with high benefit-cost ratio to find
mainly due to the introduction and constant evolution of              “best-deals” on a data set of products. Figure 1 shows an
new markets, technologies and products.                               example of such recommendations based on utility functions
   Unfortunately, human capacity for decision-making is too           displaying 900 laptop computers. In this figure, the features
limited to address the complexity of this scenario. Studies in        of laptops below the line in bold, indicating fair prices, do
psychology field have shown that human cognitive capacities           not justify prices of laptops.
are limited from five to nine alternatives for simultaneous              The rest of the paper is organized as follows. In Section
comparison [17, 14]. Consequently, making a purchasing de-            2, the necesary background and proposed method are pre-
cision at an e-commerce store that does not provide tools to          sented. In Section 3, an evaluation methodology and some
                                                                      data refinements are proposed and applied to the model.
Copyright is held by the author/owner(s).                             Finally, in Section 4, some concluding remarks are briefly
RecSys’11, October 23–27, 2011, Chicago, Illinois, USA.               discussed.


                                                                 27
                                                                         tor regression [19], provide better models with lower error
Figure 1: Graphic recommender based on price es-                         rates but also with lower interpretability.
timates                                                                     This trade-o↵ can be overcome with a hybrid regression
                         Recommendation                                  model as 3FML (three-function meta-learner) [3]. This meta-
                                                                         regressor combines two di↵erent regression methods in a new
                                                                         improved combined model in a way similar to other meta-
                        Recommendation                                   algorithms such as voting, bagging and AdaBoost (see [1]).
Price estimation ($)


                                                                         Unlike these methods, 3FML uses one regression method to
                                                                         make price predictions and another to predict the error. As
                                                         fair            long as the former regression method is weak, stable and
                                                        price            interpretable, the latter can be any other regression method
                                                                         regardless its interpretability. As a result, the combined re-
                                                                         gression preserves the same interpretability level of the first
                                                                         regressor but with lower error rate.
                                                                            A linear regression model can be trained to learn param-
                                                                         eters m by minimizing the least squared error from data
                                                                         [15]. This first model can be used by 3FML to build a base
                                                                         regression model f0 (x) with the full dataset. Then, this
                                                                         model is used to divide the data into two additional groups
                                                                         depending on whether the obtained price predictions were
                                    Actual price ($)                     below or above the training price, given a di↵erence thresh-
                                                                         old ✓. Next, using the same base-regression method, two ad-
                                                                         ditional models f+1 (x) and f 1 (x) are trained with the pair
2.                     APRIORI RECOMMENDATIONS USING                     of subsets called respectively, upper model and lower model.
                       UTILITY FUNCTIONS                                 Figure 3 illustrates upper, base and lower models compared
   The general intuition of method is led by the lexicograph-            to the target function, which is the price in a data set of
ical criterion [22]. That is, users prefer products that o↵er            laptop computers. The three resulting models are combined
more value for their money. Clearly, this approach is not ap-            using an aggregation mechanism – called mixture of experts
plicable to all circumstances, but it is general enough when             [11] – with the following expression:
customer profiles are not available in cold-start scenarios.
                                                                                                         P
   When a user purchases a product xi , a utility function
utility(xi ) provides an estimation of the di↵erence between                                 fˆ(xi ) =         wl (xi )fl (xi ),          (2)
                                                                                                         l✏H
the estimated price f (xi ) and the market price yi , that is
utility(xi ) = f (xi ) yi . Thus, the products in the market               having
                                                                                            P
are represented as a set X = {x1 , x2 ,..., xn , ..., xN }, where
each product xi is a vector characterized in a feature space                                      wl (xi ) = 1, i 2 {1 . . . n}           (3)
                                                                                            l✏H
RM . With these data, a regression model, learned from X
and the vector of prices y, generates price estimations f (xi )             H is a dictionary of experts consisting on the base model
required for calculation of the utility. Finally, the utility            and two additional specialized models, H = {f 1 (x), f0 (x),
function is computed on all products thus providing an or-               f+1 (x)}. The gating coefficient wli establishes the level of
dered list with the top-n apriori recommendations.                       relevance of the l model into the final price prediction for
   Estimates of price f (xi ) can be obtained by a linear-               the i-th product.
regression model as:                                                        In 3FML model, coefficients wli = wl (xi ) are obtained
                                 X                                       by chaining a membership function wl for each regression
                f (xi ) = o +             m xim .             (1)
                                                                         model to a function ↵ that depends on the errors of the
                                         m2{1,...,M }
                                                                         three models, wl (xi ) = wl (↵(f 1 (xi ), f0 (xi ), f+1 (xi ), yi )).
   This model is equivalent to an additive value function                These membership functions wl (↵) are similar to those used
used in the decision-making model SAW (simple additive                   in fuzzy sets [23] but these satisfy the constraint given by
weighting) [5], but with coefficients m learned automat-                 eq. 3. Three examples of those functions are shown in Fig-
ically. Clearly, the recommendations obtained from these                 ure 2; one triangular and two Gaussian. Clearly, the range
estimates can be explained to users, since each term m xim               of the error function ↵ must agree with the domain of the
represents the money contribution to the final price estimate            membership functions. For instance, if the domain of the
provided by the m-th feature of the i-th product.                        membership functions is [0, 1], an appropriate function ↵i
   The quality of the apriori recommendations obtained with              must return a value close to 0.5 when yi is better modeled
the proposed method depends primarily on three factors:                  by f0 (xi ). Similarly, reasonable values for ↵i , if yi is better
the amount of training data, the accuracy of price estimates             modeled by f 1 (xi ) or f+1 (xi ), are respectively 0.0 and 1.0.
f (xi ), and the ability to extract user-understandable expla-              Such function ↵ can be arithmetically constructed (see
nations from the regression model. Certainly, linear models,             [3] for triangular and Gaussian cases) and ↵i can be ob-
such as that of eq. 1, o↵er good interpretability, but in many           tained for every xi . 3FML makes use of a second regres-
cases, these models generate high rates of error in their pre-           sion method to learn a function for ↵i . This function is
dictions when the interactions among features are complex.               called ↵-learner, which seeks to predict the same target yi
These models are known as weak regression models. On the                 but indirectly through the errors obtained by f 1 , f0 and
other hand, discriminative approaches, such as support vec-              f+1 . The estimates obtained with ↵-learner are used in com-


                                                                    28
Figure 2:             Triangular and Gaussian membership                                    Table 1: Attributes in Laptops 17 836 data set
functions                                                                                          Feature name         Type    % missing
                                                                                                   Manufacturer        Nominal    0.00%
                                                                                                  Processor Speed      Numeric    0.40%
                                                                                                 Installed Memory      Numeric    1.90%
                                                                                                 Operating System      Nominal    0.00%
                                                                                                     Processor         Nominal    0.20%
                                                                                                Memory Technology      Nominal    7.20%
                                                                                             Max Horizontal Resolution Numeric    7.90%
                                                                                                  Warranty-Days        Numeric   15.50%
   Figure 3: 3FML’s three regression models graph                                                     Infrared         Nominal    0.00%
                                                                                                     Bluetooth         Nominal    0.00%
                                                                                                  Docking Station      Nominal    0.00%
                                                                                                  Port Replicator      Nominal    0.00%
                                                                                                    Fingerprint        Nominal    0.00%
                                                                                                     Subwoofer         Nominal    0.00%
 Price ($)


                                                                                                 External Battery      Nominal    0.00%
                                                                                                      CDMA             Nominal    0.00%
                                                                                                        Price          Numeric    0.00%


                                                                                       of the entire data set. These resources allow the reader –
                           Laptop computers sorted by actual price
             Target         Base LMS        Lower model          Upper model
                                                                                       guided by a brief discussion – to qualitatively evaluate the
                                                                                       recommendations obtained with the proposed method.

                                                                                       3.1      Data
bination with the membership functions to get coefficients                                The data is a set of 836 laptop computers each represented
wl (xi ). Therefore, final predictions are obtained with a dif-                        by a vector of 69 attributes including price, which is the
ferent linear model for each target price yi . The resulting                           attribute to be estimated. Data were collected by Becerra1
model is also linear, but di↵erent for each product instance                           from several U.S. e-commerce sites (e.g. Pricegrabber, Cnet,
in function to xi :                                                                    Yahoo, etc.), during the second half of 2007 within a month.
                                    P                                                  A subset of 17 features was selected using the correlation-
          fˆ(xi ) = ˆ0 (xi ) +                 ˆm (xi )xim , (4)                       based selection method proposed by Hall [7]. We call this
                               m✏ {1, ..., M }
                                                                                       dataset Laptops 17 836 ; all its features and percentage of
where                                                                                  missing values are shown in Table 1.
                          X                                  X
             ˆo (xi ) =         lo wl (xi ) ;
                                                ˆm (xi ) =         lm wl (xi ).        3.2      Price estimation results
                          l2H                                l2H
                                                                                          For the construction of the price-estimation function, sev-
   Clearly, the model in eq. 4 is as user-explainable as that                          eral regression methods were used, namely: least mean squares
of eq. 1.                                                                              linear regression (LMS) [15], M5P regression tree [21, 16],
   The e↵ect of ↵-learner in eq. 4 is that the entire data set                         support vector regression (SVR) [19] and three-function meta-
is clustered into three latent classes. These classes can be                           learner (3FML, described in previous section). 3FML pro-
considered as market segments namely: high-end, mid-range                              vides three interpretable linear models: upper, base and
and low-end products. Many commercial markets exhibit                                  lower models, which can be associated with product classes.
this segmentation, e.g. computers, mobile phones, cars, etc.                           Finally, estimated price for each laptop was obtained with
                                                                                       the combination of these three models using eq. 4 with the
                                                                                       weights obtained from ↵-Learner and Gaussian membership
3.           EXPERIMENTAL VALIDATION                                                   functions.
   The aim of experiments is to build a model that provides a                             The performance of each method was measured using root-
cost-benefit ranking of a set of products where each product                           mean-square error (RMSE) defined as:
is represented as a vector of features. To assess the quality
of this ranking, two factors are observed. First, the error                                                    v
                                                                                                               uP ⇣                  ⌘2
of the price-estimation regression should be low to make                                                       u    ˆ
                                                                                                               t i f (xi )      yi
sure that this function provides a reasonable explanation                                              RM SE =                            .
of the data set. Second, the model must be interpretable                                                              |X|
and discovered knowledge must be consistent with market                                  The data set was randomly divided into 75% for training
data. For example, if a proposed model discovers a ranking                             and 25% for testing. Ten di↵erent runs of this partition ratio
of how much money each operating system contributes to                                 were used for each method. These ten RMSE results were
laptop prices, this ranking should be in agreement the prices                          averaged and reported. Table 2 shows the results, their stan-
of retail versions of the same operating systems.                                      dard deviation (in parentheses) and some model parameters.
   In addition, the full features set of the top-10 recom-
                                                                                       1
mended products is provided along with a 2D visualization                                  http://unal.academia.edu/claudiabecerra/teaching


                                                                                  29
The method with lowest RMSE was SVR with a complex-
ity parameter C = 100 using radial basis functions (RBF)             Table 3: Proportions of operating systems ocur-
as kernel. However, interpretability of this model is quite          rences in Laptops 17 836 data set
limited, given the embedded feature space induced by the               Operating System                   #       %
kernel. On the other hand, LMS and 3FML provide straight-              Vista Home Premium     (WinVHP)   251 30.02%
forward interpretation of coefficients, which represent the            WinXP Pro     (WinXPP)            208 24.88%
amount of the contribution of each feature to the product              WinXP     (WinXP)                 151 18.06%
estimated price. Clearly, 3FML was the method that better              Win. Vista Business   (WinVB)     137 16.39%
coped with this interpretability-accuracy trade-o↵.                    Win. Vista Home Basic    (WinVHB)  44  5.26%
                                                                       Mac OS     (MacOS)                 34  4.07%
                                                                       Win. Vista Ultimate   (WinVU)      11  1.32%
Table 2: 10 runs average RMSE results for price                        Total                             836   100%
estimates obtained with several regression methods
 Regression model                      Avg. RMSE
 M5P regression tree                   239.70(21.57)                 Table 4: Retail prices for di↵erent editions of Win-
 Least Mean Squares (LMS)              259.87(17.90)                 dows Vista
 "-SVR, C = 100, linear kernel         258.93(16.93)                      O.S.!      WinVHB WinVHP WinVB WinVU
 3FML (LMS as base model)              233.48(14.76)                  Retail price*  $199.95     $259.95  $299.95 $319.95
 3FML ("-SVR, C = 100, linear kernel) 223.76( 8.57)                  *http://www.microsoft.com/windows/windows-
 "-SVR, C = 100, RBF kernel = 7.07     230.23(12.27)                 vista/compare-editions (site consulted in September
                                                                     2007)

3.3     Evaluation and feedback
                                                                     string that contains — in most of cases – the manufacturer,
   In this section the price estimation function obtained us-
                                                                     the product family and the model (e.g. “Intel Core 2 Duo
ing 3FML is manually analyzed checking coherence of co-
                                                                     Mobile T7200”). Unlike OS attribute, which has only seven
efficients with real facts of the market. Particularly, coef-
                                                                     possible alternatives, Processor attribute has 133 possible
ficients for attributes operating system, processor and nu-
                                                                     processor models. Moreover, the frequencies of occurrence
merical features are reviewed, and – when necessary – some
                                                                     of each processor model exhibit a Zipf-type distribution (see
refinements are proposed to the data sets to deal with dis-
                                                                     Figure 4). Thus, approximately half of the 836 laptops have
cussed issues.
                                                                     only 8 di↵erent processors and more than 80 processors oc-
3.3.1      Operating System attribute analysis                       cur only in one laptop. Part of this sparseness is due to
                                                                     missing information, abbreviations and formatting.
   Table 3 shows the distribution of the di↵erent operating
                                                                        The Processor attribute, as found in the data set, can
systems into the entire data set of laptops and the abbrevi-
                                                                     generate a detrimental e↵ect on the price-estimation func-
ations that we use to refer them at Table 5 and Table 4.
                                                                     tion. Besides, coefficients could hardly be explained and
   In order to evaluate the portion of the price estimation
                                                                     their evaluation against market facts could lead to mislead-
model related to operating system (OS) attribute, coeffi-
                                                                     ing results. Thus, the model was withdrawn from Processor
cients of this feature are compared with related Microsoft’s
                                                                     attribute and it was renamed as Proc. Family. In addition,
retail prices. Table 4 shows public retail prices for Windows
                                                                     the data set was enriched manually adding the following four
Vista      published at 2007-3Q. In spite that at that date,
                                                                     processor related attributes:
Windows Vista        operating system had already six months
of launched, many brand new laptops still had pre-installed              • L2-Cache: processor cache in Kibibytes (210 bytes).
previous Windows XP . Thus, we consider for analysis Win-
dows XP Pro        equivalent to Windows Vista Business , as             • Hyper Transport: frontal bus clock rate in Mhz.
well as, Windows XP        equivalent to Windows Vista Home
                                                                         • Thermal Design: maximum dissipated power in watt.
Premium . This assumption is also coherent with the ob-
served behavior in Microsoft’s price policy that keeps prices            • Process Technology: CMOS technology in nanometre.
of previous product releases invariable during version tran-
sition periods.                                                      This new data set is referred as Laptops 21 836 data set.
   It is interesting to highlight the behavior of 3FML model         Performance results of new price-estimation functions are
with Windows Vista Ultimate . Although this OS version               shown in Table 6. Clearly, SVR and 3FML obtained sub-
occurs only at 1.32% of instances (see Table 3), it is cor-          stantial improvements using this new data set.
rectly recognized as the most expensive OS (see Table 4) by            Similarly to the analysis made for OS attribute, processors
the upper model. This fact corrects an erroneous tendency            families also have a consumer-value ranking given by their
recognized by base and lower models. In general terms, for           technology, which can be compared to a ranking taken from
other OS versions, 3FML managed to predict similar order-            an interpretable price-estimation function. The technology
ing as that of retail prices.                                        ranking of Intel processors is: (1st) Core 2 Duo , Core
                                                                     Duo , Core Solo , Pentium Dual Core           and Celeron .
3.3.2    Processor attribute coefficients                            Same for AMD’s processors: (1st) Turion , Athlon         and
                                                                                  2
  As shown in Table 1, Laptops 17 836 data set has two fea-          Sempron        . We extracted a ordering for processor fami-
tures to describe the main processors of laptops , they are:         2
                                                                      see             http://www.notebookcheck.net/Notebook-
Processor Speed (numeric) and Processor (nominal). The               Processors.129.0.html for a short description of mobile
former is the processor clock rate and the latter is a text          processor families (site consulted in June 2011)


                                                                30
5
                                                                                     Table 7: Processor families rankings obtained from
Table 5: 3FML base, upper and lower model coeffi-                                    3FML price-estimation function
cients s.o for operating system attribute                                                    Upper model                  Lower model
    Base model       Upper model       Lower model                                    Intel Core2 Duo 7.4(0.8) Intel Core2 Duo 7.6(0.5)
    S.O.      s.o     S.O.       s.o   S.O.       s.o                                 Intel Core Duo      7.2(1.2) Intel Core Solo  6.2(1.7)
  WinVU     323.3   WinVB     185.6   WinVB    127.3                                  Intel Core Solo     5.3(2.1) AMD Athlon       5.7(3.8)
  WinVB     260.3 WinXPP 184.5 WinXPP           127                                   Intel Celeron       5.1(1.7) Intel Core Duo   4.8(2.1)
 WinXPP 249.8       MacOS     169.2   MacOS     95.2                                  PowerPC             4.6(2.6) AMD Turion       4.8(1.8)
  MacOS     245.9   WinVU      96.4 WinVHP 24.7                                       Pent DualCore       3.7(1.4) Pent DualCore    4.3(2.1)
 WinVHP 116.7 WinVHP            57   WinVU       0.0                                  AMD Sempron         3.4(2.4) PowerPC          4.3(3.1)
  WinXP      94.3   WinXP      27.8 WinVHB       0.0                                  AMD Turion          3.3(1.8) Intel Celeron    3.3(1.3)
 WinVHB      0.0   WinVHB       0.0   WinXP     -7.1                                  AMD Athlon          1.8(1.4) AMD Sempron      2.8(2.3)
                                                                                                             Base model
                                                                                                      Intel Core Solo 8.5(0.7)
        Figure 4: Distribution of Processor attribute
                                                                                                     Intel Core2 Duo 8.3(0.7)
                     2.00                                                                             Intel Core Duo 6.8(0.9)
                     1.80                                                                             Pent DualCore 5.1(1.4)
                     1.60                                                                              Intel Celeron   5.0(0.7)
    log(frequency)


                     1.40                                                                              AMD Turion      3.7(1.1)
                     1.20                                                                                PowerPC       2.9(2.1)
                     1.00                                                                             AMD Sempron 2.6(1.8)
                     0.80
                                                                                                       AMD Athlon      2.1(1.3)
                     0.60
                     0.40
                     0.20                                                            Table 8: coefficients for numerical attributes from
                     0.00                                                            3FML model with Laptops 21 836 data set
                            1           34          67         100                            Feature name        Upper Base Lower
                                processor models ranked by data set frequency                        0             0.23    0.12   0.06
                                                                                             Warranty Days         0.04    0.01  -0.01
                                                                                            Installed Memory      -0.11    0.17   0.10
                                                                                       Max. Horizontal Resolution  0.12    0.37   0.15
lies by their corresponding coefficients from 3FML models.                                   Processor Tech.       0.30    0.08   0.05
Results for this ranking – means and standard deviation –                                    Thermal Desing       -0.01   -0.37 -0.27
making 10 runs with di↵erent samples of 75% training and                                    Hyper Transport       -0.02    0.25   0.05
25% test are shown in Table 7.                                                                  L2-Cache           0.08    0.16   0.11
   Results in Table 7 show how upper model better ordered                                    Processor Speed      -0.03    0.25   0.17
processor families with high technological ranking. Simi-
larly, lower model does a similar work recognizing Sempron
family at the lowest rank.                                                           Besides, these coefficients also shows that this e↵ect a↵ects
                                                                                     prices more at mid-range and low-end laptop-market seg-
3.3.3                   Numerical attributes coefficients                            ments. Similarly, Max. Horizontal Resolution attribute re-
  This subsection present a brief discussion on the interpre-                        veals that this feature has greater impact on the mid-range
tation of coefficients extracted from the price-estimation                           laptop market prices.
function for some numeric attributes (shown in Table 8).                               Interestingly, there is a phenomenon revealed by the fea-
Although this interpretation is clearly subjective, it reveals                       tures that are easy perceived by users, such as Installed
some laptop-market facts, which were extracted in an unsu-                           Memory, Max. Horizontal Res. (number of horizontal pixels
pervised way from the data.                                                          on screen), L2-Cache and Processor Speed. That is: those
  For instance, consider Thermal Design attribute. Neg-                              features have considerably less e↵ect on prices in high-end
ative values in the   coefficients reveal a fact: the lesser                         than in mid-range and low-end market segments. This phe-
power the CPU dissipates, the higher the laptop’s price.                             nomenon can be explained by the fact that “luxury” goods
                                                                                     justify their price more by attributes such as brand-label,
                                                                                     exclusive features and physical appearance rather than for
Table 6: RMSE for regression price estimates in                                      their configuration.
Laptops 21 836 data set                                                              3.4     Recommendations for users
     Regression model                       RMSE
     "-SVR (C = 1, lineal kernel )   254.56(11.75)                                   3.4.1    Top-10 recommendations
     "-SVR (C = 100, RBF kernel, ) 219.16( 9.88)                                       After the quantitative evaluation (i.e. regression error)
     3FML*                           220.91(10.97)                                   and qualitative assessment (i.e. agreement with market facts)
* Base model: "-SVR, C = 1, lineal kernel. ↵-Learner: "-                             using 3FML model, the resulting functions provided reason-
SVR, C = 100, RBF kernel.                                                            able estimates of price and support elements to explainable


                                                                                31
                           Figure 5: Visualization of 836 laptops recommendation ranking
                        Profit


                     $400


                     $200


                        $0


                    -$200


                    -$400


                                                                                                                     Price
                            $450 $550   $650   $750   $850   $950 $1050 $1150 $1250 $1350 $1450 $1550 $1650 $1750 $1850


recommendations such as rankings and weights of attributes.                   3.4.2    2D Visualization
After obtaining the estimates of prices, the profit for each                    Ordered lists are the most common format to present rec-
laptop is calculated from the di↵erence between this esti-                   ommendations to users. However, despite having such an
mate and real price. Table 9 shows the top-10 recommen-                      ordination, establish the most appropriated choice for a par-
dations with the highest profit among all 836 laptops.                       ticular user is a difficult task. Therefore, we propose a novel
   The first and second top-ranked laptops have similar con-                 visualization method for our recommender system. The pur-
figurations, but even small di↵erences make comparison dif-                  pose of this is to enable users to build a mental model of the
ficult at first sight. The second laptop has better price, more              market. When users do not have a clear aim or a defined
memory, docking station and ports replicator slots. Unlike,                  budget, this tool provides a rough idea of the number of
the former has higher screen resolution and a fingerprint sen-               options and prices. In addition, visualization can help the
sor. These di↵erences can be compared quantitatively with                    short-term memory decreasing cognitive load and highlight
the help of coefficients provided by the model. However, a                   the recommended options.
better explanation of the #1 recommended choice is a mar-                       The proposed 2D visualization is shown in Figure 5. The
ket fact extracted from the obtained manufacturer ranking                    horizontal axe represents actual price and the vertical axe
showed in Table 10. The three regression models identify                     represents the profit, which is the di↵erence between the es-
the Lenovo         brand better ranked than HP . Therefore,                  timated and actual price. Each laptop is represented as a
the first recommended laptop becomes a “best deal” given                     bubble, where larger radius and warmer colors (darker gray
the standard prices of Lenovo at the time. Similarly, rec-                   in the grayscale version) means higher profit-price di↵er-
ommendations #7, #9 and #10 seem to get their high user                      ences. Besides, the number of ranking was included in the
profit not because of their configuration features, but be-                  top-99 recommendations.
cause of their label Sony , which do better positions on the                    This visualization highlights other “best deals” that are
ranking of manufacturers than its counterparts.                              hidden in the ranking list. For instance, consider recom-
   Second and third recommendations only di↵er in Proces-                    mendation #53 (see Figure 5 in the coordinates $1550 price
sor Speed attribute. Clearly, the estimated cost of that dif-                and $260 profit). Perhaps this is an interesting option to
ferentia is the numerical di↵erence between their estimated                  consider if user’s budget is over $1500. Similarly, recom-
prices, which is $42. Nevertheless, their real price di↵erence               mendation #26 can be quickly identified as the best option
is $50. This explains the order of position in the ranking                   for buyer on a low budget.
assigned by the recommender system to the #2 and #3 rec-                        The proposed visualization also allows a qualitative as-
ommendations. More pair-wise comparisons and evaluations                     sessment of the price-estimation function. For instance, con-
could be made but are omitted due to space limitations.                      sider the laptops above $1300, this function has difficulties
   These paired comparisons become cognitively more diffi-                   to predict prices using the current set of features, which in
cult when the number of features, di↵erences and instances                   turn appears to be very e↵ective for mid-range prices. This
increases. However, the proposed recommender method pro-                     problem could be solved indentifying and adding to the set
vides reasonable explanations no matter how much data is                     of attributes those distinctive features of high-end laptops,
involved, and these can be provided by user request. This                    namely: shockproof devices, special designs, colors, housing
is important because cold-start recommender systems need                     materials, exclusive options, etc.
to establish trust in users due of the lack of collaborative
support.


                                                                       32
                                     Table 9: Detailed top-10 ranked recomendations
               Recommendation rank !            #1            #2             #3             #4             #5
                   Horizontal Resol.        900 pixels    800 pixels     800 pixels    1536 pixels     768 pixels
                    Memory Tech.              DDR2          DDR2           DDR2           DDR2           DDR2
                    Inst. Memory             512 MB        1024 MB       1024 MB        1024 MB        1024 MB
                        Family              Core Duo      Core Duo       Core Duo      Core2 Duo      Core2 Duo
                   Processor Speed          1830 GHz      1830 GHz       2000 GHz       1500 GHz       2000 GHz
                       L2 Cache                  ?*            ?              ?          2048 kB        4096 kB
                    Hyper Transp                  ?            ?              ?          667 Mhz        667 Mhz
                   Thermal Design                 ?            ?              ?              35             34
                     Process Tech.                ?            ?              ?            65nm           65nm
                     Manufacturer             Lenovo          HP             HP           Lenovo           HP
                      Op. System             WinXPP        WinXPP        WinXPP          WinVB         WinXPP
                    Warranty Days              1095          1095           1095          365 W         1095 W
                     IBDPFWC**             YNYYYNN YYYYYNN YYYYYNN NNYYNNN YYYYYNN
                      Actual Price             $ 899         $ 795         $ 845           $ 875          $ 849
                   Estimated Price            $ 1,438       $ 1,319       $ 1,361         $ 1,383        $ 1,332
                         Profit                $ 539         $ 524         $ 516           $ 508          $ 483
              Recommendation rank !            #6            #7             #8              #9              #10
                  Horizontal Resol.       1050 pixels    800 pixels     800 pixels      800 pixels          800
                   Memory Tech.              DDR2          DDR2           DDR2            DDR2            DDR2
                   Inst. Memory            1024 MB        1024 MB       1024 MB          512 MB          1024 MB
                       Family              Core Duo      Core2 Duo      Core Duo        Core Duo       Core2 Duo
                  Processor Speed          2000 GHz      2160 GHz       1830 GHz        1660 GHz        2000 GHz
                      L2 Cache              2048 kB       4096 kB           ?*           2048 kB         4096 kB
                   Hyper Transp             667 Mhz       667 Mhz            ?           667 Mhz         800 Mhz
                  Thermal Design              31 W          34 W             ?             31 W            35 W
                    Process Tech.             65nm          65nm             ?             65nm            65nm
                    Manufacturer             Lenovo         Sony            HP             Sony            Sony
                     Op. System           WinXP Pro WinXP Pro WinXP Pro                WinXP Pro       V Business
                   Warranty Days               365           365           1095             365             365
                   I BDPFWC**             NYNNYNN NYYYNNN YYYYYNN NNYYNNNN NYYYYNN
                    Actual Price              $ 845        $ 1,060        $ 868            $ 649          $ 1,080
                  Estimated Price            $ 1,312       $ 1,526       $ 1,319          $ 1,093         $ 1,522
                        Profit                $ 467         $ 466         $ 451            $ 444           $ 442
* Question mark stands for missing values.
** Initials I B D P F W C stand for Infrared, Bluetooth, Docking Station, Port Replicator, Fingerprint, Subwoofer and CDMA.


        Table 10: Average ranking of Manufacturer attribute using 3FML at Laptops 21 836 data set
                             Base model         Upper model         Lower model
                           Asus    10.6(0.7)   Dell   10.0(0.9)   Asus     10.2(0.8)
                           Sony     9.6(1.2)  Fujitsu  9.2(1.0)  Fujitsu    9.6(1.5)
                          Fujitsu   8.4(1.4)   Sony    7.4(1.8)   Sony      8.1(1.2)
                           Dell     7.9(0.9)   Asus    6.8(1.8)   Dell      7.3(1.4)
                          Apple     7.0(2.4)  Apple    6.1(1.3)  Apple      6.7(2.0)
                          Lenovo    6.5(1.6)  Lenovo   4.1(2.4)  Lenovo     5.2(1.5)
                         Toshiba    5.0(1.2) Gateway 4.0(2.9)   Toshiba     4.7(1.5)
                           Acer     4.7(1.2) Averatec 3.8(1.3)    Acer      3.9(1.3)
                            HP      2.3(0.7)   Acer    3.6(1.3)    HP       3.4(1.7)
                         Averatec 2.1(1.3)      HP     3.3(1.9) Gateway 1.9(0.8)
                         Gateway 1.9(1.0)    Toshiba   2.4(1.5) Averatec 1.9(2.1)


                                                          33
4.   CONCLUSIONS                                                       [6] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry.
   We presented a novel product recommender system based                   Using collaborative filtering to weave an information
on an interpretable price-estimation function, which esti-                 tapestry. Commun. ACM, 35(12):61–70, 1992.
mates the economic benefit for the customer to buy a prod-             [7] M. A. Hall. Correlation-based feature selection for
uct in a particular market. Accurate and interpretable price               discrete and numeric class machine learning. In ICML
estimations were obtained using the 3FML (three-function                   ’00: Proceedings of the 17th International Conference
meta-learner) method. This regression method allows the                    on Machine Learning, pages 359–366, San Francisco,
combination of an interpretable regressor (e.g. LMS) to es-                CA, USA, 2000.
timate prices and an uninterpretable regressor (e.g. SVR)              [8] E. Han and G. Karypis. Feature-based
to identify the latent class of each product. The combined                 recommendation system. In Proceedings of the 14th
model obtained better price estimates than LMS, SVR and                    ACM International Conference on Information and
M5P regression tree, while it kept a high level of interpreta-             Knowledge Management (CIKM), pages 446–452,
tion.                                                                      2005.
   The proposed method was tested with real-market data                [9] Borchers A. Riedl J. Herlocker J. L., Konstan J. A. An
from a data set of laptops. The obtained price-estimation                  algorithmic framework for performing collaborative
model was interpretable, allowing evaluation and refinement                filtering. In Proc. 22nd ACM SIGIR Conference on
by domain experts and ensuring that price estimates are                    Information Retrieval, pages 230–237, 1999.
a coherent consequence of the product features. In addi-              [10] T. Hofmann. Latent semantic models for collaborative
tion, the obtained recommendations are easy to understand                  filtering. ACM Trans. on Information Systems, pages
by users. For instance, feature rankings (e.g. ranking of                  89–115, 2004.
CPU) and feature price contributions (e.g. cost per GB of             [11] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E.
main memory) are provided. Importantly, while the price                    Hinton. Adaptive mixtures of local experts. Neural
estimates are obtained in a supervised way, other domain                   Comput., 3(1):79–87, 1991.
knowledge is extracted in a non-supervised way. Although              [12] G. Linden, B. Smith, and J. York. Amazon.com
the proposed method was tested in a particular domain (i.e.                recommendations: Item-to-item collaborative filtering.
laptops), this same process can be applied to other domains                IEEE Internet Computing, 7(1):76–80, 2003.
that exhibit similar number of options and features.
                                                                      [13] M. Mandl, A. Felfernig, E. Teppan, and M. Schubert.
   Moreover, a user-friendly visualization method for recom-
                                                                           Consumer decision making in knowledge-based
mendations was proposed using a 2D Cartesian metaphor                      recommendation. Journal of Intelligent Information
and concrete variables such as cost and profit. This visual-
                                                                           Systems, 2010.
ization allows users to make a quick mental map of a large
                                                                      [14] G. A. Miller. The magical number seven: Plus or
market to explore and identify recommendations in di↵erent
                                                                           minus two: Some limits on our capability for
price ranges.
                                                                           processing information. Psychological Review,
   In conclusion, the proposed method is flexible and can be
                                                                           63(2):81–97, 1956.
useful in e-commerce scenarios with products that allow the
construction of price-estimation functions, such as customer-         [15] D. C. Montgomery, E. A. Peck, and G. G. Vining.
electronics products and others. Finally, our method fills a               Introduction to linear regression analysis. Wiley
gap where recommender systems based on historical infor-                   Interscience, 2006.
mation fail because of the lack of such information.                  [16] R. J. Quinlan. Learning with continuous classes. In
                                                                           5th Australian Joint Conference on Artificial
                                                                           Intelligence, pages 343–348, 1992.
5.   ACKNOWLEDGEMENTS                                                 [17] T. L. Saaty and M. S. Ozdemir. Why the magic
  This research is funded in part by the Bogota Research                   number seven plus or minus two. Mathematical and
Division (DIB) at the National University of Colombia, and                 Computer Modelling, 38(3):233–244, 2003.
throught a grant from Colciencias, project 110152128465.              [18] A. I. Schein, A. Popescul, L. H. Ungar, and D. M.
                                                                           Pennock. Methods and metrics for cold-start
6.   REFERENCES                                                            recommendations. In SIGIR, 2002.
                                                                      [19] B. Scholkopf and A. J. Smola. Learning with Kernels:
 [1] E. Alpaydin. Introduction to Machine Learning. The                    Support Vector Machines, Regularization,
     MIT Press, October 2004.                                              Optimization, and Beyond. MIT Press, 2001.
 [2] M. Balabanovic and Y. Shoham. Fab: Content-based                 [20] B. Schwartz. The tyranny of choice. Scientific
     collaborative recommendation. Communications of the                   American, pages 71–75, April 2004.
     Association for Computing Machinery, 40(3):66–72,                [21] Y. Wang and I. H. Witten. Induction of model trees
     1997.                                                                 for predicting continuous classes. In Poster papers of
 [3] C. Becerra and F. Gonzalez. 3-functions meta-learner                  the 9th European Conference on Machine Learning.
     algorithm: a mixture of experts technique to improve                  Springer, 1997.
     regression models. In DMIN08: Proceedings of the 4th             [22] K. Yoon and C. Hwang. Multiple attribute decision
     international conference on data mining, Las Vegas,                   making: An introduction. Sage University papers,
     NV, USA., 2008.                                                       7(104), 1995.
 [4] J. Bennet and S. Lanning. The netflix prize. In KDD              [23] L. Zadeh. Fuzzy Sets, Fuzzy Logic and Fuzzy Systems.
     Cup and Workshop, 2007.                                               World Scientific, 1996.
 [5] R. T. Eckenrode. Weighting multiple criteria.
     Management Sciences, 12:180–192, 1965.


                                                                 34