=Paper= {{Paper |id=Vol-1453/05_SinghRangaraj_CustomerBuyingBehaviourAnalysis_Confws-15_p23 |storemode=property |title=Customer buying behaviour analysis in mass customization |pdfUrl=https://ceur-ws.org/Vol-1453/05_SinghRangaraj_CustomerBuyingBehaviourAnalysis_Confws-15_p23.pdf |volume=Vol-1453 |dblpUrl=https://dblp.org/rec/conf/confws/SinghR15 }} ==Customer buying behaviour analysis in mass customization== https://ceur-ws.org/Vol-1453/05_SinghRangaraj_CustomerBuyingBehaviourAnalysis_Confws-15_p23.pdf
                 Customer buying behaviour analysis in mass
                              customization
                                               Tilak Raj Singh 1 and Narayan Rangaraj 2


Abstract. Motivated by the importance of customer buying be-                       These consistent attribute association rules could then be used to pre-
haviour (such as correlation among product attributes/features of                  dict future configurations for production planning [15].
products configured in the past) in planning future configurations,                   Throughout this paper, we will use the terms attribute association
this paper addresses the issue that product evolution (upgrades) usu-              rules or attribute associations to specify quantities reflecting the joint
ally render information gathered from past buying behaviour at least               selection of attributes or conditional section of attributes. For exam-
partially unusable. For instance, relations among features might have              ple, figure 1 shows that 67% of configurations contain both attribute
been changed, thus making it difficult to configure the same prod-                 1 and attribute 2.
ucts again. The proposed approach aims to (1) find associations be-                   In practice, the use of specific components in the final product
tween product attributes based on the analysis of prior customer or-               assembly depends on 1) the way they are designed and 2) the way
ders (2) apply configuration rules to prune attribute association rules            customers select them. Design or engineering related dependencies
which are not controlled by customers, and (3) check whether de-                   (or restrictions) are well documented in product’s Bill-of-Material
rived attribute association rules from past orders also work for the               (BOM) or configuration rules. As the product development process
new upgraded product. Attribute associations consistent with the up-               starts well before the actual production, it is possible to know the
graded product are then used to predict configurations for produc-                 product description for a future time (2-3 years in advance) from
tion planning. We use machine learning algorithms and optimization                 BOM [16]. However, attribute associations from the customer point
techniques to address these issues.                                                of view are not known directly and can only be seen once cus-
                                                                                   tomers have placed orders. Most manufacturers utilize information
                                                                                   from product variants produced in the past to get estimates of cus-
1    Introduction                                                                  tomer demands. Customer behaviour can be expressed through as-
                                                                                   sociations between different attribute choices (e.g. joint selection)
Mass customized products (e.g. Automobiles) involve a large number
                                                                                   customers have made in the past.
of product variants which are generated by combining different pre-
                                                                                      In order to predict configurations for production planning the in-
defined features/attributes. Individual product attributes and attribute
                                                                                   formation from 1) configuration restrictions and 2) customer buying
combinations control the final consumption of vehicle components
                                                                                   behaviour should be used together. Any conflicting information be-
and sub-assemblies during the production [13]. For example, the se-
                                                                                   tween these two sources points to an inconsistency in the planning
lection of features such as a specific gearbox or a sports package
                                                                                   information. Delay in the detection of such discrepancies may result
can decide which steering wheel will be used to build the vehicle.
                                                                                   in a wrong mix of parts being produced, thus hampering production
Thus, knowledge of customer buying behaviour (correlation between
                                                                                   efficiency.
product attributes) is crucial for demand estimation of parts and sub-
assemblies for future production.                                                  Input                                                                    Output
                                                                                                      Section 2          Section 2       Section 3
   One way to get customer buying behaviour is by extrapolating the
configurations produced in the past. Due to the high degree of in-                    Product           Attribute         Pruning          Validate           Predict
dividualization and continuous changes in the product design, prod-                 configured         association       association      attribute         future con-
                                                                                    in the past           rules             rules        associations       figurations
uct evolution (upgrades) usually renders information gathered from
past buying behaviour at least partially unusable. For instance, rela-                                                   (+) Config
                                                                                    Attributes         (1)        100%                    (1)        100%    Attributes
tions among features might have been changed, thereby not allow-                    1    2   3         (1,2)      67%    rule (2 → ¬3)    (1,2)      67%     1    2   3
                                                                                    1    0   1         (1,3)      67%      (1)    100%    (1,3)      33%     1    0   1
ing configuration of the same products again. In the special case of                1    1   0         (2,3)      33%      (1,2)   67%                       1    1   0
a new product, information from existing models (having common                      1    1   1                             (1,3)   67%                       1    1   0

features) is often used to prepare the initial production plan (set of
vehicle configurations). As configuration rules of different products
are not same, it is likely that attribute associations from the existing
                                                                                           Figure 1. Predicting future configurations as per customer buying
model may not be directly applicable to the new product. Thus, we                                                     behaviour
need a mechanism to validate whether the derived attribute associa-
tion rules from past orders also work for the new upgraded product.
1
                                                                                      Figure 1 shows a sequential block diagram for predicting future
   IT-Data Analytics, Mercedes-Benz R & D India, Bangalore, Email:
  tilak.singh@daimler.com                                                          product configurations as per customer buying behaviour and config-
2 Industrial Engineering and Operations Research, Indian Institute of Tech-        uration rules. First, customer prior demand is analysed to calculate
  nology, Bombay, Powai Mumbai, India, email: narayan.rangaraj@iitb.ac.in          attribute association rules (joint and conditional correlation between




                                                                              23                    Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors
                                                                                                 Proceedings of the 17th International Configuration Workshop
                                                                                                                      September 10-11, 2015, Vienna, Austria
 attributes). Then, association rules which conflict with configuration             of Boolean variables (X) which satisfies given configuration rules
 restrictions are pruned. In Section 2 we discuss a machine learning                (F ). Configurations from Table 1 can be treated as customer config-
 algorithm to calculate such association rules. In Section 3 we present             urations and in the next section we will derive associations between
 optimization models which aim to build a set of future configurations              different attributes from these. Table 1 contains a list of some feasi-
 by considering 1) configuration restriction and 2) attribute associa-
 tion rules simultaneously. If we are able to find such a configuration              Order# Automatic Cruise          Reverse   Sunroof KeyLess Parktronic
                                                                                            Gear-     Con-            Cam-      (SR)    Go      (PA)
 set which matches both of the input parameters, then the result can
                                                                                            Box       trol            era               (KG)
 be used for future production planning. In case of conflicts between                       (AG)      (CC)            (RC)
 configuration restrictions and attribute association rules, further anal-           O001 1           1               1         1          1         0
 ysis is required and perhaps only a limited set of consistent associ-               O002 1           1               0         1          1         0
 ation rules can be used to predict future configurations. In this case,             O003 1           0               1         0          1         0
 our focus is to find such a consistent set of attribute association rules           O004 1           0               0         0          0         1
 and use them to predict future configurations. Section 4 focuses on                 O005 1           1               1         1          0         1
 the system implementation followed by initial computational results,                O006 1           0               0         0          0         1
 discussed in section 5.                                                             O007 1           0               1         1          1         1
                                                                                     O008 1           0               0         0          1         1
                                                                                     O009 0           0               1         0          1         1
 2     Mining attribute association rules                                            O010 0           0               1         1          0         0
 A customizable product can be configured using different combina-
 tions of attributes (features). In an automobile, attribute could be               Table 1.   Sample configurations (selected by customers) from Example 1
 body style, transmission type, sunroof or parking assistance. Cus-
 tomer buying behaviour can be studied by analysing how product
 attributes are associated with each other. Association rule mining has
 wide application in data mining for analysing and predicting cus-                  ble configurations which are created by combining given attributes
 tomer behaviour [13]. In this section, we discuss a framework for                  and satisfying configuration rule F .
 mining association rules among product attributes from a given set
 of product configurations. As association rules are extracted from                 2.2    Attribute association rules and configuration
 known product configurations, we first take a look at the characteris-                    restrictions
 tics of the configuration problem.
                                                                                    Association rule mining methodology is used to find the association
                                                                                    between variables in large transactions [1]. In our case, each config-
 2.1    Product configurations                                                      uration is expressed over a subset of attributes, where attributes are
 Let us define our product configuration problem as per [10, Defi-                  feature/variables used to configure the product. An association be-
 nition 1]: the configuration problem C can be expressed through a                  tween two disjoint sets of attributes p and q can be expressed using
 triple (X, D, F ), where:                                                          two numbers:

 • X is a set of product attributes (configuration variables) lets say                Support(p ⇒ q): This is the proportion of configurations
   {x1 , ...xn }. Where n is the total number of attributes.                          that contain both attribute sets p and q. In Example1, Support
 • D is the set of attributes finite domains d1 , d2 , ..., dn .                      (Sunroof, ReverseCamera)= 4/10 = 0.4.
 • F = {f1 , f2 , ..., fm } is a set of propositional formulas (rules or              Confidence (p ⇒ q): Given set of configuration which con-
   restrictions) over attribute set X.                                                tains attributes set p, this is the proportion of configurations
                                                                                      where attribute set q is also selected. In Example1, Confidence
    In this paper, we assume that the configuration variables xi ∈ X                  (Sunroof ⇒ ReverseCamera) = 4/5 = 0.8. If support and con-
 are boolean, hence domain di ∈ {0, 1}, ∀i ∈ X. A configuration is                    fidence are greater than user specified thresholds, then we call that
 said to be feasible if an assignment for all attributes (i ∈ X) is found             association rule “interesting”.
 which fulfils each and every proposition in F . X = x1 , ...xn is a
 set and each di is a set. In this case, each di is a binary set. In other             After applying rule mining techniques, we get a large number of
 words, {d1 , ..., dn } is a collection of sets, whereas {x1 , ..., xn } are        relations which satisfy our parameter of interestingness, although
 the elements of set X.                                                             not all of them are customer driven. Because sales transactions do
                                                                                    not explicitly state only customer selectable attributes, it is then
                                                                                    our task to identify and remove attribute relations which are driven
 Example 1                                                                          by the technical nature of the product. For example, in Table 2
                                                                                    (rule # 2) the association between two attributes CruiseCtrl ⇒
 Let us assume a car is configured using six attributes X =
                                                                                    AutomaticGearBox is given by Support = 0.3 and confidence = 1.
 {1, 2, ..., 6} ≡ { Automatic Gearbox, Cruise control, Reverse cam-
                                                                                    The high confidence between Cruise control and Automatic Gearbox
 era, Sunroof, Keyless Go, Parktronic }, D ∈ {0, 1}∀X, and F =
                                                                                    is not really driven by customer buying behaviour, but this is the only
 {f1 } where
                                                                                    way a feasible configuration using attribute Cruise control can be cre-
     f1 = {2 → 1}: Cruise control requires Automatic Gearbox.                       ated. This information is stated in the configuration rule (F = {f1 })
                                                                                    of Example 1.
 For a given set of boolean variables (attributes) and propositional                   In practical scenarios with hundreds of product attributes and thou-
 formulas, finding a feasible configuration is a Boolean Satisfiability             sands of configuration rules, identifying which attribute association
 problem where the aim is to get an assignment (true or false value)                rule is controlled by their technical dependencies is non-trivial. Also,




Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors                       24
Proceedings of the 17th International Configuration Workshop
September 10-11, 2015, Vienna, Austria
            sr.    lhs             rhs      support     confidence                     If products are defined over sets of boolean attributes, the asso-
             1    {CC}       ⇒    {SR}        0.3            1                      ciation rules can be expressed as boolean proposition formulas. For
             2    {CC}       ⇒    {AG}        0.3            1                      example, support(p, q) can be modelled in a proposition formula to
             3    {PA}       ⇒    {AG}        0.5          0.83
                                                                                    capture the joint selection of the attribute sets p and q. The value of
             4    {KG}       ⇒    {AG}        0.5          0.83
             5    {SR}       ⇒    {RC}        0.4           0.8                     support (p, q) indicates the fraction at which corresponding boolean
             6    {SR}       ⇒    {AG}        0.4           0.8                     propositional formula (p ∧ q) evaluates as true in the configurations
             7    {CC}       ⇒    {RC}        0.2          0.66                     set. Thus, finding a consistent future demand estimate with respect to
             8    {CC}       ⇒    {KG}        0.2          0.66                     association rules is equivalent to satisfying a set of propositional for-
             9    {RC}       ⇒    {KG}        0.4          0.66                     mulas with some probability. For example, if support(p, q) = 0.2
            10    {RC}       ⇒    {AG}        0.4          0.66
                                                                                    then we want a clause (p ∧ q) is true 20% of the time in final config-
            11    {SR}       ⇒    {KG}        0.3           0.6
            12    {PA}       ⇒    {RC}        0.3           0.5                     urations. The resultant set of propositional formulas can be divided
            13    {PA}       ⇒    {KG}        0.3           0.5                     into two sets: 1) propositional formulas derived from the configura-
            14    {SR}       ⇒    {PA}        0.2           0.4                     tion problem (i.e. BOM) which has to evaluate to ”true” for every
            15    {CC}       ⇒    {PA}        0.1          0.33                     demand instance 2) propositional formulas generated from associa-
                                                                                    tion rules which are assigned a probability of being satisfied. If we
      Table 2. Association rules between two attributes from Example 1              are able to find a probability distribution on the truth assignments of
                                                                                    the boolean variables corresponds to association rules that induces
                                                                                    the given probabilities, then we will have an instance of future de-
                                                                                    mand where all calculated association rules are satisfied.
it is both error prone and time consuming to analyse and classify                      For a given set of boolean variables and set of boolean clauses,
a large set of attribute associations manually. As the configuration                determining whether it is possible to find a probability measure over
problem is used to find feasible assignments of product attributes un-              truth assignments of the boolean variables that induce the given as-
der configuration rules, we use this to state some direct dependencies              sessments is known as Probabilistic Satisfiability (PSAT) problem
among attributes. Any two attributes (let’s say p and q) can be com-                [9]. After modelling association rules as boolean propositional for-
bined in a configuration based on the following relations:                          mulas and generating a solution instance where all association rules
                                                                                    and configuration restrictions are satisfied simultaneously, we can
    Sr#      Relation    p    q                  Description
     1       ¬p ∧ ¬q     0    0     Configuration without attribute p and q         say that derived association rules are consistent with the new con-
     2        ¬p ∧ q     0    1     Configuration with attribute q but not p        figuration restrictions for the product. Our aim is to construct a con-
     3        p ∧ ¬q     1    0     Configuration with attribute p but not q        figuration set such that it reflects the same support and confidence
     4         p∧q       1    1    Configuration with both attribute p and q        values from association rules as derived from past data. Support and
                                                                                    confidence can be modelled as constraints in configuration problem
                                                                                    and the associated quantity is then used to select the configuration
    Table 3. Possible relationships among two attributes in a configuration
                                                                                    set/ solution set to check the satisfiability [17].


                                                                                    3.1    The association rules verification model
   Depending upon how many relations are satisfied from Table 3
any association rule can be classified in one of 24 possible cases. For             In this section, we present the optimization model for evaluating the
example, if we consider Cruise control and Automatic gearbox from                   consistency among attributes association rules. Before formulating
example 1, as attribute p and q, then with the given configuration                  the mathematical model, let us discuss a small example to under-
rule, we will not be able to create a configuration which satisfies 3rd             stand the underlying problem:
relation p ∧ ¬q. If all the four relations are satisfied from Table 3
then we can say that the given attributes are independent of product’s              Example 2: Let us assume we have discovered a customer
technical influence and any association derived from customer orders                behaviour from Table 1 that three different attributes (Reverse
actually reflects their buying behaviour.                                           Camera (RC), Keyless Go (KG), Parktronic (PA)) are individually
   In the other case, let us assume that the product from Example 1                 selected 60% of the time in prior demand. Now, new configuration
has been upgraded and new configuration restriction has been added                  restrictions (e.g. Upgraded product) specify that at least two of
i.e. F = {f1 , f2 } where f2 = Parktronic comes with Reverse cam-                   the attributes (out of three) have to be present in every feasible
era. Due to this new restriction configuration, O001, O003 and O010                 configuration. Now, is it feasible to assume that the attributes will be
from Table 1 will not be feasible for future product. Then, associa-                selected at the same rate as before?
tions between different product attributes need to be validated against                From given configuration rule in Example 2, at least two attributes
the new configuration rule. In the next section, we discuss a set of                have to be present in a feasible configuration, i.e. the following
optimization models for validating attribute association rules with                 boolean clause has to be true: (RC ∨ KG), (KG ∨ PA) and (RC ∨
respect to configuration restrictions.                                              PA). If we are able to get a set of configurations where the above
                                                                                    rules are satisfied and each attribute is selected 60% of the time in the
                                                                                    configuration set, then we can say that derived attribute association
3      Validation of attribute association rules
                                                                                    rules and configuration restrictions are consistent with the upgraded
In the task of validating attribute association rules for an upgraded               product.
product; our aim is to find one instance of future demand where all                    Before solving the problem of Example 2 let us formulate a gen-
derived association rules are satisfied. The future demand estimate                 eral mathematical model to detect consistency among the association
can be given in terms of a configuration set where correlation be-                  rules. The problem of validating association rules with respect to a
tween attributes is controlled by predefined association rules.                     new configuration rule can be defined as follows:




                                                                               25                Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors
                                                                                              Proceedings of the 17th International Configuration Workshop
                                                                                                                   September 10-11, 2015, Vienna, Austria
    Let the index i refer to a logical association rule statement (Sup-              Decision variables Zi+ , Zi− are used to control the absolute devia-
 port or Confidence) defined over n Boolean variables x1 , x2 , ...xn             tion for each association rule. The matrix A lists the set of all feasible
 using Boolean propositional formulas. As an Example from Table                   configurations which we use to match association rule quantity πi . At
 2, x1 = {CC}, x2 = {SR}, the new variable π1 = Support(x1 ,x2 ) =                this point let us say that A contains all feasible orders. From model
 x1 ∧ x2 .                                                                        OP T1 if Zi+ , Zi− are zero ∀i then we can say the association rules
                                                                                  are consistent among each other and also with configuration rules as
   Let the index j refer to a configuration in the set J, the total               we are only considering feasible configurations in the matrix A.
 configuration set (typically of very large size).                                   In Example 2, let us assume that x1 = RC, x2 = KG, x3 = PA.
                                                                                  Then the configuration constraint can be written as follows:
   Data                                                                              x1 + x2 = ≥ 1, x1 + x3 = ≥ 1 and x2 + x3 = ≥ 1.
 πi is the probability of ith statement (attributes Support or Confi-                any combination of x1 , x2 , x3 which satisfies the above constraint
 dence) to              an Example from Table 2, π3 =0.4                          will be a possible configuration to use by model OP T1 i.e. as a col-
          be true. Asth                                                          umn of A-matrix. In this case, only four possible solutions are avail-
            1       if i statement is true in j th configuration
 Ai,j =                                                                           able so we can solve this example by explicitly enumerating all pos-
            0       otherwise
                                                                                  sible configurations.
 Decision variables
                                                                                                                              3
                                                                                                                              X
 Xj = Fraction representing the proportion of j in the total                                    OP TExample2 : Minimize             Zi+ + Zi−           (7)
                                                                                                                              i=1
 configuration set, a real number between 0 and 1;
 Zi+ = Positive deviation from target probability πi , a real number
                                                                                                            
                                                                                                           X
                                                                                                        1  1
                                                                                                                  +  −  
 between 0 and 1                                                                      1     1       0              Z1    Z1    0.6
                                                                                                             X2   +  −  
                                                                                                           X3  + Z2+ − Z2− = 0.6
                                                                                     1     0       1   1                                             (8)
 Zi− = Negative deviation from target probability πi , a real number
 between 0 and 1                                                                      0     1       1   1          Z3    Z3    0.6
                                                                                                             X4      +     −
                                                                                                A                  Z     Z      π
                                                                                                             X
    Objective Function
                                          X                                                              X1 + X2 + X3 + X4 = 1                          (9)
                      OP T1 : Minimize          Zi+ + Zi−             (1)
                                           i                                                               0 ≤ Xj , Zi+ , Zi− ≤ 1                      (10)

 Subject to                                                                          By solving the optimization model OP TExample2 , at optimality
                  X                                                               we get objective function value 0.2 (6= 0). There are multiple optimal
                          Aij Xj + Zi+ − Zi− = πi . . . ∀i            (2)         solutions and one is X = 0.4, 0.2, 0.4, 0 which means configuration
                      j                                                           1, 2 and 3 are used 40%, 20% and 40% of the time respectively
                                  X                                               and configuration 4 is not used in the final solution. As the objective
                                       Xj = 1                         (3)         function value of OP TExample2 model is not equal to zero, we can
                                   j                                              say that attribute association and configuration rules are not consis-
                                                                                  tent with each other. However, one drawback of the model OP T1 is
                             0 ≤ Xj , Zi+ , Zi− ≤ 1                   (4)
                                                                                  that it does not explicitly specify how many association rules are sat-
                                                                                  isfied. More specifically, we would like to be able to build a model
 3.1.1   Support                                                                  which satisfies the maximum number of association rules in case of
                                                                                  conflicting inputs as presented in Example2 . In the next subsection,
 Assigning support and confidence values to πi in model OP T1 is
                                                                                  we discuss one such model.
 to control the joint probability and conditional probability among at-
 tributes. Support of any two attributes association (p ∧ q) can be
 considered in the model OP T1 by constraint 2 as follows:                        3.2     Maximum association rules fulfilment model
                X                    +       −
                     Ap∧q,j Xj + Zp∧q   − Zp∧q   = πp∧q             (5)           Complex products such as automobile undergo enormous changes
                  j                                                               through their life cycle. Thus, it is quite likely that some of the asso-
                                                                                  ciation rules derived from past orders may not be consistent with new
 where πp∧q is equal to support (p ⇒ q). Ap∧q,j will take value one               configurations rules as discussed in our Example2 . A relevant ques-
 if both p and q are present in configuration j, otherwise zero.                  tion is whether we can maximize the coverage of association rules
                                                                                  which can be fulfilled by an upgraded product. The OP T1 model
 3.1.2   Confidence                                                               discussed in section 3.1 can only detect if all the association rules are
                                                                                  satisfiable or not. If there are conflicts, we need to ideally find the
 Confidence value from association rule mining can be controlled                  minimum number of association rules, so that if we remove those,
 through conditional probability of given attributes. Confidence (p ⇒             then all the remaining association rules become consistent.
 q) = support(p ⇒ q)/support(q) = πp|q .                                             With the same definitions of variables as in the OP T1 model, let
          X                                  +        −
                                                                                  us formulate the problem as follows:
              (Ap∧q,j − πp|q Aq,j )Xj + Zp|q     − Zp|q =0         (6)
              j
                                                                                     Decision variables:
 Above equation controls the confidence (p ⇒ q) by controlling the
                                                                                                 if ith association rule statement is not satisfied
                                                                                       
 joint selection of p and q and individual selection of attribute q in the                 1
                                                                                  yi =
 configuration.                                                                            0     otherwise




Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors                     26
Proceedings of the 17th International Configuration Workshop
September 10-11, 2015, Vienna, Austria
     Objective Function
                                             X
                       OP T2 = Minimize             yi             (11)
                                              i

 Subject to
                 X
                       Aij Xj + Zi+ − Zi− = πi . . . ∀i            (12)
                   j
                               X
                                    Xj = 1                         (13)
                                j


                          yi ≥ Zi+ + Zi− . . . ∀i                  (14)
                                                                                  Figure 2. Column generation procedure to solve OP T1 and OP T2

                  0 ≤ Xj , Zi+ , Zi− ≤ 1; yi ∈ {0, 1}              (15)
    yi is a binary 0-1 decision variable associated with each associ-           question. If no such configuration is found, we conclude that there
 ation rule (confidence/ support). The variable yi will take value 1            is no feasible configuration which can improve our current objective
 if
 Pthere is some deviation between selection of attribute association            function value. The aim of the master problem in both model OP T1
     Aij Xj and the given rate πi . For any association rule at a time          and OP T2 is to find the optimal solution over known configurations
 j
                                                                                (given columns of the Aij -matrix) and the task of the sub model is
 only one variable Zi+ or Zi− will have a non zero value. Accord-
                                                                                to add new columns to the Aij -matrix. We will repeat this procedure
 ingly, yi will take value 0 or 1 from constraint 14. If all association
                                                                                until no configuration is found that is worth considering. In the next
 rules are consistent, yi must be zero for all i.
                                                                                section, we will discuss the formulation of sub model w.r.t. master
    In Example2 the OP T2 model will give objective function value
                                                                                problem OP T1 .
 1 i.e. if we ignore one association rule then we can satisfy the re-
 maining ones in the new product configuration. For example if we
 take only π2 , π3 = 0.6 then we can build a set of feasible configura-         3.4    The configuration generation model (Sub
 tions which can satisfy both the association rules.                                   model)
                                                                                In linear programming problems, in each iteration of the simplex
 3.3    Solution procedure
                                                                                algorithm we compute reduced costs to check if any non-basic
 Model formulation (OP T1 ) and (OP T2 ) are designed to express all                                      of the solution. To do so in model OP T1 ,
                                                                                variable can enter as partP
 possible configurations (X). This runs into the hundreds of millions!          we have to evaluate if − wi ∗ [xi ]j < 0 for any configuration j.
 There are still two different problems with the optimization model             Where wi is dual variable and [xi ]j is the new j th configuration.
 (OP T1 ) and (OP T2 ):                                                         As the value of wi will be known from solving model OP T1 , if
                                                                                we know the coefficient of j th order [xi ]j we can tell whether the
1. How to consider all possible solutions? Very large number of de-             given order will improve our objective function or not. Another way
   cision variables.                                                                                                         th
2. How to build Ai,j matrix?                                                    Plook at this problem is to build the new j configuration so that
                                                                                to
                                                                                   wi ∗ [xi ]j can be maximized.
     – Do we have to explicitly write all the columns of A?
     – Can we work with a small set of configurations and add more                 The Sub-Problem:
       when needed?                                                             Data: wi = Dual variable from the model OP T1 , associated with
                                                                                constraints 2
    The key for success here is that Aij needs not to be stored or built        B= Set of constraints derived from configuration rules
 explicitly. Based on the need, configurations can be added to Aij to           Decision Variable
 minimize the objective function. A solution for such a large scale
                                                                                                 if ith attributes association is true in new configuration j
                                                                                         
 optimization can be found using column generation [8]. We can start                        1
                                                                                [xi ]j =
 with a possible set of Xj variables. Solve OP T1 or OP T2 to decide                        0    otherwise
 which of those Xj ’s are included in the solution, and then try to             [xi ]j = new configuration for j th column of configuration ma-
 generate a new configuration so as to improve the objective function           trix Ai,j
 value.
    As discussed in figure 2, the overall problem is divided into two
 optimization problems. The first problem is to make a selection over             Objective:
 known configurations (stored in Aij − M atrix) such that attribute
                                                                                                                          X
 association rules can be matched as per objective function. In the next                        OP T 1Sub = Maximize           wi ∗ xi            (16)
 step, we want to know whether there is any feasible configuration                                                         i
 (w.r.t. configuration rules ) which improves the master problem ob-
 jective function value. Let us assume that we have an ’Oracle’ (sub-             subject to:
 model) which will give us such a configuration each time we ask the                                          B[x] ≤ b                            (17)




                                                                           27               Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors
                                                                                         Proceedings of the 17th International Configuration Workshop
                                                                                                              September 10-11, 2015, Vienna, Austria
 (i.e. x is a feasible configuration)

                               xi = {0, 1}                           (18)
    In this formulation, we assume that configuration restrictions are
 modelled as a set of linear constraints as per Eq. 17 [15]. The sub-
 problem will generate a possible new configuration j. If this new con-
 figuration j satisfies Eq. 19 the configuration j enter the pool. This
 will be one column of Aij matrix in OP T1 model. Each solution of
 OP T 1Sub model will give a feasible configuration after satisfying
 all configuration rules from constraint 17. Configuration rules can be
 presented as propositional formulas and then transformed to sets of               Figure 3. Implementation flow of customer drive attributes association
 linear constraints [15].                                                                                      rule mining
                             X
                                  wi ∗ xi ≥ 0                        (19)
                              i                                                   4.1 Association rule mining
 Dual costs are recomputed by solving the model OP T1 and the pro-                We use Apriori algorithm implemented in ”arules” package of R to
 cess terminates when no more configurations are found to be worth                derive a list of association rules. Number of attributes present in all
 taking in. We use IBM ILOG Cplex engine to solve the sub-problem.                customer orders are in the range of 500- 1,000. Total number of or-
 The master (OP T1 ) and the subproblem (OP T 1Sub ) may have to                  der is in the range of 10K to 1 Million. Even for very high cut off of
 be solved multiple times before the terminating criteria is satisfied.           support and confidence value ( 80%) number of generated associa-
                                                                                  tion rules ranges in tens of thousands. One way to reduce the number
 3.5    Column generation
 The procedures discussed in section 3.1 and 3.4 works together to
 find a set of consistent association rules. Model OP T1 works with a
 predefined set of configurations and optimizes the current deviation
 with given association rules target. Model OP T 1Sub is used to find a
 new configuration so that Model OP T1 objective function value im-
 proves. Implementing the column generation approach in association
 rule verification problem is done in the following way:

 1 Solve the OP T1 model with current columns of Aij matrix. In the
   first iteration, a feasible configuration can be used to initialize the
   A matrix. This iteration is used to get the dual variables which will
   be used in the new configuration generation model OP T 1Sub .
 2 Get dual variable wi from Eq. 2 of the OP T1 Model
 3 Set up a sub problem as per Section 3.4
 4 Get new column (order as 0-1 vector [x] from solution of the sub
   problem discussed in section 3.4)
 5 This generates a possible new configuration j with dual variable
   wi                           P
 6 If configuration j satisfies wi ∗ xi ≥ 0 (pricing inequality) then
                                  i
   configuration j enters as j th column of Aij
 7 Dual costs are re-computed and the process terminates when no
   more configurations satisfy the pricing inequality.


 4     Implementation                                                                 Figure 4. System overview: Deriving association rules from prior
                                                                                                      configurations using R-arules
 One of our goals is to provide a software system which can 1) ex-
 tract association rules from given configurations and is 2) able to use
 new configuration rules (upgraded product) to validate attribute as-             of association rules is by limiting the attribute association level. We
 sociation rules. Figure 3 shows the implementation flow of arriving              usually limit the association between 2 or 3 three attributes. Figure
 customer driven attribute associations for predicting future configu-            4 shows a screen shot of actual system for association rule mining.
 rations. We use Apriori algorithm through R-arules package to find a             The user can select different computational parameters such as min-
 list of interesting (by support and confidence threshold) association            imum support and confidence, rule size, limiting antecedent (left-
 among product attributes [11]. All derived association rules are then            hand-side) and consequent attributes to build the desired association
 used in the optimization model which is implemented using IBM                    rules. Association rule which are according to the user specified pa-
 ILOG Cplex 12.5 and c# .net environment. At the end, a list of all               rameters are computed from given configuration set. These associa-
 consistent association rules is available with the corresponding set of          tion rules are used as the target to predict future product configura-
 configurations.                                                                  tions.




Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors                     28
Proceedings of the 17th International Configuration Workshop
September 10-11, 2015, Vienna, Austria
4.2     Predicting configuration set                                              are then filtered as per their direct dependencies/conflict with con-
                                                                                  figuration rules as explain in section 2.2. By doing so, we see a sig-
As a next step after computing attribute associations (support and                nificant reduction in the number of association rules. These rules are
confidence), we build the optimization model as per section 3 to                  now ready to be validated with respect to the upgraded product.
compute configurations as per target input characteristics. Two op-                  Now we apply optimization model discussed in section 3.1 to build
timization models are used:                                                       a set of configurations so that association and configuration rules are
                                                                                  met together. In most of the cases, not all computed association rules
1 Master model as per section 3.1 (OP T1 ) which models selection                 are applicable to the upgraded product. Therefore, it is important to
  of configuration such that sum of absolute deviation from target                use only consistent rules to predict the set of future configurations.
  association rules can be minimized.                                             We applied the optimization model discussed in section 3.1 to build
2 Sub model as per section 3.4 which models all configuration rules               such a configuration set.
  as linear inequalities.                                                            An important use of the set of predicted future configurations is to
                                                                                  find part demand estimates for future production. In order to see the
Both the optimization model receives input from each other in every               influence of consistent customer buying behaviour in parts demand,
iteration of column generation procedure described in section 3.5.                we computed parts frequency associated with order sets, where 1)
Optimization models run iteratively until stopping criteria (no im-               only consistent attribute associations are used to predict the order set
provement to the current solution) is met. At any iteration of column             and 2) all attribute associations available after pruning w.r.t. config-
generation procedure, sub model gives a single configuration which                uration rules are used to predict the order set. The above association
is used as new decision variable to the master problem. We have im-               rules are also supplemented with a few sales forecasts (at single at-
plemented both the optimization model using cplex 12.5. As a result               tribute level) to include estimates of new attributes which are not
of the optimization procedure, a configuration set is build adhering              present in past orders.
given association rule targets. In the next section, we will discuss our
first computation result with developed models.                                                                    Order set built with inconsistent association rules
                                                                                                                   Order set built with consistent association rules
                                                                                  Parts forecast accuracy in [%]


5     Computational Results
                                                                                                                      80
In this section, we discuss typical computational parameters and as-
sociated numbers with input data and decision variables. We have                                                      60
tested our methods and models mainly on automotive data. The his-
torical configurations are analysed at specific granularity (product-                                                 40
line/body style/market/engine type) to reflect current sales planning.
Generally, about 500-1000 unique attributes are to be specified in a                                                  20
configuration set. However, not all attributes are available for cus-
                                                                                                                       0
tomers choice as some of them are related to production. In our anal-
                                                                                                                           Segment1       Segment2        Segment3
ysis, we have considered between 100 and 200 attributes which are
available to customers. Three vehicle segments are used to test the
methodology that has been developed. Table 4 shows various param-                       Figure 5. Comparing part demand forecast accuracy between
eters of the data segments that we have selected. The number of or-               configuration set built with consistent and inconsistent attribute association
ders is the number of prior configurations used to find the attribute                                                 rules
associations (support and confidence). For this experiment, the max-
imum number of attributes in an association rule is limited to two i.e.
association among two arbitrary attributes are computed.                             Figure 5 shows the part forecast accuracy of the two scenarios
                                                                                  discussed above. The order sets are compared with real customer or-
    Experiment     # of at-   # of       # asso-    # Pruned associ-              ders to find the match with respect to estimated part number. About
    #              tributes   orders     ciation    ation rules (after            10,000 part forecasts are compared and we used part demand match-
                                         rules      applying configura-           ing parameter ±10% i.e. if the estimated value of part demand is
                                                    tion rules)                   within ±10% of actual demand then this forecast is considered good.
    Segment1       200        30,000     2,000      1,200                         With this measurement, we have compared two order set computed
    Segment2       120        25,000     1,800      1,300                         with consistent and non-consistent attribute association. In figure 5,
    Segment3       100        10,000     1,500      1,100                         for all the 3 cases we see significant improvement in forecast accu-
                                                                                  racy when consistent sets of input information are used.
Table 4.   Computational experiments with three different vehicle segments
                                                                                  6                                Related work
                                                                                  Data mining techniques in manufacturing system are widely used
   Starting from a set of configurations and attributes, we use ad-               to provide detailed insights regarding processes and products, such
ditional parameters such as minimum support and minimum confi-                    as customer segmentation, production control and quality controls
dence to compute attribute association rules. In this experiment, our             [7]. In mass customization, the uncovering of aggregated level of
aim is to find association rules which are significant (e.g. above min-           customer buying information becomes crucial due to high product
imum support and confidence support). After applying data mining                  variety [14]. Data mining techniques such as association rule min-
methods, we get a large number of attribute association rules which               ing have been used in many applications such as predicting a sub-




                                                                             29                                           Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors
                                                                                                                       Proceedings of the 17th International Configuration Workshop
                                                                                                                                            September 10-11, 2015, Vienna, Austria
 assembly selection, and lead to significant improvement in order ful-            of future configurations.
 filment process [13]. The main challenge in association rule min-
 ing technique is how to find useful association from a large set of
 possible attribute choices [3]. As association rules are derived from
                                                                                  REFERENCES
 historical demands, another challenge is to validate association rules            [1] R. Agrawal, T. Imielinski, and A. Swami, ‘Mining association rules
 with respect to engineering changes to the product. Configuration                     between sets of items in large databases’, in Proc. 1993 ACM-SIGMOD
                                                                                       Int. Conf. Management of Data (SIGMOD-93), Washington DC, pp.
 models which capture configuration restrictions can be used to vali-                  207–216, (1993).
 date the list of association rules. It turns out that validating product          [2] KimAllan Andersen and Daniele Pretolani, ‘Easy cases of probabilistic
 attribute association rules against configuration rules can be formu-                 satisfiability’, Annals of Mathematics and Artificial Intelligence, 33(1),
 lated as Probabilistic Satisfiability Problem (PSAT) [2]. Optimization                69–91, (2001).
                                                                                   [3] Yves Bastide, Nicolas Pasquier, Rafik Taouil, Gerd Stumme, and Lotfi
 techniques such as column generation can be used to find a solution
                                                                                       Lakhal, ‘Mining minimal non-redundant association rules using fre-
 of PSAT problem [9].                                                                  quent closed itemsets’, in Computational Logic CL 2000, eds., John
    Another way of building reasoning between product attributes                       Lloyd, Veronica Dahl, Ulrich Furbach, Manfred Kerber, Kung-Kiu Lau,
 from known configurations is through feature models [4]. The ba-                      Catuscia Palamidessi, LusMoniz Pereira, Yehoshua Sagiv, and PeterJ.
 sic idea is to deduce rules/ constraints from existing product variants               Stuckey, volume 1861 of Lecture Notes in Computer Science, 972–986,
                                                                                       Springer Berlin Heidelberg, (2000).
 to support reverse engineering [12]. Feature models, combined with                [4] Guillaume Bécan, Razieh Behjati, Arnaud Gotlieb, and Mathieu Acher,
 configuration rules, can represent hierarchical relations among dif-                  ‘Synthesis of attributed feature models from product descriptions:
 ferent product attributes, modelling a complete set of configurations                 Foundations’, Rapport de Recherche RR-8680, Inria Rennes, (feb
 implicitly [6]. However, our aim is to build a small set of explicit con-             2015).
                                                                                   [5] Guillaume Bécan, Nicolas Sannier, Mathieu Acher, Olivier Barais, Ar-
 figurations which can be used for production planning. The product
                                                                                       naud Blouin, and Benoit Baudry, ‘Automating the formalization of
 comparison matrix is another intuitive way to highlight the differ-                   product comparison matrices’, in Proceedings of the 29th ACM/IEEE
 ences between two products [5]. However, building such a matrix for                   International Conference on Automated Software Engineering, ASE
 different customer buying behaviour is a challenging task.                            ’14, pp. 433–444, New York, NY, USA, (2014). ACM.
    In our work, we have formulated configuration problem as an opti-              [6] Guillaume Bcan, Mathieu Acher, Benoit Baudry, and SanaBen Nasr,
                                                                                       ‘Breathing ontological knowledge into feature model synthesis: an em-
 mization model to give an integrated solution within the column gen-                  pirical study’, Empirical Software Engineering, 1–48, (2015).
 eration framework of validating set of association rules. Our model               [7] A.K. Choudhary, J.A. Harding, and M.K. Tiwari, ‘Data mining in man-
 takes support and confidence value to attribute associations to build a               ufacturing: a review based on the kind of knowledge’, Journal of Intel-
 set of configurations adhering given input targets. In case of conflict-              ligent Manufacturing, 20(5), 501–521, (2009).
                                                                                   [8] Gerard Cornuejols, Milind Dawande, Michel Gamache, Francois
 ing association rules, the model that has been developed attempts to
                                                                                       Soumis, Gerald Marquis, and Jacques Desrosiers, ‘A column genera-
 find the maximum number of association rules which can be satisfied                   tion approach for large-scale aircrew rostering problems’, Oper. Res.,
 after considering product configuration changes.                                      47, 247–262, (February 1999).
                                                                                   [9] Fabio G. Cozman and Lucas Fargoni di Ianni, ‘Probabilistic satisfia-
                                                                                       bility and coherence checking through integer programming’, Interna-
 7    Future work                                                                      tional Journal of Approximate Reasoning, 58(0), 57 – 70, (2015). Spe-
                                                                                       cial Issue of the Twelfth European Conference on Symbolic and Quan-
 In this paper, we have discussed a framework for learning customer                    titative Approaches to Reasoning with Uncertainty (ECSQARU 2013).
 buying behaviour through data mining and optimization-based tech-                [10] T. Hadzic, S. Sathiamoorthy, R. M. Jensen, H. R. Andersen, J. Møller,
                                                                                       and H. Hulgaard, ‘Fast backtrack free product configuration using pre-
 niques. In mass customization, due to frequent changes in products,                   compiled solution space representations’, in Proceedings of the Interna-
 we are required to validate product attribute associations learnt from                tional Conference on Economic, Technical and Organisational aspects
 customer prior demand. The association rule mining technique when                     of Product Configuration Systems, (2004).
 combined with the configuration problem gives the required frame-                [11] Michael Hahsler, Bettina Grün, and Kurt Hornik, ‘arules - a computa-
                                                                                       tional environment for mining association rules and frequent item sets’,
 work for calculating consistent and feasible attribute associations.
                                                                                       Journal of Statistical Software, 14(15), 1–25, (9 2005).
 These associations among attributes can be used as inputs for pre-               [12] Roberto E. Lopez-Herrejon, Lukas Linsbauer, Jos A. Galindo, Jos A.
 dicting configurations for future production planning. The proposed                   Parejo, David Benavides, Sergio Segura, and Alexander Egyed, ‘An
 framework uses data mining libraries from R to find association                       assessment of search-based techniques for reverse engineering feature
 rules. The integrated framework with optimization models provides                     models’, Journal of Systems and Software, 103(0), 353 – 369, (2015).
                                                                                  [13] Efthimia Mavridou, DionisisD. Kehagias, Dimitrios Tzovaras, and
 the ability to perform tests on many scenarios before using any asso-                 George Hassapis, ‘Mining affective needs of automotive industry cus-
 ciation discovered from past data to future product planning.                         tomers for building a mass-customization recommender system’, Jour-
    One application of discovering attribute associations is to use them               nal of Intelligent Manufacturing, 24(2), 251–265, (2013).
 for predicting the set of future configurations. As per our initial com-         [14] Rainer Paffrath, ‘Mining product configurator data’, in Modern Con-
                                                                                       cepts of the Theory of the Firm, eds., Gnter Fandel, Uschi Backes-
 putational results, such a configuration set results in considerable im-
                                                                                       Gellner, Manfred Schlter, and JoergE. Staufenbiel, 110–121, Springer
 provement in part demand forecasts. Currently, we only consider at-                   Berlin Heidelberg, (2004).
 tribute associations which are frequent (e.g. above minimum support              [15] Tilakraj Singh and Narayan Rangaraj, ‘Generation of predictive config-
 or confidence). In the next step, the association rule mining algorithm               urations for production planning’, in Proceedings of the 15th Interna-
 can be enhanced to look for other relations. Also, in current imple-                  tional Configuration Workshop, eds., Michel Aldanondo and Andreas
                                                                                       Falkner, pp. 79–86. CEUR Workshop Proceedings, (2013).
 mentation we simply remove attribute association which are having                [16] C. Sinz, A. Kaiser, and W. Küchlin, ‘Formal methods for the valida-
 the conflict with each other or with product configuration rules. As a                tion of automotive product configuration data’, Artificial Intelligence
 next step, we will try to develop approaches which can readjust at-                   for Engineering Design, Analysis and Manufacturing, 17(1), 75–97,
 tribute association in case of conflicts. For example, we can have the                (JAN 2003). Special issue on configuration.
                                                                                  [17] Peter Walley, Renato Pelessoni, and Paolo Vicig, ‘Direct algorithms for
 same selection rate for attributes if they are selected together. Fur-
                                                                                       checking consistency and making inferences from conditional probabil-
 ther computational tests are required to validate and improve our as-                 ity assessments’, Journal of Statistical Planning and Inference, 126(1),
 sumptions on attribute associations which can improve the selection                   119 – 151, (2004).




Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors                     30
Proceedings of the 17th International Configuration Workshop
September 10-11, 2015, Vienna, Austria