=Paper=
{{Paper
|id=Vol-1453/05_SinghRangaraj_CustomerBuyingBehaviourAnalysis_Confws-15_p23
|storemode=property
|title=Customer buying behaviour analysis in mass customization
|pdfUrl=https://ceur-ws.org/Vol-1453/05_SinghRangaraj_CustomerBuyingBehaviourAnalysis_Confws-15_p23.pdf
|volume=Vol-1453
|dblpUrl=https://dblp.org/rec/conf/confws/SinghR15
}}
==Customer buying behaviour analysis in mass customization==
Customer buying behaviour analysis in mass customization Tilak Raj Singh 1 and Narayan Rangaraj 2 Abstract. Motivated by the importance of customer buying be- These consistent attribute association rules could then be used to pre- haviour (such as correlation among product attributes/features of dict future configurations for production planning [15]. products configured in the past) in planning future configurations, Throughout this paper, we will use the terms attribute association this paper addresses the issue that product evolution (upgrades) usu- rules or attribute associations to specify quantities reflecting the joint ally render information gathered from past buying behaviour at least selection of attributes or conditional section of attributes. For exam- partially unusable. For instance, relations among features might have ple, figure 1 shows that 67% of configurations contain both attribute been changed, thus making it difficult to configure the same prod- 1 and attribute 2. ucts again. The proposed approach aims to (1) find associations be- In practice, the use of specific components in the final product tween product attributes based on the analysis of prior customer or- assembly depends on 1) the way they are designed and 2) the way ders (2) apply configuration rules to prune attribute association rules customers select them. Design or engineering related dependencies which are not controlled by customers, and (3) check whether de- (or restrictions) are well documented in product’s Bill-of-Material rived attribute association rules from past orders also work for the (BOM) or configuration rules. As the product development process new upgraded product. Attribute associations consistent with the up- starts well before the actual production, it is possible to know the graded product are then used to predict configurations for produc- product description for a future time (2-3 years in advance) from tion planning. We use machine learning algorithms and optimization BOM [16]. However, attribute associations from the customer point techniques to address these issues. of view are not known directly and can only be seen once cus- tomers have placed orders. Most manufacturers utilize information from product variants produced in the past to get estimates of cus- 1 Introduction tomer demands. Customer behaviour can be expressed through as- sociations between different attribute choices (e.g. joint selection) Mass customized products (e.g. Automobiles) involve a large number customers have made in the past. of product variants which are generated by combining different pre- In order to predict configurations for production planning the in- defined features/attributes. Individual product attributes and attribute formation from 1) configuration restrictions and 2) customer buying combinations control the final consumption of vehicle components behaviour should be used together. Any conflicting information be- and sub-assemblies during the production [13]. For example, the se- tween these two sources points to an inconsistency in the planning lection of features such as a specific gearbox or a sports package information. Delay in the detection of such discrepancies may result can decide which steering wheel will be used to build the vehicle. in a wrong mix of parts being produced, thus hampering production Thus, knowledge of customer buying behaviour (correlation between efficiency. product attributes) is crucial for demand estimation of parts and sub- assemblies for future production. Input Output Section 2 Section 2 Section 3 One way to get customer buying behaviour is by extrapolating the configurations produced in the past. Due to the high degree of in- Product Attribute Pruning Validate Predict dividualization and continuous changes in the product design, prod- configured association association attribute future con- in the past rules rules associations figurations uct evolution (upgrades) usually renders information gathered from past buying behaviour at least partially unusable. For instance, rela- (+) Config Attributes (1) 100% (1) 100% Attributes tions among features might have been changed, thereby not allow- 1 2 3 (1,2) 67% rule (2 → ¬3) (1,2) 67% 1 2 3 1 0 1 (1,3) 67% (1) 100% (1,3) 33% 1 0 1 ing configuration of the same products again. In the special case of 1 1 0 (2,3) 33% (1,2) 67% 1 1 0 a new product, information from existing models (having common 1 1 1 (1,3) 67% 1 1 0 features) is often used to prepare the initial production plan (set of vehicle configurations). As configuration rules of different products are not same, it is likely that attribute associations from the existing Figure 1. Predicting future configurations as per customer buying model may not be directly applicable to the new product. Thus, we behaviour need a mechanism to validate whether the derived attribute associa- tion rules from past orders also work for the new upgraded product. 1 Figure 1 shows a sequential block diagram for predicting future IT-Data Analytics, Mercedes-Benz R & D India, Bangalore, Email: tilak.singh@daimler.com product configurations as per customer buying behaviour and config- 2 Industrial Engineering and Operations Research, Indian Institute of Tech- uration rules. First, customer prior demand is analysed to calculate nology, Bombay, Powai Mumbai, India, email: narayan.rangaraj@iitb.ac.in attribute association rules (joint and conditional correlation between 23 Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria attributes). Then, association rules which conflict with configuration of Boolean variables (X) which satisfies given configuration rules restrictions are pruned. In Section 2 we discuss a machine learning (F ). Configurations from Table 1 can be treated as customer config- algorithm to calculate such association rules. In Section 3 we present urations and in the next section we will derive associations between optimization models which aim to build a set of future configurations different attributes from these. Table 1 contains a list of some feasi- by considering 1) configuration restriction and 2) attribute associa- tion rules simultaneously. If we are able to find such a configuration Order# Automatic Cruise Reverse Sunroof KeyLess Parktronic Gear- Con- Cam- (SR) Go (PA) set which matches both of the input parameters, then the result can Box trol era (KG) be used for future production planning. In case of conflicts between (AG) (CC) (RC) configuration restrictions and attribute association rules, further anal- O001 1 1 1 1 1 0 ysis is required and perhaps only a limited set of consistent associ- O002 1 1 0 1 1 0 ation rules can be used to predict future configurations. In this case, O003 1 0 1 0 1 0 our focus is to find such a consistent set of attribute association rules O004 1 0 0 0 0 1 and use them to predict future configurations. Section 4 focuses on O005 1 1 1 1 0 1 the system implementation followed by initial computational results, O006 1 0 0 0 0 1 discussed in section 5. O007 1 0 1 1 1 1 O008 1 0 0 0 1 1 O009 0 0 1 0 1 1 2 Mining attribute association rules O010 0 0 1 1 0 0 A customizable product can be configured using different combina- tions of attributes (features). In an automobile, attribute could be Table 1. Sample configurations (selected by customers) from Example 1 body style, transmission type, sunroof or parking assistance. Cus- tomer buying behaviour can be studied by analysing how product attributes are associated with each other. Association rule mining has wide application in data mining for analysing and predicting cus- ble configurations which are created by combining given attributes tomer behaviour [13]. In this section, we discuss a framework for and satisfying configuration rule F . mining association rules among product attributes from a given set of product configurations. As association rules are extracted from 2.2 Attribute association rules and configuration known product configurations, we first take a look at the characteris- restrictions tics of the configuration problem. Association rule mining methodology is used to find the association between variables in large transactions [1]. In our case, each config- 2.1 Product configurations uration is expressed over a subset of attributes, where attributes are Let us define our product configuration problem as per [10, Defi- feature/variables used to configure the product. An association be- nition 1]: the configuration problem C can be expressed through a tween two disjoint sets of attributes p and q can be expressed using triple (X, D, F ), where: two numbers: • X is a set of product attributes (configuration variables) lets say Support(p ⇒ q): This is the proportion of configurations {x1 , ...xn }. Where n is the total number of attributes. that contain both attribute sets p and q. In Example1, Support • D is the set of attributes finite domains d1 , d2 , ..., dn . (Sunroof, ReverseCamera)= 4/10 = 0.4. • F = {f1 , f2 , ..., fm } is a set of propositional formulas (rules or Confidence (p ⇒ q): Given set of configuration which con- restrictions) over attribute set X. tains attributes set p, this is the proportion of configurations where attribute set q is also selected. In Example1, Confidence In this paper, we assume that the configuration variables xi ∈ X (Sunroof ⇒ ReverseCamera) = 4/5 = 0.8. If support and con- are boolean, hence domain di ∈ {0, 1}, ∀i ∈ X. A configuration is fidence are greater than user specified thresholds, then we call that said to be feasible if an assignment for all attributes (i ∈ X) is found association rule “interesting”. which fulfils each and every proposition in F . X = x1 , ...xn is a set and each di is a set. In this case, each di is a binary set. In other After applying rule mining techniques, we get a large number of words, {d1 , ..., dn } is a collection of sets, whereas {x1 , ..., xn } are relations which satisfy our parameter of interestingness, although the elements of set X. not all of them are customer driven. Because sales transactions do not explicitly state only customer selectable attributes, it is then our task to identify and remove attribute relations which are driven Example 1 by the technical nature of the product. For example, in Table 2 (rule # 2) the association between two attributes CruiseCtrl ⇒ Let us assume a car is configured using six attributes X = AutomaticGearBox is given by Support = 0.3 and confidence = 1. {1, 2, ..., 6} ≡ { Automatic Gearbox, Cruise control, Reverse cam- The high confidence between Cruise control and Automatic Gearbox era, Sunroof, Keyless Go, Parktronic }, D ∈ {0, 1}∀X, and F = is not really driven by customer buying behaviour, but this is the only {f1 } where way a feasible configuration using attribute Cruise control can be cre- f1 = {2 → 1}: Cruise control requires Automatic Gearbox. ated. This information is stated in the configuration rule (F = {f1 }) of Example 1. For a given set of boolean variables (attributes) and propositional In practical scenarios with hundreds of product attributes and thou- formulas, finding a feasible configuration is a Boolean Satisfiability sands of configuration rules, identifying which attribute association problem where the aim is to get an assignment (true or false value) rule is controlled by their technical dependencies is non-trivial. Also, Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors 24 Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria sr. lhs rhs support confidence If products are defined over sets of boolean attributes, the asso- 1 {CC} ⇒ {SR} 0.3 1 ciation rules can be expressed as boolean proposition formulas. For 2 {CC} ⇒ {AG} 0.3 1 example, support(p, q) can be modelled in a proposition formula to 3 {PA} ⇒ {AG} 0.5 0.83 capture the joint selection of the attribute sets p and q. The value of 4 {KG} ⇒ {AG} 0.5 0.83 5 {SR} ⇒ {RC} 0.4 0.8 support (p, q) indicates the fraction at which corresponding boolean 6 {SR} ⇒ {AG} 0.4 0.8 propositional formula (p ∧ q) evaluates as true in the configurations 7 {CC} ⇒ {RC} 0.2 0.66 set. Thus, finding a consistent future demand estimate with respect to 8 {CC} ⇒ {KG} 0.2 0.66 association rules is equivalent to satisfying a set of propositional for- 9 {RC} ⇒ {KG} 0.4 0.66 mulas with some probability. For example, if support(p, q) = 0.2 10 {RC} ⇒ {AG} 0.4 0.66 then we want a clause (p ∧ q) is true 20% of the time in final config- 11 {SR} ⇒ {KG} 0.3 0.6 12 {PA} ⇒ {RC} 0.3 0.5 urations. The resultant set of propositional formulas can be divided 13 {PA} ⇒ {KG} 0.3 0.5 into two sets: 1) propositional formulas derived from the configura- 14 {SR} ⇒ {PA} 0.2 0.4 tion problem (i.e. BOM) which has to evaluate to ”true” for every 15 {CC} ⇒ {PA} 0.1 0.33 demand instance 2) propositional formulas generated from associa- tion rules which are assigned a probability of being satisfied. If we Table 2. Association rules between two attributes from Example 1 are able to find a probability distribution on the truth assignments of the boolean variables corresponds to association rules that induces the given probabilities, then we will have an instance of future de- mand where all calculated association rules are satisfied. it is both error prone and time consuming to analyse and classify For a given set of boolean variables and set of boolean clauses, a large set of attribute associations manually. As the configuration determining whether it is possible to find a probability measure over problem is used to find feasible assignments of product attributes un- truth assignments of the boolean variables that induce the given as- der configuration rules, we use this to state some direct dependencies sessments is known as Probabilistic Satisfiability (PSAT) problem among attributes. Any two attributes (let’s say p and q) can be com- [9]. After modelling association rules as boolean propositional for- bined in a configuration based on the following relations: mulas and generating a solution instance where all association rules and configuration restrictions are satisfied simultaneously, we can Sr# Relation p q Description 1 ¬p ∧ ¬q 0 0 Configuration without attribute p and q say that derived association rules are consistent with the new con- 2 ¬p ∧ q 0 1 Configuration with attribute q but not p figuration restrictions for the product. Our aim is to construct a con- 3 p ∧ ¬q 1 0 Configuration with attribute p but not q figuration set such that it reflects the same support and confidence 4 p∧q 1 1 Configuration with both attribute p and q values from association rules as derived from past data. Support and confidence can be modelled as constraints in configuration problem and the associated quantity is then used to select the configuration Table 3. Possible relationships among two attributes in a configuration set/ solution set to check the satisfiability [17]. 3.1 The association rules verification model Depending upon how many relations are satisfied from Table 3 any association rule can be classified in one of 24 possible cases. For In this section, we present the optimization model for evaluating the example, if we consider Cruise control and Automatic gearbox from consistency among attributes association rules. Before formulating example 1, as attribute p and q, then with the given configuration the mathematical model, let us discuss a small example to under- rule, we will not be able to create a configuration which satisfies 3rd stand the underlying problem: relation p ∧ ¬q. If all the four relations are satisfied from Table 3 then we can say that the given attributes are independent of product’s Example 2: Let us assume we have discovered a customer technical influence and any association derived from customer orders behaviour from Table 1 that three different attributes (Reverse actually reflects their buying behaviour. Camera (RC), Keyless Go (KG), Parktronic (PA)) are individually In the other case, let us assume that the product from Example 1 selected 60% of the time in prior demand. Now, new configuration has been upgraded and new configuration restriction has been added restrictions (e.g. Upgraded product) specify that at least two of i.e. F = {f1 , f2 } where f2 = Parktronic comes with Reverse cam- the attributes (out of three) have to be present in every feasible era. Due to this new restriction configuration, O001, O003 and O010 configuration. Now, is it feasible to assume that the attributes will be from Table 1 will not be feasible for future product. Then, associa- selected at the same rate as before? tions between different product attributes need to be validated against From given configuration rule in Example 2, at least two attributes the new configuration rule. In the next section, we discuss a set of have to be present in a feasible configuration, i.e. the following optimization models for validating attribute association rules with boolean clause has to be true: (RC ∨ KG), (KG ∨ PA) and (RC ∨ respect to configuration restrictions. PA). If we are able to get a set of configurations where the above rules are satisfied and each attribute is selected 60% of the time in the configuration set, then we can say that derived attribute association 3 Validation of attribute association rules rules and configuration restrictions are consistent with the upgraded In the task of validating attribute association rules for an upgraded product. product; our aim is to find one instance of future demand where all Before solving the problem of Example 2 let us formulate a gen- derived association rules are satisfied. The future demand estimate eral mathematical model to detect consistency among the association can be given in terms of a configuration set where correlation be- rules. The problem of validating association rules with respect to a tween attributes is controlled by predefined association rules. new configuration rule can be defined as follows: 25 Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria Let the index i refer to a logical association rule statement (Sup- Decision variables Zi+ , Zi− are used to control the absolute devia- port or Confidence) defined over n Boolean variables x1 , x2 , ...xn tion for each association rule. The matrix A lists the set of all feasible using Boolean propositional formulas. As an Example from Table configurations which we use to match association rule quantity πi . At 2, x1 = {CC}, x2 = {SR}, the new variable π1 = Support(x1 ,x2 ) = this point let us say that A contains all feasible orders. From model x1 ∧ x2 . OP T1 if Zi+ , Zi− are zero ∀i then we can say the association rules are consistent among each other and also with configuration rules as Let the index j refer to a configuration in the set J, the total we are only considering feasible configurations in the matrix A. configuration set (typically of very large size). In Example 2, let us assume that x1 = RC, x2 = KG, x3 = PA. Then the configuration constraint can be written as follows: Data x1 + x2 = ≥ 1, x1 + x3 = ≥ 1 and x2 + x3 = ≥ 1. πi is the probability of ith statement (attributes Support or Confi- any combination of x1 , x2 , x3 which satisfies the above constraint dence) to an Example from Table 2, π3 =0.4 will be a possible configuration to use by model OP T1 i.e. as a col- be true. Asth umn of A-matrix. In this case, only four possible solutions are avail- 1 if i statement is true in j th configuration Ai,j = able so we can solve this example by explicitly enumerating all pos- 0 otherwise sible configurations. Decision variables 3 X Xj = Fraction representing the proportion of j in the total OP TExample2 : Minimize Zi+ + Zi− (7) i=1 configuration set, a real number between 0 and 1; Zi+ = Positive deviation from target probability πi , a real number X 1 1 + − between 0 and 1 1 1 0 Z1 Z1 0.6 X2 + − X3 + Z2+ − Z2− = 0.6 1 0 1 1 (8) Zi− = Negative deviation from target probability πi , a real number between 0 and 1 0 1 1 1 Z3 Z3 0.6 X4 + − A Z Z π X Objective Function X X1 + X2 + X3 + X4 = 1 (9) OP T1 : Minimize Zi+ + Zi− (1) i 0 ≤ Xj , Zi+ , Zi− ≤ 1 (10) Subject to By solving the optimization model OP TExample2 , at optimality X we get objective function value 0.2 (6= 0). There are multiple optimal Aij Xj + Zi+ − Zi− = πi . . . ∀i (2) solutions and one is X = 0.4, 0.2, 0.4, 0 which means configuration j 1, 2 and 3 are used 40%, 20% and 40% of the time respectively X and configuration 4 is not used in the final solution. As the objective Xj = 1 (3) function value of OP TExample2 model is not equal to zero, we can j say that attribute association and configuration rules are not consis- tent with each other. However, one drawback of the model OP T1 is 0 ≤ Xj , Zi+ , Zi− ≤ 1 (4) that it does not explicitly specify how many association rules are sat- isfied. More specifically, we would like to be able to build a model 3.1.1 Support which satisfies the maximum number of association rules in case of conflicting inputs as presented in Example2 . In the next subsection, Assigning support and confidence values to πi in model OP T1 is we discuss one such model. to control the joint probability and conditional probability among at- tributes. Support of any two attributes association (p ∧ q) can be considered in the model OP T1 by constraint 2 as follows: 3.2 Maximum association rules fulfilment model X + − Ap∧q,j Xj + Zp∧q − Zp∧q = πp∧q (5) Complex products such as automobile undergo enormous changes j through their life cycle. Thus, it is quite likely that some of the asso- ciation rules derived from past orders may not be consistent with new where πp∧q is equal to support (p ⇒ q). Ap∧q,j will take value one configurations rules as discussed in our Example2 . A relevant ques- if both p and q are present in configuration j, otherwise zero. tion is whether we can maximize the coverage of association rules which can be fulfilled by an upgraded product. The OP T1 model 3.1.2 Confidence discussed in section 3.1 can only detect if all the association rules are satisfiable or not. If there are conflicts, we need to ideally find the Confidence value from association rule mining can be controlled minimum number of association rules, so that if we remove those, through conditional probability of given attributes. Confidence (p ⇒ then all the remaining association rules become consistent. q) = support(p ⇒ q)/support(q) = πp|q . With the same definitions of variables as in the OP T1 model, let X + − us formulate the problem as follows: (Ap∧q,j − πp|q Aq,j )Xj + Zp|q − Zp|q =0 (6) j Decision variables: Above equation controls the confidence (p ⇒ q) by controlling the if ith association rule statement is not satisfied joint selection of p and q and individual selection of attribute q in the 1 yi = configuration. 0 otherwise Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors 26 Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria Objective Function X OP T2 = Minimize yi (11) i Subject to X Aij Xj + Zi+ − Zi− = πi . . . ∀i (12) j X Xj = 1 (13) j yi ≥ Zi+ + Zi− . . . ∀i (14) Figure 2. Column generation procedure to solve OP T1 and OP T2 0 ≤ Xj , Zi+ , Zi− ≤ 1; yi ∈ {0, 1} (15) yi is a binary 0-1 decision variable associated with each associ- question. If no such configuration is found, we conclude that there ation rule (confidence/ support). The variable yi will take value 1 is no feasible configuration which can improve our current objective if Pthere is some deviation between selection of attribute association function value. The aim of the master problem in both model OP T1 Aij Xj and the given rate πi . For any association rule at a time and OP T2 is to find the optimal solution over known configurations j (given columns of the Aij -matrix) and the task of the sub model is only one variable Zi+ or Zi− will have a non zero value. Accord- to add new columns to the Aij -matrix. We will repeat this procedure ingly, yi will take value 0 or 1 from constraint 14. If all association until no configuration is found that is worth considering. In the next rules are consistent, yi must be zero for all i. section, we will discuss the formulation of sub model w.r.t. master In Example2 the OP T2 model will give objective function value problem OP T1 . 1 i.e. if we ignore one association rule then we can satisfy the re- maining ones in the new product configuration. For example if we take only π2 , π3 = 0.6 then we can build a set of feasible configura- 3.4 The configuration generation model (Sub tions which can satisfy both the association rules. model) In linear programming problems, in each iteration of the simplex 3.3 Solution procedure algorithm we compute reduced costs to check if any non-basic Model formulation (OP T1 ) and (OP T2 ) are designed to express all of the solution. To do so in model OP T1 , variable can enter as partP possible configurations (X). This runs into the hundreds of millions! we have to evaluate if − wi ∗ [xi ]j < 0 for any configuration j. There are still two different problems with the optimization model Where wi is dual variable and [xi ]j is the new j th configuration. (OP T1 ) and (OP T2 ): As the value of wi will be known from solving model OP T1 , if we know the coefficient of j th order [xi ]j we can tell whether the 1. How to consider all possible solutions? Very large number of de- given order will improve our objective function or not. Another way cision variables. th 2. How to build Ai,j matrix? Plook at this problem is to build the new j configuration so that to wi ∗ [xi ]j can be maximized. – Do we have to explicitly write all the columns of A? – Can we work with a small set of configurations and add more The Sub-Problem: when needed? Data: wi = Dual variable from the model OP T1 , associated with constraints 2 The key for success here is that Aij needs not to be stored or built B= Set of constraints derived from configuration rules explicitly. Based on the need, configurations can be added to Aij to Decision Variable minimize the objective function. A solution for such a large scale if ith attributes association is true in new configuration j optimization can be found using column generation [8]. We can start 1 [xi ]j = with a possible set of Xj variables. Solve OP T1 or OP T2 to decide 0 otherwise which of those Xj ’s are included in the solution, and then try to [xi ]j = new configuration for j th column of configuration ma- generate a new configuration so as to improve the objective function trix Ai,j value. As discussed in figure 2, the overall problem is divided into two optimization problems. The first problem is to make a selection over Objective: known configurations (stored in Aij − M atrix) such that attribute X association rules can be matched as per objective function. In the next OP T 1Sub = Maximize wi ∗ xi (16) step, we want to know whether there is any feasible configuration i (w.r.t. configuration rules ) which improves the master problem ob- jective function value. Let us assume that we have an ’Oracle’ (sub- subject to: model) which will give us such a configuration each time we ask the B[x] ≤ b (17) 27 Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria (i.e. x is a feasible configuration) xi = {0, 1} (18) In this formulation, we assume that configuration restrictions are modelled as a set of linear constraints as per Eq. 17 [15]. The sub- problem will generate a possible new configuration j. If this new con- figuration j satisfies Eq. 19 the configuration j enter the pool. This will be one column of Aij matrix in OP T1 model. Each solution of OP T 1Sub model will give a feasible configuration after satisfying all configuration rules from constraint 17. Configuration rules can be presented as propositional formulas and then transformed to sets of Figure 3. Implementation flow of customer drive attributes association linear constraints [15]. rule mining X wi ∗ xi ≥ 0 (19) i 4.1 Association rule mining Dual costs are recomputed by solving the model OP T1 and the pro- We use Apriori algorithm implemented in ”arules” package of R to cess terminates when no more configurations are found to be worth derive a list of association rules. Number of attributes present in all taking in. We use IBM ILOG Cplex engine to solve the sub-problem. customer orders are in the range of 500- 1,000. Total number of or- The master (OP T1 ) and the subproblem (OP T 1Sub ) may have to der is in the range of 10K to 1 Million. Even for very high cut off of be solved multiple times before the terminating criteria is satisfied. support and confidence value ( 80%) number of generated associa- tion rules ranges in tens of thousands. One way to reduce the number 3.5 Column generation The procedures discussed in section 3.1 and 3.4 works together to find a set of consistent association rules. Model OP T1 works with a predefined set of configurations and optimizes the current deviation with given association rules target. Model OP T 1Sub is used to find a new configuration so that Model OP T1 objective function value im- proves. Implementing the column generation approach in association rule verification problem is done in the following way: 1 Solve the OP T1 model with current columns of Aij matrix. In the first iteration, a feasible configuration can be used to initialize the A matrix. This iteration is used to get the dual variables which will be used in the new configuration generation model OP T 1Sub . 2 Get dual variable wi from Eq. 2 of the OP T1 Model 3 Set up a sub problem as per Section 3.4 4 Get new column (order as 0-1 vector [x] from solution of the sub problem discussed in section 3.4) 5 This generates a possible new configuration j with dual variable wi P 6 If configuration j satisfies wi ∗ xi ≥ 0 (pricing inequality) then i configuration j enters as j th column of Aij 7 Dual costs are re-computed and the process terminates when no more configurations satisfy the pricing inequality. 4 Implementation Figure 4. System overview: Deriving association rules from prior configurations using R-arules One of our goals is to provide a software system which can 1) ex- tract association rules from given configurations and is 2) able to use new configuration rules (upgraded product) to validate attribute as- of association rules is by limiting the attribute association level. We sociation rules. Figure 3 shows the implementation flow of arriving usually limit the association between 2 or 3 three attributes. Figure customer driven attribute associations for predicting future configu- 4 shows a screen shot of actual system for association rule mining. rations. We use Apriori algorithm through R-arules package to find a The user can select different computational parameters such as min- list of interesting (by support and confidence threshold) association imum support and confidence, rule size, limiting antecedent (left- among product attributes [11]. All derived association rules are then hand-side) and consequent attributes to build the desired association used in the optimization model which is implemented using IBM rules. Association rule which are according to the user specified pa- ILOG Cplex 12.5 and c# .net environment. At the end, a list of all rameters are computed from given configuration set. These associa- consistent association rules is available with the corresponding set of tion rules are used as the target to predict future product configura- configurations. tions. Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors 28 Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria 4.2 Predicting configuration set are then filtered as per their direct dependencies/conflict with con- figuration rules as explain in section 2.2. By doing so, we see a sig- As a next step after computing attribute associations (support and nificant reduction in the number of association rules. These rules are confidence), we build the optimization model as per section 3 to now ready to be validated with respect to the upgraded product. compute configurations as per target input characteristics. Two op- Now we apply optimization model discussed in section 3.1 to build timization models are used: a set of configurations so that association and configuration rules are met together. In most of the cases, not all computed association rules 1 Master model as per section 3.1 (OP T1 ) which models selection are applicable to the upgraded product. Therefore, it is important to of configuration such that sum of absolute deviation from target use only consistent rules to predict the set of future configurations. association rules can be minimized. We applied the optimization model discussed in section 3.1 to build 2 Sub model as per section 3.4 which models all configuration rules such a configuration set. as linear inequalities. An important use of the set of predicted future configurations is to find part demand estimates for future production. In order to see the Both the optimization model receives input from each other in every influence of consistent customer buying behaviour in parts demand, iteration of column generation procedure described in section 3.5. we computed parts frequency associated with order sets, where 1) Optimization models run iteratively until stopping criteria (no im- only consistent attribute associations are used to predict the order set provement to the current solution) is met. At any iteration of column and 2) all attribute associations available after pruning w.r.t. config- generation procedure, sub model gives a single configuration which uration rules are used to predict the order set. The above association is used as new decision variable to the master problem. We have im- rules are also supplemented with a few sales forecasts (at single at- plemented both the optimization model using cplex 12.5. As a result tribute level) to include estimates of new attributes which are not of the optimization procedure, a configuration set is build adhering present in past orders. given association rule targets. In the next section, we will discuss our first computation result with developed models. Order set built with inconsistent association rules Order set built with consistent association rules Parts forecast accuracy in [%] 5 Computational Results 80 In this section, we discuss typical computational parameters and as- sociated numbers with input data and decision variables. We have 60 tested our methods and models mainly on automotive data. The his- torical configurations are analysed at specific granularity (product- 40 line/body style/market/engine type) to reflect current sales planning. Generally, about 500-1000 unique attributes are to be specified in a 20 configuration set. However, not all attributes are available for cus- 0 tomers choice as some of them are related to production. In our anal- Segment1 Segment2 Segment3 ysis, we have considered between 100 and 200 attributes which are available to customers. Three vehicle segments are used to test the methodology that has been developed. Table 4 shows various param- Figure 5. Comparing part demand forecast accuracy between eters of the data segments that we have selected. The number of or- configuration set built with consistent and inconsistent attribute association ders is the number of prior configurations used to find the attribute rules associations (support and confidence). For this experiment, the max- imum number of attributes in an association rule is limited to two i.e. association among two arbitrary attributes are computed. Figure 5 shows the part forecast accuracy of the two scenarios discussed above. The order sets are compared with real customer or- Experiment # of at- # of # asso- # Pruned associ- ders to find the match with respect to estimated part number. About # tributes orders ciation ation rules (after 10,000 part forecasts are compared and we used part demand match- rules applying configura- ing parameter ±10% i.e. if the estimated value of part demand is tion rules) within ±10% of actual demand then this forecast is considered good. Segment1 200 30,000 2,000 1,200 With this measurement, we have compared two order set computed Segment2 120 25,000 1,800 1,300 with consistent and non-consistent attribute association. In figure 5, Segment3 100 10,000 1,500 1,100 for all the 3 cases we see significant improvement in forecast accu- racy when consistent sets of input information are used. Table 4. Computational experiments with three different vehicle segments 6 Related work Data mining techniques in manufacturing system are widely used Starting from a set of configurations and attributes, we use ad- to provide detailed insights regarding processes and products, such ditional parameters such as minimum support and minimum confi- as customer segmentation, production control and quality controls dence to compute attribute association rules. In this experiment, our [7]. In mass customization, the uncovering of aggregated level of aim is to find association rules which are significant (e.g. above min- customer buying information becomes crucial due to high product imum support and confidence support). After applying data mining variety [14]. Data mining techniques such as association rule min- methods, we get a large number of attribute association rules which ing have been used in many applications such as predicting a sub- 29 Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria assembly selection, and lead to significant improvement in order ful- of future configurations. filment process [13]. The main challenge in association rule min- ing technique is how to find useful association from a large set of possible attribute choices [3]. As association rules are derived from REFERENCES historical demands, another challenge is to validate association rules [1] R. Agrawal, T. Imielinski, and A. Swami, ‘Mining association rules with respect to engineering changes to the product. Configuration between sets of items in large databases’, in Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD-93), Washington DC, pp. models which capture configuration restrictions can be used to vali- 207–216, (1993). date the list of association rules. It turns out that validating product [2] KimAllan Andersen and Daniele Pretolani, ‘Easy cases of probabilistic attribute association rules against configuration rules can be formu- satisfiability’, Annals of Mathematics and Artificial Intelligence, 33(1), lated as Probabilistic Satisfiability Problem (PSAT) [2]. Optimization 69–91, (2001). [3] Yves Bastide, Nicolas Pasquier, Rafik Taouil, Gerd Stumme, and Lotfi techniques such as column generation can be used to find a solution Lakhal, ‘Mining minimal non-redundant association rules using fre- of PSAT problem [9]. quent closed itemsets’, in Computational Logic CL 2000, eds., John Another way of building reasoning between product attributes Lloyd, Veronica Dahl, Ulrich Furbach, Manfred Kerber, Kung-Kiu Lau, from known configurations is through feature models [4]. The ba- Catuscia Palamidessi, LusMoniz Pereira, Yehoshua Sagiv, and PeterJ. sic idea is to deduce rules/ constraints from existing product variants Stuckey, volume 1861 of Lecture Notes in Computer Science, 972–986, Springer Berlin Heidelberg, (2000). to support reverse engineering [12]. Feature models, combined with [4] Guillaume Bécan, Razieh Behjati, Arnaud Gotlieb, and Mathieu Acher, configuration rules, can represent hierarchical relations among dif- ‘Synthesis of attributed feature models from product descriptions: ferent product attributes, modelling a complete set of configurations Foundations’, Rapport de Recherche RR-8680, Inria Rennes, (feb implicitly [6]. However, our aim is to build a small set of explicit con- 2015). [5] Guillaume Bécan, Nicolas Sannier, Mathieu Acher, Olivier Barais, Ar- figurations which can be used for production planning. The product naud Blouin, and Benoit Baudry, ‘Automating the formalization of comparison matrix is another intuitive way to highlight the differ- product comparison matrices’, in Proceedings of the 29th ACM/IEEE ences between two products [5]. However, building such a matrix for International Conference on Automated Software Engineering, ASE different customer buying behaviour is a challenging task. ’14, pp. 433–444, New York, NY, USA, (2014). ACM. In our work, we have formulated configuration problem as an opti- [6] Guillaume Bcan, Mathieu Acher, Benoit Baudry, and SanaBen Nasr, ‘Breathing ontological knowledge into feature model synthesis: an em- mization model to give an integrated solution within the column gen- pirical study’, Empirical Software Engineering, 1–48, (2015). eration framework of validating set of association rules. Our model [7] A.K. Choudhary, J.A. Harding, and M.K. Tiwari, ‘Data mining in man- takes support and confidence value to attribute associations to build a ufacturing: a review based on the kind of knowledge’, Journal of Intel- set of configurations adhering given input targets. In case of conflict- ligent Manufacturing, 20(5), 501–521, (2009). [8] Gerard Cornuejols, Milind Dawande, Michel Gamache, Francois ing association rules, the model that has been developed attempts to Soumis, Gerald Marquis, and Jacques Desrosiers, ‘A column genera- find the maximum number of association rules which can be satisfied tion approach for large-scale aircrew rostering problems’, Oper. Res., after considering product configuration changes. 47, 247–262, (February 1999). [9] Fabio G. Cozman and Lucas Fargoni di Ianni, ‘Probabilistic satisfia- bility and coherence checking through integer programming’, Interna- 7 Future work tional Journal of Approximate Reasoning, 58(0), 57 – 70, (2015). Spe- cial Issue of the Twelfth European Conference on Symbolic and Quan- In this paper, we have discussed a framework for learning customer titative Approaches to Reasoning with Uncertainty (ECSQARU 2013). buying behaviour through data mining and optimization-based tech- [10] T. Hadzic, S. Sathiamoorthy, R. M. Jensen, H. R. Andersen, J. Møller, and H. Hulgaard, ‘Fast backtrack free product configuration using pre- niques. In mass customization, due to frequent changes in products, compiled solution space representations’, in Proceedings of the Interna- we are required to validate product attribute associations learnt from tional Conference on Economic, Technical and Organisational aspects customer prior demand. The association rule mining technique when of Product Configuration Systems, (2004). combined with the configuration problem gives the required frame- [11] Michael Hahsler, Bettina Grün, and Kurt Hornik, ‘arules - a computa- tional environment for mining association rules and frequent item sets’, work for calculating consistent and feasible attribute associations. Journal of Statistical Software, 14(15), 1–25, (9 2005). These associations among attributes can be used as inputs for pre- [12] Roberto E. Lopez-Herrejon, Lukas Linsbauer, Jos A. Galindo, Jos A. dicting configurations for future production planning. The proposed Parejo, David Benavides, Sergio Segura, and Alexander Egyed, ‘An framework uses data mining libraries from R to find association assessment of search-based techniques for reverse engineering feature rules. The integrated framework with optimization models provides models’, Journal of Systems and Software, 103(0), 353 – 369, (2015). [13] Efthimia Mavridou, DionisisD. Kehagias, Dimitrios Tzovaras, and the ability to perform tests on many scenarios before using any asso- George Hassapis, ‘Mining affective needs of automotive industry cus- ciation discovered from past data to future product planning. tomers for building a mass-customization recommender system’, Jour- One application of discovering attribute associations is to use them nal of Intelligent Manufacturing, 24(2), 251–265, (2013). for predicting the set of future configurations. As per our initial com- [14] Rainer Paffrath, ‘Mining product configurator data’, in Modern Con- cepts of the Theory of the Firm, eds., Gnter Fandel, Uschi Backes- putational results, such a configuration set results in considerable im- Gellner, Manfred Schlter, and JoergE. Staufenbiel, 110–121, Springer provement in part demand forecasts. Currently, we only consider at- Berlin Heidelberg, (2004). tribute associations which are frequent (e.g. above minimum support [15] Tilakraj Singh and Narayan Rangaraj, ‘Generation of predictive config- or confidence). In the next step, the association rule mining algorithm urations for production planning’, in Proceedings of the 15th Interna- can be enhanced to look for other relations. Also, in current imple- tional Configuration Workshop, eds., Michel Aldanondo and Andreas Falkner, pp. 79–86. CEUR Workshop Proceedings, (2013). mentation we simply remove attribute association which are having [16] C. Sinz, A. Kaiser, and W. Küchlin, ‘Formal methods for the valida- the conflict with each other or with product configuration rules. As a tion of automotive product configuration data’, Artificial Intelligence next step, we will try to develop approaches which can readjust at- for Engineering Design, Analysis and Manufacturing, 17(1), 75–97, tribute association in case of conflicts. For example, we can have the (JAN 2003). Special issue on configuration. [17] Peter Walley, Renato Pelessoni, and Paolo Vicig, ‘Direct algorithms for same selection rate for attributes if they are selected together. Fur- checking consistency and making inferences from conditional probabil- ther computational tests are required to validate and improve our as- ity assessments’, Journal of Statistical Planning and Inference, 126(1), sumptions on attribute associations which can improve the selection 119 – 151, (2004). Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors 30 Proceedings of the 17th International Configuration Workshop September 10-11, 2015, Vienna, Austria