Prediction of Customer Behavior using Machine
                   Learning: A Case Study

                     Tran Duc Quynh and Hoang Thi Thuy Dung


           International School-Vietnam National University, Hanoi, Vietnam
                              ducquynh@vnu.edu.vn
                               17071347@isvnu.vn


        Abstract. Understanding what customers want and need- and, ideally,
        anticipating their needs - is a constant challenge for marketers. With the advent
        of machine learning, researchers have successfully it for some problems in
        customer behavior analysis. For a specific problem, we need to study machine
        learning models and data preprocessing techniques to get the highest accurate
        solution. In this paper, we consider the problem and the dataset given in Wang et
        al. (2017). The task is to predict the decision of customers with the question of
        whether they accept a restaurant coupon recommended by an in vehicle system,
        given a set of input about customer’s driving context. Most of attributes being
        categorical and missing values makes the problem more difficult. We
        investigated methods to transform categorical attributes to numerical attributes,
        handle the missing values and then applied various classification models. The
        result showed that the proposed approach overperforms the previous methods
        given in Wang et al. (2017). Besides, we also obtain some interesting findings
        about the used methods and the impact of variables on the customer decision.

        Keywords: Customer Behavior, Classification Models, Bagging


1       Introduction
Customer behavior analysis is very important for an enterprise. It helps companies
understand what the customer wants and needs. Hence, the company can improve the
service or offer a suitable product to the customer. Thanks to customer behavior
analysis, the company can increase their sales and be more successful in business.
Nowadays, digital transformation affects all activities of enterprises. Data of products
and customers can be collected via information systems. These data may be used to
understand more deeply about the customer's intention by using tools in data science.

    
      Copyright © by the paper’s authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0). In: N. D. Vo, O.-J. Lee, K.-H. N.
Bui, H. G. Lim, H.-J. Jeon, P.-M. Nguyen, B. Q. Tuyen, J.-T. Kim, J. J. Jung, T. A. Vo
(eds.): Proceedings of the 2nd International Conference on Human-centered Artificial
Intelligence (Computing4Human 2021), Da Nang, Viet Nam, 28-October-2021,
published at http://ceur-ws.org
   
       Corresponding author.
                        Prediction of Customer Behavior using Machine Learning 169

    In recent years, the use of machine learning methods to study problems in customer
behavior analysis has become more and more attractive. The problems can be
considered as supervised or unsupervised models in machine learning. The used models
may be clustering models, regression models or classification models, such as decision
tree, random forest, support vector machine, neural network, logistic regression,…Safia
et al. (2015) used decision trees to find a number of important controllable
characteristics: interactions, playtime, location and understand what influences a
customer’s purchase choice. Larivière, B., and Van den Poel, D. (2005) proposed a
method based on random forest for solving the customer retention prediction problem.
In 2020, V. Shrirame, J. Sabade, H. Soneta and M. Vijayalakshmi (2020) employs data
visualization, natural language processing and machine learning to explore the
demographic of an organization. Dou, X (2020) used an ensemble learning to predict
online purchase behavior of customers. In 2017, Wang et al. proposed a Bayesian
framework for learning rule sets to solve a classification problem and application for
predicting the customer intention in in-vehicle recommendation systems. S. Cao et al.
(2019) proposed a deep learning model to analyze customer churn.
    Although machine learning is promising for solving customer behavior prediction
problems, the number of research in this field is limited. Besides, we need to preprocess
data before applying a machine learning model. Moreover, there does not exist a good
method for every dataset. Hence, the study of preprocessing methods and machine
learning models to improve the results for a specific dataset is necessary. In this paper,
we consider the problem of customer intention prediction and the data given in Wang
et al. (2017). Our approach is to tackle directly the original dataset instead of dividing
it in 5 folders. We also proposed two methods to handle missing values and then used
some machine learning models for the resulting dataset. The comparison based on AUC
showed that our approach is better than the methods given in Wang et al. (2017). The
performance of machine learning models was explored to find the best models and then
detected important features that have strong effects on the customer decision.
    The paper is structured as follows. Section 2 introduces the problem and the dataset
used in this research. Section 3 is reserved to present the methodology while results are
reported in Section 4. Conclusion is given in Section 5.


2      Problem Statement and Data

2.1    Problem Statement
The context of enhancing the prediction of customer behavior has been broadened in
many aspects and is going further with the help of advanced technology. However,
there is still less research about specific solutions for business when they consider
which to choose in order to solve their problems among a wide world of machine
learning. This study is a new approach that covers a variety of prediction models and
improves their accuracy. Hence, the marketers will have a large and clear vision on how
machine learning can help them and have more selection on applying machine learning
models in developing their businesses. We investigate the data collected from a system
which is put on the car and makes recommendations on the coupon from local
businesses. As the output data is binary of whether the driver accepts the coupon,
170 Quynh and Dung

classification models would be chosen to make analyses and predictions like decision
tree, random forest, support vector machine, etc. The data is also experimented through
several models of preprocessing to improve performance as well as predicting with
calculated accuracy. The results then are compared to generate the best model that
would be helpful in predicting the customer behavior. Moreover, the importance of each
feature could be indicated to take a deeper understanding aiming to advance the overall
system.

2.2    Data
In-vehicle coupon recommendation dataset which contains 23 attributes with 12.684
records was gathered using an Amazon Mechanical Turk poll published at UCI website
in 2017 https://archive.ics.uci.edu/ml/datasets/in-vehicle+coupon+recommendation. The
survey goes over various driving scenarios, such as the destination, present time,
weather, passenger, and so on, before asking the person if he will accept the voucher if
he is the driver.
   The goal of the prediction problem is to forecast whether a client will accept a
coupon for a specific venue based on demographic and contextual factors. Replies that
the user will drive there "right away" or "later before the coupon expires" are labeled
"Y = 1", whereas answers of rejecting the coupons are labeled "Y = 0". We are looking
into five different categories of coupons: pubs, takeaway food restaurants, coffee shops,
low-cost restaurants (under $20 per person), and high-cost restaurants (between $20
and $50 per person). The difficulty of this problem come from the attributes. Most of
attributes (19/ 23 attributes) are categorical features and there are five attributes
containing missing values.


3      Methodology

3.1      Preprocessing data

From having an overview over the dataset, we notice that there are a number of missing
values and the data is quite balanced with the acceptance percentage being
approximately 50%. Furthermore, to experiment with the models, the data must be
numerical type. Hence, we propose several methods to handle those problems.
   To transfer variables from categorical to numerical type, we use the combination of
two methods-integer encoding (mapping) and one-hot encoding (get_dummies
function). Mapping is used for ordinal attributes such as age, time, expiration, income,
etc. while get_dummmies is used for nominal attributes. Dealing with missing values,
we use 2 methods which are imputing by mode and imputing by random forest. Using
mode imputing, the missing values are replaced by the mode value in the attribute range
while using random forest imputing, the missing values are replaced by prediction
values using random forest classification model. Besides, we also add scale to the range
of value to see if it can enhance the model performance. The method we use is min-
max scaling, which consists in rescaling the range of features to scale the range in [0,
                           Prediction of Customer Behavior using Machine Learning 171

1]. With the mix of above mentioned methods, we create 4 sub-datasets described in
table 1.

                                    Table 1. Building 4 sub-datasets.

      Sub-datasets             Mode imputing                         Random forest imputing
      No scale                 d1                                    d3
      Scale                    d2                                    d4


3.2           Machine Learning Method

In this research, we use classification methods of machine learning. The task of
approximating the mapping function from input variables to discrete output variables is
classified predictive modeling. The basic goal is to figure out which category or class
the new data belongs to. For the experiments, we adopted several different classification
approaches that were selected due to their extensive use, well-understood behavior, and
promising results in a range of categorization tasks. Our goal was to examine classifiers
that differ in terms of the functional forms of classification boundaries they may learn,
as well as classifiers that are based on distinct assumptions about the relationship
between distinct features. We studied the performance of a set of classification models
including decision tree, random forest, support vector machine (SVM), feedforward
neural networks (MLP), logistics regression, Bagging, AdaBoost, XGBoost.

3.3      Estimating Model Performance Measurement
   We use the k-fold cross-validation approach with 3 measurements: accuracy, f1
score, and AUC (the Area Under The ROC Curve). Accuracy is the simplest intuitive
performance metric. It is just the ratio of properly predicted observations to all
observations. F1 score is the weighted average of precision and recall. As a result, this
score considers both false positives and false negatives. Area Under Curve (AUC) score
represents the degree or measure of separability.


4        Results and Evaluation

All accuracy results are shown in Table 2 below:

                                     Table 2. Performance results.

                                                              Dataset

 Model               Measure        d1              d2               d3            d4

 Decision            acc            0.69            0.69             0.69          0.69
172 Quynh and Dung


 tree          f1             0.73          0.73           0.73          0.73

               auc            0.68          0.68           0.68          0.68

 Random        acc            0.76          0.76           0.75          0.75
 forest
               f1             0.79          0.79           0.79          0.79

               auc            0.83          0.83           0.83          0.83

 Logistic      acc            0.69          0.69           0.69          0.68
 regression
               f1             0.74          0.73           0.74          0.73

               auc            0.74          0.74           0.74          0.74

 SVC           acc            0.69          0.69           0.69          0.69

               f1             0.74          0.74           0.74          0.74

               auc            0.74          0.74           0.74          0.74

 MLP           acc            0.74          0.74           0.74          0.75

               f1             0.78          0.78           0.78          0.78

               auc            0.81          0.81           0.81          0.81

 Bagging       acc            0.76          0.76           0.76          0.76

               f1             0.80          0.80           0.80          0.80

               auc            0.83          0.83           0.83          0.83

 Adaboost      acc            0.68          0.68           0.68          0.68

               f1             0.73          0.73           0.73          0.73

               auc            0.74          0.74           0.74          0.74

 XGBoost       acc            0.72          0.72           0.72          0.72

               f1             0.77          0.77           0.77          0.77

               auc            0.79          0.79           0.79          0.79

The accuracy results range is from 68% to 76%. They are quite equal among 4 datasets.
Therefore, the type of imputing missing value and the scale has less effect on accuracy
of predicting values.
                       Prediction of Customer Behavior using Machine Learning 173


                               Fig. 1. F1 score of models.

From figure 1, we can see that the f1 score of 8 models is quite close to each other with
the values of greater than 70%. The bagging model has the highest accuracy with 80%.
The second highest rank model is random forest. Then that is the demonstration for the
help of bagging in enhancing the performance of random forest classification.
    Putting in comparison with the most related work done with the same dataset of
Wang Tong and partners in Wang et al.(2017), we found that we have reached a better
result with 83% of the bagging model while their result is approximately 73% with
Bayesian rule sets in using the same accuracy measurement - AUC. So, this study can
be considered as an enhancement in the accuracy of prediction for marketers. As results,
we suggest using ensemble learning methods as Bagging for the dataset and for
customer’s purchasing intention prediction problem in gerenal.
    For deeper understanding about the data and the scenario, we use the best model -
bagging with random forest based to indicate the importance of each feature (see Fig.
2).


                               Fig. 2. Feature importance.
174 Quynh and Dung

Since coupon is the type of coupon that drivers were recommended by the system, then
it is the most important feature. Two other important features can be mentioned are
expiration, and CoffeeHouse. car and toCoupon_GEQ5 are two features having the
least importance so they should be removed from the model. We notice that with the
five types of coupons (bar, coffee house, take away, expensive restaurant, and less
expensive one), the coffee house coupon has the most attribution to the results that
means customers tend to accept that kind of coupon. The others in descending order are
bar, take away, less expensive restaurant, and then the expensive one.


5      Conclusion
All above results bring us to a conclusion about the best model among 8 ones that fits
the given dataset of in-vehicle coupon recommendation systems. Before going to
implement models for analyzing and predicting values, the dataset should be
preprocessed by filling missing values by mode or random forest (as suggested), for
addition, it is not necessary to be scaled. The model should be used to give the best
results performance is bagging classification with random forest the based estimator.
   The challenge of forecasting consumer purchasing decisions using easily
quantifiable aspects of the purchasing context was investigated in this research. The
findings and discussion showed that the proposed methods are effective for the given
dataset (and promising for the problem of customer’s intention prediction in general).
   This research still has its limitations of only covering several models in a numerous
and diverse world of machine learning. However, it can be a source to bring the idea
for further creative research along with the upward ever-changing trend in the demand
and coordination of technology with other industries.

References
 1. Asghar, N. 2016. Yelp dataset challenge: Review rating prediction. arXiv preprint
    arXiv:1605.05362.
 2. Buckinx, W.; Verstraeten, G.; and Van den Poel, D. 2007. Predicting customer loyalty using
    the internal transactional database. Expert Systems with Applications 32(1):125–134.
 3. Ding, Y., DeSarbo, W. S., Hanssens, D. M., Jedidi, K., Lynch Jr., J. G., & Lehmann, D. R.
    (2020). The past, present, and future of measurements and methods in marketing analysis.
    Marketing Letters, https://doi.org/10.1007/s11002-020-09527-7.
 4. Dou, X. (2020). Online purchase behavior prediction and analysis using ensemble learning.
    In 2020 IEEE 5th International conference on cloud computing and big data analytics,
    ICCCBDA 2020 (pp. 532–536). https://doi.org/10.1109/icccbda49378.2020.9095554.
 5. Kaefer, F.; Heilman, C. M.; and Ramenofsky, S. D. 2005. A neural network application to
    consumer classification to improve the timing of direct marketing activities. Computers &
    Operations Research 32(10):2595–2615.
 6. Ladas, A.; Garibaldi, J.; Scarpel, R.; and Aickelin, U. 2014. Augmented neural networks for
    modelling consumer indebtedness (sic). In Proc. IEEE International Joint Conference on
    Neural Networks 3086–3093.
 7. Larivière, B., and Van den Poel, D. 2005. Predicting customer retention and profitability by
    using random forests and regression forests techniques. Expert Systems with Applications
    29(2):472–484.
                          Prediction of Customer Behavior using Machine Learning 175

 8. S. Cao, W. Liu, Y. Chen and X. Zhu, "Deep Learning Based Customer Churn Analysis,"
    2019 11th International Conference on Wireless Communications and Signal Processing
    (WCSP), 2019, pp. 1-6, doi: 10.1109/WCSP.2019.8927877.
 9. Sifa, R.; Hadiji, F.; Runge, J.; Drachen, A.; Kersting, K.; and Bauckhage, C. 2015. Predicting
    purchase decisions in mobile free-to-play games. In Proc. Conference on Artificial
    Intelligence and Interactive Digital Entertainment.
10. V. Shrirame, J. Sabade, H. Soneta and M. Vijayalakshmi, "Consumer Behavior Analytics
    using Machine Learning Algorithms," 2020 IEEE International Conference on Electronics,
    Computing and Communication Technologies (CONECCT), 2020, pp. 1-6, doi:
    10.1109/CONECCT50063.2020.9198562.
11. Wang, Tong, Cynthia Rudin, Finale Doshi-Velez, Yimin Liu, Erica Klampfl, and Perry
    MacNeille. 'A bayesian framework for learning rule sets for interpretable classification.' The
    Journal of Machine Learning Research 18, no. 1 (2017): 2357-2393.
12. Xie, Y.; Li, X.; Ngai, E.; and Ying, W. 2009. Customer churn prediction using improved
    balanced random forests, Expert Systems with Applications 36(3):5445–5449.