<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Entity Embedding in Artificial Neural Networks: A Novel Approach to Sales Data Analysis and Forecasting</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christina Jayakumaran</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sengole Merlin</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vaishali R Kulkarni</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thompson Stephan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Punitha S</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, Graphic Era (Deemed to be University)</institution>
          ,
          <addr-line>Dehradun, Uttarakhand</addr-line>
          ,
          <country country="IN">India -</country>
          <addr-line>248002</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science and Engineering, Loyola-ICAM College of Engineering and Technology</institution>
          ,
          <addr-line>Chennai, Tamil Nadu</addr-line>
          ,
          <country country="IN">India -</country>
          <addr-line>600034</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Strategic Education Inc.</institution>
          ,
          <addr-line>Texas</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This study delves into the realm of sales forecasting, a critical component for strategic business planning, encompassing staff scheduling, inventory management, and supply chain optimization. At its core, the research investigates the efficacy of advanced predictive analytics in sales forecasting, leveraging historical and current data to unearth patterns that guide business decision-making. The focus is primarily on the application and comparative analysis of two sophisticated algorithms: eXtreme gradient boost (XGBoost) and deep neural networks (DNNs). These methods are explored for their potential to enhance forecasting accuracy using sales data. A notable aspect of this research is exploring entity embedding within the artificial neural network (ANN),highlighting its relevance and application in the context of sales data analysis. This comprehensive approach aims to offer insights into the most effective predictive models for sales forecasting, contributing to the broader field of predictive analytics in business.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;ANN</kwd>
        <kwd>Entity Embedding</kwd>
        <kwd>Sales Forecasting</kwd>
        <kwd>Time Series Analysis</kwd>
        <kwd>XGBoost</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In the rapidly evolving business landscape, the ability to accurately forecast sales has become a
cornerstone for strategic planning and operational efficiency. Sales forecasting, a critical component in
the vast domain of supply chain management, significantly impacts areas like inventory management,
staff scheduling, and supply chain optimization. The accuracy of sales forecasts directly influences a
company’s ability to make informed decisions, manage resources effectively, and maintain a competitive
edge in the market [1]. Despite its critical importance, traditional sales forecasting methods often
fall short in today’s dynamic and complex market environments. These methods, typically grounded in
statistical analysis of historical data, struggle to adapt to the nonlinear and evolving patterns of
consumer behavior and market trends. The limitation of traditional approaches in handling large and
varied datasets underscores the need for more advanced and sophisticated forecasting techniques. In
response to these challenges, this paper introduces machine learning as a transformative solution for
enhancing the accuracy and dynamism of sales forecasts. Machine learning, with its capability to
process and learn from vast amounts of data, presents an opportunity to develop more robust and
adaptable forecasting models. Among the various machine learning algorithms, XGBoost and DNNs are
notable for their efficacy in predictive modeling tasks. This research aims to explore and evaluate
the application of these two algorithms in the context of sales data analysis, offering a comparative
analysis of their performance and suitability.</p>
      <p>A novel aspect of this study is the incorporation of entity embedding within ANN for sales data
analysis. Entity embedding, a technique for transforming categorical variables into numerical forms, is
particularly relevant in sales forecasting, where data often comprises a mix of categorical and numerical
information. By integrating entity embedding, the research aims to enhance the predictive capability of
ANNs, enabling them to handle the intricacies of sales data more effectively</p>
      <p>The primary objective of this paper is to present a comprehensive analysis of XGBoost and DNNs
in sales forecasting, highlighting the benefits and limitations of each approach. Additionally, the
paper explores the innovative application of entity embedding within ANNs, aiming to contribute to
the broader field of predictive analytics in business. The structure of the paper includes a detailed
examination of existing sales forecasting techniques, the methodology for implementing the machine
learning models, and a comparative analysis of the performance of XGBoost and DNNs, culminating in
a discussion of the implications and potential applications of entity embedding in sales data analysis [2].
Through this research, we aim to provide valuable insights and methodologies for businesses seeking
to improve their sales forecasting capabilities in the face of rapidly changing market dynamics.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>The use of machine learning (ML) techniques in predictive analytics for businesses has been widely
studied. For instance, a Machine Learning Model and Rule Engine can be used to predict sales forecasts
on historical data, which can be fed to a front-end application for processing [3]. This application will
show the predicted data for the following week. The XGBoost Sales prediction model created for the
Walmart dataset was used as a reference point for the model presented in this chapter [4] . This paper
outlines the advantage of using an XGBoost model as the corresponding error scores are 16.3% and
15.4% lower than Linear Regression and Ridge regression models respectively. In [5] the application of
the XG Boosting algorithm is implemented. The insights on feature engineering through the use of
XGBoost feature importance ranking, have been implemented in the proposed model.</p>
      <p>XGBoost is frequently cited as a highly effective ML model that can be applied to many problems.
However, the key to developing an efficient model is to choose the right features beforehand, by
performing data analysis and feature engineering techniques [6]. In recent years, XGBoost has been
widely used in forecasting as it performs comparatively better than classic regression models. In other
words, XGBoost outperforms the existing models by performing better in a shorter period [7]. In
comparison to traditional models such as Linear regression models and Support Vector Machines, ANN
also perform well. On applying ANN to Walmart Sales data ANN were found to have the lowest RMSE
scores when compared to the previously mentioned traditional models [8]. In fast-paced industries,
quick and reliable sales forecasting models are an invaluable resource. One such example is the fashion
or retail industry. Sales predictions using NN models in this industry have proven to perform well,
with low root mean square percent error (RMSPE) and mean square error (MSE) scores [9]. Another
essential practice in the retail industry is promotional sales, which can be cold-start forecasted using
gradient boosting algorithms [10].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Existing Systems</title>
      <p>
        There are two types of sales forecasting used by most companies: quantitative and qualitative Sales
Forecasting. Quantitative sales forecasting requires numerical data, so some commonly used data
includes consumer spending and economic trends. In the simplest form, quantitative sales forecasting
uses linear equations to calculate predicted values as given as given in (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ).
      </p>
      <p>
        =  0 +  1 +  22 + . . . + 
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
As more data is collected, forecasting models become more complex, so generic linear models will not
predict reliable forecast values. Quantitative models are also not useful for businesses with no
historical data, or with limited historical data. These models perform much better with data from
established businesses. Qualitative Sales forecasting is used in multiple applications with a bigger scope
in comparison with quantitative sales forecasting. Qualitative sales forecasting is classified as the jury
of executive opinion, delphi technique, sales force composite, and surveyor of buyer intentions.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed Architecture</title>
      <p>For the proposed system, sales data was collected from 1115 Rossman stores in Germany. This data was
wrangled and analyzed. This system will predict the sales of each store in the company for the next 6
weeks. The different activities in promoting, distributing, and consumption of various products and
services are covered under economic features. The forecasting of sales is based on various economic
features, and it covers several visual and display options called temporal features. The other parameters
include promotion policies and steps in promoting the product, the season of the sale that comprises
school holidays, locations and other competing stores, the different times of the year, etc. The overall
proposed architecture for the sales data forecasting and analysis is depicted in Figure 1. Commonsense
reasoning and analytic knowledge methods are used for building the model.These two methods are
used during the analysis of the data and are helpful in the development of a definitive conclusion. Two
machine learning models are developed, one named XGBoost specific to the Rossmann company, and
the other a neural network generic to all companies for their sales forecasting. The details of distribution of
stores over assortments is as given in Table 1.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Implementation</title>
      <p>5.1. Data
The data used here has been collected from Kaggle, a popular website for Data Science related projects.
The dataset of Rossmann Drugstore that was published for a competition for predicting sales is used.
The Google Trends data, weather data, and state (location of the store) data of each store of Rossmann
on each day were used as external factors to predict the sales. The training data contains 1, 017, 210
records and the testing data contains 41089 records. On performing Exploratory Data Analysis (EDA),
there are some significant correlations. Figure 2 denotes the distribution of the store types and its effects on
the sales value. Figure 3 depicts the day of the week and its correlation with the sales value for eachof
the store type. The number of sales and the number of customers have a strong positive correlation,
which means that more customers lead to better sales. This, however, was an expected trend. If the
store offers a promotion (Promo 1), then the number of customers increases, which leads to more sales.
Adversely, if the store offers a consecutive promotion (Promo 2), then the number of customers remains the
same or even decreases slightly. The Pearson’s correlation matrix of the attributes is depicted in Figure
4. The weekly sales status trend and yearly sales trend is depicted in Figure 5 and Figure 6
respectively. The following points give a general overview of the data as a whole:
• The most selling and crowded Store Type is A.
• The best Sale per Customer (Store Type D) shows the higher Buyer Cart. We could also assume
that the stores of this type are in rural areas, so customers prefer buying more but less often.
• Low Sale_Per_Customer amount for Store Type B shows the possibility that people shop there
essentially for small things. This can also indicate the label of this store type - “urban” - as it is
more accessible to the public, and customers don’t mind shopping there from time to time during the
week.</p>
      <p>• Customers tend to buy more on Mondays when there is one promotion running (Promo) and on</p>
      <p>Sundays when there is no promotion at all.</p>
      <p>• Promo2 alone does not seem to be correlated to any significant change in the Sales amount.</p>
      <sec id="sec-5-1">
        <title>5.2. Time Series Analysis of Data</title>
        <p>A time series analysis is done using the different store types but not by refereeing the individual store.
Overall sales seem to increase, but not for Store Type C (a third from the top). Even Though Store Type
A is the most selling store type in the dataset, it can follow the same decreasing trajectory as Store
Type C did. The non-randomness of the time series and high lag-1 are common things for each pair of
plots. The probability is that these two entities may probably need a higher order of differencing d/D.
Type A and type B: Both types show seasonality at certain lags. For type A, it is each 12th observation
with positive spikes at the 12 (s) and 24(2s) lags and so on. For type B it’s a weekly trend with positive
spikes at the 7(s), 14(2s), 21(3s) and 28(4s) lags. Type C and type D: Plots of these two types are more
complex. The auto correlation and partial correlation of each store is shown in Figure 7. It seems like
each observation is correlated to its adjacent observations.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.3. XGBoost</title>
        <p>
          Boosting is considered to be an optimization method. Gradient boosting is a machine learning algorithm that
is frequently used in regression and classification. It produces prediction models, mostly in the form of
decision trees. Weak learners can be combined to create one strong learner. XGBoost, in particular, is used
for supervised learning tasks. The XGBoost makes use of the term regularization to manage the
complexity of the model. With the help of regularization, the problem of overfitting can be prev ented.
A popular tree ensemble model is an XGBoost and it consists of a set of classification and regression
trees (CART). The tree includes the family members as distinct leaves and each leaf is allotted a score
value. The CART includes a real score along with the decision value. The use of real scores helps in
interpreting value in a better way than just the classification. XGBoost is optimized for boosting tree
algorithms. Figure 8 indicates the order of importance of each feature and the RMPSE values against
the iterations are denoted. The XGBoost provides a better framework than gradient boosting and it is
faster than existing gradient boosting algorithms. XGBoost is based on the linear solver model, and it
includes various objective functions. The functions make use of regression, classification, and ranking
methods. The objective function used is given in (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ).
        </p>
        <p>
          ( ) =  ( ) +  ( )
where  ( ) is a training loss and  ( ) refers to regularization.
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
(a) Feature Importance based on XGBoost
(b) RMSPE - During the Training Phase of the XGBoost
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>5.4. Artificial Neural Network</title>
        <p>
          In a neural network, the inputs are multiplied by their corresponding weights and then summed together.
This is depicted in 9. The sum is passed through an activation function which selects which neurons to
activate within the network. The corresponding output is a single value.The output function  is
represented in (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ).
        </p>
        <p>
          =  (0 + 11 + 22 + 33 + . . . + )
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
The activation functions considered are the sigmoid and ReLU function. The sigmoid function maps
values between 0 to 1. The sigmoid function is frequently used in machine learning models that work
on probabilities. This function is both monotonic and differentiable. Rectified Linear Unit, or ReLU,
is a widely used activation function. In ReLU, the gradient is positive for any positive input values,
but there is no gradient (or gradient = 0) for negative values. ReLU ranges from 0 to infinity. Both the
function and its derivative are monotonic.
        </p>
      </sec>
      <sec id="sec-5-4">
        <title>5.5. Entity Embedding</title>
        <p>Entity embedding involves the mapping of categorical variables into Euclidean spaces. The neural
network learns this mapping during the training phase. Embedding reduces memory usage and increases
the speed of neural networks. Similar values are mapped in the embedding space for disclosing the
intrinsic properties of the categorical variables. When the data sets have large high cardinality features,
the other methods are overfit and cannot be used. In this study, the data sparsity problem is overcome
by representing the discrete category features in a continuous space. The category similarity is reflected
using the distance between category points. The idea is to utilize the data points close by, which is used
for approximating missing data points.</p>
        <p>In Figure 10 and Figure 11, the focus is on the overall sales performance of a single store (Store ID =
1). It can be seen that Saturdays are the most profitable days of the week. This figure highlights sales
optimization techniques at the granular level (in a single store, on a particular day). Also the focus is
given to how each store type performs. Store Type A has the highest percentage of overall sales with an
overwhelming 53.9%. The performance measurement based on the RMSPE metric and time is expressed in
Table 2.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Two models based on machine learning models namely XGBoost and entity embedding neural network
are used for sales forecasting in stores. The task involved predicting the sales on any given day at any
store. The studied previous work is adapted in the domain including time series algorithms in machine
learning. The patterns and outliers are identified using analysis of the data. The analysis has boosted
the prediction algorithm. XGBoost has performed best at prediction and slightly better than neural
networks. However, a neural network is suggested to be used for forecasting sales of those companies
whose sales trend deviates from that of Rossmann’s sales used in this project as an experimental dataset.
The major parameter used for fitting is the measurement of the overall prediction error rather than the
specific decomposition of error into bias and variance. The uncorrelated sales responses in various data
stores are presented using RMSPE.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgement</title>
      <p>Authors acknowledge the support received from Graphic Era Deemed to be University, Dehradun, India.
[5] Y. Niu, Walmart sales forecasting using xgboost algorithm and feature engineering, 2020
International Conference on Big Data &amp; Artificial Intelligence &amp; Software Engineering (ICBASE) (2020)
458–461.
[6] S. Ghosh, C. Banerjee, A predictive analysis model of customer purchase behavior using modified
random forest algorithm in cloud environment, in: 2020 IEEE 1st International Conference for
Convergence in Engineering (ICCE), 2020, pp. 239–244.
[7] R. P, S. M, Predictive analysis for big mart sales using machine learning algorithms, in: 2021
5th International Conference on Intelligent Computing and Control Systems (ICICCS), 2021, pp.
1416–1421.
[8] J. Chen, W. Koju, S. Xu, Z. Liu, Sales forecasting using deep neural network and shap techniques,
2021 IEEE 2nd International Conference on Big Data, Artificial Intelligence and Internet of Things
Engineering (ICBAIE) (2021) 135–138.
[9] C. Giri, Y. Chen, Deep learning for demand forecasting in the fashion and apparel retail industry,</p>
      <p>Forecasting 4 (2022) 565–581.
[10] C. Aguilar-Palacios, S. Muñoz-Romero, J. L. Rojo-álvarez, Cold-start promotional sales forecasting
through gradient boosted-based contrastive explanations, IEEE Access 8 (2020) 137574–137586.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ahlemeyer-Stubbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coleman</surname>
          </string-name>
          ,
          <string-name>
            <surname>A Practical</surname>
          </string-name>
          <article-title>Guide to Data Mining for Business and Industry</article-title>
          , 1st ed., Wiley Publishing,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Seyedan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mafakheri</surname>
          </string-name>
          ,
          <article-title>Predictive big data analytics for supply chain demand forecasting: Methods, applications</article-title>
          , and research opportunities,
          <source>Journal of Big Data</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saqib</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Alyas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ur</given-names>
            <surname>Rehman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Saeed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zeb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zareei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <article-title>Effective demand forecasting model using business intelligence empowered with machine learning</article-title>
          ,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          )
          <fpage>116013</fpage>
          -
          <lpage>116023</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>dairu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shilong</surname>
          </string-name>
          ,
          <article-title>Machine learning model for sales forecasting by using xgboost</article-title>
          ,
          <source>in: 2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>480</fpage>
          -
          <lpage>483</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>