A Knowledge-Based Platform for the
        Classification of Accounting Documents?

    Alessia Amelio[0000−0002−3568−636X] , Alberto Falcone[0000−0002−2660−1432] ,
     Angelo Furfaro[0000−0003−2537−8918] , Alfredo Garro[0000−0003−0351−0869] ,
                       Domenico Saccà[0000−0003−3584−5372]

                         DIMES - University of Calabria
                            87036 – Rende (CS), Italy
       {a.amelio, a.falcone, a.furfaro, a.garro, sacca}@dimes.unical.it


        Abstract. Due to the lack of standardization, the task consisting in the
        classification of invoices’ billing entries for accounting purposes has yet
        not been made fully automatic despite the diffusion of e-invoicing sys-
        tems. Each accounting firm adopts its own conventions also on the basis
        of the specific business domain of the invoice owner. Here, we describe
        a knowledge-based software platform devised to adequately address this
        issue. The paper focuses on the evaluation of the most appropriate clas-
        sification algorithms which are suitable to be employed in an adaptive
        meta-learning approach. The effectiveness of the selected algorithms is
        experimentally assessed by running them on a real dataset of invoice
        entries.

        Keywords: Automatic Invoice Processing · Meta-Learning · Classifica-
        tion.


1     Introduction

Over the years, business processes and software systems supporting accounting
have reached a high level of maturity and, at the same time, have allowed the
producers of such systems to achieve an excellent and consolidated level of diffu-
sion in the market. The ever increasing pervasiveness of new ICT technologies,
mainly due to the diffusion of mobile devices and “smart” and Internet-enabled
objects, has determined, in all application domains, a strong need for innovation
for IT products and services, which in turn, has contributed to the increase in
competitiveness among companies. Also in the specific domain of systems and
?
    Supported by project P.O.R. CALABRIA FESR-FSE 2014-2020 - “Smart Electronic
    Invoices Accounting – SELINA” (CUP J28C1700016006)

    Copyright c 2019 for the individual papers by the papers’ authors. Copying per-
    mitted for private and academic purposes. This volume is published and copyrighted
    by its editors. SEBD 2019, June 16-19, 2019, Castiglione della Pescaia, Italy.
services supporting the processing of accounting documents, the demand for in-
novation has determined the definition of effective strategies to evolve the offer
in order to maintain the customer portfolio and potentially increase it.
     In this context, one of the aspects most susceptible of innovation is the clas-
sification of invoices’ billing entries, a process which is not fully automated. One
of the main issues is the lack of a standard for the categorization and association
of the information related to the invoices entries in the corresponding accounting
categories. Often, each accounting firm adopts its own conventions by modifying
and adapting them to its client portfolio, to its internal management processes
and to other contingent needs.
     Most of the scientific works related to the processing of accounting documents
has been devoted to the extraction of information from scanned documents, e.g.
in [6] a flexible form-reader system has been devised for such purpose. The
work described in [4] focuses on the feature extraction process for the supervised
classification of scanned invoices obtained from printed documents. An approach
based on incremental learning techniques, where some experiments were carried
out by applying it to the classification of a small data set of invoices (a learning
set of 324 documents, a testing set of 169 documents and 8 classes), has been
proposed in [10].
     This paper proposes a knowledge-based software platform devised to address
the issue of automatic classification of invoices entries which is able to tackle the
lack of standardization by allowing the building of specific classification models
tailored on the peculiarities/preferences of a given accounting firm. The most
appropriate classification algorithms suitable to be employed in such a domain
are evaluated and their effectiveness is experimentally assessed by running them
on a dataset of 910874 invoice entries. The architecture has been devised to sup-
port a meta-learning approach consisting in the integration of more classification
algorithms in order to achieve better prediction performances.
     The rest of the paper is organized as follows. Section 2 presents the archi-
tecture of the proposed classification platform. Section 3 describes the adopted
methodology and the dataset used for its validation. Section 4 illustrates the
experimental setting whose results are shown in Section 5. Finally, Section 6
draws the conclusions.


2   Platform Architecture

Figure 1 shows the architecture of the proposed platform. It is characterized by
a set of modules, each of which is in charge of specific functionalities supporting
the classification process of invoices entries.
    The Data Pre-Processing module aims at performing from the raw input
data: (i) cleaning, (ii) numerical feature extraction, and (iii) normalisation. The
clean data are used to feed both the Classification and Knowledge update mod-
ules. The first module is composed of a set of classification models, each specif-
ically defined to take into account the details of a given user profile along with
related product information (see Section 3) during the classification process (the
             Raw Input data


          Data Pre-Processing            Clean data


                                       Classification

                                   Model 1
                                                 Meta            Proposed labels
           Learning                Model 2       Model

                                   Model k
                                                                  User’s choice


                                              Knowledge update        label

       Knowledge base


                      Fig. 1. The architecture of the platform


predicted labels depend on both these information). The second module allows,
starting from the Clean data and User’s choice, to update the Knowledge base
that is exploited by the Learning module to re-train the classification models
according to the user profiles.
    The labels are predicted by the Classification module through the Meta Model
that combines the contributions of each classification model. The Classification
module produces a ranked list of labels which are fed to the end-user that,
in turn, either selects one of them or specifies a new one. In both cases, the
resulting labels along with the input data are used by the Knowledge update
module to update the Knowledge base. This feedback allows both to increase the
prediction performance of the classification process and to adapt it to concept
drift issues [15].


3   Data and Methodology
The original dataset consists of a sample of 910874 anonymous invoices’ billing
entries with 92 numerical and string attributes related to products sold in the
period 2016-2018. Specifically, the first 8 are related to the kind of invoice’s
billing entry along with its occurring date; the following 23 attributes concern
the product supplier; the following 2 ones affect the details of the payment;
the next 20 attributes concern the recipient of the goods and related transport
details. The latest 39 attributes are related to the product details, i.e. weight,
description and the user (invoice owner) information. The group of the invoice
entry, which has been defined by the dataset owner, represents the class attribute,
for a total of 71 groups.
    The dataset keeps information for the following 7 user profiles: (i) Driver, (ii)
Change of heading, (iii) Direct agent, (iv) Fair event, (v) Supplier, (vi) Internet
and (vii) Office.
    Starting from the original dataset, the following 2 string attributes have been
considered for the analysis: (i) business activity type (e.g. “cafe” and “hotel”),
and (ii) description of invoices’ billing entry.


3.1   Preprocessing

The dataset has several missing, invalid and inaccurate values, particularly in
correspondence with the product information (e.g. negative weight values, non-
homogeneous descriptions and units of measure). To address this issues, it was
necessary to perform data cleaning and preparation procedures through the Data
Pre-Processing module in order to make this data usable for the subsequent
classification steps [11].
    The data pre-processing task consists of three parts: (i) data cleaning, (ii)
feature extraction, and (iii) normalization.


Data Cleaning. The defined data cleaning process, which aims to eliminate
inaccurate and corrupt data, is composed of three steps which are performed on
the 2 selected string attributes:

 1. Remove duplicate, redundant and invalid text;
 2. Remove useless words and sentences for classification purposes (e.g., “trans-
    port cost” or “closed on Tuesday” ).
 3. Identification through regular expressions of invalid text patterns, and fur-
    ther deletion of numbers, symbols, special characters and units of measure-
    ment to avoid unexpected behaviours during the classification phases (e.g.
    overfitting, underfitting).


Feature Extraction. After data cleaning, for each entry, a feature vector of
10 numerical values is built from the 2 string attribute values. Specifically, the
business activity type is characterised by a single word, while the description of
invoices’ billing entry is converted to a sequence of words. All distinct words in
the dataset form a vocabulary which associates a unique integer value to each
word.
    The first element of the feature vector is the integer value in the vocabu-
lary connected with the business activity type’s word. The other elements of
the feature vector are the integer values in the vocabulary associated with the
description of invoices’ billing entry’s words. Accordingly, the size of the fea-
ture vector is equal to 1 plus the size of the longest possible sequence of words
composing a description of invoices’ billing entry in the dataset, which is equal
to 9.
    Due to the predefined format of the description of invoices’ billing entry
in the dataset, the feature vectors are consistent and comparable. The whole
procedure has been performed through the Tokenizer utility class provided by
the Keras library [12].

Normalisation. In each feature vector, missing numerical values for a given
feature are replaced with the mean of the distribution of the remaining ones for
that feature. Then, the feature values are bounded to be included in the range
[0, 1] using the following min-max procedure:

                                         xi j − minj
                              x̂i j =                ,                        (1)
                                        maxj − minj

where xi j is the value of j th feature of feature vector xi , and minj and maxj
are the minimum and maximum values of j th feature, respectively.
    This step is essential for distance-based classifiers to standardize the range
of the numerical features so that each feature contributes approximately propor-
tionately to the final distance.

3.2   Classification Models
Although different algorithms have been employed and tested to define the set
of classification models composing the Classification module, a subset of them
were selected which obtained the best performance on the feature vectors related
to the different user profiles. Specifically, they are:
  - Bayes Net,
  - Decision Tree,
  - Random Forest,
  - K-Nearest Neighbour,
  - Deep Neural Network,
  - Repeated Incremental Pruning to Produce Error Reduction.

3.3   Bayes Net
A Bayes Net (BN ) is a probabilistic graphic model that represents a set of
stochastic variables with their conditional dependencies through the use of a
Direct Acyclic Graph (DAG). In the DAG, each node represents a feature of
the dataset, whereas the conditional dependencies among them are represented
as edges [14]. The BN model can be used to classify an entry a of the dataset;
specifically, a is predicted to belong to the class c whether c maximizes the
posterior probability p(c|a).
3.4   Decision Tree

A Decision Tree (DT ) is a classifier based on a tree structure that is learned from
a training dataset. In the tree, leaf nodes represent the class labels, whereas the
no-leaf nodes correspond to the feature decision points which spit an entry a
of the dataset according to its feature values. After evaluating all the decision
points, the entry a is predicted to belong to the class c, which is the leaf node
along the path [3].


3.5   Random Forest

A Random Forest (RF ) generates a set of tree-based classifiers, each of which
is built starting from a random subset of the training dataset. A tree-based
classifier can be a decision tree as previously described. The classification of
an entry a is done by combining the contributions of each tree-based classifier
through a majority voting strategy [3].


3.6   K-Nearest Neighbour

A K-Nearest Neighbour (KNN ) is a classifier that aims at finding the K-Nearest
entries of the training dataset from a test entry a based on a given distance
function. At the end, a is predicted to belong to the class c, which represents
the most frequent class in the selected K-Nearest entries [2].


3.7   Deep Neural Network

A Deep Neural Network (DNN ) can be defined as a weighted oriented graph
G =< V, E > where nodes represent the artificial neurons V (G) = {v1 , v2 , ..., vn },
whereas the weighted edges E(G) = {e1 , e2 , ..., em } ⊆ V × V represent the re-
lationships between neurons. A DNN is composed of a layer of input neurons,
one or more intermediate layers of neurons and a layer of output neurons that
provides the classification result. The model learns a non linear-transformation
by tuning the layer weights so as to associate the input entry to another space
where it becomes linearly separable [2].


3.8   Repeated Incremental Pruning to Produce Error Reduction

Repeated Incremental Pruning to Produce Error Reduction (RIPPER) is a rule-
based classifier which learns rules by considering all the entries of a given judg-
ment in the training dataset as a class label, and detecting a set of rules that
include all the entries belonging to that class. After that, the algorithm continues
considering the next class label and does the same steps, iterating until all class
labels have been considered [7].
4    Experiments

The classification task consists in predicting the group of invoices’ billing entries
related to user profiles from the feature vectors in the dataset.
     The experiments involving BN, DT, RF, KNN and RIPPER have been per-
formed in Weka [1], whereas the DNN has been implemented in Python through
well-known deep learning libraries, i.e. Pandas, Sklearn, Keras with Tensorflows
as backend.
     For KNN, the experimented values for K ranged from 1 to 8. In the end, K
was set to 1 since it provided the best results on the given dataset. Concerning
the other parameters, they were set to the default Weka values.
     The DNN has a 1D tensor 10-dimensional input vector (corresponding to the
size of the feature vector), an Embedding layer that allows to represent words
with their meanings, a Flatten layer that transforms the output coming from
the Embedding so as to be exploited by the following Dense hidden layer. Also,
a BatchNormalization layer has been introduced to normalize the activation
function of neurons, followed by a Dropout layer, which randomly deactivates
neurons to prevent overfitting issues [13]. Finally, a Dense output layer composed
of 71 neurons (corresponding to the number of groups of invoices’ billing entries)
provides the predicted class. Figure 2 shows the DNN architecture.
     In order to evaluate the performance of the classifiers, accuracy, error and
kappa statistics have been computed from the multiclass confusion matrix. The
accuracy represents the percentage of correctly classified entries; the error quan-
tifies the percentage of incorrectly classified entries; finally, the kappa statistics
is based on a comparison between the observed and expected accuracy [11].
     To make the evaluation independent from the choice of training and test
datasets, a k-fold cross validation (k = 10) has been used, which is a valuable
literature technique to achieve statistically significant results [11].


5    Results

Table 1 reports the classification results obtained by BN, DT, RF, KNN, RIP-
PLE and DNN in terms of accuracy, error and kappa statistics for one randomly
selected test dataset related to the Supplier user profile.
    It is worth noting that all the presented classification models obtained an
accuracy in predicting the group of invoices’ entries exceeding 95%. Specifically,
DNN obtains the best result of 99.20% for accuracy, 0.80% for error and 0.99
for kappa statistics. Hence, it can be definitively adopted as classification model
for the Supplier user profile. This is followed by RF, KNN, DT, RIPPLE and
BN.
    Concerning the best classification method, DNN, Figure 3 shows the trend
of accuracy and error over the epochs on training and test datasets.
    As depicted in Figure 3, the error decreases at each epoch, whereas the accu-
racy increases at each epoch. This is what it is expected using an optimization
function based on the descending gradient, which tries to minimize the error
                  Fig. 2. The architecture of the adopted DNN.


            Table 1. Classification results for the user profile Supplier.

      Classification model Accuracy (%) Error (%) Kappa statistics
      BN                   95.64        4.35      0.95
      DT                      96.82             3.17        0.96
      RF                      98.02             1.98        0.98
      KNN                     97.60             2.40        0.97
      RIPPLE                  96.58             3.42        0.96
      DNN                     99.20             0.80        0.99


value at each epoch [13]. An opposite trend can be observed for accuracy where
its value increases over the epochs. It worth noting that the defined DNN is
able to learn the key patterns characterizing the dataset already from the first
epochs; it is visible from the trend of the accuracy and error obtained on the
dataset which rapidly increases and decreases, respectively, and then does not
change significantly over the epochs.
Fig. 3. Accuracy and error obtained by DNN on invoices’ entries related to the Supplier
user profile.


6    Conclusions and Future Works
This paper presented the architecture of a platform for the automatic classifica-
tion of invoice entries for accounting purposes and the evaluation of the perfor-
mance of the classification algorithms that can be suitably employed in such a
domain. The architecture has been designed to enable a meta-learning approach
where issues such as adaptation to invoice owner profiles and concept drift can
be effectively handled. Toward this goal, the future work will involve the analy-
sis of ensemble learning techniques [15, 16] based on new boosting and bagging
strategies to combine the classification results in the Meta Model, different fea-
ture representations for modelling the invoice entries, and the active learning
approach proposed. Another potential future research direction concerns the ex-
tension of the architecture in order for it to work in distributed environments
e.g. through the use of Model-Driven approaches [5] and its evaluation by means
of virtual environments [8, 9].


References
 1. Weka 3: Data Mining Software in Java, https://www.cs.waikato.ac.nz/ml/weka/
    index.html
 2. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric re-
    gression. The American Statistician 46(3), 175–185 (1992)
 3. Amelio, L., Amelio, A.: Classification methods in image analysis with a special
    focus on medical analytics. In: Machine Learning Paradigms, pp. 31–69. Springer
    (2019)
 4. Bartoli, A., Davanzo, G., Medvet, E., Sorio, E.: Improving features extraction
    for supervised invoice classification. In: Artificial Intelligence and Applications.
    ACTAPRESS (2010). https://doi.org/10.2316/P.2010.674-040
 5. Bocciarelli, P., D’Ambrogio, A., Falcone, A., Garro, A., Giglio, A.: A model-driven
    approach to enable the distributed simulation of complex systems. In: Complex
    Systems Design & Management, Proceedings of the Sixth International Confer-
    ence on Complex Systems Design & Management, CSD&M 2015, Paris, France,
    November 23-25, 2015, pp. 171–183. Springer International Publishing (oct 2015).
    https://doi.org/10.1007/978-3-319-26109-6 13
 6. Cesarini, F., Gori, M., Marinai, S., Soda, G.: INFORMys: a flexible invoice-like
    form-reader system. IEEE Transactions on Pattern Analysis and Machine Intelli-
    gence 20(7), 730–745 (jul 1998). https://doi.org/10.1109/34.689303
 7. Cohen, W.W.: Fast effective rule induction. In: Machine Learning Proceedings
    1995, pp. 115–123. Elsevier (1995)
 8. Furfaro, A., Argento, L., Parise, A., Piccolo, A.: Using virtual environments for the
    assessment of cybersecurity issues in IoT scenarios. Simulation Modelling Practice
    and Theory 73, 43–54 (apr 2017). https://doi.org/10.1016/j.simpat.2016.09.007
 9. Furfaro, A., Piccolo, A., Parise, A., Argento, L., Saccà, D.: A cloud-based platform
    for the emulation of complex cybersecurity scenarios. Future Generation Computer
    Systems 89, 791–803 (dec 2018). https://doi.org/10.1016/j.future.2018.07.025
10. Hamza, H., Belaid, Y., Belaid, A., Chaudhuri, B.B.: Incremental classification of
    invoice documents. In: Proc. of 19th International Conference on Pattern Recog-
    nition. IEEE (dec 2008). https://doi.org/10.1109/icpr.2008.4761832
11. Han, J., Pei, J., Kamber, M.: Data mining: concepts and techniques. Elsevier (2011)
12. Ketkar, N., et al.: Deep Learning with Python. Springer (2017)
13. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436 (2015)
14. Solares, C., Sanz, A.M.: Different bayesian network models in the classifica-
    tion of remote sensing images. In: Intelligent Data Engineering and Automated
    Learning - IDEAL 2007, 8th International Conference, Birmingham, UK, De-
    cember 16-19, 2007, Proceedings, pp. 10–16. Springer Berlin Heidelberg (2007).
    https://doi.org/10.1007/978-3-540-77226-2 2
15. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using
    ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international con-
    ference on Knowledge discovery and data mining - KDD’03. ACM Press (2003).
    https://doi.org/10.1145/956750.956778
16. Zhang, C., Ma, Y.: Ensemble machine learning: methods and applications. Springer
    (2012)