=Paper=
{{Paper
|id=Vol-2472/p10
|storemode=property
|title=OINOS, an application suite for the performance evaluation of classifiers
|pdfUrl=https://ceur-ws.org/Vol-2472/p10.pdf
|volume=Vol-2472
|authors=Emanuele Paracone
}}
==OINOS, an application suite for the performance evaluation of classifiers==
<pdf width="1500px">https://ceur-ws.org/Vol-2472/p10.pdf</pdf>
<pre>
    OINOS, an application suite for the performance
               evaluation of classifiers
                                                                Emanuele Paracone
                                             Dept. of Civil Engineering and Computer Science
                                                     University of Rome ”Tor Vergata”
                                                                Rome, Italy
                                                      emanuele.paracone@gmail.com


   Abstract—The last few years have been characterized by a big                    Bash, and implemented as an applicative for the execution of
development of machine learning (ML) techniques, and their                         multithreaded benchmark.
application has spread in many fields. The success of their                           In order to give an example of application, we considered
use in a specific problem strongly depends on the approach
used, the dataset formatting, and not only on the type of ML                       electroencephalography (EEG) data related to the problem of
algorithm employed. Tools that allows the user to evaluate                         alcoholic prediction, i.e., the classification between patients
different classification approaches on the same problem, and                       suffering from alcoholism and healthy patients, based on EEG
their efficacy on different ML algorithms, are therefore becoming                  times series of a second of brain activity. Such dataset has been
crucial.                                                                           chosen for the high prediction complexity and because the
In this paper we present OINOS, a suite written in Python
and Bash aimed to the evaluation of performances of different                      data is publicly available (at https://kdd.ics.uci.edu/databases/
ML algorithms. This tool allows the user to face a classification                  eeg/eeg.html). In this problem the ML tools learn to glean the
problem with different classifiers and dataset formatting strate-                  correlations among the fluctuation of brain signals obtained
gies, and to extract related performance metrics. The tool is                      from the different channels and their dependance on the
presented and then tested on the classification of two diagnostic                  subject’s pathological state.
species from a public electroencephalography (EEG) database.
The flexibility and ease of use of this tool allowed us to easily                  The use of a custom dataset partitioning procedure allowed us
compare the performances of the different classifiers varying the                  to find satisfactory performances without the need to overload
dataset formatting and to determine the best approach, obtaining                   the data preprocessing. OINOS has simplified us to find
an accuracy of almost 75%.                                                         alternative approaches to train the classifiers.
OINOS is an open source project, therefore its use and sharing                        The analyzed data belongs to a test concerning 122 subjects.
are encouraged.
                                                                                   From each of them it has been collected a set of 120 trial.
  Index Terms—Machine learning, Classification, EEG.                               Each trial consists in the measurement of 1 sec. of EEG
                                                                                   signals caught from 64 electrodes placed on the subject’s scalp.
                                                                                   During the trials, the subjects were exposed to three kind of
                         I. I NTRODUCTION                                          stimuli: une single image, two matching images or two non-
   In recent years we have witnessed the development of                            matching images alternately. Since subjects belongs to the 2
new machine learning (ML) techniques and the improvement                           category alcoholic and non-alcoholic and the stimuli to the 3
of the existing ones, and their application has expanded in                        kind single, matching and non-matching, the EEG data have
many fields [1]–[6]. Contemporarily, Python programming                            been labeled through those 2 cohordinates (e.g. if a trial has
language has seen a surge in popularity across the sciences                        been caught from an alcoholic patient while he was looking
and in particular in neuroscience [7] for reasons which include                    to a non-matching couple of figure, the trial label will be alc-
its readability, modularity, and the large libraries available.                    non-matching).
Python’s versatility is today evident in its range of uses.                                               II. E XECUTION
With the aim of carrying out classification, regression and /
                                                                                      Here we describe the structure of the presented tool and its
or clustering on a specific problem, it is useful to evaluate the
                                                                                   operation modes.
performances of different ML tools and the different dataset
                                                                                   The algorithms and the logic underlying the classification
formatting strategies, for studying their behaviour with respect
                                                                                   processes of OINOS are implemented by the libraries scikit-
to the different scenarios.
                                                                                   learn [8]–[10].
In this work we present OINOS, a suite for the evaluation
                                                                                   The component modules are:
of classifier performances, composed of a set of modules for
the comparison of ML algorithms with respect to different                             1) main: the entry point of the suite. This block is respon-
dataset partitioning strategies. OINOS is written in Python and                          sible for the execution and the orchestration of single
                                                                                         modules;
  ©2019 for this paper by its authors. Use permitted under Creative Commons           2) OINOS core: the main component. It implements the
License Attribution 4.0 International (CC BY 4.0).                                       logic of comparison among different ML algorithms;


                                                                              48
  3) datalogger: the module for the output management and                        image (single), two identical alternate images (matching)
     the experiment reports.                                                     or two different alternate images (non matching) are
                                                                                 shown;
A. Use                                                                        3) single-matching-nonmatching images for alcoholic and
   In order to start OINOS it is necessary to execute the                        control: the intersection of the two previous classifica-
starter present in the root directory of the project through the                 tions is considered (alcoholic patient watching a single
command: $ ./start.                                                              image, alcoholic patient watching two identical images,
In this way the program will return:                                             etc.);
                                                                              4) alcoholic-control extended: the six classes of the previ-
  ================                                                               ous step are considered and projected to the two classes
  = OINOS V1.0 =                                                                 alcoholic and control.
  ================                                                            The execution goes through these categories in four phases,
  Select an option:                                                         showing the results of the test on the screen in terms of:
  1. learn from the ’Alcoholic’ dataset from UCI Knowledge                    • classification (ALC-CTRL, SGL-MATCH-NONMATCH,
  Discovery in Databases                                                        ALC-CTRL/SGL-MATCH-NONMATCH,                           ALC-
  2. learn from the ’Wrist’ dataset from NeuCube                                CTRL EXT, respectively)
  [1,2, quit]:                                                                • overall cardinallity of the dataset (for example 100 for
  From this menu it is possible to select on which dataset the                  data 100)
prediction algorithms must be tested.                                         • cardinality of the test set (for example 20 for ratio = 0.2)
                                                                              • type of classifier under testing
B. Alcoholic                                                                  • accuracy of the classifier, computed as
   By selecting the first option, OINOS will acquire the                                                     TP + TN
datasets of the database of the UCI Knowledge Discovery in                                   Acc =                                       (1)
                                                                                                    TP + TN + FP + FN
Databases. Before to start the execution, OINOS will ask to
                                                                              • precision of the classifier, computed as
the user:
   1) to specify the dataset among those available;                                                            TP
                                                                                                     Pr =                                (2)
   2) to specify the destination path for the output                                                        TP + FP
   3) to specify which portion of the data (i.e., ratio) with                 • recall of the classifier, computed as
       respect to the overall dataset, will be used for testing.                                               TP
   4) to specify the number of executions of the same pre-                                           Rec =                               (3)
                                                                                                            TP + FN
       diction test. The importance of this setting is notable
                                                                              • F1 Score of the classifier, computed as
       because once fixed the cardinality of the training and
       test sets (through the ratio), the related elements will                                                P r · Rec
                                                                                                    F1 = 2 ·                              (4)
       be randomly selected; of course at each run the perfor-                                                 P r + Rec
       mances will vary depending on the specific training set                   where TP stands for true positive, TN for true negative,
       of each experiment and it may be reasonable to study                      FP for false positive and FN for false negative. A TP is an
       them in a statistical sense.                                              outcome where the model correctly predicts the positive
   At the end of this configuration phase the comparison                         class; similarly, a TN is an outcome where the model
between the prediction algorithms is executed.                                   correctly predicts the negative class. A FP is an outcome
      a) Dataset description: The dataset shown by the start                     where the model incorrectly predicts the positive class,
are named with a suffix that indicates the dataset cardinality.                  and a FN is an outcome where the model incorrectly
For example, if data 100 is selected by the user, it will be                     predicts the negative class.
executed the prediction on a sample of 100 elements.                             c) output: When the execution is finished, the output will
If a ratio of 0.2 is specified, the prediction will be redistributed        be avaible at the path specified during the configuration phase:
using 80 elements as training set and the remaining 20 as test                • a microsoft excel file (.xlsx) with the report, as described
set.                                                                             above;
      b) Execution: During the execution, messages of four                    • a figure with the comparison between the different accu-
comparison categories are printed on the standard output;                        racy values.
at each category corresponds a different classification to be
presented to the prediction algorithms:                                     C. Unattended mode
   1) alcoholic-control: the EEG time-series of the dataset are                With this option it is possible to directly call the Python
       classified as pertaining to alcoholic (alcohol) or healthy           relative sources.
       (i.e., control) patients;                                            This allows the user the execution in unattended mode, useful
   2) single-matching-non matching images: the EEG time-                    for the implementaton of custom procedures and benchmarks.
       series pertain to participants which a single flickering             The related scripts are ./bin/oinos.py and ./bin/wrist.py


                                                                       49
respectively; the switch -h enables the help, which returns the
following information to the user:

  $ ./bin/oinos.py -h
  usage:
  $ python bin/main.py -d <data path> -r <testing data
  ratio> -o
  <output path>
  example:
  $ ./bin/oinos.py -d data 100 -r 0.3 -v -o out
  ——————————————————————
  $ ./bin/wrist.py -h
  usage:
  $ python bin/main.py -r <testing data ratio> -o <output
  path>
  example:                                                                Fig. 1. Base aproach: OINOS compare several runs for each classification
  $ ./bin/oinos.py -r 0.3 -v -o out                                       algorithm; each run takes as learning base a subset of the original base by
                                                                          considering only the label about alcoholism and ignoring the stimuli; the
                                                                          prediction domain is alcoholic - control.
   This menu makes possibile to select the dataset on which
the prediction algorithms have to be tested.
                  III. DATASET: ALCOHOLIC                                    Therefore we implemented a different method. In addition to
                                                                          the three types of prediction described above, we implemented
A. Dataset interpretation
                                                                          - alcohol control extended, i.e., alc-ctrl ext - able to project
   This dataset comes from the database of the Knowledge Dis-             the classifications obtained by the combined configuration (six
covery in Databases Archive of the University of California,              classes, one for each combination of pathology and stimulus)
Irvine [11], that is part of a bigger dataset on the detection of         in the two classes alcoholic and control; we therefore classified
genetic predisposition of human beings to the alcoholism [9].             the data in the most stringent way to go back into abstraction
   In our case, the gathered data comes from an experiment                and generalize the final solution.
conducted on 122 subjects, each of which underwent 120 trials
of the same task.
The task consists in a second of EEG activity recorded while
the subject is asked to watch alternatively:
   • a single image (case identifiec as single, i.e., SNGL)
   • two identical images (matching, i.e., MATCH)
   • two different images (non matching, i.e., NONMATCH)

   For each presented stimulus, ten trials of a second of
activity, recorded by 64 electrodes, have been gathered. Elec-
trodes were located on the head of the subject, to record
fluctuations of postsynaptic activity [12], sampled at 256 Hz.
We implemented our comparisons between the classifiers by
considering the four classifications described in the section
II-B0b. In the next section we will show the strong points and
the advantages of this approcach.
B. Classification: a bottom up approach
  Using the metadata of the experiment, the samples have                  Fig. 2. Extended aproach: OINOS compare several runs for each classification
                                                                          algorithm; each run takes as learning base a subset of the original base
been subdivided with respect to the type of stimulus given                by considering both the label about alcoholism and stimuli; the prediction
to the subject (SNGL, MATCH, NONMATCH) or on the                          domain is made of all combination betweenalcoholic-control and single-
bases of the type of subject (alcoholic, control). Different tests        matching-nonmatching; a further phase merges the obtained predictions into
                                                                          the alcoholic extended and control extended sets.
have been done to evaluate the performances of the classifiers
considered, using different configurations (only subject type,
only stimulus type, combined).                                               This new way to predict the classes alcoholic and control
Unfortunately, neither classifiers who achieved greater success           has significantly improved the performances of the classifiers,
during repeated runs were able to achieve satisfying perfor-              in the way we give illustration below.
mances.


                                                                     50
C. Results
   We have conducted a benchmark of 100 consecutive run to
analyze the performances of the following classifiers:
   • K Nearest Neighbors
   • Linear SVM
   • RBF SVM
   • Linear SVC
   • Gaussian Process
   • Decision Tree (with max depth = 5)
   • Decision Tree (with max depth = 10)
   • Random Forest                                                                     Fig. 4. Precision results are summarized, taking in account the considered
   • Gradient Boosting Classifier                                                      classifiers. In orange are shown the results obtained by adopting the bottom-up
   • Neural Net
                                                                                       approach whereas the blue ones shows the result of the ”normal” aproach.
   • Ada Boost
   • Naive Baies
   • Linear Discriminant Analysis
   • QDA

After the execution of the run, an average for each one of the
metrics has been done. Although the performances are not very
good, it has to be noted that the introduction of the approach
acc-ctrl extended has significantly affected the performance of
some of the classifiers.
   The two types of classifications alcoholic - control (in
blue) and alcoholic -control extended (in orange) have been
compared, underlining the benefits of the latter approach. The                         Fig. 5. Recall results are summarized, taking in account the considered
results are summarized in figures 3, 4, 5, 6.                                          classifiers. In orange are shown the results obtained by adopting the bottom-up
                                                                                       approach whereas the blue ones shows the result of the ”normal” aproach.


                                                                                                                   IV. C ONCLUSION
                                                                                          In this work we present OINOS, a suite for the evaluation
                                                                                       of classifier performances, composed of a set of modules
                                                                                       for the comparison of several ML algorithms with respect to
                                                                                       different datasets. We faced a classification problem based on
                                                                                       neurophysiology data (i.e., EEG time series), to distinguish
                                                                                       alcoholic to non/alcoholic subjects during the execution of a
                                                                                       task. Through the performance evaluation of a set of classi-
                                                                                       fiers we found the better configuration among the proposed
Fig. 3. Accuracy results are summarized, taking in account the considered              classifiers and dataset formatting strategies. Although the big
classifiers. In orange are shown the results obtained by adopting the bottom-up
approach whereas the blue ones shows the result of the ”normal” aproach.               cardinality of the dataset, a need of alternative approaches for

   Among the different classifiers tested, it is worth high-
lighting the cases Linear SVC and Neural Network. Their
classification for alcoholic control were just above the average
of the other classifiers, with accuracy and precision near the
60%. Such performances have considerably improved with the
extended approach.
   The University of California Knowledge Discovery
Database Archive (UCI KDD Archive) openly shares different
datasets with the aim of make them usable for Machine Learn-
ing research (http://kdd.ics.uci.edu/). In the website different
dataset are available, indexed for typology and semantic area.
EEG data selected for our study are categorized into the                               Fig. 6. F1 results are summarized, taking in account the considered classifiers.
section Time Series - EEG (http://kdd.ics.uci.edu/databases/                           In orange are shown the results obtained by adopting the bottom-up approach
eeg/eeg.data.html).                                                                    whereas the blue ones shows the result of the ”normal” aproach.


                                                                                  51
dataset formatting to facilitate the learning of the classifiers                      [14] B. Henrique, V. Amorim Sobreiro, and H. K. S., “Literature
has emerged. The use of a custom procedure allowed us to                                   review: Machine learning techniques applied to financial market
                                                                                           prediction,” Expert Systems with Applications, vol. 124, pp. 226 – 251,
find a way to improve the classification.                                                  2019. [Online]. Available: http://www.sciencedirect.com/science/article/
   To show how to use OINOS, here we have performed                                        pii/S095741741930017X
                                                                                      [15] H. Filali, J. Riffi, A. M. Mahraz, and H. Tairi, “Multiple face detection
the evaluation of classifiers for an application related to the                            based on machine learning,” in 2018 International Conference on
biomedical field. Nevertheless, such kind of tools are of                                  Intelligent Systems and Computer Vision (ISCV), April 2018, pp. 1–8.
great help in many other fields [13], as financial [14], face                         [16] O. Simeone, “A very brief introduction to machine learning with ap-
                                                                                           plications to communication systems,” IEEE Transactions on Cognitive
recognition [15], and communications systems [16] where they                               Communications and Networking, vol. 4, no. 4, pp. 648–664, Dec 2018.
could result useful for evaluating the performance of recent                          [17] A. Detti, M. Orru, R. Paolillo, G. Rossi, P. Loreti, L. Bracciale, and
communication algorithms (e.g., [17], [18])). Finally, since                               N. Blefari Melazzi, “Application to information centric networking to
                                                                                           nosql database,” in 2017 IEEE International Symposium on Local and
some ML strategies are based on neural networks, a future                                  Metropolitan Area Networks (LANMAN), 2017.
development could be that of expanding classical artificial                           [18] A. Detti, L. Bracciale, P. Loreti, G. Rossi, and N. Blefari Melazzi,
neural networks (ANNs) with the bio-inspired spiking neural                                “A cluster-based scalable router for information centric networks,”
                                                                                           Computer networks, vol. 142, 2018.
networks (SNNs) [19]–[22], since recently such approaches                             [19] G. Susi, L. Antn Toro, L. Canuet, M. E. Lpez, F. Maest, C. R.
are proving to be appropriate for classification/prediction of                             Mirasso, and E. Pereda, “A neuro-inspired system for online learning
spatio-temporal stream data [23]–[25], and compare their                                   and recognition of parallel spike trains, based on spike latency, and
                                                                                           heterosynaptic stdp,” Frontiers in Neuroscience, vol. 12, p. 780, 2018.
performances on classical problems which approaches are                               [20] N. K. Kasabov, “Neucube: A spiking neural network architecture for
classically faced with ANN [26], [27]                                                      mapping, learning and understanding of spatio-temporal brain data,”
   The work is an opensource project, available at https://gitlab.                         Neural Networks, vol. 52, pp. 62–76, 2014.
                                                                                      [21] G. Susi, A. Cristini, and S. Mario, “Path multimodality in a feedforward
com/knizontes/oinos.                                                                       snn module, using lif with latency model,” Neural Network World, vol. 4,
                                                                                           no. 26, 2016.
                                                                                      [22] S. Acciarito, G. C. Cardarilli, A. Cristini, L. D. Nunzio, R. Fazzolari,
                              R EFERENCES                                                  G. M. Khanal, M. Re, and G. Susi, “Hardware design of lif
                                                                                           with latency neuron model with memristive stdp synapses,” Integr.
 [1] S. Angra and S. Ahuja, “Machine learning and its applications: A                      VLSI J., vol. 59, no. C, pp. 81–89, Sep. 2017. [Online]. Available:
     review,” in 2017 International Conference on Big Data Analytics and                   https://doi.org/10.1016/j.vlsi.2017.05.006
     Computational Intelligence (ICBDAC), 2017, pp. 57–60.                            [23] N. Kasabov, N. M. Scott, E. Tu, S. Marks, N. Sengupta, E. Capecci,
                                                                                           M. Othman, M. G. Doborjeh, N. Murli, R. Hartono, J. I. Espinosa-
 [2] R. Malhotra, “A systematic review of machine learning techniques for
                                                                                           Ramos, L. Zhou, F. B. Alvi, G. Wang, D. Taylor, V. Feigin, S. Gulyaev,
     software fault prediction,” Applied Soft Computing, vol. 27, pp. 504 –
                                                                                           M. Mahmoud, Z.-G. Hou, and J. Yang, “Evolving spatio-temporal
     518, 2015. [Online]. Available: http://www.sciencedirect.com/science/
                                                                                           data machines based on the neucube neuromorphic framework: Design
     article/pii/S1568494614005857
                                                                                           methodology and selected applications,” Neural Networks, vol. 78, pp.
 [3] M. Matta, G. C. Cardarilli, L. Di Nunzio, R. Fazzolari, D. Giardino,
                                                                                           1 – 14, 2016.
     M. Re, F. Silvestri, and S. Span, “Q-rts: a real-time swarm intelligence
                                                                                      [24] G. Lo Sciuto, G. Susi, G. Cammarata, and G. Capizzi, “A spiking neural
     based on multi-agent q-learning,” Electronics Letters, vol. 55, no. 10,
                                                                                           network-based model for anaerobic digestion process,” in 2016 Interna-
     pp. 589–591, 2019.
                                                                                           tional Symposium on Power Electronics, Electrical Drives, Automation
 [4] G. C. Cardarilli, L. Di Nunzio, R. Fazzolari, M. Re, and S. Span, “Aw-                and Motion (SPEEDAM), 2016, pp. 996–1003.
     som, an algorithm for high-speed learning in hardware self-organizing            [25] S. Brusca, G. Capizzi, G. Lo Sciuto, and G. Susi, “A new design
     maps,” IEEE Transactions on Circuits and Systems II: Express Briefs,                  methodology to predict wind farm energy production by means of a
     pp. 1–1, 2019.                                                                        spiking neural networkbased system,” International Journal of Numeri-
 [5] S. Coco, A. Laudani, F. Riganti Fulginei, and A. Salvini, “Team                       cal Modelling: Electronic Networks, Devices and Fields, vol. 32, no. 4,
     problem 22 approached by a hybrid artificial life method,” COMPEL-                    p. e2267, 2019.
     The international journal for computation and mathematics in electrical          [26] A. Tealab, “Time series forecasting using artificial neural networks
     and electronic engineering, vol. 31, no. 3, pp. 816–826, 2012.                        methodologies: A systematic review,” Future Computing and Informatics
 [6] S. Coco, A. Laudani, F. R. Fulginei, and A. Salvini, “Bacterial chemo-                Journal, vol. 3, no. 2, pp. 334 – 340, 2018. [Online]. Available:
     taxis shape optimization of electromagnetic devices,” Inverse Problems                http://www.sciencedirect.com/science/article/pii/S2314728817300715
     in Science and Engineering, vol. 22, no. 6, pp. 910–923, 2014.                   [27] G. Capizzi, G. Lo Sciuto, P. Monforte, and C. Napoli, “Cascade feed
 [7] E. Muller, J. Bednar, M. Diesmann, M. Gewaltig, M. Hines, and                         forward neural network-based model for air pollutants evaluation of
     A. Davison, “Python in neuroscience,” Frontiers in Neuroinformatics,                  single monitoring stations in urban areas,” INTL Journal of Electronics
     vol. 9, no. 11, pp. 62–76, 2015.                                                      and Communications, vol. 61, no. 4, pp. 327–332, 2015.
 [8] INRIA, “Scikit learn.” [Online]. Available: https://scikit-learn.org
 [9] ——, “Scikit eeg.” [Online]. Available: http://kdd.ics.uci.edu/databases/
     eeg/eeg.data.html
[10] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
     O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, and V. Dubourg,
     “Scikit-learn: Machine learning in python,” Journal of machine learning
     research, vol. 12, pp. 2825–2830, 2011.
[11] S. D. Bay, D. F. Kibler, M. J. Pazzani, and P. Smyth, “The uci kdd
     archive of large data sets for data mining research and experimentation,”
     SIGKDD explorations, vol. 2, no. 2, pp. 81–85, 2000.
[12] G. Susi, S. Ye-Chen, J. de Frutos Lucas, G. Niso, and F. Maestú,
     “Neurocognitive aging and functional connectivity using magnetoen-
     cephalography,” in Oxford research encyclopedia of psychology and
     aging. Oxford: Oxford University press, 2018.
[13] A. Soofi and A. Awan, “Classification techniques in machine learning:
     Applications and issues,” Journal of Basic & Applied Sciences, vol. 13,
     2017.


                                                                                 52

</pre>