=Paper=
{{Paper
|id=Vol-2472/p10
|storemode=property
|title=OINOS, an application suite for the performance evaluation of classifiers
|pdfUrl=https://ceur-ws.org/Vol-2472/p10.pdf
|volume=Vol-2472
|authors=Emanuele Paracone
}}
==OINOS, an application suite for the performance evaluation of classifiers==
https://ceur-ws.org/Vol-2472/p10.pdf
OINOS, an application suite for the performance
evaluation of classifiers
Emanuele Paracone
Dept. of Civil Engineering and Computer Science
University of Rome ”Tor Vergata”
Rome, Italy
emanuele.paracone@gmail.com
Abstract—The last few years have been characterized by a big Bash, and implemented as an applicative for the execution of
development of machine learning (ML) techniques, and their multithreaded benchmark.
application has spread in many fields. The success of their In order to give an example of application, we considered
use in a specific problem strongly depends on the approach
used, the dataset formatting, and not only on the type of ML electroencephalography (EEG) data related to the problem of
algorithm employed. Tools that allows the user to evaluate alcoholic prediction, i.e., the classification between patients
different classification approaches on the same problem, and suffering from alcoholism and healthy patients, based on EEG
their efficacy on different ML algorithms, are therefore becoming times series of a second of brain activity. Such dataset has been
crucial. chosen for the high prediction complexity and because the
In this paper we present OINOS, a suite written in Python
and Bash aimed to the evaluation of performances of different data is publicly available (at https://kdd.ics.uci.edu/databases/
ML algorithms. This tool allows the user to face a classification eeg/eeg.html). In this problem the ML tools learn to glean the
problem with different classifiers and dataset formatting strate- correlations among the fluctuation of brain signals obtained
gies, and to extract related performance metrics. The tool is from the different channels and their dependance on the
presented and then tested on the classification of two diagnostic subject’s pathological state.
species from a public electroencephalography (EEG) database.
The flexibility and ease of use of this tool allowed us to easily The use of a custom dataset partitioning procedure allowed us
compare the performances of the different classifiers varying the to find satisfactory performances without the need to overload
dataset formatting and to determine the best approach, obtaining the data preprocessing. OINOS has simplified us to find
an accuracy of almost 75%. alternative approaches to train the classifiers.
OINOS is an open source project, therefore its use and sharing The analyzed data belongs to a test concerning 122 subjects.
are encouraged.
From each of them it has been collected a set of 120 trial.
Index Terms—Machine learning, Classification, EEG. Each trial consists in the measurement of 1 sec. of EEG
signals caught from 64 electrodes placed on the subject’s scalp.
During the trials, the subjects were exposed to three kind of
I. I NTRODUCTION stimuli: une single image, two matching images or two non-
In recent years we have witnessed the development of matching images alternately. Since subjects belongs to the 2
new machine learning (ML) techniques and the improvement category alcoholic and non-alcoholic and the stimuli to the 3
of the existing ones, and their application has expanded in kind single, matching and non-matching, the EEG data have
many fields [1]–[6]. Contemporarily, Python programming been labeled through those 2 cohordinates (e.g. if a trial has
language has seen a surge in popularity across the sciences been caught from an alcoholic patient while he was looking
and in particular in neuroscience [7] for reasons which include to a non-matching couple of figure, the trial label will be alc-
its readability, modularity, and the large libraries available. non-matching).
Python’s versatility is today evident in its range of uses. II. E XECUTION
With the aim of carrying out classification, regression and /
Here we describe the structure of the presented tool and its
or clustering on a specific problem, it is useful to evaluate the
operation modes.
performances of different ML tools and the different dataset
The algorithms and the logic underlying the classification
formatting strategies, for studying their behaviour with respect
processes of OINOS are implemented by the libraries scikit-
to the different scenarios.
learn [8]–[10].
In this work we present OINOS, a suite for the evaluation
The component modules are:
of classifier performances, composed of a set of modules for
the comparison of ML algorithms with respect to different 1) main: the entry point of the suite. This block is respon-
dataset partitioning strategies. OINOS is written in Python and sible for the execution and the orchestration of single
modules;
©2019 for this paper by its authors. Use permitted under Creative Commons 2) OINOS core: the main component. It implements the
License Attribution 4.0 International (CC BY 4.0). logic of comparison among different ML algorithms;
48
3) datalogger: the module for the output management and image (single), two identical alternate images (matching)
the experiment reports. or two different alternate images (non matching) are
shown;
A. Use 3) single-matching-nonmatching images for alcoholic and
In order to start OINOS it is necessary to execute the control: the intersection of the two previous classifica-
starter present in the root directory of the project through the tions is considered (alcoholic patient watching a single
command: $ ./start. image, alcoholic patient watching two identical images,
In this way the program will return: etc.);
4) alcoholic-control extended: the six classes of the previ-
================ ous step are considered and projected to the two classes
= OINOS V1.0 = alcoholic and control.
================ The execution goes through these categories in four phases,
Select an option: showing the results of the test on the screen in terms of:
1. learn from the ’Alcoholic’ dataset from UCI Knowledge • classification (ALC-CTRL, SGL-MATCH-NONMATCH,
Discovery in Databases ALC-CTRL/SGL-MATCH-NONMATCH, ALC-
2. learn from the ’Wrist’ dataset from NeuCube CTRL EXT, respectively)
[1,2, quit]: • overall cardinallity of the dataset (for example 100 for
From this menu it is possible to select on which dataset the data 100)
prediction algorithms must be tested. • cardinality of the test set (for example 20 for ratio = 0.2)
• type of classifier under testing
B. Alcoholic • accuracy of the classifier, computed as
By selecting the first option, OINOS will acquire the TP + TN
datasets of the database of the UCI Knowledge Discovery in Acc = (1)
TP + TN + FP + FN
Databases. Before to start the execution, OINOS will ask to
• precision of the classifier, computed as
the user:
1) to specify the dataset among those available; TP
Pr = (2)
2) to specify the destination path for the output TP + FP
3) to specify which portion of the data (i.e., ratio) with • recall of the classifier, computed as
respect to the overall dataset, will be used for testing. TP
4) to specify the number of executions of the same pre- Rec = (3)
TP + FN
diction test. The importance of this setting is notable
• F1 Score of the classifier, computed as
because once fixed the cardinality of the training and
test sets (through the ratio), the related elements will P r · Rec
F1 = 2 · (4)
be randomly selected; of course at each run the perfor- P r + Rec
mances will vary depending on the specific training set where TP stands for true positive, TN for true negative,
of each experiment and it may be reasonable to study FP for false positive and FN for false negative. A TP is an
them in a statistical sense. outcome where the model correctly predicts the positive
At the end of this configuration phase the comparison class; similarly, a TN is an outcome where the model
between the prediction algorithms is executed. correctly predicts the negative class. A FP is an outcome
a) Dataset description: The dataset shown by the start where the model incorrectly predicts the positive class,
are named with a suffix that indicates the dataset cardinality. and a FN is an outcome where the model incorrectly
For example, if data 100 is selected by the user, it will be predicts the negative class.
executed the prediction on a sample of 100 elements. c) output: When the execution is finished, the output will
If a ratio of 0.2 is specified, the prediction will be redistributed be avaible at the path specified during the configuration phase:
using 80 elements as training set and the remaining 20 as test • a microsoft excel file (.xlsx) with the report, as described
set. above;
b) Execution: During the execution, messages of four • a figure with the comparison between the different accu-
comparison categories are printed on the standard output; racy values.
at each category corresponds a different classification to be
presented to the prediction algorithms: C. Unattended mode
1) alcoholic-control: the EEG time-series of the dataset are With this option it is possible to directly call the Python
classified as pertaining to alcoholic (alcohol) or healthy relative sources.
(i.e., control) patients; This allows the user the execution in unattended mode, useful
2) single-matching-non matching images: the EEG time- for the implementaton of custom procedures and benchmarks.
series pertain to participants which a single flickering The related scripts are ./bin/oinos.py and ./bin/wrist.py
49
respectively; the switch -h enables the help, which returns the
following information to the user:
$ ./bin/oinos.py -h
usage:
$ python bin/main.py -d -r -o