=Paper= {{Paper |id=Vol-3360/p10 |storemode=property |title=The use of Maximum Completeness to Estimate Bias in AI-based Recommendation Systems |pdfUrl=https://ceur-ws.org/Vol-3360/p10.pdf |volume=Vol-3360 |authors=Alessandro Simonetta,Maria Cristina Paoletti,Alessio Venticinque |dblpUrl=https://dblp.org/rec/conf/system/SimonettaPV22 }} ==The use of Maximum Completeness to Estimate Bias in AI-based Recommendation Systems== https://ceur-ws.org/Vol-3360/p10.pdf
The use of Maximum Completeness to Estimate Bias in
AI-based Recommendation Systems
Alessandro Simonetta1,* , Maria Cristina Paoletti1 and Alessio Venticinque2
1
    Department of Enterprise Engineering, University of Rome Tor Vergata, Rome, Italy
2
    Department of Electrical and Information Engineering, University of Naples Federico II, Napoli, Italy


                                       Abstract
                                       The use of AI based recommendation systems, based on data analysis using Machine Learning algorithms, is taking away
                                       people’s full control over decision making. The presence of unbalanced and incomplete data can cause discrimination to reli-
                                       gious, ethnic, and political minorities without this phenomenon being easily detectable. In this context, it becomes critically
                                       important to understand what are the potential risks associated with learning with such a dataset and what consequences
                                       it may have on the outcome of decision making using Machine Learning algorithms. In this paper, we tried to identify how
                                       to measure the group fairness of a prediction of a classification algorithm, to identify the quality features of the dataset
                                       that influence the learning process, and finally, to evaluate the relationships between the quality features and the fairness
                                       measures.

                                       Keywords
                                       Fairness, Clustering, Machine Learning, Completeness, ISO/IEC 25012, Maximum Completeness, Metrics, Bias, Classification



1. Introduction                                                                                        quality of the information available (balance, complete-
                                                                                                       ness, ...). The correctness and authenticity of the data
In 2019, the Economist [1] stated that data is an impor- alone [12, 13, 14, 15, 16, 17] are not enough to guarantee
tant resource comparable to oil. Moreover, Forbes [2] their quality. Thus, the presence of poor quality in the
defines data as the fuel of the information age, and Ma- data, or a low level of representativeness, can lead to
chine Learning (ML) as the engine that uses it. These biased learning.
technologies in addition to the evolution of networks A very striking example of discrimination is the result
[3, 4] is providing opportunities to develop new applica- of the algorithm used in Florida on predicting the risk of
tions.                                                                                                 re-offense, brought to light by the nonprofit organization
Many companies and organizations are investing, increas- ProPublica [18]. In fact, the algorithm, which assigned
ingly, in decision-making processes centered on AI based each person a score indicating the likelihood of reof-
recommendation systems to offer a variety of services fending, was trained using an unbalanced dataset, and
ranging from marketing [5, 6] to fault diagnosis [7]. Tools as result, black people showed greater recidivism than
that make use of these types of algorithms relieve people other ethnicities [19].
from making decisions that may be influenced by moods, In this paper, we will present a methodology that, start-
biases and subjective thoughts, they ensure fairness and ing from training data, allows us to estimate the risk of
repeatability at different times too. The reasons why ML getting an unfair treatment in the prediction.
algorithms arrived at a certain type of result may not be To do this, it is necessary to identify how to measure the
transparent or easily understood by users. For this reason, fairness of a prediction of a classification algorithm, to
techniques have been introduced, such as Explenable AI, discover the quality characteristics of the dataset that
which allow analysts to understand how a given choice influence the goodness of learning, and, last but not
was arrived at, or Reinforcement Learning, which allows least, to study the relationships that exist between quality
the decision-making process to be distributed across dif- characteristics and the fairness measures of the ML algo-
ferent levels in a counterbalance way [8, 9, 10, 11].                                                  rithm.
These decision systems are based on a data-driven ap- The study of the fairness of ML algorithms is widely de-
proach and their results are strongly influenced by the bated topic in science, in [20], [21] many performance
                                                                                                       and fairness indices are studied, such as False Positive and
SYSYEM 2022: 8th Scholar’s Yearly Symposium of Technology, Engi-
neering and Mathematics, Brunek, July 23, 2022                                                         Equalized Odds. These metrics could be used to enhance
*
  Corresponding author. All the authors contributed equally.                                           different performance characteristics, that could be in
" alessandro.simonetta@gmail.com (A. Simonetta);                                                       contrast each others and each of them fit best a particular
mariacristina.paoletti@gmail.com (M. C. Paoletti)                                                      class of objectives [22],[23]. For example, there are in-
 0000-0003-2002-9815 (A. Simonetta); 0000-0001-6850-1184
                                                                                                       dices that prefer accuracy and others that prefer precision
(M. C. Paoletti); 0000-0003-3286-3137 (A. Venticinque)
          Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License [20], the right trade-off must be found between the two,
          Attribution 4.0 International (CC BY 4.0).
    CEUR

          CEUR Workshop Proceedings (CEUR-WS.org)
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                                                                                       depending on the problem to be solved. Indeed, in the



                                                                                              76
Alessandro Simonetta et al. CEUR Workshop Proceedings                                                            76–84



cancer detection we prefer recall rather than precision:     be identified through a combination of them too. One
better to plot a healthy patient as probably ill than not    work including metrics for assessing fairness is [22]
to screen one who actually is.                               where Disparate Impact and Demographic Parity (Statis-
Other studies [22, 24, 25, 26] are aimed at identifying the  tical Parity) are introduced. The first one use the ratio
relationship between sensitive attributes and the target     between 𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– ) and 𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘— ),
one. These show dependencies between the number of           instead the second one is the difference between the
incorrect predictions (e.g., ratio of predicted positive to  two probabilities. Demography Parity is present in [32],
real positive) and the features of the dataset. For example, too, and it is referred to as Independence, as it indicates
if it get wrong advantageously with respect to sensitive     the degree of independence of the target variable R
attributes, that is, by attributing more positive outcomes   compared to the sensitive attribute. Another measure of
to them, individuals belonging to this set are considered a  fairness reported in [22] is the Equalized Odds which is
privileged group. Conversely, if the algorithm get wrong     satisfied if the prediction is conditionally independent to
negatively, associating more unfavorable outcomes than       the sensitive attribute, given the true value; it highlight
should normally be indicated, that group is considered       the difference between true positive rate and false
an unprivileged.                                             positive rate. In our work we call the latter index as
Regarding data quality aspects, we identified the inter-     Separation. The Equal Opportuinity [22], requires the
national standards ISO/IEC 25012 [27] and and ISO/IEC        true positive rates to be similar across groups. Other
25024 [28] as the models from which to draw the notion       metrics for estimating faireness are the Sufficiency [32],
of completeness. This choice was also supported by the       similar to Equal Opportunity, but focuses on true values
presence of studies [29],[30] that use these standards for   rather than predicted values, and the Overall Accuracy
dataset construction and maintenance of its quality over     Equality [21], which tests the average error between
time. In particular, we identified the notion of maximum     predictions across groups.
completeness [31] as satisfying the goal we had set for      Determine what fairness metrics are best for finding
ourselves.                                                   what is the right configuration of an algorithm to use
This paper will start from the state of the art, Section II, in a decision support system depends on the purpose
comparing the advantages and disadvantages of different      for which the it should be built and what discrimination
approaches to fairness and show alternative synthesis        risks it may be exposed to. Studies [23], [33] point
solutions that are useful in identifying critical issues in  out that it is not possible to maximize all metrics
the input data. In section III, we will present our solution simultaneously and therefore one must choose among
developed from what has been proposed in the literature      the features that these measures tend to enhance, such
and in particular we will decline it into two different ver- as accuracy and recall. In [25] the authors present a
sions defining its pros and cons. In section IV, we will     framework for comparing indices and highlighting when
point out to identify the limitations of the present work    the maximization of one conflicts with that of another.
and what are the possible future developments. Finally,      Their work goes beyond analyzing individual metrics,
in section V, we will present concluding remarks.            and groups them according to their characteristics
                                                             (fairness of treatment, fairness of opportunity, interest
                                                             groups, sensitive attributes, etc.) and their usefulness
2. State of Art                                              with respect the target that the system has (i.e. support
                                                             for film discovery on a streaming platform). These
In general, a classification model is called fair if the
                                                             are clustered using a hierarchical algorithm applied to
mistakes are equally distributed across the different
                                                             correlation between metrics to identify similar ones.
groups, identified within the sensitive attribute. The
                                                             The results are then diagrammed in a simplified manner
input features X of the values space are mapped onto
                                                             through the use of Principal Component Analysis (PCA)
a target variable R according to a function f(X). Where
                                                             to reduce the state space in two dimensions. In particular,
the values of R are represented by classes or a score, i.e.,
                                                             the authors conclude that through the use of PCA it is
in a range of a scale of N different values, they can be
                                                             possible to explain the relationships among different
mapped back to a binary value by defining a threshold.
                                                             metrics by reducing the state space in a range from one
In order to train these classifiers, example data are
                                                             to three component.
used in which the input features X, are associated to
                                                             The idea of using balancing indices to predict the risk
truth variable Y with the real result. From the goodness
                                                             of discrimination can be found in [34]. In this work for
of these examples derives the quality of the resulting
                                                             the first time they use a measure of fairness applied to
classifier.
                                                             the sensitive attribute and not to the comparison among
From now on, we will refer to A as the sensitive attribute,
                                                             groups. In the next section we will start from this topic
which can identify a minority. Although, these variables
                                                             and then propose two different solutions for calculating
are treated individually, an underprivileged group could
                                                             a synthetic index related to the sensitive attribute.



                                                           77
Alessandro Simonetta et al. CEUR Workshop Proceedings                                                                       76–84



                                                                    the rest of the other ethnic groups (Native-American,
                                                                    Caucasian, Asian, Mexican, Other).
                                                                    The synthetic index described by the equation 3 can
3. Methodology                                                      show on average how much a sensitive attribute is at
                                                                    risk of discrimination, but might underestimate the
The first formal criterion introduced in [32] requires that
                                                                    inequity. With reference to the dataset Compas, in
the sensitive attribute A be statistically independent of
the predicted value R. Assuming we use a dataset with
a field A having cardinality m, 𝐴 = {π‘Ž1 , ..., π‘Žπ‘š }, the
random variable A is independent compared to R if and
only if for each 𝑖, 𝑗 ∈ [1, π‘š], with 𝑖 ΜΈ= 𝑗 we have that:

        𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– ) = 𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘— )            (1)
   To understand how far the two predictions deviate
from the ideal case (zero difference), we can calculate the
distance between two probabilities:


 U(π‘Žπ‘– , π‘Žπ‘— ) = |𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– ) βˆ’ 𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘— )|
                                                     (2)
  To get a synthesis value of the non-independence be-
tween A and R [34], the arithmetic mean of the distances
can be considered:

                               π‘šβˆ’1   π‘š
                         2     βˆ‘οΈ βˆ‘οΈ
  U(π‘Ž1 , .., π‘Žπ‘š ) =                      U(π‘Žπ‘– , π‘Žπ‘— ) (3)
                      π‘š(π‘š βˆ’ 1) 𝑖=1 𝑗=𝑖+1                            Figure 1: Scatter Plot: probability of A=Race and K-Means
                                                                    centroid
  Instead of using the equation 2, some authors [22]
apply a different notion of independence:                  the table 1 it is observed that the values in the second
                                                           column 𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– ) cluster around 0.26 and 0.65.
                                                           Figure 1 shows graphically the values of the table 1, with
   πœ€π‘– = |𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– ) βˆ’ 𝑃 (𝑅 = 1|𝐴 ΜΈ= π‘Žπ‘– )| (4)
                                                           evidence of the centroids identified through the K-Means
   What we have seen so far fails to explain whether there algorithm. The difference between the value calculated
are groups, within the sensitive attribute, that undergo using the equation 3 (P=0.24) and the value obtained
the same mode of treatment nor the presence of discrim- by considering centroids (P=0.39) turns out to be 0.15
ination among groups, the present work was born from points. This demonstrates what was stated earlier with
this reflection. For explaining our idea we prefer to use respect to the use of a central tendency index. However,
an example previously mentioned.                           the equation 3 can be used in the presence of more than
The sensitive attibute A=Race, of Compas dataset, con- two clusters.
tains six different ethnicities shown in the first column
of the table 1.

Table 1
Probability for Sensitive Attribute Race

    𝐴 = π‘Žπ‘–               𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– )    Centroid
    Caucasian                    0.33
    Hispanic                     0.28           0.26
    Other                        0.20
    Asian                        0.23
    African-American             0.58
                                                0.65                Figure 2: Probability of U(π‘Žπ‘– , π‘Žπ‘— ) with A=Race for couples in
    Native-American              0.73
                                                                    Equation 2

 The study [19], has determined that African-
Americans are the unprivileged group compared to                      In the following, we will illustrate alternative methods



                                                               78
Alessandro Simonetta et al. CEUR Workshop Proceedings                                                                     76–84



for calculating different synthetic fairness indices that             regression [40] that offers categorical results and there-
allow for greater sensitivity to discriminatory situations.           fore can be trained to predict the membership of an item
Next, we will try to identify the relationship that exists            in a class.
between the dateset completeness index and the identi-                In order to evaluate the completeness of the dataset we
fied fairness indices. This will allow us to anticipate the           examined the Max Completeness, introduced in [41]. Max-
risks of bias arising from incomplete data.                           imum completeness is an index measuring the percent-
                                                                      age degree of completeness of the dataset (Incomplete=0,
3.1. Dataset                                                          Complete=1) with respect to one or more categorical at-
                                                                      tributes, when the expected value is that in which the
In this section are present the list of the dataset used for          attributes considered have a number of replications equal
the sperimentation:                                                   to that of the predominant. Assuming we wish to cal-
                                                                      culate the completeness of a dataset on a categorical
      β€’ COMPAS Recidivism Dataset [19];                               attribute A:
      β€’ Recidivism in juvenile justice [35]
      β€’ UCI Statelog German Credit [36];                                                               𝑁
                                                                                    𝐢𝑀 𝐴𝑋 (𝐴) =                             (5)
      β€’ default of credit card clients Data Set [37];                                               𝐾𝐴 Β· 𝑀𝑃 𝐢
      β€’ Adult Data Set [38];                                          Where:
      β€’ Student Performance Data Set [39].
                                                                           β€’ N is the total number of instances in dataset;
    The sensitive attributes are listed in the following table             β€’ 𝐾𝐴 is the number of classes of attribute A;
2                                                                          β€’ 𝑀𝑃 𝐢 is the maximum number of elements of a
                                                                             class of attribute A.
Table 2
                                                                      This index could be calculated on multiple categori-
Datasets and Sensitive Attributes
                                                                      cal attributes by considering as 𝐾𝐴 the number of
      Dataset            Attribute            Cardinality             possible combinations of the chosen categorical vari-
                 Race                                   6             ables and as 𝑀𝑃 𝐢 the maximum number of items
     Compas      Sex                                    2             grouped over the number of attributes considered. For
                 Age                                    3             example, considering the Compas dataset and the at-
                 V3_nacionalitat                       35             tributes Race and Sex, to have the maximum complete-
                 V2_estranger                           2             ness 𝐢𝑀 𝐴𝑋 (π‘…π‘Žπ‘π‘’, 𝑆𝑒π‘₯) = 1, we must have a number
     Juvenile
                 V1_sexe                                2             of items for all combinations of race and sex equal to
                 V5_edat_fet_agrupat                    3             2,626, that is the number of items of male and African-
                 V4_nacionalitat_agrupat                5             American ethnicity, which is the category most nu-
                 V8_edat_fet                            5
                                                                      merous. To do this the number of records within the
                 Sex                                    2
     UCI                                                              dataset must increase to 31,512 from the current N=6,172
                 Education                              7
                 Education                             16
                                                                      (𝐢𝑀 𝐴𝑋 (π‘…π‘Žπ‘π‘’, 𝑆𝑒π‘₯) = 0.19).
                 Race                                   5
     Income
                 Sex                                    2             3.2. Clustering Method
                 Native country                        41
                 Status                                 4   Starting from the consideration that the conditional prob-
     Statelog    Sex                                    2   abilities of the random variable R with respect to mem-
                 foreignworker                          2   bership in a sensitive attribute (𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– )) may
                 Sex                                    2   determine affinity in treatment equivalence classes, we
                 Age                                    6   thought of using unsupervised clustering algorithms to
    Student
                 mMather job                            5   identify possible clusters in the probabilities. Among the
                 Father job                             5   ML algorithms that were analyzed, we chose K-Means
                 Mother Education                       5   and DBSCAN [42]. K-Means is a clustering algorithm
                 Father Education                       4
                                                            that tends to separate samples into K groups with equal
                                                            variance, minimizing the within-cluster sum-of-squares
  For each dataset we have calculated the predicted value criterion. To use this method, it is necessary to know
using a classification model, but only Compas Dataset a priori the K number of clusters into which to divide
and Recidivism in juvenile justice have already this in- the samples. Once the centroids have been calculated, it
formation, so we have used the original one. Different is possible to use the equation 2 or the 3 depending on
models for classification are present in literature and are the value of K to obtain the synthetic index. Compared
implemented in software libraries. We chose the logistic with [34], bundling multiple instances of the sensitive



                                                                 79
Alessandro Simonetta et al. CEUR Workshop Proceedings                                                              76–84




Figure 3: K-Means method, 𝐢𝑀 𝐴𝑋 minor of 0.33 and major of 0.66



attribute into groups results in a lower m-number, indeed,      3.3. MinMax Method
the term π‘Žπ‘– is composed of all elements treated similarly.
                                                             Although both clustering methodologies gave good re-
The critical issue of correctly identifying the K-number
                                                             sults to work on, another approach was explored starting
to be used for clustering led us to study other approaches
                                                             from the definition found in [22] to calculate the value
and subsequent experimentation with DBSCAN. This
                                                             of the fairness metric trying to find the worst case. The
sees clusters as areas of high density separated by areas
                                                             process is described below referring to the Demographic
of lower density. The application of such an algorithm
                                                             Parity metric (equation 4) (Independence), but without
needs only the parameters indicating the number of sam-
                                                             loss of generality can be extended to all. This description
ples found in the area forming a cluster and the one
                                                             places emphasis on the fact that π‘Žπ‘– is considered as an
indicating the density required to form a cluster (eps).
                                                             unprivileged group and the set of all other elements as
One of its limitations is that some elements turn out not
                                                             a privileged group. The algorithm, for all values of the
to belong to any cluster; it was decided to treat them as
                                                             sensitive attribute A, calculates the result of the equa-
unitary clusters. Since the concept of a centroid does not
                                                             tion 4 associated with each group by considering from
exist for DBSCAN, the value of the fairness metric was
                                                             time to time the element under observation as discrim-
calculated as follows:
                                                             inated and all others as privileged. Considering, again,
      β€’ clusters: probability as per equation 3 to calculate the field Race, if the element we are calculating for is
         cluster fairness;                                   Asian, this is π‘Žπ‘– and all others constitute the other group
                                                             in the equation. Once we have iterated this process for
      β€’ individual elements: average of the fairness in-
                                                             all values of the sensitive attribute we will go on to select
         dices of individual instances of the sensitive at-
                                                             the highest and lowest result. The former is the group
         tribute.
                                                             for which the predicted variable R is most dependent on
   Applying the two clustering methods resulted ethnicity, while the latter is the most independent. The
in values that were higher on average than those difference between these two values indicates how large
calculated using only the arithmetic mean reported the inequality of treatment between the privileged and
in the equation 3. Furthermore, this made it possi- unprivileged group is, relative to the sensitive attribute
ble to identify groups that were similar in treatment type. considered.
                                                             Compared with the use of clustering, this methodology




                                                           80
Alessandro Simonetta et al. CEUR Workshop Proceedings                                                           76–84




Figure 4: DBSCAN method, 𝐢𝑀 𝐴𝑋 minor of 0.33 and major of 0.66



arises in the worst case by considering as the index of    cases separately. The metrics resulting from this split
treatment disequity the largest difference between the πœ€π‘–  are: Separation TPR (True Positive Rate), Separation FPR
obtained by applying the equation 4 and reported in the    (False Positive Rate), Sufficiency PPV (Positive Predictive
table 3.                                                   Value) and Sufficiency NPV (Negative Predictive Value).
                                                           Once the results were evaluated for the six chosen fair-
Table 3                                                    ness metrics, we related them to the maximum complete-
Conditional Probabilities of Sensitive Attribute and πœ€π‘– by ness balancing index.
equation 4                                                 Each diagram consists of two box plots containing the
                                                           values of sensitive attributes divided in this way: Max-
      𝐴 = π‘Žπ‘–               𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– )      πœ€π‘–       imum Completeness values less than 0.33 (low risk in
      Caucasian                   0.33           0.22      yellow) and greater than 0.66 (high risk in red). Interme-
      Hispanic                    0.28           0.18      diate values are not reported because it is more difficult
      Other                       0.20           0.11      for them to determine whether they are fair or not. We
      Asian                       0.23           0.14
                                                           remarks that Fairness metrics, as defined, take value in
      African-American            0.58           0.51
      Native-American             0.73           0.64
                                                           the range [0,1] (Fair=0, Unfair=1).
                                                           Figure 3 shows the case where the six fairness metrics
       πœ€π‘– =|𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– ) βˆ’ 𝑃 (𝑅 = 1|𝐴 ΜΈ= π‘Žπ‘– )|        are calculated using the K-means technique. Note that
                                                           all boxplots tails overlap, while for the body remains a
                                                           clear separation for Separation TPR and Sufficiency PPV.
4. Discussion                                              The  worst case is for the Sufficiency NPV metric where
                                                           there is total overlap.
During the experimental phase, the methods presented Figure 4 shows the results of applying the DBSCAN tech-
in the previous paragraphs were applied to sensitive at- nique. In this case the values obtained are similar to the
tributes belonging to six known datasets. The fairness K-means, although there are worst results for Overall
metrics used during the testing phase are: Independence, Accuracy Equality and NPV Sufficiency. Such plots were
Separation, Sufficiency and Overall Accuracy Equality as not optimal in 3 too.
defined in [32],[21]. Separation and Sufficiency were cal- Finally, in figure 5 the method of MinMax as for equation
culated by considering positive and negative predicted 4 is applied instead of the clustering algorithms. In these



                                                            81
Alessandro Simonetta et al. CEUR Workshop Proceedings                                                               76–84




Figure 5: MINMAX method, 𝐢𝑀 𝐴𝑋 minor of 0.33 and major of 0.66



diagrams, one can see a lengthening of the boxplots re-         However, establishing a number of clusters that forces
lated to the risk cases and a sharper separation between        such a division could join groups that are not actually
the two boxplots for all diagrams. In the Independence          treated equally. Another point we will investigate is
and OAE cases, there is no more overlap between the tails.      the possibility of changing the algorithm to identify the
In the PPV Sufficiency the high risk cases tend to one          synthetic value of the cluster for the metric under con-
and almost full separation between values is achieved.          sideration. One solution, we will explore, is to integrate
For sensitive attributes that have a 𝐢𝑀 𝐴𝑋 greater than         the calculation of the difference between maximum and
0.66, there are values of fairness metrics greater than 0.2.    minimum in a clustering algorithm that does not have
 In conclusion, the MinMax method yielded better results        a predefined number K of clusters, such as the already
compared to clustering, but conversely, using methods           presented DBSCAN. Related to this algorithm, alterna-
such as K-Means and DBSCAN can help to better define            tives on how to treat elements that are not associated
treatment similarities among groups.                            with any cluster will be explored. The question is: should
                                                                these cases discarded because they are outliers or do they
                                                                represent borderline cases, e.g., a highly discriminated
5. Current Limits and Future                                    minority?
   Works
The experimentation carried out has yielded encoura-            6. Conclusion
ging results for what is the separation of high and low
                                                                The spread of ML algorithms for constructing decision
risk fairness forecast with respect to 𝐢𝑀 𝐴𝑋 . Although
                                                                systems make the data used in their construction increas-
the result obtained shows values that are sometimes in a
                                                                ingly important. Imbalances or biases that may be present
range that is not very wide, improvements have already
                                                                within the information can affect the results of such sys-
been identified that can be investigated in future work.
                                                                tems, causing discrimination toward certain groups.
The first task is the choice of clustering method and pa-
                                                                The use of the fairness metrics that have been presented
rameters that affect the number of clusters. Having few
                                                                becomes important to predict the impact related to such
clusters brings us closer to the worst case, and it becomes
                                                                biases and go to act accordingly on both algorithms and
more easy to identify the correct meaning to give to each
                                                                input data in line with the objectives.
clusters, depending on their distance and their shape.



                                                           82
Alessandro Simonetta et al. CEUR Workshop Proceedings                                                           76–84



In this work we tried to provide a methodology to iden-           motor current signature analysis and the artificial
tify similar clusters by treatment type and to calculate a        neural network 10 (2020) 70–79.
synthetic index that could predict how at risk the system     [8] M. Matta, G. C. Cardarilli, L. Di Nunzio, R. Fazzo-
is with respect to sensitive attributes. The experimen-           lari, D. Giardino, A. Nannarelli, M. Re, S. SpanΓ²,
tation carried out provided good results both in terms            A reinforcement learning-based qam/psk symbol
of identifying agglomerations that undergo similar treat-         synchronizer, IEEE Access 7 (2019) 124147–124157.
ments and in calculating a parameter that would give a            doi:10.1109/ACCESS.2019.2938390.
conservative assessment of the metric.                        [9] R. Brociek, G. Magistris, F. Cardia, F. Coppa,
The relationship between the maximum completeness                 S. Russo, Contagion prevention of covid-19 by
index and the fairness indices calculated by the showed           means of touch detection for retail stores, in: CEUR
methods provided a guideline in order to recognize high-          Workshop Proceedings, volume 3092, CEUR-WS,
risk and lower-risk sensitive attributes. This will give to       2021, pp. 89–94.
the analysts the information to better configure classifi- [10] L. Canese, G. C. Cardarilli, L. Di Nunzio, R. Faz-
cation algorithms.                                                zolari, D. Giardino, M. Re, S. SpanΓ², Multi-agent
The results of this work lay the groundwork for future            reinforcement learning: A review of challenges
developments aimed at improving the identification of             and applications, Applied Sciences 11 (2021).
groups within sensitive attributes and researching alter-         URL: https://www.mdpi.com/2076-3417/11/11/4948.
native synthetic indices that will have a greater precision.      doi:10.3390/app11114948.
                                                             [11] N. Brandizzi, S. Russo, R. Brociek, A. Wajda, First
                                                                  studies to apply the theory of mind theory to green
References                                                        and smart mobility by using gaussian area cluster-
                                                                  ing, volume 3118, CEUR-WS, 2021, pp. 71–76.
  [1] The Economist, The world’s most valuable resource
                                                             [12] F. Fallucchi, M. Gerardi, M. Petito, E. De Luca,
       is no longer oil, but data, The Economist, USA (6th
                                                                  Blockchain framework in digital government for
       May 2019).
                                                                  the certification of authenticity, timestamping and
  [2] B. Marr, The 5 biggest data science
                                                                  data property, 2021. doi:10.24251/HICSS.2021.
       trends in 2022, Oct 2021. URL: https:
                                                                  282.
       //www.forbes.com/sites/bernardmarr/2021/
                                                             [13] N. Brandizzi, V. Bianco, G. Castro, S. Russo, A. Wa-
       10/04/the-5-biggest-data-science-trends-in-2022/
                                                                  jda, Automatic rgb inference based on facial emo-
       ?sh=22f5fc1d40d3(AccessedMay,2022).
                                                                  tion recognition, in: CEUR Workshop Proceedings,
  [3] R. Giuliano, The next generation network in 2030:
                                                                  volume 3092, CEUR-WS, 2021, pp. 66–74.
       Applications, services, and enabling technologies,
                                                             [14] B. Nowak, R. Nowicki, M. WoΕΊniak, C. Napoli,
       in: 2021 8th International Conference on Elec-
                                                                  Multi-class nearest neighbour classifier for
       trical Engineering, Computer Science and Infor-
                                                                  incomplete data handling, in: Lecture Notes
       matics (EECSI), 2021, pp. 294–298. doi:10.23919/
                                                                  in Artificial Intelligence (Subseries of Lec-
       EECSI53397.2021.9624241.
                                                                  ture Notes in Computer Science), volume
  [4] C. Napoli, G. Pappalardo, E. Tramontana, A
                                                                  9119, Springer Verlag, 2015, pp. 469–480.
       hybrid neuro-wavelet predictor for qos control
                                                                  doi:10.1007/978-3-319-19324-3_42.
       and stability, Lecture Notes in Computer Sci-
                                                             [15] D. PoΕ‚ap, M. WoΕΊniak, C. Napoli, E. Tramontana,
       ence (including subseries Lecture Notes in Arti-
                                                                  R. Damaőevičius, Is the colony of ants able to rec-
       ficial Intelligence and Lecture Notes in Bioinfor-
                                                                  ognize graphic objects?, Communications in Com-
       matics) 8249 LNAI (2013) 527–538. doi:10.1007/
                                                                  puter and Information Science 538 (2015) 376–387.
       978-3-319-03524-6_45.
                                                                  doi:10.1007/978-3-319-24770-0_33.
  [5] S. Verma, R. Sharma, S. Deb, D. Maitra, Artifi-
                                                             [16] S. Illari, S. Russo, R. Avanzato, C. Napoli, A cloud-
       cial intelligence in marketing: Systematic review
                                                                  oriented architecture for the remote assessment
       and future research direction, International Jour-
                                                                  and follow-up of hospitalized patients, in: CEUR
       nal of Information Management Data Insights 1
                                                                  Workshop Proceedings, volume 2694, CEUR-WS,
       (2021) 100002. doi:https://doi.org/10.1016/
                                                                  2020, pp. 29–35.
       j.jjimei.2020.100002.
                                                             [17] N. Dat, V. Ponzi, S. Russo, F. Vincelli, Supporting
  [6] G. Capizzi, G. Lo Sciuto, C. Napoli, E. Tramontana,
                                                                  impaired people with a following robotic assistant
       An advanced neural network based solution to en-
                                                                  by means of end-to-end visual target navigation
       force dispatch continuity in smart grids, Applied
                                                                  and reinforcement learning approaches, in: CEUR
       Soft Computing Journal 62 (2018) 768 – 775.
                                                                  Workshop Proceedings, volume 3118, CEUR-WS,
  [7] T. Dhomad, A. Jaber, Bearing fault diagnosis using
                                                                  2021, pp. 51–63.
                                                             [18] J. Angwin, J. Larson, S. Mattu, L. Kirchner, Machine



                                                         83
Alessandro Simonetta et al. CEUR Workshop Proceedings                                                         76–84



     bias : There’s software used across the country            cation using iso/iec 25012: Industrial experi-
     to predict future criminals. and it’s biased against       ences, Journal of Systems and Software 176
     blacks., https://www.propublica.org/ (2016).               (2021) 110938. URL: https://www.sciencedirect.com/
[19] J. Larson, S. Mattu, L. Kirchner, J. Angwin,               science/article/pii/S0164121221000352. doi:https:
     Compas recidivism dataset,              2016. URL:         //doi.org/10.1016/j.jss.2021.110938.
     https://www.propublica.org/article/                   [31] A. Simonetta, A. Trenta, M. C. Paoletti, A. VetrΓ²,
     how-we-analyzed-the-compas-recidivism-algorithm/           Metrics for identifying bias in datasets, SYSTEM
     (AccessedOct,2021).                                        (2021).
[20] J. Lee, Analysis of precision and accuracy in a [32] S. Barocas, M. Hardt, A. Narayanan, Fairness and
     simple model of machine learning, Journal of the           machine learning, 2020. URL: https://fairmlbook.
     Korean Physical Society, 2017, p. 866–870. doi:10.         org/(AccessedSept,2021), chapter: Classification.
     3938/jkps.71.866.                                     [33] K. Burkholder, K. Kwock, Y. Xu, J. Liu, C. Chen,
[21] A. Carey, X. Wu, The statistical fairness field            S. Xie, Certification and trade-off of multiple fair-
     guide: perspectives from social and formal sci-            ness criteria in graph-based spam detection, Asso-
     ences, AI and Ethics (2022) 1–23. doi:10.1007/             ciation for Computing Machinery, 2021, p. 130–139.
     s43681-022-00183-3.                                        doi:10.1145/3459637.3482325.
[22] D. Pessach, E. Shmueli, Algorithmic fairness, [34] A. VetrΓ², M. Torchiano, M. Mecati, A data
     2020. URL: https://arxiv.org/abs/2001.09784. doi:10.       quality approach to the identification of discrim-
     48550/ARXIV.2001.09784.                                    ination risk in automated decision making sys-
[23] S. Prince, Bias and fairness in ai, 2019.                  tems, Government Information Quarterly 38
     URL:         https://www.borealisai.com/en/blog/           (2021) 101619. URL: https://www.sciencedirect.com/
     tutorial1-bias-and-fairness-ai/(accessedMar,               science/article/pii/S0740624X21000551. doi:https:
     2022).                                                     //doi.org/10.1016/j.giq.2021.101619.
[24] G. Capizzi, G. Lo Sciuto, C. Napoli, M. WoΕΊniak, [35] Department of Justice, Recidivism in juvenile jus-
     G. Susi, A spiking neural network-based long-term          tice, 2016. URL: https://cejfe.gencat.cat/en/recerca/
     prediction system for biogas production, Neural            opendata/jjuvenil/reincidencia-justicia-menors/
     Networks 129 (2020) 271 – 279.                             index.html(AccessedOct,2021).
[25] M. Miron, S. Tolan, E. GΓ³mez, C. Castillo, Address- [36] D. H. Hofmann, Uci statelog german credit,
     ing multiple metrics of group fairness in data-driven      1994. URL: https://archive.ics.uci.edu/ml/datasets/
     decision making, 2020. URL: https://arxiv.org/abs/         statlog+(german+credit+data)(AccessedOct,2021).
     2003.04794. doi:10.48550/ARXIV.2003.04794. [37] I.-C. Yeh, default of credit card clients data set,
[26] G. Capizzi, G. Lo Sciuto, C. Napoli, R. Shikler,           2016. URL: https://archive.ics.uci.edu/ml/datasets/
     M. Wozniak, Optimizing the organic solar cell man-         default+of+credit+card+clients)(AccessedOct,
     ufacturing process by means of afm measurements            2021.
     and neural networks, Energies 11 (2018).              [38] R. Kohavi, B. Becker, Adult data set, 1996.
[27] International Organization for Standardization,            URL:        https://archive.ics.uci.edu/ml/datasets/
     "ISO/IEC 25012:2008 Software engineering β€”                 adult(AccessedOct,2021).
     Software product Quality Requirements and [39] P. Cortez, Student performance data sett, 2014. URL:
     Evaluation (SQuaRE) β€” Data quality model",                 https://archive.ics.uci.edu/ml/datasets/student+
     2008. URL: https://www.iso.org/standard/35736.             performance(AccessedOct,2021).
     html(accessedJan,2021).                               [40] scikit-learn developers, Logistic regression,
[28] International Organization for Standardization,            2022.      URL:      https://scikit-learn.org/stable/
     "ISO/IEC 25024:2015 Systems and software engi-             modules/generated/sklearn.linear_model.
     neering β€” Systems and software Quality Require-            LogisticRegression.html(AccessedMay,2022).
     ments and Evaluation (SQuaRE) β€” Measurement [41] A. Simonetta, A. VetrΓ², M. C. Paoletti, M. Torchi-
     of data quality", 2015. URL: https://www.iso.org/          ano, Integrating square data quality model with iso
     standard/35749.html(accessedJan,2022).                     31000 risk management to measure and mitigate
[29] J. Calabrese, S. Esponda, P. M. Pesado, Frame-             software bias, CEUR Workshop Proceedings (2021)
     work for Data Quality Evaluation Based on ISO/IEC          pp. 17–22.
     25012 and ISO/IEC 25024, in: VIII Conference on [42] scikit-learn developers, Algorithmic fairness,
     Cloud Computing, Big Data & Emerging Topics,               2022. URL: https://scikit-learn.org/stable/modules/
     2020. URL: http://sedici.unlp.edu.ar/handle/10915/         clustering.html(AccessedMay,2022).
     104778.
[30] F. Gualo, M. Rodriguez, J. Verdugo, I. Ca-
     ballero, M. Piattini,        Data quality certifi-



                                                         84