<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The use of Maximum Completeness to Estimate Bias in AI-based Recommendation Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessandro Simonetta</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Cristina Paoletti</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessio Venticinque</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Electrical and Information Engineering, University of Naples Federico II</institution>
          ,
          <addr-line>Napoli</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Enterprise Engineering, University of Rome Tor Vergata</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <fpage>76</fpage>
      <lpage>84</lpage>
      <abstract>
        <p>The use of AI based recommendation systems, based on data analysis using Machine Learning algorithms, is taking away people's full control over decision making. The presence of unbalanced and incomplete data can cause discrimination to religious, ethnic, and political minorities without this phenomenon being easily detectable. In this context, it becomes critically important to understand what are the potential risks associated with learning with such a dataset and what consequences it may have on the outcome of decision making using Machine Learning algorithms. In this paper, we tried to identify how to measure the group fairness of a prediction of a classification algorithm, to identify the quality features of the dataset that influence the learning process, and finally, to evaluate the relationships between the quality features and the fairness measures.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Fairness</kwd>
        <kwd>Clustering</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Completeness</kwd>
        <kwd>ISO/IEC 25012</kwd>
        <kwd>Maximum Completeness</kwd>
        <kwd>Metrics</kwd>
        <kwd>Bias</kwd>
        <kwd>Classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        quality of the information available (balance,
completeness, ...). The correctness and authenticity of the data
In 2019, the Economist [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] stated that data is an impor- alone [
        <xref ref-type="bibr" rid="ref12 ref13 ref14 ref15 ref16 ref17">12, 13, 14, 15, 16, 17</xref>
        ] are not enough to guarantee
tant resource comparable to oil. Moreover, Forbes [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] their quality. Thus, the presence of poor quality in the
defines data as the fuel of the information age, and Ma- data, or a low level of representativeness, can lead to
chine Learning (ML) as the engine that uses it. These biased learning.
technologies in addition to the evolution of networks A very striking example of discrimination is the result
[
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] is providing opportunities to develop new applica- of the algorithm used in Florida on predicting the risk of
tions. re-ofense, brought to light by the nonprofit organization
Many companies and organizations are investing, increas- ProPublica [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. In fact, the algorithm, which assigned
ingly, in decision-making processes centered on AI based each person a score indicating the likelihood of
reofrecommendation systems to ofer a variety of services fending, was trained using an unbalanced dataset, and
ranging from marketing [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] to fault diagnosis [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Tools as result, black people showed greater recidivism than
that make use of these types of algorithms relieve people other ethnicities [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
from making decisions that may be influenced by moods, In this paper, we will present a methodology that,
startbiases and subjective thoughts, they ensure fairness and ing from training data, allows us to estimate the risk of
repeatability at diferent times too. The reasons why ML getting an unfair treatment in the prediction.
algorithms arrived at a certain type of result may not be To do this, it is necessary to identify how to measure the
transparent or easily understood by users. For this reason, fairness of a prediction of a classification algorithm, to
techniques have been introduced, such as Explenable AI, discover the quality characteristics of the dataset that
which allow analysts to understand how a given choice influence the goodness of learning, and, last but not
was arrived at, or Reinforcement Learning, which allows least, to study the relationships that exist between quality
the decision-making process to be distributed across dif- characteristics and the fairness measures of the ML
algoferent levels in a counterbalance way [
        <xref ref-type="bibr" rid="ref10 ref11 ref8 ref9">8, 9, 10, 11</xref>
        ]. rithm.
      </p>
      <p>
        These decision systems are based on a data-driven ap- The study of the fairness of ML algorithms is widely
deproach and their results are strongly influenced by the bated topic in science, in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] many performance
and fairness indices are studied, such as False Positive and
SnYeeSrYinEgMa2n0d2M2: a8tthhe mScahtoiclas,r’BsrYuenaerkl,yJSuylym2p3o,s2iu02m2 of Technology, Engi- Equalized Odds. These metrics could be used to enhance
* Corresponding author. All the authors contributed equally. diferent performance characteristics, that could be in
" alessandro.simonetta@gmail.com (A. Simonetta); contrast each others and each of them fit best a particular
mariacristina.paoletti@gmail.com (M. C. Paoletti) class of objectives [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ],[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. For example, there are
in(M.0C0.00P-a0o0l0e3tt-i2);00020-0908-10500(3A-.32S8im6-o3n1e3t7ta(A); .0V00e0n-t0ic0i0n1q-u6e8)50-1184 dices that prefer accuracy and others that prefer precision
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], the right trade-of must be found between the two,
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) depending on the problem to be solved. Indeed, in the
cancer detection we prefer recall rather than precision: be identified through a combination of them too. One
better to plot a healthy patient as probably ill than not work including metrics for assessing fairness is [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]
to screen one who actually is. where Disparate Impact and Demographic Parity
(StatisOther studies [
        <xref ref-type="bibr" rid="ref22 ref24 ref25 ref26">22, 24, 25, 26</xref>
        ] are aimed at identifying the tical Parity) are introduced. The first one use the ratio
relationship between sensitive attributes and the target between  ( = 1| = ) and  ( = 1| =  ),
one. These show dependencies between the number of instead the second one is the diference between the
incorrect predictions (e.g., ratio of predicted positive to two probabilities. Demography Parity is present in [32],
real positive) and the features of the dataset. For example, too, and it is referred to as Independence, as it indicates
if it get wrong advantageously with respect to sensitive the degree of independence of the target variable R
attributes, that is, by attributing more positive outcomes compared to the sensitive attribute. Another measure of
to them, individuals belonging to this set are considered a fairness reported in [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] is the Equalized Odds which is
privileged group. Conversely, if the algorithm get wrong satisfied if the prediction is conditionally independent to
negatively, associating more unfavorable outcomes than the sensitive attribute, given the true value; it highlight
should normally be indicated, that group is considered the diference between true positive rate and false
an unprivileged. positive rate. In our work we call the latter index as
Regarding data quality aspects, we identified the inter- Separation. The Equal Opportuinity [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], requires the
national standards ISO/IEC 25012 [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] and and ISO/IEC true positive rates to be similar across groups. Other
25024 [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] as the models from which to draw the notion metrics for estimating faireness are the Suficiency [32],
of completeness. This choice was also supported by the similar to Equal Opportunity, but focuses on true values
presence of studies [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ],[
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] that use these standards for rather than predicted values, and the Overall Accuracy
dataset construction and maintenance of its quality over Equality [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], which tests the average error between
time. In particular, we identified the notion of maximum predictions across groups.
completeness [31] as satisfying the goal we had set for Determine what fairness metrics are best for finding
ourselves. what is the right configuration of an algorithm to use
This paper will start from the state of the art, Section II, in a decision support system depends on the purpose
comparing the advantages and disadvantages of diferent for which the it should be built and what discrimination
approaches to fairness and show alternative synthesis risks it may be exposed to. Studies [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], [33] point
solutions that are useful in identifying critical issues in out that it is not possible to maximize all metrics
the input data. In section III, we will present our solution simultaneously and therefore one must choose among
developed from what has been proposed in the literature the features that these measures tend to enhance, such
and in particular we will decline it into two diferent ver- as accuracy and recall. In [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] the authors present a
sions defining its pros and cons. In section IV, we will framework for comparing indices and highlighting when
point out to identify the limitations of the present work the maximization of one conflicts with that of another.
and what are the possible future developments. Finally, Their work goes beyond analyzing individual metrics,
in section V, we will present concluding remarks. and groups them according to their characteristics
(fairness of treatment, fairness of opportunity, interest
groups, sensitive attributes, etc.) and their usefulness
2. State of Art with respect the target that the system has (i.e. support
for film discovery on a streaming platform). These
In general, a classification model is called fair if the are clustered using a hierarchical algorithm applied to
mistakes are equally distributed across the diferent correlation between metrics to identify similar ones.
groups, identified within the sensitive attribute. The The results are then diagrammed in a simplified manner
input features X of the values space are mapped onto through the use of Principal Component Analysis (PCA)
a target variable R according to a function f(X). Where to reduce the state space in two dimensions. In particular,
the values of R are represented by classes or a score, i.e., the authors conclude that through the use of PCA it is
in a range of a scale of N diferent values, they can be possible to explain the relationships among diferent
mapped back to a binary value by defining a threshold. metrics by reducing the state space in a range from one
In order to train these classifiers, example data are to three component.
used in which the input features X, are associated to The idea of using balancing indices to predict the risk
truth variable Y with the real result. From the goodness of discrimination can be found in [34]. In this work for
of these examples derives the quality of the resulting the first time they use a measure of fairness applied to
classifier. the sensitive attribute and not to the comparison among
From now on, we will refer to A as the sensitive attribute, groups. In the next section we will start from this topic
which can identify a minority. Although, these variables and then propose two diferent solutions for calculating
are treated individually, an underprivileged group could a synthetic index related to the sensitive attribute.
the rest of the other ethnic groups (Native-American,
Caucasian, Asian, Mexican, Other).
      </p>
      <p>The synthetic index described by the equation 3 can
show on average how much a sensitive attribute is at
risk of discrimination, but might underestimate the
inequity. With reference to the dataset Compas, in</p>
    </sec>
    <sec id="sec-2">
      <title>3. Methodology</title>
      <p>The first formal criterion introduced in [ 32] requires that
the sensitive attribute A be statistically independent of
the predicted value R. Assuming we use a dataset with
a field A having cardinality m,  = {1, ..., }, the
random variable A is independent compared to R if and
only if for each ,  ∈ [1, ], with  ̸=  we have that:
 ( = 1| = ) =  ( = 1| =  )
(1)</p>
      <sec id="sec-2-1">
        <title>To understand how far the two predictions deviate from the ideal case (zero diference), we can calculate the distance between two probabilities:</title>
        <p>U(,  ) = | ( = 1| = ) −  ( = 1| =  )|
(2)</p>
        <p>To get a synthesis value of the non-independence
between A and R [34], the arithmetic mean of the distances
can be considered:</p>
        <p>U(1, .., ) =</p>
        <p>2
( − 1) =1 =+1
− 1 
∑︁ ∑︁</p>
        <p>U(,  ) (3)</p>
      </sec>
      <sec id="sec-2-2">
        <title>Instead of using the equation 2, some authors [22]</title>
        <p>apply a diferent notion of independence:
the table 1 it is observed that the values in the second
column  ( = 1| = ) cluster around 0.26 and 0.65.
 = | ( = 1| = ) −  ( = 1| ̸= )| (4) eFvigiduerenc1esohfotwhes gceranptrhoiicdaslliydethnetivfiedaltuhersooufgthhtehteabKl-eM1e,awnisth
What we have seen so far fails to explain whether there algorithm. The diference between the value calculated
are groups, within the sensitive attribute, that undergo using the equation 3 (P=0.24) and the value obtained
the same mode of treatment nor the presence of discrim- by considering centroids (P=0.39) turns out to be 0.15
ination among groups, the present work was born from points. This demonstrates what was stated earlier with
this reflection. For explaining our idea we prefer to use respect to the use of a central tendency index. However,
an example previously mentioned. the equation 3 can be used in the presence of more than
The sensitive attibute A=Race, of Compas dataset, con- two clusters.
tains six diferent ethnicities shown in the first column
of the table 1.</p>
        <p>
          In the following, we will illustrate alternative methods
• COMPAS Recidivism Dataset [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ];
• Recidivism in juvenile justice [35]
• UCI Statelog German Credit [36];
• default of credit card clients Data Set [37];
• Adult Data Set [38];
• Student Performance Data Set [39].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>The sensitive attributes are listed in the following table 2</title>
        <p>This index could be calculated on multiple
categorical attributes by considering as  the number of
possible combinations of the chosen categorical
variables and as   the maximum number of items
grouped over the number of attributes considered. For
example, considering the Compas dataset and the
attributes Race and Sex, to have the maximum
completeness  (, ) = 1, we must have a number
of items for all combinations of race and sex equal to
2,626, that is the number of items of male and
AfricanAmerican ethnicity, which is the category most
numerous. To do this the number of records within the
dataset must increase to 31,512 from the current N=6,172
( (, ) = 0.19).
3.2. Clustering Method
for calculating diferent synthetic fairness indices that regression [40] that ofers categorical results and
thereallow for greater sensitivity to discriminatory situations. fore can be trained to predict the membership of an item
Next, we will try to identify the relationship that exists in a class.
between the dateset completeness index and the identi- In order to evaluate the completeness of the dataset we
ifed fairness indices. This will allow us to anticipate the examined the Max Completeness, introduced in [41].
Maxrisks of bias arising from incomplete data. imum completeness is an index measuring the
percentage degree of completeness of the dataset (Incomplete=0,
3.1. Dataset Complete=1) with respect to one or more categorical
attributes, when the expected value is that in which the
In this section are present the list of the dataset used for attributes considered have a number of replications equal
the sperimentation: to that of the predominant. Assuming we wish to
calculate the completeness of a dataset on a categorical
attribute A:</p>
        <sec id="sec-2-3-1">
          <title>Dataset</title>
        </sec>
        <sec id="sec-2-3-2">
          <title>Compas</title>
        </sec>
        <sec id="sec-2-3-3">
          <title>Juvenile UCI</title>
        </sec>
        <sec id="sec-2-3-4">
          <title>Income</title>
        </sec>
        <sec id="sec-2-3-5">
          <title>Race</title>
          <p>Sex
Age</p>
        </sec>
        <sec id="sec-2-3-6">
          <title>V3_nacionalitat</title>
        </sec>
        <sec id="sec-2-3-7">
          <title>V2_estranger</title>
        </sec>
        <sec id="sec-2-3-8">
          <title>V1_sexe</title>
        </sec>
        <sec id="sec-2-3-9">
          <title>V5_edat_fet_agrupat</title>
        </sec>
        <sec id="sec-2-3-10">
          <title>V4_nacionalitat_agrupat</title>
        </sec>
        <sec id="sec-2-3-11">
          <title>V8_edat_fet Sex</title>
        </sec>
        <sec id="sec-2-3-12">
          <title>Education</title>
        </sec>
        <sec id="sec-2-3-13">
          <title>Education</title>
        </sec>
        <sec id="sec-2-3-14">
          <title>Race</title>
          <p>Sex</p>
        </sec>
        <sec id="sec-2-3-15">
          <title>Native country</title>
        </sec>
        <sec id="sec-2-3-16">
          <title>Status Sex foreignworker Sex</title>
          <p>Age
mMather job</p>
        </sec>
        <sec id="sec-2-3-17">
          <title>Father job</title>
        </sec>
        <sec id="sec-2-3-18">
          <title>Mother Education</title>
        </sec>
        <sec id="sec-2-3-19">
          <title>Father Education</title>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>Starting from the consideration that the conditional prob</title>
        <p>Statelog abilities of the random variable R with respect to
membership in a sensitive attribute ( ( = 1| = )) may
determine afinity in treatment equivalence classes, we
thought of using unsupervised clustering algorithms to
Student identify possible clusters in the probabilities. Among the
ML algorithms that were analyzed, we chose K-Means
and DBSCAN [42]. K-Means is a clustering algorithm
that tends to separate samples into K groups with equal
variance, minimizing the within-cluster sum-of-squares</p>
        <p>For each dataset we have calculated the predicted value criterion. To use this method, it is necessary to know
using a classification model, but only Compas Dataset a priori the K number of clusters into which to divide
and Recidivism in juvenile justice have already this in- the samples. Once the centroids have been calculated, it
formation, so we have used the original one. Diferent is possible to use the equation 2 or the 3 depending on
models for classification are present in literature and are the value of K to obtain the synthetic index. Compared
implemented in software libraries. We chose the logistic with [34], bundling multiple instances of the sensitive
attribute into groups results in a lower m-number, indeed, 3.3. MinMax Method
the term  is composed of all elements treated similarly.</p>
        <p>The critical issue of correctly identifying the K-number
to be used for clustering led us to study other approaches
and subsequent experimentation with DBSCAN. This
sees clusters as areas of high density separated by areas
of lower density. The application of such an algorithm
needs only the parameters indicating the number of
samples found in the area forming a cluster and the one
indicating the density required to form a cluster (eps).</p>
        <p>One of its limitations is that some elements turn out not
to belong to any cluster; it was decided to treat them as
unitary clusters. Since the concept of a centroid does not
exist for DBSCAN, the value of the fairness metric was
calculated as follows:</p>
      </sec>
      <sec id="sec-2-5">
        <title>Although both clustering methodologies gave good re</title>
        <p>
          sults to work on, another approach was explored starting
from the definition found in [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] to calculate the value
of the fairness metric trying to find the worst case. The
process is described below referring to the Demographic
Parity metric (equation 4) (Independence), but without
loss of generality can be extended to all. This description
places emphasis on the fact that  is considered as an
unprivileged group and the set of all other elements as
a privileged group. The algorithm, for all values of the
sensitive attribute A, calculates the result of the
equation 4 associated with each group by considering from
time to time the element under observation as
discriminated and all others as privileged. Considering, again,
• clusters: probability as per equation 3 to calculate the field Race, if the element we are calculating for is
cluster fairness; Asian, this is  and all others constitute the other group
• individual elements: average of the fairness in- in the equation. Once we have iterated this process for
dices of individual instances of the sensitive at- all values of the sensitive attribute we will go on to select
tribute. the highest and lowest result. The former is the group
for which the predicted variable R is most dependent on
ethnicity, while the latter is the most independent. The
diference between these two values indicates how large
the inequality of treatment between the privileged and
unprivileged group is, relative to the sensitive attribute
considered.
        </p>
        <p>Compared with the use of clustering, this methodology</p>
      </sec>
      <sec id="sec-2-6">
        <title>Applying the two clustering methods resulted</title>
        <p>in values that were higher on average than those
calculated using only the arithmetic mean reported
in the equation 3. Furthermore, this made it
possible to identify groups that were similar in treatment type.
arises in the worst case by considering as the index of
treatment disequity the largest diference between the 
obtained by applying the equation 4 and reported in the
table 3.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Discussion</title>
      <sec id="sec-3-1">
        <title>During the experimental phase, the methods presented</title>
        <p>
          in the previous paragraphs were applied to sensitive
attributes belonging to six known datasets. The fairness
metrics used during the testing phase are: Independence,
Separation, Suficiency and Overall Accuracy Equality as
defined in [ 32],[
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. Separation and Suficiency were
calculated by considering positive and negative predicted
cases separately. The metrics resulting from this split
are: Separation TPR (True Positive Rate), Separation FPR
(False Positive Rate), Suficiency PPV (Positive Predictive
Value) and Suficiency NPV (Negative Predictive Value).
Once the results were evaluated for the six chosen
fairness metrics, we related them to the maximum
completeness balancing index.
        </p>
        <p>
          Each diagram consists of two box plots containing the
values of sensitive attributes divided in this way:
Maximum Completeness values less than 0.33 (low risk in
yellow) and greater than 0.66 (high risk in red).
Intermediate values are not reported because it is more dificult
for them to determine whether they are fair or not. We
remarks that Fairness metrics, as defined, take value in
the range [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ] (Fair=0, Unfair=1).
        </p>
        <p>Figure 3 shows the case where the six fairness metrics
are calculated using the K-means technique. Note that
all boxplots tails overlap, while for the body remains a
clear separation for Separation TPR and Suficiency PPV.
The worst case is for the Suficiency NPV metric where
there is total overlap.</p>
        <p>Figure 4 shows the results of applying the DBSCAN
technique. In this case the values obtained are similar to the
K-means, although there are worst results for Overall
Accuracy Equality and NPV Suficiency. Such plots were
not optimal in 3 too.</p>
        <p>Finally, in figure 5 the method of MinMax as for equation
4 is applied instead of the clustering algorithms. In these</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Current Limits and Future Works</title>
      <p>diagrams, one can see a lengthening of the boxplots re- However, establishing a number of clusters that forces
lated to the risk cases and a sharper separation between such a division could join groups that are not actually
the two boxplots for all diagrams. In the Independence treated equally. Another point we will investigate is
and OAE cases, there is no more overlap between the tails. the possibility of changing the algorithm to identify the
In the PPV Suficiency the high risk cases tend to one synthetic value of the cluster for the metric under
conand almost full separation between values is achieved. sideration. One solution, we will explore, is to integrate
For sensitive attributes that have a  greater than the calculation of the diference between maximum and
0.66, there are values of fairness metrics greater than 0.2. minimum in a clustering algorithm that does not have
In conclusion, the MinMax method yielded better results a predefined number K of clusters, such as the already
compared to clustering, but conversely, using methods presented DBSCAN. Related to this algorithm,
alternasuch as K-Means and DBSCAN can help to better define tives on how to treat elements that are not associated
treatment similarities among groups. with any cluster will be explored. The question is: should
these cases discarded because they are outliers or do they
represent borderline cases, e.g., a highly discriminated
minority?</p>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion</title>
      <sec id="sec-5-1">
        <title>The experimentation carried out has yielded encoura</title>
        <p>ging results for what is the separation of high and low
risk fairness forecast with respect to  . Although The spread of ML algorithms for constructing decision
the result obtained shows values that are sometimes in a systems make the data used in their construction
increasrange that is not very wide, improvements have already ingly important. Imbalances or biases that may be present
been identified that can be investigated in future work. within the information can afect the results of such
sysThe first task is the choice of clustering method and pa- tems, causing discrimination toward certain groups.
rameters that afect the number of clusters. Having few The use of the fairness metrics that have been presented
clusters brings us closer to the worst case, and it becomes becomes important to predict the impact related to such
more easy to identify the correct meaning to give to each biases and go to act accordingly on both algorithms and
clusters, depending on their distance and their shape. input data in line with the objectives.
In this work we tried to provide a methodology to
identify similar clusters by treatment type and to calculate a
synthetic index that could predict how at risk the system
is with respect to sensitive attributes. The
experimentation carried out provided good results both in terms
of identifying agglomerations that undergo similar
treatments and in calculating a parameter that would give a
conservative assessment of the metric.</p>
        <p>The relationship between the maximum completeness
index and the fairness indices calculated by the showed
methods provided a guideline in order to recognize
highrisk and lower-risk sensitive attributes. This will give to
the analysts the information to better configure
classification algorithms.</p>
        <p>The results of this work lay the groundwork for future
developments aimed at improving the identification of
groups within sensitive attributes and researching
alternative synthetic indices that will have a greater precision.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>The</given-names>
            <surname>Economist</surname>
          </string-name>
          ,
          <article-title>The world's most valuable resource is no longer oil, but data, The Economist</article-title>
          ,
          <source>USA (6th May</source>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Marr</surname>
          </string-name>
          ,
          <article-title>The 5 biggest data science trends in 2022, Oct 2021</article-title>
          . URL: https: //www.forbes.com/sites/bernardmarr/2021/ 10/04/the-5
          <article-title>-biggest-data-science-trends-in-2022/ ?sh=22f5fc1d40d3(AccessedMay,</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Giuliano</surname>
          </string-name>
          ,
          <article-title>The next generation network in 2030: Applications, services, and enabling technologies</article-title>
          ,
          <source>in: 2021 8th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>294</fpage>
          -
          <lpage>298</lpage>
          . doi:
          <volume>10</volume>
          .23919/ EECSI53397.
          <year>2021</year>
          .
          <volume>9624241</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Pappalardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          ,
          <article-title>A hybrid neuro-wavelet predictor for qos control and stability</article-title>
          ,
          <source>Lecture Notes in Computer Science (including subseries Lecture Notes in Artiifcial Intelligence and Lecture Notes in Bioinformatics) 8249 LNAI</source>
          (
          <year>2013</year>
          )
          <fpage>527</fpage>
          -
          <lpage>538</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>319</fpage>
          -03524-6_
          <fpage>45</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Deb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Maitra</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence in marketing: Systematic review and future research direction</article-title>
          ,
          <source>International Journal of Information Management Data Insights</source>
          <volume>1</volume>
          (
          <year>2021</year>
          )
          <article-title>100002</article-title>
          . doi:https://doi.org/10.1016/ j.jjimei.
          <year>2020</year>
          .
          <volume>100002</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Capizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. Lo</given-names>
            <surname>Sciuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Tramontana,</surname>
          </string-name>
          <article-title>An advanced neural network based solution to enforce dispatch continuity in smart grids</article-title>
          ,
          <source>Applied Soft Computing Journal</source>
          <volume>62</volume>
          (
          <year>2018</year>
          )
          <fpage>768</fpage>
          -
          <lpage>775</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Dhomad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jaber</surname>
          </string-name>
          ,
          <article-title>Bearing fault diagnosis using motor current signature analysis</article-title>
          and
          <source>the artificial neural network 10</source>
          (
          <year>2020</year>
          )
          <fpage>70</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Matta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. C.</given-names>
            <surname>Cardarilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Di</given-names>
            <surname>Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fazzolari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Giardino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nannarelli</surname>
          </string-name>
          , M. Re,
          <string-name>
            <given-names>S.</given-names>
            <surname>Spanò</surname>
          </string-name>
          ,
          <article-title>A reinforcement learning-based qam/psk symbol synchronizer</article-title>
          ,
          <source>IEEE Access 7</source>
          (
          <year>2019</year>
          )
          <fpage>124147</fpage>
          -
          <lpage>124157</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2019</year>
          .
          <volume>2938390</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Brociek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Magistris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cardia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Coppa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <article-title>Contagion prevention of covid-19 by means of touch detection for retail stores</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3092</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>89</fpage>
          -
          <lpage>94</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Canese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. C.</given-names>
            <surname>Cardarilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Di</given-names>
            <surname>Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fazzolari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Giardino</surname>
          </string-name>
          , M. Re,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Spanò, Multi-agent reinforcement learning: A review of challenges and applications</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>11</volume>
          (
          <year>2021</year>
          ). URL: https://www.mdpi.com/2076-3417/11/11/4948. doi:
          <volume>10</volume>
          .3390/app11114948.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Brandizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brociek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wajda</surname>
          </string-name>
          ,
          <article-title>First studies to apply the theory of mind theory to green and smart mobility by using gaussian area clustering</article-title>
          , volume
          <volume>3118</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>71</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Fallucchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gerardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Petito</surname>
          </string-name>
          , E. De Luca,
          <article-title>Blockchain framework in digital government for the certification of authenticity, timestamping and data property</article-title>
          ,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .24251/HICSS.
          <year>2021</year>
          .
          <volume>282</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Brandizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bianco</surname>
          </string-name>
          , G. Castro,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wajda</surname>
          </string-name>
          ,
          <article-title>Automatic rgb inference based on facial emotion recognition</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3092</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>66</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B.</given-names>
            <surname>Nowak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nowicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Woźniak</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Napoli, Multi-class nearest neighbour classifier for incomplete data handling</article-title>
          ,
          <source>in: Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)</source>
          , volume
          <volume>9119</volume>
          , Springer Verlag,
          <year>2015</year>
          , pp.
          <fpage>469</fpage>
          -
          <lpage>480</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -19324-3_
          <fpage>42</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Połap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Woźniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , E. Tramontana,
          <string-name>
            <given-names>R.</given-names>
            <surname>Damaševičius</surname>
          </string-name>
          ,
          <article-title>Is the colony of ants able to recognize graphic objects?</article-title>
          ,
          <source>Communications in Computer and Information Science</source>
          <volume>538</volume>
          (
          <year>2015</year>
          )
          <fpage>376</fpage>
          -
          <lpage>387</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -24770-0_
          <fpage>33</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Illari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Avanzato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>A cloudoriented architecture for the remote assessment and follow-up of hospitalized patients</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>2694</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2020</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>N.</given-names>
            <surname>Dat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ponzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vincelli</surname>
          </string-name>
          ,
          <article-title>Supporting impaired people with a following robotic assistant by means of end-to-end visual target navigation and reinforcement learning approaches</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , volume
          <volume>3118</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>51</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Angwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mattu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kirchner</surname>
          </string-name>
          ,
          <article-title>Machine bias : There's software used across the country cation using iso/iec 25012: Industrial experito predict future criminals. and it's biased against ences</article-title>
          ,
          <source>Journal of Systems and Software</source>
          <volume>176</volume>
          <fpage>blacks</fpage>
          ., https://www.propublica.org/ (
          <year>2016</year>
          ).
          <article-title>(</article-title>
          <year>2021</year>
          )
          <article-title>110938</article-title>
          . URL: https://www.sciencedirect.com/
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mattu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kirchner</surname>
          </string-name>
          , J. Angwin, science/article/pii/S0164121221000352. doi:https: Compas recidivism dataset,
          <year>2016</year>
          . URL: //doi.org/10.1016/j.jss.
          <year>2021</year>
          .
          <volume>110938</volume>
          . https://www.propublica.org/article/ [31]
          <string-name>
            <given-names>A.</given-names>
            <surname>Simonetta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Trenta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Paoletti</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Vetrò, how-we-analyzed-the-compas-recidivism-algorithm/ Metrics for identifying bias in datasets</article-title>
          ,
          <source>SYSTEM (AccessedOct</source>
          ,
          <year>2021</year>
          ).
          <article-title>(</article-title>
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Analysis of precision and</article-title>
          accuracy in a [32]
          <string-name>
            <given-names>S.</given-names>
            <surname>Barocas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <article-title>Fairness and simple model of machine learning</article-title>
          ,
          <source>Journal of the machine learning</source>
          ,
          <year>2020</year>
          . URL: https://fairmlbook. Korean Physical Society,
          <year>2017</year>
          , p.
          <fpage>866</fpage>
          -
          <lpage>870</lpage>
          . doi:10. org/(AccessedSept,
          <year>2021</year>
          ), chapter: Classification. 3938/jkps.71.866. [33]
          <string-name>
            <given-names>K.</given-names>
            <surname>Burkholder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kwock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          , C. Chen,
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Carey</surname>
          </string-name>
          ,
          <string-name>
            <surname>X. Wu,</surname>
          </string-name>
          <article-title>The statistical fairness field S. Xie, Certification and trade-of of multiple fairguide: perspectives from social and formal sci- ness criteria in graph-based spam detection</article-title>
          , Assoences,
          <source>AI and Ethics</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          . doi:
          <volume>10</volume>
          .1007/ ciation for Computing Machinery,
          <year>2021</year>
          , p.
          <fpage>130</fpage>
          -
          <lpage>139</lpage>
          .
          <fpage>s43681</fpage>
          -
          <fpage>022</fpage>
          -00183-3. doi:
          <volume>10</volume>
          .1145/3459637.3482325.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>D.</given-names>
            <surname>Pessach</surname>
          </string-name>
          , E. Shmueli, Algorithmic fairness, [34]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vetrò</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Torchiano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mecati</surname>
          </string-name>
          ,
          <string-name>
            <surname>A data</surname>
          </string-name>
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>2001</year>
          .09784. doi:10.
          <article-title>quality approach to the identification of discrim48550/ARXIV.</article-title>
          <year>2001</year>
          .
          <volume>09784</volume>
          .
          <article-title>ination risk in automated decision making sys-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Prince</surname>
          </string-name>
          , Bias and fairness in ai,
          <year>2019</year>
          . tems,
          <source>Government Information Quarterly</source>
          <volume>38</volume>
          URL: https://www.borealisai.com/en/blog/ (
          <year>2021</year>
          )
          <article-title>101619</article-title>
          . URL: https://www.sciencedirect.com/ tutorial1-bias
          <string-name>
            <surname>-</surname>
          </string-name>
          and
          <string-name>
            <surname>-</surname>
          </string-name>
          fairness-ai/(accessedMar, science/article/pii/S0740624X21000551. doi:https:
          <year>2022</year>
          ). //doi.org/10.1016/j.giq.
          <year>2021</year>
          .
          <volume>101619</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>G.</given-names>
            <surname>Capizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. Lo</given-names>
            <surname>Sciuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Woźniak</surname>
          </string-name>
          , [
          <volume>35</volume>
          ]
          <article-title>Department of Justice, Recidivism in juvenile jusG. Susi, A spiking neural network-based long-term tice</article-title>
          ,
          <year>2016</year>
          . URL: https://cejfe.gencat.cat/en/recerca
          <article-title>/ prediction system for biogas production</article-title>
          ,
          <source>Neural opendata/jjuvenil/reincidencia-justicia-menors/ Networks</source>
          <volume>129</volume>
          (
          <year>2020</year>
          )
          <fpage>271</fpage>
          -
          <lpage>279</lpage>
          . index.
          <source>html(AccessedOct</source>
          ,
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>M.</given-names>
            <surname>Miron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tolan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gómez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Castillo</surname>
          </string-name>
          , Address- [36]
          <string-name>
            <given-names>D. H.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <article-title>Uci statelog german credit, ing multiple metrics of group fairness in data-driven 1994</article-title>
          . URL: https://archive.ics.uci.edu/ml/datasets/ decision making,
          <year>2020</year>
          . URL: https://arxiv.org/abs/ statlog+
          <article-title>(german+credit+data)(AccessedOct</article-title>
          ,
          <year>2021</year>
          ).
          <year>2003</year>
          .
          <volume>04794</volume>
          . doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>2003</year>
          .
          <volume>04794</volume>
          . [37]
          <string-name>
            <surname>I</surname>
          </string-name>
          .
          <article-title>-C. Yeh, default of credit card clients data set,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>G.</given-names>
            <surname>Capizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. Lo</given-names>
            <surname>Sciuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Shikler</surname>
          </string-name>
          ,
          <year>2016</year>
          . URL: https://archive.ics.uci.edu/ml/datasets/ M. Wozniak,
          <article-title>Optimizing the organic solar cell man- default+of+credit+card+clients)(AccessedOct, ufacturing process by means of afm measurements 2021</article-title>
          .
          <article-title>and neural networks</article-title>
          ,
          <source>Energies</source>
          <volume>11</volume>
          (
          <year>2018</year>
          ). [38]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kohavi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Becker</surname>
          </string-name>
          , Adult data set,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27] International Organization for Standardization, URL: https://archive.ics.uci.edu/ml/datasets/ "ISO/IEC 25012:2008 Software engineering - adult
          <source>(AccessedOct</source>
          ,
          <year>2021</year>
          ).
          <article-title>Software product Quality Requirements and</article-title>
          [39]
          <string-name>
            <given-names>P.</given-names>
            <surname>Cortez</surname>
          </string-name>
          ,
          <source>Student performance data sett</source>
          ,
          <year>2014</year>
          . URL:
          <string-name>
            <surname>Evaluation (SQuaRE</surname>
          </string-name>
          )
          <article-title>- Data quality model"</article-title>
          , https://archive.ics.uci.edu/ml/datasets/student+ 2008. URL: https://www.iso.org/standard/35736. performance(
          <issue>AccessedOct</issue>
          ,
          <year>2021</year>
          ).
          <source>html(accessedJan</source>
          ,
          <year>2021</year>
          ). [40]
          <article-title>scikit-learn developers, Logistic regression</article-title>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28] International Organization for Standardization,
          <year>2022</year>
          . URL: https://scikit-learn.
          <source>org/stable/ "ISO/IEC 25024:2015 Systems and software engi- modules/generated/sklearn.linear_model. neering - Systems and software Quality Require- LogisticRegression</source>
          .html(
          <issue>AccessedMay</issue>
          ,
          <year>2022</year>
          ). ments and
          <string-name>
            <surname>Evaluation (SQuaRE) - Measurement</surname>
            [41]
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Simonetta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Vetrò</surname>
            ,
            <given-names>M. C.</given-names>
          </string-name>
          <string-name>
            <surname>Paoletti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Torchiof data quality"</article-title>
          ,
          <year>2015</year>
          . URL: https://www.iso.
          <article-title>org/ ano, Integrating square data quality model with iso standard/35749</article-title>
          .html(
          <issue>accessedJan</issue>
          ,
          <year>2022</year>
          ).
          <article-title>31000 risk management to measure and mitigate</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>J.</given-names>
            <surname>Calabrese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Esponda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Pesado</surname>
          </string-name>
          , Frame- software bias,
          <source>CEUR Workshop Proceedings</source>
          (
          <year>2021</year>
          )
          <article-title>work for Data Quality Evaluation Based on ISO</article-title>
          /IEC pp.
          <fpage>17</fpage>
          -
          <lpage>22</lpage>
          . 25012 and ISO/IEC 25024, in: VIII Conference on [42]
          <article-title>scikit-learn developers, Algorithmic fairness</article-title>
          ,
          <source>Cloud Computing, Big Data &amp; Emerging Topics</source>
          ,
          <year>2022</year>
          . URL: https://scikit-learn.org/stable/modules/ 2020. URL: http://sedici.unlp.edu.ar/handle/10915/ clustering.html(
          <issue>AccessedMay</issue>
          ,
          <year>2022</year>
          ).
          <fpage>104778</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>F.</given-names>
            <surname>Gualo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Verdugo</surname>
          </string-name>
          , I. Caballero,
          <string-name>
            <given-names>M.</given-names>
            <surname>Piattini</surname>
          </string-name>
          , Data quality certifi-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>