=Paper= {{Paper |id=Vol-3356/paper4 |storemode=property |title=Fairness Metrics and Maximum Completeness for the Prediction of Discrimination |pdfUrl=https://ceur-ws.org/Vol-3356/paper-04.pdf |volume=Vol-3356 |authors=Alessandro Simonetta,Tsuyoshi Nakajima,Maria Cristina Paoletti,Alessio Venticinque |dblpUrl=https://dblp.org/rec/conf/apsec/SimonettaNPV22 }} ==Fairness Metrics and Maximum Completeness for the Prediction of Discrimination== https://ceur-ws.org/Vol-3356/paper-04.pdf
Fairness Metrics and Maximum Completeness for the
Prediction of Discrimination
Alessandro Simonetta1 , Tsuyoshi Nakajima2 , Maria Cristina Paoletti3 and Alessio Venticinque4
1
  Department of Enterprise Engineering University of Rome Tor Vergata, Rome, Italy
2
  Department of Computer Science and Engineering Shibaura Institute of Technology, Tokyo, Japan
3
  Rome, Italy
4
  Naples, Italy


                                       Abstract
                                       Data has assumed increasing importance within the global economy, and its use is becoming more pervasive in multiple
                                       contexts. However, learning systems are exposed to various critical issues that can be addressed through ISO standards.
                                       Indeed, machine learning (ML) models may be exposed to the risk of perpetrating societal prejudice simply because the same
                                       bias exists in the data. Based on these notions, we have build a model to identify similar treatment groups based on the type
                                       of classification errors made by ML algorithms. A way to calculate fairness indices on the protected attributes of the dataset
                                       will be illustrated in the article. Finally, we will consider the degree of relationship existing between maximal completeness
                                       and fairness of forecasting algorithms through an inverse procedure of constructing a complete dataset. The use of mutual
                                       information provided an alternative method for calculating synthetic fairness indices and a useful basis for future research.

                                       Keywords
                                       fairness, machine learning, maximum completeness, treatment similarity, mutual information, entropy



1. Introduction                                                                                  are being used to influence their behavior without them
                                                                                                 realizing it, by providing the right ad hoc inputs.
Data has become increasingly important within the                                                   While being able to have multiple data makes it possi-
global economy, and its use, which often occurs through ble to perform analyses on phenomena, we must consider
sophisticated learning systems, is becoming more perva- that ML algorithms, as in [5] and [6], are affected by the
sive in many areas.                                                                              completeness and redundancy of information to train
The Economist [1] was one of the first to define data the them. The presence of bias within them can cause dis-
oil of the modern age. With the rise of Artificial Intelli- crimination regarding ethnic, gender, religious, race, and
gence (AI) algorithms in decision support, data quality cultural minorities, etc. An emblematic example is the
has become always more important, therefore Forbs [2] one related to the Compas dataset [7] where the algorithm
points to data as the fuel of ML algorithms. Consequently, used to predict inmates recidivism unfairly disfavored
a new business has emerged based on their collection and people of African-American race.
sale. WEB giants such as Google offer free services and                                             In the last few years, attention to data quality and
products with the target of collecting information often its use has increased, and, especially in Europe, legisla-
for advertising purposes. Many social platforms are free, tors have become aware of the existence of the problem
and companies earn considerable sums from selling the [8]. It is worth mentioning that also in the General Data
information rather than from payment services. This has Protection Regulation (GDPR) 2016/679 [9], defined to
pushed these companies to use increasingly sophisticated harmonize the data privacy laws among the European
technologies [3] and algorithms to collect information countries, there are data quality notions such as accuracy,
and integrate it with those from other data sources to timeliness and security. The same could be found in the
maximize their insights. In addition, as presented in the European regulation Solvency II [10], which states the
documentary ”The Social Dilemma” [4] information about need for insurance companies to have internal procedures
users, including contacts and interactions on platforms and processes in place to ensure the appropriateness.
                                                                                                    As we mentioned in [11] we believe that a good
Woodstock’21: Symposium on the irreproducible science, June 07–11,
2021, Woodstock, NY
                                                                                                 solution   to ensure the correct use of data and their
Envelope-Open alessandro.simonetta@gmail.com (A. Simonetta);                                     quality according to regulation and ethics values is the
tsnaka@shibaura-it.ac.jp (T. Nakajima);                                                          compliance to ISO standards: ISO/IEC 27000 [12], ISO
mariacristina.paoletti@gmail.com (M. C. Paoletti)                                                31000 [13] e ISO/IEC 25000 [14]. The introduction of
Orcid 0000-0003-2002-9815 (A. Simonetta); 0000-0002-9721-4763                                    maximum completeness, as dataset balance index, and
(T. Nakajima); 0000-0001-6850-1184 (M. C. Paoletti);
0000-0003-3286-3137 (A. Venticinque)
                                                                                                 its relation to fairness metrics are emanations of the
                    © 2022 Copyright for this paper by its authors. Use permitted under Creative of SQuaRE approach in measuring data quality and
                    Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings     CEUR Workshop Proceedings (CEUR-WS.org)
                  http://ceur-ws.org
                  ISSN 1613-0073                                                                 assessing its implications.
                                                               and information presentation (quantity of data presented
                                                               to the user and order of priority) issues that may affect
                                                               the fairness of computing systems. Although these issues
2. The Present Situation                                       are related to the biases within the data, characteristics of
                                                               recommender systems can introduce a greater degree of
Although it is difficult to estimate the cost of the absence
                                                               uncertainty. These are related to the permissions of the
of quality in data, a primary goal for organizations (public
                                                               users who use them to access the information or the size
and private) that base their business on the digitization
                                                               of the data that can be processed by the algorithms. This
of processes and the operation of the organization itself
                                                               makes it even more difficult to find countermeasures to
is to have trusted data [15]. Some experiences show
                                                               avoid discrimination.
how the application of the SQuaRE series is a solution
                                                                  Finally, in [26] the authors show a methodology for
for measuring and monitoring data quality over time. In
                                                               identifying critical attributes that can lead to discrimina-
Italy, the first indication towards public administration
                                                               tion by classification-based learning systems.
managing databases of national interest was in 2013, in
fact the Agency for Digital Italy (AgID) had identified in
the ISO /IEC 25012 standard the data quality model to be       3. Solution Proposed
adopted [16].
   Since in 2013, AgID had identified within the 15 quality    When using an ML-based recommendation system on a
characteristics, those that should be inescapably used         dataset where bias is present, the bias propagates within
(accuracy, consistency, completeness, and newness) for         the model itself, replicating the guesswork and prejudices
databases of national interest. In the three-year plan for     in the data. So, we run the risk of thinking that we applied
public administration information technology 2021-2023         an objective and neutral evaluation system, while we are
[17], AgID confirms increasing data and metadata quality       using a biased system within an AI algorithm.
as a strategic goal (OB2.2).                                      One of the purposes of this research is to verify that
   In [18] are reported three case studies of data quality     the system behaves in a non-discriminatory way toward
evaluation and certification process about repositories.       certain groups. By considering the different fairness mea-
The different visions are analyzed to evaluate the impact      sures in [27], it is possible to calculate their value with
of the adoption of the ISO/IEC 25012, ISO/IEC 25024            respect to two groups, identified by a protected attribute,
and ISO/IEC 25040 and their benefit recognized in the          to see if there are any disparities in treatment. For ex-
three organization before and after the process. The           ample the formal criterion of Independence requires that
results show that applying their methodology helps the         the sensitive attribute A would be statistically indepen-
organization to get a better sustainability in the long        dent of the predicted value R and this could be calculated
term, improve the knowledge of the business and drive          ∀𝑖, 𝑗 ∶ 𝑖 ≠ 𝑗 as:
the organizations in better data quality initiatives for the
future.                                                                    𝑃(𝑅 = 1|𝐴 = 𝑎𝑖 ) = 𝑃(𝑅 = 1|𝐴 = 𝑎𝑗 )            (1)
   Among the environments in which the above ISO stan-
dards can be most useful are undoubtedly those where              To understand whether an attribute is a cause of dis-
the information contains sensitive or safety data [19]         crimination in prediction outcomes, that is, whether there
such as the healthcare and legal domains. An example           are homologous treatment groups, it is necessary to know
is the proposed OpenEHR standard in [20]. The issues           the attribute’s level of fairness. Ideally, therefore, should
that touch clinical records from the perspective of data       be better to have a single measure that gives an idea of
quality are presented in [21]. In [22] the authors propose     how likely that attribute is to lead to discrimination.
a generalized model for big data: a solution based on the         In [26], the authors propose a method to compute sev-
application of ISO/IEC 25012 and ISO/IEC 25024. The            eral synthetic indices related to the fairness of the classifi-
study introduces three data quality dimensions: Contex-        cation system. Two different methods are described in the
tual Consistency, Operational Consistency and Temporal         article: the first performs clustering with DBSCAN and
Consistency. In [11] the authors show how using the            Kmeans methods while the second, MaxMin, searches
SQuaRE series can ensure GDPR compliance. In [23] the          for the worst case by dividing the protected attribute
study examines discrimination against nonwhite teach-          instances into privileged and unprivileged. Both meth-
ers who are present on online English language teaching        ods allow grouping the elements of a protected attribute
platforms.                                                     according to the type of treatment. In this way the calcu-
   One possible solution to the problem that bias in the       lation of the synthetic index is based on a few influence
data can propagate into the inferences of ML algorithms        classes returning to the definition from which we started
is through the dataset labeling mechanism presented in         [27]. These two approaches were used to test for a link
[24]. In [25] the authors present a range of fair access       between the notion of maximum completeness and fair-
                                                               ness indices. This would allow a priori identification of
Figure 1: Metropolitan Diagram related to nationalities in Juvenile Dataset with groups intersections



whether learning on a present dataset can lead to mi-        the issue becomes more complicated when there are cat-
nority discrimination. At this point, alternative methods    egorical attributes with higher cardinality as the number
are proposed to identify homogeneous treatment groups        of relations increases. With reference to the Juvenile
with respect to the result obtained from a classification    dataset [28], considering the V3_nacionalitat attribute
system. Algorithms may err toward some groups equally,       representing the nationality of the students, it is possible
i.e. for African-Americans and Native Americans they         to draw the phenomenon through a subway diagram (Fig.
may give a degree of recidivism in excess of what hap-       1). In this graph, it is easier to check intersections be-
pens in reality.                                             tween sets. For example, Group 0 and Group 1 have the
                                                             element Colombia in common. The top histogram shows
3.1. Identification of homogeneous                           the number of elements participating in the intersection
                                                             while the left histogram shows the number of elements
      treatment groups                                       in the group.
To start, we need to calculate the fairness indices reported    The result obtained with the Pearson coefficient thresh-
in [27] for the protected attributes of the dataset, consid- old of 0.9 identifies twelve homogeneous treatment
ering the predictions of the classification algorithm and groups. In order to reduce their number, we kept as
the actual corrected result.                                 a representative of a set of groups the one that contained
   We refereed to a classic case study for this kind of them in the inclusion relation. This reduced the twelve
problem: the Compas dataset [7], where we observed to four completely disjointed groups.
a similar trend between groups. Table 1 shows the val-
ues of the 6 fairness indices for the protected attribute
Race: Independence (Ind), Separation True Positive Rate
(SepTPR), Separation False Positive Rate (SepFPR), Suf-
ficiency Positive Predictive Value (SufPPV), Sufficiency
Negative Predictive Value (SufNPV) and Overall Accu-
racy Equality (OAE).
   Table 2 shows the correlation matrix according to Pear-
son’s coefficient and the existence of correlation between
the indices measured for different ethnicities. Consid-
ering a correlation value of 0.9, it is easy to detect the
existence of two treatment groups (Table 3): G0 and G1. Figure 2: Scatterplot of races in Compas Dataset, mean of
   Although this method works well for the case study, fairness metrics Vs maximum completeness
Table 1
Table of fairness measures

                                                                  Fairness Index
                         Race              Ind.     SepTPR      SepFPR    SufPPV     SufNPV        OAE
                       Caucasian          33,10%    50,36%      22,01%     59,48%     29,00%      67,19%
                        Hispanic          27,70%    41,80%      19,38%     56,03%     29,89%      66,21%
                         Other            20,41%    33,87%      12,79%     60,00%     30,04%      67,93%
                         Asian            22,58%    62,50%       8,70%     71,43%     12,50%      83,87%
                   African American.      57,61%    71,52%      42,34%     64,95%     35,14%      64,91%
                   Native American.       72,73%     100%         50%      62,50%      0,00%      72,73%



Table 2
Table of Pearson Cefficient Correlation

                                                                      Race
                       Race            African-A.   Native A.    Caucasian    Hispanic    Other      Asian
                 African-American           1        0,901         0,801       0,680      0,562      0,848
                 Native American         0,901          1          0,515       0,364      0,210      0,596
                     Caucasian           0,801       0,515           1         0,983      0,941      0,991
                      Hispanic           0,680       0,364         0,983         1        0,984      0,956
                       Other             0,562       0,210         0,941       0,984        1        0,900
                      Hispanic           0,848       0,596         0,991       0,956      0,900        1




Table 3                                                         completeness index (𝐶𝑀𝐴𝑋 ).
Groups Aggregation                                                 After extending the analysis to the different attributes
                                                                of the datasets already present in [26] [31], 𝐶𝑀𝐴𝑋 seems
                                Races                           to be a strongly characterizing parameter, more so than
  Groups          1             2           3        4          the other indices proposed in [32]. In fact, repeating
              American        Native                            the analysis on other protected attributes, such as
     G0
               African       American                           V3_nacionalitat of the Juvenile dataset, within the
     G1       Caucasian      Hispanic     Other     Asian       scatterplot the clustering of similarly treated elements
                                                                was found to be strongly related not only to the average
                                                                of the fairness indices, but also to the 𝐶𝑀𝐴𝑋 as present in
                                                                Fig. 3, considering the groups with intersection present
3.2. Relationship between mean of                               in 1.
     fairness indexes and 𝐶𝑀𝐴𝑋
At this point, we studied if there was a relationship be-
tween the composition of the groups made using Pear-            3.3. Alternative synthetic indices
son’s coefficient and the maximum completeness, as
                                                                The presence of outliers in the values of fairness indices
shown in the [29], [30] studies. For this purpose, we
                                                                related to a protected attribute could impact the valuation
used the scatterplot diagram in which each ethnicity was
                                                                of these parameters. For this reason, in this paper we
drawn in relation to the pair of values: mean of the fair-
                                                                propose a different way of calculating fairness indices. In
ness indices, in the abscissa, and maximum completeness,
                                                                this research, we calculate independence, separation, suf-
in the ordinate (Fig. 2). Considering the positioning of
                                                                ficiency and OAE using the notion of entropy and mutual
the different ethnic groups and a scale that reports the
                                                                information. The idea is to find a new representation of
highest value as the limit of the diagram, we observe
                                                                the synthetic index that would allow more confident iden-
that they tend to cluster on average around the grand
                                                                tification of whether a given protected attribute could
mean of the fairness attributes, most noticeably when we
                                                                lead to possible discrimination. Considering the condi-
look at the privileged group. Items belonging to the same
                                                                tion of Independence between two groups 𝐴 = 𝑎𝑖 and
group tend to remain close relative to the fairness in-
                                                                𝐴 = 𝑎𝑗 :
dex. These considerations are less true for the maximum
Figure 3: Scatterplot of nationalitis in Juvenile Dataset, mean of fairness metrics Vs maximum completeness



                                                              sufficiency is expressed by the following equation:
         |𝑃(𝑅 = 1|𝐴 = 𝑎𝑖 ) − 𝑃(𝑅 = 1|𝐴 ≠ 𝑎𝑖 )| < 𝜀      (2)
                                                         𝐼 (𝑌 , 𝐴|𝑅) = 𝐻 (𝑌 , 𝑅) + 𝐻 (𝐴, 𝑅) − 𝐻 (𝑌 , 𝑅, 𝐴) − 𝐻 (𝑅) (9)
  This condition can be extended to all categories of
the protected attribute and also expressed by orthogo- finally, the OAE is computed by:
nality between the predicted value R and the group A
through mutual information. Given two variables, they      𝐻 (𝑅, 𝐴|𝑌 = 𝑅) = 𝐻 (𝑅, 𝑌 = 𝑅)+
are independent if their mutual information is zero:
                                                                       + 𝐻 (𝐴, 𝑌 = 𝑅) − 𝐻 (𝑅 = 𝑌 , 𝐴|𝑅 = 𝑌 ) (10)
                       𝐼 (𝑅, 𝐴) = 0                     (3)
                                                             Once the mutual information is calculated for the fair-
remember that mutual information is calculated by the     ness metrics considered, we compared these values with
equation:                                                 those obtained by applying the methods presented in
                                                          [26] that refer to calculating MaxMin and clustering with
           𝐼 (𝑅, 𝐴) = 𝐻 (𝑅) + 𝐻 (𝐴) − 𝐻 (𝐴, 𝑅)        (4) DBSCAN. To achieve this, a normalization process ob-
                                                          tained by dividing the different quantities by the max-
where H(R) is the entropy associated with the R function. imum achievable value was necessary. In the present
Thus, the individual terms for calculating independence
are:                     𝑛
                𝐻 (𝑅) = ∑ 𝑃(𝑟𝑖 )𝑙𝑜𝑔(𝑃(𝑟𝑖 ))             (5)
                         𝑖=1
H(A) is the entropy associated to A and it is calculated
as:                     𝑛
               𝐻 (𝐴) = ∑ 𝑃(𝑎𝑖 )𝑙𝑜𝑔(𝑃(𝑎𝑖 ))              (6)
                        𝑖=1
finally, the third term H(A,R) is computed by:
                      𝑛,𝑚                                     Figure 4: Comparision Mutual Information, DBSCAN and
         𝐻 (𝑅, 𝐴) = ∑ 𝑃(𝑟𝑖 ∩ 𝑎𝑗 )𝑙𝑜𝑔(𝑃(𝑟𝑖 ∩ 𝑎𝑗 ))       (7)   MaxMin methods for Independence
                    𝑖=1,𝑗=1

The other indices can also be expressed by mutual in- study, we performed the comparison of the three method-
formation and in particular referring to [33] and [26] ologies (MaxMin, DBSCAN and mutual information) ap-
Separation is calculated by:                                     plied on the Compas dataset with respect to fairness
                                                                 indices. Since the results show similar trends releted
 𝐼 (𝑅, 𝐴|𝑌 ) = 𝐻 (𝑅, 𝑌 ) + 𝐻 (𝐴, 𝑌 ) − 𝐻 (𝑅, 𝑌 , 𝐴) − 𝐻 (𝑌 ) (8) to the fairness measures, without loss of generality, we
have reported only the relationships between Indepen-         identified similarities that previously remained hidden
dence measure and maximum completeness. In Fig. 4 in          in search of possible discrimination.
red is shown the dependence curve related to MaxMin              The other achievement was that we were able to asso-
methodology, in blue that with DBSCAN and in black            ciate fairness measures with protected attributes, inde-
that with mutual information. The graph, highlighted in       pendently of those of individual values, using the concept
Fig. 4 shows the trend of independence versus varying         of mutual information and entropy. This approach laid
maximum completeness. The process of construction             the foundation for new experimentation to relate the
of the dataset initially select few tuples of the original    response of these measures to changes in maximum com-
one (𝐶𝑀𝐴𝑋 =0.324) and after insert new tuples until the       pleteness.
dataset reaches the overall completeness (𝐶𝑀𝐴𝑋 = 1),             Finally, we compared the classical approaches [31]
which corresponds to maximum independence.                    versus the method using mutual information and
    The curve related to the MaxMin method initially hires    entropy. In this way, we tested the response of fairness
greater values than the other two methods, while the          measures against maximum completeness and found
phenomenon decreases as the number of records entered         confirmation against the premises of the work, namely,
increases. Thus, we can conclude from the present re-         that non-quality in the data leads to unfair treatments if
search that there is a greater sensitivity of independence    AI and ML are used in the decision-making process of
measure with respect to varying maximum completeness          recommender systems.
if the MaxMin method is used.

3.4. Limit and Future Works                                   References
This work identified homologous treatment groups us-
ing Pearson’s coefficient, which detects the correlation       [1] The Economist, The world’s most valuable resource
between fairness characteristics associated with different         is no longer oil, but data, The Economist, USA (6th
groups.                                                            May 2017).
   In the future, further research should be done to in-       [2] B. Marr, The 5 biggest data science
vestigate new similarity mechanisms based on ML and                trends in 2022, Oct 2021. URL: https:
Deep Learning algorithms considering other cluster-                //www.forbes.com/sites/bernardmarr/2021/
ing methodologies that can avoid overlapping between               10/04/the-5-biggest-data-science-trends-in-2022/
groups.                                                            ?sh=22f5fc1d40d3.
   A second line of research will aim to identify discrimi-    [3] R. Giuliano, The next generation network in 2030:
nation caused by belonging to more than one protected              Applications, services, and enabling technologies,
attribute such as gender and race simultaneously.                  in: 2021 8th International Conference on Elec-
   Since we do not considered explainable AI algorithms,           trical Engineering, Computer Science and Infor-
future works could be extended considering framework               matics (EECSI), 2021, pp. 294–298. doi:10.23919/
                                                                   EECSI53397.2021.9624241 .
that analyze how AI models make decisions (i.e. Watson
OpenScale [34]).                                               [4] J. Orlowski, The social dilemma, Sep. 2020. URL:
                                                                   https://www.netflix.com/it/title/81254224.
                                                               [5] G. C. Cardarilli, L. Di Nunzio, R. Fazzolari, D. Gi-
                                                                   ardino, M. Re, A. Ricci, S. Spanò, An fpga-based
4. Conclusion                                                      multi-agent reinforcement learning timing synchro-
                                                                   nizer, Computers and Electrical Engineering 99
The use of AI and ML in the decision-making process                (2022) 107749. doi:https://doi.org/10.1016/j.
of many recommendation systems makes it possible to                compeleceng.2022.107749 .
mitigate the risk of subjective classifications.               [6] G. C. Cardarilli, M. Re, L. Di Nunzio, A pseudo-
While these systems are reliable forecasting tools, they           softmax function for hardware-based high speed
do not always allow for an explanation of why such con-            image classification, Scientific Reports (2021).
clusions were reached. Thus, the presence of incomplete            doi:10.1038/s41598- 021- 94691- 7 .
or unbalanced data, that can be measured through the           [7] J. Larson, S. Mattu, L. Kirchner, J. Angwin,
SQuaRE series (completeness measures), can lead to bi-             Compas recidivism dataset,            2016. URL:
ased results.                                                      https://www.propublica.org/article/
   This work made it possible to us, to calculate simi-            how-we-analyzed-the-compas-recidivism-algorithm.
lar groups in terms of equivalence of treatment through        [8] Council of Europe,                Recommendation
the application of Pearson’s coefficient to synthetic in-          CM/Rec(2020)1 of the Committee of Minis-
dices related to protected attributes. In such a way, we
     ters to member States on the human rights impacts            304–309. doi:10.1109/ICCSCE.2014.7072735 .
     of algorithmic systems (2020).                          [20] M. Sousa, D. Gonçalves-Ferreira, C. Pereira, G. Bace-
 [9] European Parliament and Council of Europe, Reg-              lar, S. Frade, O. Pestana, R. Cruz-Correia, openehr
     ulation (eu) 2016/679 of the european parliament             based systems and the general data protection reg-
     and of the council of 27 april 2016 on the protection        ulation (gdpr), Studies in health technology and
     of natural persons with regard to the processing of          informatics 247 (2018) 91–95.
     personal data and on the free movement of such          [21] V. C. Pezoulas, K. D. Kourou, F. Kalatzis, T. P.
     data, and repealing directive 95/46/ec (general data         Exarchos, A. Venetsanopoulou, E. Zampeli,
     protection regulation) (2016).                               S. Gandolfo, F. Skopouli, S. De Vita, A. G. Tzioufas,
[10] Directive 2009/138/EC of the European Parliament             D. I. Fotiadis, Medical data quality assessment:
     and of the Council of 25 November 2009 on the                On the development of an automated frame-
     taking-up and pursuit of the business of Insurance           work for medical data curation, Computers in
     and Reinsurance (Solvency II), 2009.                         Biology and Medicine 107 (2019) 270–283. URL:
[11] A. Simonetta, M. C. Paoletti, A. Venticinque, Using          https://www.sciencedirect.com/science/article/
     the SQuaRE series as a guarantee for GDPR com-               pii/S0010482519300733. doi:https://doi.org/10.
     pliance, Ceur-WS 3115 (2021). URL: http://ceur-ws.           1016/j.compbiomed.2019.03.001 .
     org/Vol-3114/paper-05.pdf.                              [22] I. Caballero, M. Serrano, M. Piattini, A data quality
[12] International organization for standardization,              in use model for big data, in: M. Indulska, S. Purao
     ”iso/iec 27000:2018, “information technology,                (Eds.), Advances in Conceptual Modeling, Springer
     security techniques, information security man-               International Publishing, Cham, 2014, pp. 65–74.
     agement systems,overview and vocabulary”,               [23] N. M. Curran, Discrimination in the gig econ-
     2018. URL: https://www.iso.org/standard/73906.               omy: the experiences of black online english teach-
     html(accessedNov,2021).                                      ers, Language and Education 0 (2021) 1–15. doi:10.
[13] International      organization      for    standard-        1080/09500782.2021.1981928 .
     ization, ”iso 31000:2018(en) risk manage-               [24] E. Beretta, A. Vetro, B. Lepri, J. C. De Martin, Ethical
     ment — guidelines”, 2018. URL: https:                        and Socially-Aware Data Labels: 5th International
     //www.iso.org/iso-31000-risk-management.                     Conference, SIMBig 2018, Lima, Peru, September
     html(accessedNov,2021).                                      3–5, 2018, Proceedings, 2019, pp. 320–327. doi:10.
[14] International organization for standardization,              1007/978- 3- 030- 11680- 4_30 .
     ”iso/iec 25000:2014, “systems and software engineer-    [25] M. D. Ekstrand, A. Das, R. Burke, F. Diaz, Fairness
     ing — systems and software quality requirements              in information access systems, Foundations and
     and evaluation (square) — guide to square”, 2014.            Trends® in Information Retrieval 16 (2022) 1–177.
[15] F. Fallucchi, M. Gerardi, M. Petito, E. De Luca,             doi:10.1561/1500000079 .
     Blockchain framework in digital government for          [26] A. Simonetta, M. C. Paoletti, A. Venticinque, The
     the certification of authenticity, timestamping and          use of maximum completeness to estimate bias in
     data property, 2021. doi:10.24251/HICSS.2021.                ai based recommendation systems, SYSTEM 2022
     282 .                                                        (In Press).
[16] ???? URL: https://www.agid.gov.it/sites/default/        [27] S. Barocas, M. Hardt, A. Narayanan, Fairness and
     files/repository_files/circolari/dt_cs_n.68_-_               machine learning, 2020. URL: https://fairmlbook.
     2013dig_-regole_tecniche_basi_dati_critiche_art_             org/, chapter: Classification.
     2bis_dl_179\-2012_sito.pdf.                             [28] Department of Justice, Recidivism in juvenile jus-
[17] Three-year plan for information technology                   tice, 2016. URL: https://cejfe.gencat.cat/en/recerca/
     (piano triennale per l’informatica), 2022.                   opendata/jjuvenil/reincidencia-justicia-menors/
     URL:            https://www.agid.gov.it/it/agenzia/          index.html.
     piano-triennale(Access06-22).                           [29] A. Simonetta, M. C. Paoletti, Designing digital
[18] F. Gualo, M. Rodriguez, J. Verdugo, I. Caballero,            circuits in multi-valued logic,           International
     M. Piattini, Data quality certification using iso/iec        Journal on Advanced Science, Engineer-
     25012: Industrial experiences, Journal of Systems            ing and Information Technology 8 (2018)
     and Software 176 (2021) 110938. doi:https://doi.             1166–1172. URL: http://ijaseit.insightsociety.
     org/10.1016/j.jss.2021.110938 .                              org/index.php?option=com_content&view=
[19] A. A. Jaber, R. Bicker, The optimum selection of             article&id=9&Itemid=1&article_id=5966.
     wavelet transform parameters for the purpose of              doi:10.18517/ijaseit.8.4.5966 .
     fault detection in an industrial robot, in: 2014 IEEE   [30] A. Simonetta, A. Vetrò, M. C. Paoletti, M. Torchi-
     International Conference on Control System, Com-             ano, Integrating square data quality model with iso
     puting and Engineering (ICCSCE 2014), 2014, pp.              31000 risk management to measure and mitigate
     software bias, CEUR Workshop Proceedings (2021)
     pp. 17–22.
[31] A. Vetrò, M. Torchiano, M. Mecati, A data quality ap-
     proach to the identification of discrimination risk in
     automated decision making systems, Government
     Information Quarterly 38 (2021) 101619. doi:https:
     //doi.org/10.1016/j.giq.2021.101619 .
[32] A. Simonetta, M. C. Paoletti, M. Muratore, A new
     approach for designing of computer architectures
     using multi-value logic, International Journal on
     Advanced Science, Engineering and Information
     Technology 11 (2021) 1440–1446. doi:10.18517/
     ijaseit.11.4.15778 .
[33] D. Steinberg, A. Reid, S. O’Callaghan, F. Lattimore,
     L. McCalman, T. S. Caetano, Fast fair regression
     via efficient approximations of mutual information,
     CoRR abs/2002.06200 (2020). URL: https://arxiv.org/
     abs/2002.06200.
[34] IBM, Watson openscale, 2022. URL: https:
     //www.ibm.com/it-it/cloud/watson-openscale/
     drift(Access10-22).