=Paper= {{Paper |id=Vol-3356/paper4 |storemode=property |title=Fairness Metrics and Maximum Completeness for the Prediction of Discrimination |pdfUrl=https://ceur-ws.org/Vol-3356/paper-04.pdf |volume=Vol-3356 |authors=Alessandro Simonetta,Tsuyoshi Nakajima,Maria Cristina Paoletti,Alessio Venticinque |dblpUrl=https://dblp.org/rec/conf/apsec/SimonettaNPV22 }} ==Fairness Metrics and Maximum Completeness for the Prediction of Discrimination== https://ceur-ws.org/Vol-3356/paper-04.pdf

Fairness Metrics and Maximum Completeness for the
Prediction of Discrimination
Alessandro Simonetta1 , Tsuyoshi Nakajima2 , Maria Cristina Paoletti3 and Alessio Venticinque4
1
Department of Enterprise Engineering University of Rome Tor Vergata, Rome, Italy
2
Department of Computer Science and Engineering Shibaura Institute of Technology, Tokyo, Japan
3
Rome, Italy
4
Naples, Italy

Abstract
Data has assumed increasing importance within the global economy, and its use is becoming more pervasive in multiple
contexts. However, learning systems are exposed to various critical issues that can be addressed through ISO standards.
Indeed, machine learning (ML) models may be exposed to the risk of perpetrating societal prejudice simply because the same
bias exists in the data. Based on these notions, we have build a model to identify similar treatment groups based on the type
of classification errors made by ML algorithms. A way to calculate fairness indices on the protected attributes of the dataset
will be illustrated in the article. Finally, we will consider the degree of relationship existing between maximal completeness
and fairness of forecasting algorithms through an inverse procedure of constructing a complete dataset. The use of mutual
information provided an alternative method for calculating synthetic fairness indices and a useful basis for future research.

Keywords
fairness, machine learning, maximum completeness, treatment similarity, mutual information, entropy

1. Introduction are being used to influence their behavior without them
realizing it, by providing the right ad hoc inputs.
Data has become increasingly important within the While being able to have multiple data makes it possi-
global economy, and its use, which often occurs through ble to perform analyses on phenomena, we must consider
sophisticated learning systems, is becoming more perva- that ML algorithms, as in [5] and [6], are affected by the
sive in many areas. completeness and redundancy of information to train
The Economist [1] was one of the first to define data the them. The presence of bias within them can cause dis-
oil of the modern age. With the rise of Artificial Intelli- crimination regarding ethnic, gender, religious, race, and
gence (AI) algorithms in decision support, data quality cultural minorities, etc. An emblematic example is the
has become always more important, therefore Forbs [2] one related to the Compas dataset [7] where the algorithm
points to data as the fuel of ML algorithms. Consequently, used to predict inmates recidivism unfairly disfavored
a new business has emerged based on their collection and people of African-American race.
sale. WEB giants such as Google offer free services and In the last few years, attention to data quality and
products with the target of collecting information often its use has increased, and, especially in Europe, legisla-
for advertising purposes. Many social platforms are free, tors have become aware of the existence of the problem
and companies earn considerable sums from selling the [8]. It is worth mentioning that also in the General Data
information rather than from payment services. This has Protection Regulation (GDPR) 2016/679 [9], defined to
pushed these companies to use increasingly sophisticated harmonize the data privacy laws among the European
technologies [3] and algorithms to collect information countries, there are data quality notions such as accuracy,
and integrate it with those from other data sources to timeliness and security. The same could be found in the
maximize their insights. In addition, as presented in the European regulation Solvency II [10], which states the
documentary ”The Social Dilemma” [4] information about need for insurance companies to have internal procedures
users, including contacts and interactions on platforms and processes in place to ensure the appropriateness.
As we mentioned in [11] we believe that a good
Woodstock’21: Symposium on the irreproducible science, June 07–11,
2021, Woodstock, NY
solution to ensure the correct use of data and their
Envelope-Open alessandro.simonetta@gmail.com (A. Simonetta); quality according to regulation and ethics values is the
tsnaka@shibaura-it.ac.jp (T. Nakajima); compliance to ISO standards: ISO/IEC 27000 [12], ISO
mariacristina.paoletti@gmail.com (M. C. Paoletti) 31000 [13] e ISO/IEC 25000 [14]. The introduction of
Orcid 0000-0003-2002-9815 (A. Simonetta); 0000-0002-9721-4763 maximum completeness, as dataset balance index, and
(T. Nakajima); 0000-0001-6850-1184 (M. C. Paoletti);
0000-0003-3286-3137 (A. Venticinque)
its relation to fairness metrics are emanations of the
© 2022 Copyright for this paper by its authors. Use permitted under Creative of SQuaRE approach in measuring data quality and
Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings CEUR Workshop Proceedings (CEUR-WS.org)
http://ceur-ws.org
ISSN 1613-0073 assessing its implications.
and information presentation (quantity of data presented
to the user and order of priority) issues that may affect
the fairness of computing systems. Although these issues
2. The Present Situation are related to the biases within the data, characteristics of
recommender systems can introduce a greater degree of
Although it is difficult to estimate the cost of the absence
uncertainty. These are related to the permissions of the
of quality in data, a primary goal for organizations (public
users who use them to access the information or the size
and private) that base their business on the digitization
of the data that can be processed by the algorithms. This
of processes and the operation of the organization itself
makes it even more difficult to find countermeasures to
is to have trusted data [15]. Some experiences show
avoid discrimination.
how the application of the SQuaRE series is a solution
Finally, in [26] the authors show a methodology for
for measuring and monitoring data quality over time. In
identifying critical attributes that can lead to discrimina-
Italy, the first indication towards public administration
tion by classification-based learning systems.
managing databases of national interest was in 2013, in
fact the Agency for Digital Italy (AgID) had identified in
the ISO /IEC 25012 standard the data quality model to be 3. Solution Proposed
adopted [16].
Since in 2013, AgID had identified within the 15 quality When using an ML-based recommendation system on a
characteristics, those that should be inescapably used dataset where bias is present, the bias propagates within
(accuracy, consistency, completeness, and newness) for the model itself, replicating the guesswork and prejudices
databases of national interest. In the three-year plan for in the data. So, we run the risk of thinking that we applied
public administration information technology 2021-2023 an objective and neutral evaluation system, while we are
[17], AgID confirms increasing data and metadata quality using a biased system within an AI algorithm.
as a strategic goal (OB2.2). One of the purposes of this research is to verify that
In [18] are reported three case studies of data quality the system behaves in a non-discriminatory way toward
evaluation and certification process about repositories. certain groups. By considering the different fairness mea-
The different visions are analyzed to evaluate the impact sures in [27], it is possible to calculate their value with
of the adoption of the ISO/IEC 25012, ISO/IEC 25024 respect to two groups, identified by a protected attribute,
and ISO/IEC 25040 and their benefit recognized in the to see if there are any disparities in treatment. For ex-
three organization before and after the process. The ample the formal criterion of Independence requires that
results show that applying their methodology helps the the sensitive attribute A would be statistically indepen-
organization to get a better sustainability in the long dent of the predicted value R and this could be calculated
term, improve the knowledge of the business and drive ∀𝑖, 𝑗 ∶ 𝑖 ≠ 𝑗 as:
the organizations in better data quality initiatives for the
future. 𝑃(𝑅 = 1|𝐴 = 𝑎𝑖 ) = 𝑃(𝑅 = 1|𝐴 = 𝑎𝑗 ) (1)
Among the environments in which the above ISO stan-
dards can be most useful are undoubtedly those where To understand whether an attribute is a cause of dis-
the information contains sensitive or safety data [19] crimination in prediction outcomes, that is, whether there
such as the healthcare and legal domains. An example are homologous treatment groups, it is necessary to know
is the proposed OpenEHR standard in [20]. The issues the attribute’s level of fairness. Ideally, therefore, should
that touch clinical records from the perspective of data be better to have a single measure that gives an idea of
quality are presented in [21]. In [22] the authors propose how likely that attribute is to lead to discrimination.
a generalized model for big data: a solution based on the In [26], the authors propose a method to compute sev-
application of ISO/IEC 25012 and ISO/IEC 25024. The eral synthetic indices related to the fairness of the classifi-
study introduces three data quality dimensions: Contex- cation system. Two different methods are described in the
tual Consistency, Operational Consistency and Temporal article: the first performs clustering with DBSCAN and
Consistency. In [11] the authors show how using the Kmeans methods while the second, MaxMin, searches
SQuaRE series can ensure GDPR compliance. In [23] the for the worst case by dividing the protected attribute
study examines discrimination against nonwhite teach- instances into privileged and unprivileged. Both meth-
ers who are present on online English language teaching ods allow grouping the elements of a protected attribute
platforms. according to the type of treatment. In this way the calcu-
One possible solution to the problem that bias in the lation of the synthetic index is based on a few influence
data can propagate into the inferences of ML algorithms classes returning to the definition from which we started
is through the dataset labeling mechanism presented in [27]. These two approaches were used to test for a link
[24]. In [25] the authors present a range of fair access between the notion of maximum completeness and fair-
ness indices. This would allow a priori identification of
Figure 1: Metropolitan Diagram related to nationalities in Juvenile Dataset with groups intersections

whether learning on a present dataset can lead to mi- the issue becomes more complicated when there are cat-
nority discrimination. At this point, alternative methods egorical attributes with higher cardinality as the number
are proposed to identify homogeneous treatment groups of relations increases. With reference to the Juvenile
with respect to the result obtained from a classification dataset [28], considering the V3_nacionalitat attribute
system. Algorithms may err toward some groups equally, representing the nationality of the students, it is possible
i.e. for African-Americans and Native Americans they to draw the phenomenon through a subway diagram (Fig.
may give a degree of recidivism in excess of what hap- 1). In this graph, it is easier to check intersections be-
pens in reality. tween sets. For example, Group 0 and Group 1 have the
element Colombia in common. The top histogram shows
3.1. Identification of homogeneous the number of elements participating in the intersection
while the left histogram shows the number of elements
treatment groups in the group.
To start, we need to calculate the fairness indices reported The result obtained with the Pearson coefficient thresh-
in [27] for the protected attributes of the dataset, consid- old of 0.9 identifies twelve homogeneous treatment
ering the predictions of the classification algorithm and groups. In order to reduce their number, we kept as
the actual corrected result. a representative of a set of groups the one that contained
We refereed to a classic case study for this kind of them in the inclusion relation. This reduced the twelve
problem: the Compas dataset [7], where we observed to four completely disjointed groups.
a similar trend between groups. Table 1 shows the val-
ues of the 6 fairness indices for the protected attribute
Race: Independence (Ind), Separation True Positive Rate
(SepTPR), Separation False Positive Rate (SepFPR), Suf-
ficiency Positive Predictive Value (SufPPV), Sufficiency
Negative Predictive Value (SufNPV) and Overall Accu-
racy Equality (OAE).
Table 2 shows the correlation matrix according to Pear-
son’s coefficient and the existence of correlation between
the indices measured for different ethnicities. Consid-
ering a correlation value of 0.9, it is easy to detect the
existence of two treatment groups (Table 3): G0 and G1. Figure 2: Scatterplot of races in Compas Dataset, mean of
Although this method works well for the case study, fairness metrics Vs maximum completeness
Table 1
Table of fairness measures

Fairness Index
Race Ind. SepTPR SepFPR SufPPV SufNPV OAE
Caucasian 33,10% 50,36% 22,01% 59,48% 29,00% 67,19%
Hispanic 27,70% 41,80% 19,38% 56,03% 29,89% 66,21%
Other 20,41% 33,87% 12,79% 60,00% 30,04% 67,93%
Asian 22,58% 62,50% 8,70% 71,43% 12,50% 83,87%
African American. 57,61% 71,52% 42,34% 64,95% 35,14% 64,91%
Native American. 72,73% 100% 50% 62,50% 0,00% 72,73%

Table 2
Table of Pearson Cefficient Correlation

Race
Race African-A. Native A. Caucasian Hispanic Other Asian
African-American 1 0,901 0,801 0,680 0,562 0,848
Native American 0,901 1 0,515 0,364 0,210 0,596
Caucasian 0,801 0,515 1 0,983 0,941 0,991
Hispanic 0,680 0,364 0,983 1 0,984 0,956
Other 0,562 0,210 0,941 0,984 1 0,900
Hispanic 0,848 0,596 0,991 0,956 0,900 1

Table 3 completeness index (𝐶𝑀𝐴𝑋 ).
Groups Aggregation After extending the analysis to the different attributes
of the datasets already present in [26] [31], 𝐶𝑀𝐴𝑋 seems
Races to be a strongly characterizing parameter, more so than
Groups 1 2 3 4 the other indices proposed in [32]. In fact, repeating
American Native the analysis on other protected attributes, such as
G0
African American V3_nacionalitat of the Juvenile dataset, within the
G1 Caucasian Hispanic Other Asian scatterplot the clustering of similarly treated elements
was found to be strongly related not only to the average
of the fairness indices, but also to the 𝐶𝑀𝐴𝑋 as present in
Fig. 3, considering the groups with intersection present
3.2. Relationship between mean of in 1.
fairness indexes and 𝐶𝑀𝐴𝑋
At this point, we studied if there was a relationship be-
tween the composition of the groups made using Pear- 3.3. Alternative synthetic indices
son’s coefficient and the maximum completeness, as
The presence of outliers in the values of fairness indices
shown in the [29], [30] studies. For this purpose, we
related to a protected attribute could impact the valuation
used the scatterplot diagram in which each ethnicity was
of these parameters. For this reason, in this paper we
drawn in relation to the pair of values: mean of the fair-
propose a different way of calculating fairness indices. In
ness indices, in the abscissa, and maximum completeness,
this research, we calculate independence, separation, suf-
in the ordinate (Fig. 2). Considering the positioning of
ficiency and OAE using the notion of entropy and mutual
the different ethnic groups and a scale that reports the
information. The idea is to find a new representation of
highest value as the limit of the diagram, we observe
the synthetic index that would allow more confident iden-
that they tend to cluster on average around the grand
tification of whether a given protected attribute could
mean of the fairness attributes, most noticeably when we
lead to possible discrimination. Considering the condi-
look at the privileged group. Items belonging to the same
tion of Independence between two groups 𝐴 = 𝑎𝑖 and
group tend to remain close relative to the fairness in-
𝐴 = 𝑎𝑗 :
dex. These considerations are less true for the maximum
Figure 3: Scatterplot of nationalitis in Juvenile Dataset, mean of fairness metrics Vs maximum completeness

sufficiency is expressed by the following equation:
|𝑃(𝑅 = 1|𝐴 = 𝑎𝑖 ) − 𝑃(𝑅 = 1|𝐴 ≠ 𝑎𝑖 )| < 𝜀 (2)
𝐼 (𝑌 , 𝐴|𝑅) = 𝐻 (𝑌 , 𝑅) + 𝐻 (𝐴, 𝑅) − 𝐻 (𝑌 , 𝑅, 𝐴) − 𝐻 (𝑅) (9)
This condition can be extended to all categories of
the protected attribute and also expressed by orthogo- finally, the OAE is computed by:
nality between the predicted value R and the group A
through mutual information. Given two variables, they 𝐻 (𝑅, 𝐴|𝑌 = 𝑅) = 𝐻 (𝑅, 𝑌 = 𝑅)+
are independent if their mutual information is zero:
+ 𝐻 (𝐴, 𝑌 = 𝑅) − 𝐻 (𝑅 = 𝑌 , 𝐴|𝑅 = 𝑌 ) (10)
𝐼 (𝑅, 𝐴) = 0 (3)
Once the mutual information is calculated for the fair-
remember that mutual information is calculated by the ness metrics considered, we compared these values with
equation: those obtained by applying the methods presented in
[26] that refer to calculating MaxMin and clustering with
𝐼 (𝑅, 𝐴) = 𝐻 (𝑅) + 𝐻 (𝐴) − 𝐻 (𝐴, 𝑅) (4) DBSCAN. To achieve this, a normalization process ob-
tained by dividing the different quantities by the max-
where H(R) is the entropy associated with the R function. imum achievable value was necessary. In the present
Thus, the individual terms for calculating independence
are: 𝑛
𝐻 (𝑅) = ∑ 𝑃(𝑟𝑖 )𝑙𝑜𝑔(𝑃(𝑟𝑖 )) (5)
𝑖=1
H(A) is the entropy associated to A and it is calculated
as: 𝑛
𝐻 (𝐴) = ∑ 𝑃(𝑎𝑖 )𝑙𝑜𝑔(𝑃(𝑎𝑖 )) (6)
𝑖=1
finally, the third term H(A,R) is computed by:
𝑛,𝑚 Figure 4: Comparision Mutual Information, DBSCAN and
𝐻 (𝑅, 𝐴) = ∑ 𝑃(𝑟𝑖 ∩ 𝑎𝑗 )𝑙𝑜𝑔(𝑃(𝑟𝑖 ∩ 𝑎𝑗 )) (7) MaxMin methods for Independence
𝑖=1,𝑗=1

The other indices can also be expressed by mutual in- study, we performed the comparison of the three method-
formation and in particular referring to [33] and [26] ologies (MaxMin, DBSCAN and mutual information) ap-
Separation is calculated by: plied on the Compas dataset with respect to fairness
indices. Since the results show similar trends releted
𝐼 (𝑅, 𝐴|𝑌 ) = 𝐻 (𝑅, 𝑌 ) + 𝐻 (𝐴, 𝑌 ) − 𝐻 (𝑅, 𝑌 , 𝐴) − 𝐻 (𝑌 ) (8) to the fairness measures, without loss of generality, we
have reported only the relationships between Indepen- identified similarities that previously remained hidden
dence measure and maximum completeness. In Fig. 4 in in search of possible discrimination.
red is shown the dependence curve related to MaxMin The other achievement was that we were able to asso-
methodology, in blue that with DBSCAN and in black ciate fairness measures with protected attributes, inde-
that with mutual information. The graph, highlighted in pendently of those of individual values, using the concept
Fig. 4 shows the trend of independence versus varying of mutual information and entropy. This approach laid
maximum completeness. The process of construction the foundation for new experimentation to relate the
of the dataset initially select few tuples of the original response of these measures to changes in maximum com-
one (𝐶𝑀𝐴𝑋 =0.324) and after insert new tuples until the pleteness.
dataset reaches the overall completeness (𝐶𝑀𝐴𝑋 = 1), Finally, we compared the classical approaches [31]
which corresponds to maximum independence. versus the method using mutual information and
The curve related to the MaxMin method initially hires entropy. In this way, we tested the response of fairness
greater values than the other two methods, while the measures against maximum completeness and found
phenomenon decreases as the number of records entered confirmation against the premises of the work, namely,
increases. Thus, we can conclude from the present re- that non-quality in the data leads to unfair treatments if
search that there is a greater sensitivity of independence AI and ML are used in the decision-making process of
measure with respect to varying maximum completeness recommender systems.
if the MaxMin method is used.

3.4. Limit and Future Works References
This work identified homologous treatment groups us-
ing Pearson’s coefficient, which detects the correlation [1] The Economist, The world’s most valuable resource
between fairness characteristics associated with different is no longer oil, but data, The Economist, USA (6th
groups. May 2017).
In the future, further research should be done to in- [2] B. Marr, The 5 biggest data science
vestigate new similarity mechanisms based on ML and trends in 2022, Oct 2021. URL: https:
Deep Learning algorithms considering other cluster- //www.forbes.com/sites/bernardmarr/2021/
ing methodologies that can avoid overlapping between 10/04/the-5-biggest-data-science-trends-in-2022/
groups. ?sh=22f5fc1d40d3.
A second line of research will aim to identify discrimi- [3] R. Giuliano, The next generation network in 2030:
nation caused by belonging to more than one protected Applications, services, and enabling technologies,
attribute such as gender and race simultaneously. in: 2021 8th International Conference on Elec-
Since we do not considered explainable AI algorithms, trical Engineering, Computer Science and Infor-
future works could be extended considering framework matics (EECSI), 2021, pp. 294–298. doi:10.23919/
EECSI53397.2021.9624241 .
that analyze how AI models make decisions (i.e. Watson
OpenScale [34]). [4] J. Orlowski, The social dilemma, Sep. 2020. URL:
https://www.netflix.com/it/title/81254224.
[5] G. C. Cardarilli, L. Di Nunzio, R. Fazzolari, D. Gi-
ardino, M. Re, A. Ricci, S. Spanò, An fpga-based
4. Conclusion multi-agent reinforcement learning timing synchro-
nizer, Computers and Electrical Engineering 99
The use of AI and ML in the decision-making process (2022) 107749. doi:https://doi.org/10.1016/j.
of many recommendation systems makes it possible to compeleceng.2022.107749 .
mitigate the risk of subjective classifications. [6] G. C. Cardarilli, M. Re, L. Di Nunzio, A pseudo-
While these systems are reliable forecasting tools, they softmax function for hardware-based high speed
do not always allow for an explanation of why such con- image classification, Scientific Reports (2021).
clusions were reached. Thus, the presence of incomplete doi:10.1038/s41598- 021- 94691- 7 .
or unbalanced data, that can be measured through the [7] J. Larson, S. Mattu, L. Kirchner, J. Angwin,
SQuaRE series (completeness measures), can lead to bi- Compas recidivism dataset, 2016. URL:
ased results. https://www.propublica.org/article/
This work made it possible to us, to calculate simi- how-we-analyzed-the-compas-recidivism-algorithm.
lar groups in terms of equivalence of treatment through [8] Council of Europe, Recommendation
the application of Pearson’s coefficient to synthetic in- CM/Rec(2020)1 of the Committee of Minis-
dices related to protected attributes. In such a way, we
ters to member States on the human rights impacts 304–309. doi:10.1109/ICCSCE.2014.7072735 .
of algorithmic systems (2020). [20] M. Sousa, D. Gonçalves-Ferreira, C. Pereira, G. Bace-
[9] European Parliament and Council of Europe, Reg- lar, S. Frade, O. Pestana, R. Cruz-Correia, openehr
ulation (eu) 2016/679 of the european parliament based systems and the general data protection reg-
and of the council of 27 april 2016 on the protection ulation (gdpr), Studies in health technology and
of natural persons with regard to the processing of informatics 247 (2018) 91–95.
personal data and on the free movement of such [21] V. C. Pezoulas, K. D. Kourou, F. Kalatzis, T. P.
data, and repealing directive 95/46/ec (general data Exarchos, A. Venetsanopoulou, E. Zampeli,
protection regulation) (2016). S. Gandolfo, F. Skopouli, S. De Vita, A. G. Tzioufas,
[10] Directive 2009/138/EC of the European Parliament D. I. Fotiadis, Medical data quality assessment:
and of the Council of 25 November 2009 on the On the development of an automated frame-
taking-up and pursuit of the business of Insurance work for medical data curation, Computers in
and Reinsurance (Solvency II), 2009. Biology and Medicine 107 (2019) 270–283. URL:
[11] A. Simonetta, M. C. Paoletti, A. Venticinque, Using https://www.sciencedirect.com/science/article/
the SQuaRE series as a guarantee for GDPR com- pii/S0010482519300733. doi:https://doi.org/10.
pliance, Ceur-WS 3115 (2021). URL: http://ceur-ws. 1016/j.compbiomed.2019.03.001 .
org/Vol-3114/paper-05.pdf. [22] I. Caballero, M. Serrano, M. Piattini, A data quality
[12] International organization for standardization, in use model for big data, in: M. Indulska, S. Purao
”iso/iec 27000:2018, “information technology, (Eds.), Advances in Conceptual Modeling, Springer
security techniques, information security man- International Publishing, Cham, 2014, pp. 65–74.
agement systems,overview and vocabulary”, [23] N. M. Curran, Discrimination in the gig econ-
2018. URL: https://www.iso.org/standard/73906. omy: the experiences of black online english teach-
html(accessedNov,2021). ers, Language and Education 0 (2021) 1–15. doi:10.
[13] International organization for standard- 1080/09500782.2021.1981928 .
ization, ”iso 31000:2018(en) risk manage- [24] E. Beretta, A. Vetro, B. Lepri, J. C. De Martin, Ethical
ment — guidelines”, 2018. URL: https: and Socially-Aware Data Labels: 5th International
//www.iso.org/iso-31000-risk-management. Conference, SIMBig 2018, Lima, Peru, September
html(accessedNov,2021). 3–5, 2018, Proceedings, 2019, pp. 320–327. doi:10.
[14] International organization for standardization, 1007/978- 3- 030- 11680- 4_30 .
”iso/iec 25000:2014, “systems and software engineer- [25] M. D. Ekstrand, A. Das, R. Burke, F. Diaz, Fairness
ing — systems and software quality requirements in information access systems, Foundations and
and evaluation (square) — guide to square”, 2014. Trends® in Information Retrieval 16 (2022) 1–177.
[15] F. Fallucchi, M. Gerardi, M. Petito, E. De Luca, doi:10.1561/1500000079 .
Blockchain framework in digital government for [26] A. Simonetta, M. C. Paoletti, A. Venticinque, The
the certification of authenticity, timestamping and use of maximum completeness to estimate bias in
data property, 2021. doi:10.24251/HICSS.2021. ai based recommendation systems, SYSTEM 2022
282 . (In Press).
[16] ???? URL: https://www.agid.gov.it/sites/default/ [27] S. Barocas, M. Hardt, A. Narayanan, Fairness and
files/repository_files/circolari/dt_cs_n.68_-_ machine learning, 2020. URL: https://fairmlbook.
2013dig_-regole_tecniche_basi_dati_critiche_art_ org/, chapter: Classification.
2bis_dl_179\-2012_sito.pdf. [28] Department of Justice, Recidivism in juvenile jus-
[17] Three-year plan for information technology tice, 2016. URL: https://cejfe.gencat.cat/en/recerca/
(piano triennale per l’informatica), 2022. opendata/jjuvenil/reincidencia-justicia-menors/
URL: https://www.agid.gov.it/it/agenzia/ index.html.
piano-triennale(Access06-22). [29] A. Simonetta, M. C. Paoletti, Designing digital
[18] F. Gualo, M. Rodriguez, J. Verdugo, I. Caballero, circuits in multi-valued logic, International
M. Piattini, Data quality certification using iso/iec Journal on Advanced Science, Engineer-
25012: Industrial experiences, Journal of Systems ing and Information Technology 8 (2018)
and Software 176 (2021) 110938. doi:https://doi. 1166–1172. URL: http://ijaseit.insightsociety.
org/10.1016/j.jss.2021.110938 . org/index.php?option=com_content&view=
[19] A. A. Jaber, R. Bicker, The optimum selection of article&id=9&Itemid=1&article_id=5966.
wavelet transform parameters for the purpose of doi:10.18517/ijaseit.8.4.5966 .
fault detection in an industrial robot, in: 2014 IEEE [30] A. Simonetta, A. Vetrò, M. C. Paoletti, M. Torchi-
International Conference on Control System, Com- ano, Integrating square data quality model with iso
puting and Engineering (ICCSCE 2014), 2014, pp. 31000 risk management to measure and mitigate
software bias, CEUR Workshop Proceedings (2021)
pp. 17–22.
[31] A. Vetrò, M. Torchiano, M. Mecati, A data quality ap-
proach to the identification of discrimination risk in
automated decision making systems, Government
Information Quarterly 38 (2021) 101619. doi:https:
//doi.org/10.1016/j.giq.2021.101619 .
[32] A. Simonetta, M. C. Paoletti, M. Muratore, A new
approach for designing of computer architectures
using multi-value logic, International Journal on
Advanced Science, Engineering and Information
Technology 11 (2021) 1440–1446. doi:10.18517/
ijaseit.11.4.15778 .
[33] D. Steinberg, A. Reid, S. O’Callaghan, F. Lattimore,
L. McCalman, T. S. Caetano, Fast fair regression
via efficient approximations of mutual information,
CoRR abs/2002.06200 (2020). URL: https://arxiv.org/
abs/2002.06200.
[34] IBM, Watson openscale, 2022. URL: https:
//www.ibm.com/it-it/cloud/watson-openscale/
drift(Access10-22).