=Paper=
{{Paper
|id=Vol-3356/paper4
|storemode=property
|title=Fairness Metrics and Maximum Completeness for the Prediction of Discrimination
|pdfUrl=https://ceur-ws.org/Vol-3356/paper-04.pdf
|volume=Vol-3356
|authors=Alessandro Simonetta,Tsuyoshi Nakajima,Maria Cristina Paoletti,Alessio Venticinque
|dblpUrl=https://dblp.org/rec/conf/apsec/SimonettaNPV22
}}
==Fairness Metrics and Maximum Completeness for the Prediction of Discrimination==
Fairness Metrics and Maximum Completeness for the Prediction of Discrimination Alessandro Simonetta1 , Tsuyoshi Nakajima2 , Maria Cristina Paoletti3 and Alessio Venticinque4 1 Department of Enterprise Engineering University of Rome Tor Vergata, Rome, Italy 2 Department of Computer Science and Engineering Shibaura Institute of Technology, Tokyo, Japan 3 Rome, Italy 4 Naples, Italy Abstract Data has assumed increasing importance within the global economy, and its use is becoming more pervasive in multiple contexts. However, learning systems are exposed to various critical issues that can be addressed through ISO standards. Indeed, machine learning (ML) models may be exposed to the risk of perpetrating societal prejudice simply because the same bias exists in the data. Based on these notions, we have build a model to identify similar treatment groups based on the type of classification errors made by ML algorithms. A way to calculate fairness indices on the protected attributes of the dataset will be illustrated in the article. Finally, we will consider the degree of relationship existing between maximal completeness and fairness of forecasting algorithms through an inverse procedure of constructing a complete dataset. The use of mutual information provided an alternative method for calculating synthetic fairness indices and a useful basis for future research. Keywords fairness, machine learning, maximum completeness, treatment similarity, mutual information, entropy 1. Introduction are being used to influence their behavior without them realizing it, by providing the right ad hoc inputs. Data has become increasingly important within the While being able to have multiple data makes it possi- global economy, and its use, which often occurs through ble to perform analyses on phenomena, we must consider sophisticated learning systems, is becoming more perva- that ML algorithms, as in [5] and [6], are affected by the sive in many areas. completeness and redundancy of information to train The Economist [1] was one of the first to define data the them. The presence of bias within them can cause dis- oil of the modern age. With the rise of Artificial Intelli- crimination regarding ethnic, gender, religious, race, and gence (AI) algorithms in decision support, data quality cultural minorities, etc. An emblematic example is the has become always more important, therefore Forbs [2] one related to the Compas dataset [7] where the algorithm points to data as the fuel of ML algorithms. Consequently, used to predict inmates recidivism unfairly disfavored a new business has emerged based on their collection and people of African-American race. sale. WEB giants such as Google offer free services and In the last few years, attention to data quality and products with the target of collecting information often its use has increased, and, especially in Europe, legisla- for advertising purposes. Many social platforms are free, tors have become aware of the existence of the problem and companies earn considerable sums from selling the [8]. It is worth mentioning that also in the General Data information rather than from payment services. This has Protection Regulation (GDPR) 2016/679 [9], defined to pushed these companies to use increasingly sophisticated harmonize the data privacy laws among the European technologies [3] and algorithms to collect information countries, there are data quality notions such as accuracy, and integrate it with those from other data sources to timeliness and security. The same could be found in the maximize their insights. In addition, as presented in the European regulation Solvency II [10], which states the documentary ”The Social Dilemma” [4] information about need for insurance companies to have internal procedures users, including contacts and interactions on platforms and processes in place to ensure the appropriateness. As we mentioned in [11] we believe that a good Woodstock’21: Symposium on the irreproducible science, June 07–11, 2021, Woodstock, NY solution to ensure the correct use of data and their Envelope-Open alessandro.simonetta@gmail.com (A. Simonetta); quality according to regulation and ethics values is the tsnaka@shibaura-it.ac.jp (T. Nakajima); compliance to ISO standards: ISO/IEC 27000 [12], ISO mariacristina.paoletti@gmail.com (M. C. Paoletti) 31000 [13] e ISO/IEC 25000 [14]. The introduction of Orcid 0000-0003-2002-9815 (A. Simonetta); 0000-0002-9721-4763 maximum completeness, as dataset balance index, and (T. Nakajima); 0000-0001-6850-1184 (M. C. Paoletti); 0000-0003-3286-3137 (A. Venticinque) its relation to fairness metrics are emanations of the © 2022 Copyright for this paper by its authors. Use permitted under Creative of SQuaRE approach in measuring data quality and Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings CEUR Workshop Proceedings (CEUR-WS.org) http://ceur-ws.org ISSN 1613-0073 assessing its implications. and information presentation (quantity of data presented to the user and order of priority) issues that may affect the fairness of computing systems. Although these issues 2. The Present Situation are related to the biases within the data, characteristics of recommender systems can introduce a greater degree of Although it is difficult to estimate the cost of the absence uncertainty. These are related to the permissions of the of quality in data, a primary goal for organizations (public users who use them to access the information or the size and private) that base their business on the digitization of the data that can be processed by the algorithms. This of processes and the operation of the organization itself makes it even more difficult to find countermeasures to is to have trusted data [15]. Some experiences show avoid discrimination. how the application of the SQuaRE series is a solution Finally, in [26] the authors show a methodology for for measuring and monitoring data quality over time. In identifying critical attributes that can lead to discrimina- Italy, the first indication towards public administration tion by classification-based learning systems. managing databases of national interest was in 2013, in fact the Agency for Digital Italy (AgID) had identified in the ISO /IEC 25012 standard the data quality model to be 3. Solution Proposed adopted [16]. Since in 2013, AgID had identified within the 15 quality When using an ML-based recommendation system on a characteristics, those that should be inescapably used dataset where bias is present, the bias propagates within (accuracy, consistency, completeness, and newness) for the model itself, replicating the guesswork and prejudices databases of national interest. In the three-year plan for in the data. So, we run the risk of thinking that we applied public administration information technology 2021-2023 an objective and neutral evaluation system, while we are [17], AgID confirms increasing data and metadata quality using a biased system within an AI algorithm. as a strategic goal (OB2.2). One of the purposes of this research is to verify that In [18] are reported three case studies of data quality the system behaves in a non-discriminatory way toward evaluation and certification process about repositories. certain groups. By considering the different fairness mea- The different visions are analyzed to evaluate the impact sures in [27], it is possible to calculate their value with of the adoption of the ISO/IEC 25012, ISO/IEC 25024 respect to two groups, identified by a protected attribute, and ISO/IEC 25040 and their benefit recognized in the to see if there are any disparities in treatment. For ex- three organization before and after the process. The ample the formal criterion of Independence requires that results show that applying their methodology helps the the sensitive attribute A would be statistically indepen- organization to get a better sustainability in the long dent of the predicted value R and this could be calculated term, improve the knowledge of the business and drive ∀𝑖, 𝑗 ∶ 𝑖 ≠ 𝑗 as: the organizations in better data quality initiatives for the future. 𝑃(𝑅 = 1|𝐴 = 𝑎𝑖 ) = 𝑃(𝑅 = 1|𝐴 = 𝑎𝑗 ) (1) Among the environments in which the above ISO stan- dards can be most useful are undoubtedly those where To understand whether an attribute is a cause of dis- the information contains sensitive or safety data [19] crimination in prediction outcomes, that is, whether there such as the healthcare and legal domains. An example are homologous treatment groups, it is necessary to know is the proposed OpenEHR standard in [20]. The issues the attribute’s level of fairness. Ideally, therefore, should that touch clinical records from the perspective of data be better to have a single measure that gives an idea of quality are presented in [21]. In [22] the authors propose how likely that attribute is to lead to discrimination. a generalized model for big data: a solution based on the In [26], the authors propose a method to compute sev- application of ISO/IEC 25012 and ISO/IEC 25024. The eral synthetic indices related to the fairness of the classifi- study introduces three data quality dimensions: Contex- cation system. Two different methods are described in the tual Consistency, Operational Consistency and Temporal article: the first performs clustering with DBSCAN and Consistency. In [11] the authors show how using the Kmeans methods while the second, MaxMin, searches SQuaRE series can ensure GDPR compliance. In [23] the for the worst case by dividing the protected attribute study examines discrimination against nonwhite teach- instances into privileged and unprivileged. Both meth- ers who are present on online English language teaching ods allow grouping the elements of a protected attribute platforms. according to the type of treatment. In this way the calcu- One possible solution to the problem that bias in the lation of the synthetic index is based on a few influence data can propagate into the inferences of ML algorithms classes returning to the definition from which we started is through the dataset labeling mechanism presented in [27]. These two approaches were used to test for a link [24]. In [25] the authors present a range of fair access between the notion of maximum completeness and fair- ness indices. This would allow a priori identification of Figure 1: Metropolitan Diagram related to nationalities in Juvenile Dataset with groups intersections whether learning on a present dataset can lead to mi- the issue becomes more complicated when there are cat- nority discrimination. At this point, alternative methods egorical attributes with higher cardinality as the number are proposed to identify homogeneous treatment groups of relations increases. With reference to the Juvenile with respect to the result obtained from a classification dataset [28], considering the V3_nacionalitat attribute system. Algorithms may err toward some groups equally, representing the nationality of the students, it is possible i.e. for African-Americans and Native Americans they to draw the phenomenon through a subway diagram (Fig. may give a degree of recidivism in excess of what hap- 1). In this graph, it is easier to check intersections be- pens in reality. tween sets. For example, Group 0 and Group 1 have the element Colombia in common. The top histogram shows 3.1. Identification of homogeneous the number of elements participating in the intersection while the left histogram shows the number of elements treatment groups in the group. To start, we need to calculate the fairness indices reported The result obtained with the Pearson coefficient thresh- in [27] for the protected attributes of the dataset, consid- old of 0.9 identifies twelve homogeneous treatment ering the predictions of the classification algorithm and groups. In order to reduce their number, we kept as the actual corrected result. a representative of a set of groups the one that contained We refereed to a classic case study for this kind of them in the inclusion relation. This reduced the twelve problem: the Compas dataset [7], where we observed to four completely disjointed groups. a similar trend between groups. Table 1 shows the val- ues of the 6 fairness indices for the protected attribute Race: Independence (Ind), Separation True Positive Rate (SepTPR), Separation False Positive Rate (SepFPR), Suf- ficiency Positive Predictive Value (SufPPV), Sufficiency Negative Predictive Value (SufNPV) and Overall Accu- racy Equality (OAE). Table 2 shows the correlation matrix according to Pear- son’s coefficient and the existence of correlation between the indices measured for different ethnicities. Consid- ering a correlation value of 0.9, it is easy to detect the existence of two treatment groups (Table 3): G0 and G1. Figure 2: Scatterplot of races in Compas Dataset, mean of Although this method works well for the case study, fairness metrics Vs maximum completeness Table 1 Table of fairness measures Fairness Index Race Ind. SepTPR SepFPR SufPPV SufNPV OAE Caucasian 33,10% 50,36% 22,01% 59,48% 29,00% 67,19% Hispanic 27,70% 41,80% 19,38% 56,03% 29,89% 66,21% Other 20,41% 33,87% 12,79% 60,00% 30,04% 67,93% Asian 22,58% 62,50% 8,70% 71,43% 12,50% 83,87% African American. 57,61% 71,52% 42,34% 64,95% 35,14% 64,91% Native American. 72,73% 100% 50% 62,50% 0,00% 72,73% Table 2 Table of Pearson Cefficient Correlation Race Race African-A. Native A. Caucasian Hispanic Other Asian African-American 1 0,901 0,801 0,680 0,562 0,848 Native American 0,901 1 0,515 0,364 0,210 0,596 Caucasian 0,801 0,515 1 0,983 0,941 0,991 Hispanic 0,680 0,364 0,983 1 0,984 0,956 Other 0,562 0,210 0,941 0,984 1 0,900 Hispanic 0,848 0,596 0,991 0,956 0,900 1 Table 3 completeness index (𝐶𝑀𝐴𝑋 ). Groups Aggregation After extending the analysis to the different attributes of the datasets already present in [26] [31], 𝐶𝑀𝐴𝑋 seems Races to be a strongly characterizing parameter, more so than Groups 1 2 3 4 the other indices proposed in [32]. In fact, repeating American Native the analysis on other protected attributes, such as G0 African American V3_nacionalitat of the Juvenile dataset, within the G1 Caucasian Hispanic Other Asian scatterplot the clustering of similarly treated elements was found to be strongly related not only to the average of the fairness indices, but also to the 𝐶𝑀𝐴𝑋 as present in Fig. 3, considering the groups with intersection present 3.2. Relationship between mean of in 1. fairness indexes and 𝐶𝑀𝐴𝑋 At this point, we studied if there was a relationship be- tween the composition of the groups made using Pear- 3.3. Alternative synthetic indices son’s coefficient and the maximum completeness, as The presence of outliers in the values of fairness indices shown in the [29], [30] studies. For this purpose, we related to a protected attribute could impact the valuation used the scatterplot diagram in which each ethnicity was of these parameters. For this reason, in this paper we drawn in relation to the pair of values: mean of the fair- propose a different way of calculating fairness indices. In ness indices, in the abscissa, and maximum completeness, this research, we calculate independence, separation, suf- in the ordinate (Fig. 2). Considering the positioning of ficiency and OAE using the notion of entropy and mutual the different ethnic groups and a scale that reports the information. The idea is to find a new representation of highest value as the limit of the diagram, we observe the synthetic index that would allow more confident iden- that they tend to cluster on average around the grand tification of whether a given protected attribute could mean of the fairness attributes, most noticeably when we lead to possible discrimination. Considering the condi- look at the privileged group. Items belonging to the same tion of Independence between two groups 𝐴 = 𝑎𝑖 and group tend to remain close relative to the fairness in- 𝐴 = 𝑎𝑗 : dex. These considerations are less true for the maximum Figure 3: Scatterplot of nationalitis in Juvenile Dataset, mean of fairness metrics Vs maximum completeness sufficiency is expressed by the following equation: |𝑃(𝑅 = 1|𝐴 = 𝑎𝑖 ) − 𝑃(𝑅 = 1|𝐴 ≠ 𝑎𝑖 )| < 𝜀 (2) 𝐼 (𝑌 , 𝐴|𝑅) = 𝐻 (𝑌 , 𝑅) + 𝐻 (𝐴, 𝑅) − 𝐻 (𝑌 , 𝑅, 𝐴) − 𝐻 (𝑅) (9) This condition can be extended to all categories of the protected attribute and also expressed by orthogo- finally, the OAE is computed by: nality between the predicted value R and the group A through mutual information. Given two variables, they 𝐻 (𝑅, 𝐴|𝑌 = 𝑅) = 𝐻 (𝑅, 𝑌 = 𝑅)+ are independent if their mutual information is zero: + 𝐻 (𝐴, 𝑌 = 𝑅) − 𝐻 (𝑅 = 𝑌 , 𝐴|𝑅 = 𝑌 ) (10) 𝐼 (𝑅, 𝐴) = 0 (3) Once the mutual information is calculated for the fair- remember that mutual information is calculated by the ness metrics considered, we compared these values with equation: those obtained by applying the methods presented in [26] that refer to calculating MaxMin and clustering with 𝐼 (𝑅, 𝐴) = 𝐻 (𝑅) + 𝐻 (𝐴) − 𝐻 (𝐴, 𝑅) (4) DBSCAN. To achieve this, a normalization process ob- tained by dividing the different quantities by the max- where H(R) is the entropy associated with the R function. imum achievable value was necessary. In the present Thus, the individual terms for calculating independence are: 𝑛 𝐻 (𝑅) = ∑ 𝑃(𝑟𝑖 )𝑙𝑜𝑔(𝑃(𝑟𝑖 )) (5) 𝑖=1 H(A) is the entropy associated to A and it is calculated as: 𝑛 𝐻 (𝐴) = ∑ 𝑃(𝑎𝑖 )𝑙𝑜𝑔(𝑃(𝑎𝑖 )) (6) 𝑖=1 finally, the third term H(A,R) is computed by: 𝑛,𝑚 Figure 4: Comparision Mutual Information, DBSCAN and 𝐻 (𝑅, 𝐴) = ∑ 𝑃(𝑟𝑖 ∩ 𝑎𝑗 )𝑙𝑜𝑔(𝑃(𝑟𝑖 ∩ 𝑎𝑗 )) (7) MaxMin methods for Independence 𝑖=1,𝑗=1 The other indices can also be expressed by mutual in- study, we performed the comparison of the three method- formation and in particular referring to [33] and [26] ologies (MaxMin, DBSCAN and mutual information) ap- Separation is calculated by: plied on the Compas dataset with respect to fairness indices. Since the results show similar trends releted 𝐼 (𝑅, 𝐴|𝑌 ) = 𝐻 (𝑅, 𝑌 ) + 𝐻 (𝐴, 𝑌 ) − 𝐻 (𝑅, 𝑌 , 𝐴) − 𝐻 (𝑌 ) (8) to the fairness measures, without loss of generality, we have reported only the relationships between Indepen- identified similarities that previously remained hidden dence measure and maximum completeness. In Fig. 4 in in search of possible discrimination. red is shown the dependence curve related to MaxMin The other achievement was that we were able to asso- methodology, in blue that with DBSCAN and in black ciate fairness measures with protected attributes, inde- that with mutual information. The graph, highlighted in pendently of those of individual values, using the concept Fig. 4 shows the trend of independence versus varying of mutual information and entropy. This approach laid maximum completeness. The process of construction the foundation for new experimentation to relate the of the dataset initially select few tuples of the original response of these measures to changes in maximum com- one (𝐶𝑀𝐴𝑋 =0.324) and after insert new tuples until the pleteness. dataset reaches the overall completeness (𝐶𝑀𝐴𝑋 = 1), Finally, we compared the classical approaches [31] which corresponds to maximum independence. versus the method using mutual information and The curve related to the MaxMin method initially hires entropy. In this way, we tested the response of fairness greater values than the other two methods, while the measures against maximum completeness and found phenomenon decreases as the number of records entered confirmation against the premises of the work, namely, increases. Thus, we can conclude from the present re- that non-quality in the data leads to unfair treatments if search that there is a greater sensitivity of independence AI and ML are used in the decision-making process of measure with respect to varying maximum completeness recommender systems. if the MaxMin method is used. 3.4. Limit and Future Works References This work identified homologous treatment groups us- ing Pearson’s coefficient, which detects the correlation [1] The Economist, The world’s most valuable resource between fairness characteristics associated with different is no longer oil, but data, The Economist, USA (6th groups. May 2017). In the future, further research should be done to in- [2] B. Marr, The 5 biggest data science vestigate new similarity mechanisms based on ML and trends in 2022, Oct 2021. URL: https: Deep Learning algorithms considering other cluster- //www.forbes.com/sites/bernardmarr/2021/ ing methodologies that can avoid overlapping between 10/04/the-5-biggest-data-science-trends-in-2022/ groups. ?sh=22f5fc1d40d3. A second line of research will aim to identify discrimi- [3] R. Giuliano, The next generation network in 2030: nation caused by belonging to more than one protected Applications, services, and enabling technologies, attribute such as gender and race simultaneously. in: 2021 8th International Conference on Elec- Since we do not considered explainable AI algorithms, trical Engineering, Computer Science and Infor- future works could be extended considering framework matics (EECSI), 2021, pp. 294–298. doi:10.23919/ EECSI53397.2021.9624241 . that analyze how AI models make decisions (i.e. Watson OpenScale [34]). [4] J. Orlowski, The social dilemma, Sep. 2020. URL: https://www.netflix.com/it/title/81254224. [5] G. C. Cardarilli, L. Di Nunzio, R. Fazzolari, D. Gi- ardino, M. Re, A. Ricci, S. Spanò, An fpga-based 4. Conclusion multi-agent reinforcement learning timing synchro- nizer, Computers and Electrical Engineering 99 The use of AI and ML in the decision-making process (2022) 107749. doi:https://doi.org/10.1016/j. of many recommendation systems makes it possible to compeleceng.2022.107749 . mitigate the risk of subjective classifications. [6] G. C. Cardarilli, M. Re, L. Di Nunzio, A pseudo- While these systems are reliable forecasting tools, they softmax function for hardware-based high speed do not always allow for an explanation of why such con- image classification, Scientific Reports (2021). clusions were reached. Thus, the presence of incomplete doi:10.1038/s41598- 021- 94691- 7 . or unbalanced data, that can be measured through the [7] J. Larson, S. Mattu, L. Kirchner, J. Angwin, SQuaRE series (completeness measures), can lead to bi- Compas recidivism dataset, 2016. URL: ased results. https://www.propublica.org/article/ This work made it possible to us, to calculate simi- how-we-analyzed-the-compas-recidivism-algorithm. lar groups in terms of equivalence of treatment through [8] Council of Europe, Recommendation the application of Pearson’s coefficient to synthetic in- CM/Rec(2020)1 of the Committee of Minis- dices related to protected attributes. In such a way, we ters to member States on the human rights impacts 304–309. doi:10.1109/ICCSCE.2014.7072735 . of algorithmic systems (2020). [20] M. Sousa, D. Gonçalves-Ferreira, C. Pereira, G. Bace- [9] European Parliament and Council of Europe, Reg- lar, S. Frade, O. Pestana, R. Cruz-Correia, openehr ulation (eu) 2016/679 of the european parliament based systems and the general data protection reg- and of the council of 27 april 2016 on the protection ulation (gdpr), Studies in health technology and of natural persons with regard to the processing of informatics 247 (2018) 91–95. personal data and on the free movement of such [21] V. C. Pezoulas, K. D. Kourou, F. Kalatzis, T. P. data, and repealing directive 95/46/ec (general data Exarchos, A. Venetsanopoulou, E. Zampeli, protection regulation) (2016). S. Gandolfo, F. Skopouli, S. De Vita, A. G. Tzioufas, [10] Directive 2009/138/EC of the European Parliament D. I. Fotiadis, Medical data quality assessment: and of the Council of 25 November 2009 on the On the development of an automated frame- taking-up and pursuit of the business of Insurance work for medical data curation, Computers in and Reinsurance (Solvency II), 2009. Biology and Medicine 107 (2019) 270–283. URL: [11] A. Simonetta, M. C. Paoletti, A. Venticinque, Using https://www.sciencedirect.com/science/article/ the SQuaRE series as a guarantee for GDPR com- pii/S0010482519300733. doi:https://doi.org/10. pliance, Ceur-WS 3115 (2021). URL: http://ceur-ws. 1016/j.compbiomed.2019.03.001 . org/Vol-3114/paper-05.pdf. [22] I. Caballero, M. Serrano, M. Piattini, A data quality [12] International organization for standardization, in use model for big data, in: M. Indulska, S. Purao ”iso/iec 27000:2018, “information technology, (Eds.), Advances in Conceptual Modeling, Springer security techniques, information security man- International Publishing, Cham, 2014, pp. 65–74. agement systems,overview and vocabulary”, [23] N. M. Curran, Discrimination in the gig econ- 2018. URL: https://www.iso.org/standard/73906. omy: the experiences of black online english teach- html(accessedNov,2021). ers, Language and Education 0 (2021) 1–15. doi:10. [13] International organization for standard- 1080/09500782.2021.1981928 . ization, ”iso 31000:2018(en) risk manage- [24] E. Beretta, A. Vetro, B. Lepri, J. C. De Martin, Ethical ment — guidelines”, 2018. URL: https: and Socially-Aware Data Labels: 5th International //www.iso.org/iso-31000-risk-management. Conference, SIMBig 2018, Lima, Peru, September html(accessedNov,2021). 3–5, 2018, Proceedings, 2019, pp. 320–327. doi:10. [14] International organization for standardization, 1007/978- 3- 030- 11680- 4_30 . ”iso/iec 25000:2014, “systems and software engineer- [25] M. D. Ekstrand, A. Das, R. Burke, F. Diaz, Fairness ing — systems and software quality requirements in information access systems, Foundations and and evaluation (square) — guide to square”, 2014. Trends® in Information Retrieval 16 (2022) 1–177. [15] F. Fallucchi, M. Gerardi, M. Petito, E. De Luca, doi:10.1561/1500000079 . Blockchain framework in digital government for [26] A. Simonetta, M. C. Paoletti, A. Venticinque, The the certification of authenticity, timestamping and use of maximum completeness to estimate bias in data property, 2021. doi:10.24251/HICSS.2021. ai based recommendation systems, SYSTEM 2022 282 . (In Press). [16] ???? URL: https://www.agid.gov.it/sites/default/ [27] S. Barocas, M. Hardt, A. Narayanan, Fairness and files/repository_files/circolari/dt_cs_n.68_-_ machine learning, 2020. URL: https://fairmlbook. 2013dig_-regole_tecniche_basi_dati_critiche_art_ org/, chapter: Classification. 2bis_dl_179\-2012_sito.pdf. [28] Department of Justice, Recidivism in juvenile jus- [17] Three-year plan for information technology tice, 2016. URL: https://cejfe.gencat.cat/en/recerca/ (piano triennale per l’informatica), 2022. opendata/jjuvenil/reincidencia-justicia-menors/ URL: https://www.agid.gov.it/it/agenzia/ index.html. piano-triennale(Access06-22). [29] A. Simonetta, M. C. Paoletti, Designing digital [18] F. Gualo, M. Rodriguez, J. Verdugo, I. Caballero, circuits in multi-valued logic, International M. Piattini, Data quality certification using iso/iec Journal on Advanced Science, Engineer- 25012: Industrial experiences, Journal of Systems ing and Information Technology 8 (2018) and Software 176 (2021) 110938. doi:https://doi. 1166–1172. URL: http://ijaseit.insightsociety. org/10.1016/j.jss.2021.110938 . org/index.php?option=com_content&view= [19] A. A. Jaber, R. Bicker, The optimum selection of article&id=9&Itemid=1&article_id=5966. wavelet transform parameters for the purpose of doi:10.18517/ijaseit.8.4.5966 . fault detection in an industrial robot, in: 2014 IEEE [30] A. Simonetta, A. Vetrò, M. C. Paoletti, M. Torchi- International Conference on Control System, Com- ano, Integrating square data quality model with iso puting and Engineering (ICCSCE 2014), 2014, pp. 31000 risk management to measure and mitigate software bias, CEUR Workshop Proceedings (2021) pp. 17–22. [31] A. Vetrò, M. Torchiano, M. Mecati, A data quality ap- proach to the identification of discrimination risk in automated decision making systems, Government Information Quarterly 38 (2021) 101619. doi:https: //doi.org/10.1016/j.giq.2021.101619 . [32] A. Simonetta, M. C. Paoletti, M. Muratore, A new approach for designing of computer architectures using multi-value logic, International Journal on Advanced Science, Engineering and Information Technology 11 (2021) 1440–1446. doi:10.18517/ ijaseit.11.4.15778 . [33] D. Steinberg, A. Reid, S. O’Callaghan, F. Lattimore, L. McCalman, T. S. Caetano, Fast fair regression via efficient approximations of mutual information, CoRR abs/2002.06200 (2020). URL: https://arxiv.org/ abs/2002.06200. [34] IBM, Watson openscale, 2022. URL: https: //www.ibm.com/it-it/cloud/watson-openscale/ drift(Access10-22).