The use of Maximum Completeness to Estimate Bias in AI-based Recommendation Systems Alessandro Simonetta1,* , Maria Cristina Paoletti1 and Alessio Venticinque2 1 Department of Enterprise Engineering, University of Rome Tor Vergata, Rome, Italy 2 Department of Electrical and Information Engineering, University of Naples Federico II, Napoli, Italy Abstract The use of AI based recommendation systems, based on data analysis using Machine Learning algorithms, is taking away people’s full control over decision making. The presence of unbalanced and incomplete data can cause discrimination to reli- gious, ethnic, and political minorities without this phenomenon being easily detectable. In this context, it becomes critically important to understand what are the potential risks associated with learning with such a dataset and what consequences it may have on the outcome of decision making using Machine Learning algorithms. In this paper, we tried to identify how to measure the group fairness of a prediction of a classification algorithm, to identify the quality features of the dataset that influence the learning process, and finally, to evaluate the relationships between the quality features and the fairness measures. Keywords Fairness, Clustering, Machine Learning, Completeness, ISO/IEC 25012, Maximum Completeness, Metrics, Bias, Classification 1. Introduction quality of the information available (balance, complete- ness, ...). The correctness and authenticity of the data In 2019, the Economist [1] stated that data is an impor- alone [12, 13, 14, 15, 16, 17] are not enough to guarantee tant resource comparable to oil. Moreover, Forbes [2] their quality. Thus, the presence of poor quality in the defines data as the fuel of the information age, and Ma- data, or a low level of representativeness, can lead to chine Learning (ML) as the engine that uses it. These biased learning. technologies in addition to the evolution of networks A very striking example of discrimination is the result [3, 4] is providing opportunities to develop new applica- of the algorithm used in Florida on predicting the risk of tions. re-offense, brought to light by the nonprofit organization Many companies and organizations are investing, increas- ProPublica [18]. In fact, the algorithm, which assigned ingly, in decision-making processes centered on AI based each person a score indicating the likelihood of reof- recommendation systems to offer a variety of services fending, was trained using an unbalanced dataset, and ranging from marketing [5, 6] to fault diagnosis [7]. Tools as result, black people showed greater recidivism than that make use of these types of algorithms relieve people other ethnicities [19]. from making decisions that may be influenced by moods, In this paper, we will present a methodology that, start- biases and subjective thoughts, they ensure fairness and ing from training data, allows us to estimate the risk of repeatability at different times too. The reasons why ML getting an unfair treatment in the prediction. algorithms arrived at a certain type of result may not be To do this, it is necessary to identify how to measure the transparent or easily understood by users. For this reason, fairness of a prediction of a classification algorithm, to techniques have been introduced, such as Explenable AI, discover the quality characteristics of the dataset that which allow analysts to understand how a given choice influence the goodness of learning, and, last but not was arrived at, or Reinforcement Learning, which allows least, to study the relationships that exist between quality the decision-making process to be distributed across dif- characteristics and the fairness measures of the ML algo- ferent levels in a counterbalance way [8, 9, 10, 11]. rithm. These decision systems are based on a data-driven ap- The study of the fairness of ML algorithms is widely de- proach and their results are strongly influenced by the bated topic in science, in [20], [21] many performance and fairness indices are studied, such as False Positive and SYSYEM 2022: 8th Scholar’s Yearly Symposium of Technology, Engi- neering and Mathematics, Brunek, July 23, 2022 Equalized Odds. These metrics could be used to enhance * Corresponding author. All the authors contributed equally. different performance characteristics, that could be in " alessandro.simonetta@gmail.com (A. Simonetta); contrast each others and each of them fit best a particular mariacristina.paoletti@gmail.com (M. C. Paoletti) class of objectives [22],[23]. For example, there are in-  0000-0003-2002-9815 (A. Simonetta); 0000-0001-6850-1184 dices that prefer accuracy and others that prefer precision (M. C. Paoletti); 0000-0003-3286-3137 (A. Venticinque) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License [20], the right trade-off must be found between the two, Attribution 4.0 International (CC BY 4.0). CEUR CEUR Workshop Proceedings (CEUR-WS.org) Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 depending on the problem to be solved. Indeed, in the 76 Alessandro Simonetta et al. CEUR Workshop Proceedings 76–84 cancer detection we prefer recall rather than precision: be identified through a combination of them too. One better to plot a healthy patient as probably ill than not work including metrics for assessing fairness is [22] to screen one who actually is. where Disparate Impact and Demographic Parity (Statis- Other studies [22, 24, 25, 26] are aimed at identifying the tical Parity) are introduced. The first one use the ratio relationship between sensitive attributes and the target between 𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– ) and 𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘— ), one. These show dependencies between the number of instead the second one is the difference between the incorrect predictions (e.g., ratio of predicted positive to two probabilities. Demography Parity is present in [32], real positive) and the features of the dataset. For example, too, and it is referred to as Independence, as it indicates if it get wrong advantageously with respect to sensitive the degree of independence of the target variable R attributes, that is, by attributing more positive outcomes compared to the sensitive attribute. Another measure of to them, individuals belonging to this set are considered a fairness reported in [22] is the Equalized Odds which is privileged group. Conversely, if the algorithm get wrong satisfied if the prediction is conditionally independent to negatively, associating more unfavorable outcomes than the sensitive attribute, given the true value; it highlight should normally be indicated, that group is considered the difference between true positive rate and false an unprivileged. positive rate. In our work we call the latter index as Regarding data quality aspects, we identified the inter- Separation. The Equal Opportuinity [22], requires the national standards ISO/IEC 25012 [27] and and ISO/IEC true positive rates to be similar across groups. Other 25024 [28] as the models from which to draw the notion metrics for estimating faireness are the Sufficiency [32], of completeness. This choice was also supported by the similar to Equal Opportunity, but focuses on true values presence of studies [29],[30] that use these standards for rather than predicted values, and the Overall Accuracy dataset construction and maintenance of its quality over Equality [21], which tests the average error between time. In particular, we identified the notion of maximum predictions across groups. completeness [31] as satisfying the goal we had set for Determine what fairness metrics are best for finding ourselves. what is the right configuration of an algorithm to use This paper will start from the state of the art, Section II, in a decision support system depends on the purpose comparing the advantages and disadvantages of different for which the it should be built and what discrimination approaches to fairness and show alternative synthesis risks it may be exposed to. Studies [23], [33] point solutions that are useful in identifying critical issues in out that it is not possible to maximize all metrics the input data. In section III, we will present our solution simultaneously and therefore one must choose among developed from what has been proposed in the literature the features that these measures tend to enhance, such and in particular we will decline it into two different ver- as accuracy and recall. In [25] the authors present a sions defining its pros and cons. In section IV, we will framework for comparing indices and highlighting when point out to identify the limitations of the present work the maximization of one conflicts with that of another. and what are the possible future developments. Finally, Their work goes beyond analyzing individual metrics, in section V, we will present concluding remarks. and groups them according to their characteristics (fairness of treatment, fairness of opportunity, interest groups, sensitive attributes, etc.) and their usefulness 2. State of Art with respect the target that the system has (i.e. support for film discovery on a streaming platform). These In general, a classification model is called fair if the are clustered using a hierarchical algorithm applied to mistakes are equally distributed across the different correlation between metrics to identify similar ones. groups, identified within the sensitive attribute. The The results are then diagrammed in a simplified manner input features X of the values space are mapped onto through the use of Principal Component Analysis (PCA) a target variable R according to a function f(X). Where to reduce the state space in two dimensions. In particular, the values of R are represented by classes or a score, i.e., the authors conclude that through the use of PCA it is in a range of a scale of N different values, they can be possible to explain the relationships among different mapped back to a binary value by defining a threshold. metrics by reducing the state space in a range from one In order to train these classifiers, example data are to three component. used in which the input features X, are associated to The idea of using balancing indices to predict the risk truth variable Y with the real result. From the goodness of discrimination can be found in [34]. In this work for of these examples derives the quality of the resulting the first time they use a measure of fairness applied to classifier. the sensitive attribute and not to the comparison among From now on, we will refer to A as the sensitive attribute, groups. In the next section we will start from this topic which can identify a minority. Although, these variables and then propose two different solutions for calculating are treated individually, an underprivileged group could a synthetic index related to the sensitive attribute. 77 Alessandro Simonetta et al. CEUR Workshop Proceedings 76–84 the rest of the other ethnic groups (Native-American, Caucasian, Asian, Mexican, Other). The synthetic index described by the equation 3 can 3. Methodology show on average how much a sensitive attribute is at risk of discrimination, but might underestimate the The first formal criterion introduced in [32] requires that inequity. With reference to the dataset Compas, in the sensitive attribute A be statistically independent of the predicted value R. Assuming we use a dataset with a field A having cardinality m, 𝐴 = {π‘Ž1 , ..., π‘Žπ‘š }, the random variable A is independent compared to R if and only if for each 𝑖, 𝑗 ∈ [1, π‘š], with 𝑖 ΜΈ= 𝑗 we have that: 𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– ) = 𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘— ) (1) To understand how far the two predictions deviate from the ideal case (zero difference), we can calculate the distance between two probabilities: U(π‘Žπ‘– , π‘Žπ‘— ) = |𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– ) βˆ’ 𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘— )| (2) To get a synthesis value of the non-independence be- tween A and R [34], the arithmetic mean of the distances can be considered: π‘šβˆ’1 π‘š 2 βˆ‘οΈ βˆ‘οΈ U(π‘Ž1 , .., π‘Žπ‘š ) = U(π‘Žπ‘– , π‘Žπ‘— ) (3) π‘š(π‘š βˆ’ 1) 𝑖=1 𝑗=𝑖+1 Figure 1: Scatter Plot: probability of A=Race and K-Means centroid Instead of using the equation 2, some authors [22] apply a different notion of independence: the table 1 it is observed that the values in the second column 𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– ) cluster around 0.26 and 0.65. Figure 1 shows graphically the values of the table 1, with πœ€π‘– = |𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– ) βˆ’ 𝑃 (𝑅 = 1|𝐴 ΜΈ= π‘Žπ‘– )| (4) evidence of the centroids identified through the K-Means What we have seen so far fails to explain whether there algorithm. The difference between the value calculated are groups, within the sensitive attribute, that undergo using the equation 3 (P=0.24) and the value obtained the same mode of treatment nor the presence of discrim- by considering centroids (P=0.39) turns out to be 0.15 ination among groups, the present work was born from points. This demonstrates what was stated earlier with this reflection. For explaining our idea we prefer to use respect to the use of a central tendency index. However, an example previously mentioned. the equation 3 can be used in the presence of more than The sensitive attibute A=Race, of Compas dataset, con- two clusters. tains six different ethnicities shown in the first column of the table 1. Table 1 Probability for Sensitive Attribute Race 𝐴 = π‘Žπ‘– 𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– ) Centroid Caucasian 0.33 Hispanic 0.28 0.26 Other 0.20 Asian 0.23 African-American 0.58 0.65 Figure 2: Probability of U(π‘Žπ‘– , π‘Žπ‘— ) with A=Race for couples in Native-American 0.73 Equation 2 The study [19], has determined that African- Americans are the unprivileged group compared to In the following, we will illustrate alternative methods 78 Alessandro Simonetta et al. CEUR Workshop Proceedings 76–84 for calculating different synthetic fairness indices that regression [40] that offers categorical results and there- allow for greater sensitivity to discriminatory situations. fore can be trained to predict the membership of an item Next, we will try to identify the relationship that exists in a class. between the dateset completeness index and the identi- In order to evaluate the completeness of the dataset we fied fairness indices. This will allow us to anticipate the examined the Max Completeness, introduced in [41]. Max- risks of bias arising from incomplete data. imum completeness is an index measuring the percent- age degree of completeness of the dataset (Incomplete=0, 3.1. Dataset Complete=1) with respect to one or more categorical at- tributes, when the expected value is that in which the In this section are present the list of the dataset used for attributes considered have a number of replications equal the sperimentation: to that of the predominant. Assuming we wish to cal- culate the completeness of a dataset on a categorical β€’ COMPAS Recidivism Dataset [19]; attribute A: β€’ Recidivism in juvenile justice [35] β€’ UCI Statelog German Credit [36]; 𝑁 𝐢𝑀 𝐴𝑋 (𝐴) = (5) β€’ default of credit card clients Data Set [37]; 𝐾𝐴 Β· 𝑀𝑃 𝐢 β€’ Adult Data Set [38]; Where: β€’ Student Performance Data Set [39]. β€’ N is the total number of instances in dataset; The sensitive attributes are listed in the following table β€’ 𝐾𝐴 is the number of classes of attribute A; 2 β€’ 𝑀𝑃 𝐢 is the maximum number of elements of a class of attribute A. Table 2 This index could be calculated on multiple categori- Datasets and Sensitive Attributes cal attributes by considering as 𝐾𝐴 the number of Dataset Attribute Cardinality possible combinations of the chosen categorical vari- Race 6 ables and as 𝑀𝑃 𝐢 the maximum number of items Compas Sex 2 grouped over the number of attributes considered. For Age 3 example, considering the Compas dataset and the at- V3_nacionalitat 35 tributes Race and Sex, to have the maximum complete- V2_estranger 2 ness 𝐢𝑀 𝐴𝑋 (π‘…π‘Žπ‘π‘’, 𝑆𝑒π‘₯) = 1, we must have a number Juvenile V1_sexe 2 of items for all combinations of race and sex equal to V5_edat_fet_agrupat 3 2,626, that is the number of items of male and African- V4_nacionalitat_agrupat 5 American ethnicity, which is the category most nu- V8_edat_fet 5 merous. To do this the number of records within the Sex 2 UCI dataset must increase to 31,512 from the current N=6,172 Education 7 Education 16 (𝐢𝑀 𝐴𝑋 (π‘…π‘Žπ‘π‘’, 𝑆𝑒π‘₯) = 0.19). Race 5 Income Sex 2 3.2. Clustering Method Native country 41 Status 4 Starting from the consideration that the conditional prob- Statelog Sex 2 abilities of the random variable R with respect to mem- foreignworker 2 bership in a sensitive attribute (𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– )) may Sex 2 determine affinity in treatment equivalence classes, we Age 6 thought of using unsupervised clustering algorithms to Student mMather job 5 identify possible clusters in the probabilities. Among the Father job 5 ML algorithms that were analyzed, we chose K-Means Mother Education 5 and DBSCAN [42]. K-Means is a clustering algorithm Father Education 4 that tends to separate samples into K groups with equal variance, minimizing the within-cluster sum-of-squares For each dataset we have calculated the predicted value criterion. To use this method, it is necessary to know using a classification model, but only Compas Dataset a priori the K number of clusters into which to divide and Recidivism in juvenile justice have already this in- the samples. Once the centroids have been calculated, it formation, so we have used the original one. Different is possible to use the equation 2 or the 3 depending on models for classification are present in literature and are the value of K to obtain the synthetic index. Compared implemented in software libraries. We chose the logistic with [34], bundling multiple instances of the sensitive 79 Alessandro Simonetta et al. CEUR Workshop Proceedings 76–84 Figure 3: K-Means method, 𝐢𝑀 𝐴𝑋 minor of 0.33 and major of 0.66 attribute into groups results in a lower m-number, indeed, 3.3. MinMax Method the term π‘Žπ‘– is composed of all elements treated similarly. Although both clustering methodologies gave good re- The critical issue of correctly identifying the K-number sults to work on, another approach was explored starting to be used for clustering led us to study other approaches from the definition found in [22] to calculate the value and subsequent experimentation with DBSCAN. This of the fairness metric trying to find the worst case. The sees clusters as areas of high density separated by areas process is described below referring to the Demographic of lower density. The application of such an algorithm Parity metric (equation 4) (Independence), but without needs only the parameters indicating the number of sam- loss of generality can be extended to all. This description ples found in the area forming a cluster and the one places emphasis on the fact that π‘Žπ‘– is considered as an indicating the density required to form a cluster (eps). unprivileged group and the set of all other elements as One of its limitations is that some elements turn out not a privileged group. The algorithm, for all values of the to belong to any cluster; it was decided to treat them as sensitive attribute A, calculates the result of the equa- unitary clusters. Since the concept of a centroid does not tion 4 associated with each group by considering from exist for DBSCAN, the value of the fairness metric was time to time the element under observation as discrim- calculated as follows: inated and all others as privileged. Considering, again, β€’ clusters: probability as per equation 3 to calculate the field Race, if the element we are calculating for is cluster fairness; Asian, this is π‘Žπ‘– and all others constitute the other group in the equation. Once we have iterated this process for β€’ individual elements: average of the fairness in- all values of the sensitive attribute we will go on to select dices of individual instances of the sensitive at- the highest and lowest result. The former is the group tribute. for which the predicted variable R is most dependent on Applying the two clustering methods resulted ethnicity, while the latter is the most independent. The in values that were higher on average than those difference between these two values indicates how large calculated using only the arithmetic mean reported the inequality of treatment between the privileged and in the equation 3. Furthermore, this made it possi- unprivileged group is, relative to the sensitive attribute ble to identify groups that were similar in treatment type. considered. Compared with the use of clustering, this methodology 80 Alessandro Simonetta et al. CEUR Workshop Proceedings 76–84 Figure 4: DBSCAN method, 𝐢𝑀 𝐴𝑋 minor of 0.33 and major of 0.66 arises in the worst case by considering as the index of cases separately. The metrics resulting from this split treatment disequity the largest difference between the πœ€π‘– are: Separation TPR (True Positive Rate), Separation FPR obtained by applying the equation 4 and reported in the (False Positive Rate), Sufficiency PPV (Positive Predictive table 3. Value) and Sufficiency NPV (Negative Predictive Value). Once the results were evaluated for the six chosen fair- Table 3 ness metrics, we related them to the maximum complete- Conditional Probabilities of Sensitive Attribute and πœ€π‘– by ness balancing index. equation 4 Each diagram consists of two box plots containing the values of sensitive attributes divided in this way: Max- 𝐴 = π‘Žπ‘– 𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– ) πœ€π‘– imum Completeness values less than 0.33 (low risk in Caucasian 0.33 0.22 yellow) and greater than 0.66 (high risk in red). Interme- Hispanic 0.28 0.18 diate values are not reported because it is more difficult Other 0.20 0.11 for them to determine whether they are fair or not. We Asian 0.23 0.14 remarks that Fairness metrics, as defined, take value in African-American 0.58 0.51 Native-American 0.73 0.64 the range [0,1] (Fair=0, Unfair=1). Figure 3 shows the case where the six fairness metrics πœ€π‘– =|𝑃 (𝑅 = 1|𝐴 = π‘Žπ‘– ) βˆ’ 𝑃 (𝑅 = 1|𝐴 ΜΈ= π‘Žπ‘– )| are calculated using the K-means technique. Note that all boxplots tails overlap, while for the body remains a clear separation for Separation TPR and Sufficiency PPV. 4. Discussion The worst case is for the Sufficiency NPV metric where there is total overlap. During the experimental phase, the methods presented Figure 4 shows the results of applying the DBSCAN tech- in the previous paragraphs were applied to sensitive at- nique. In this case the values obtained are similar to the tributes belonging to six known datasets. The fairness K-means, although there are worst results for Overall metrics used during the testing phase are: Independence, Accuracy Equality and NPV Sufficiency. Such plots were Separation, Sufficiency and Overall Accuracy Equality as not optimal in 3 too. defined in [32],[21]. Separation and Sufficiency were cal- Finally, in figure 5 the method of MinMax as for equation culated by considering positive and negative predicted 4 is applied instead of the clustering algorithms. In these 81 Alessandro Simonetta et al. CEUR Workshop Proceedings 76–84 Figure 5: MINMAX method, 𝐢𝑀 𝐴𝑋 minor of 0.33 and major of 0.66 diagrams, one can see a lengthening of the boxplots re- However, establishing a number of clusters that forces lated to the risk cases and a sharper separation between such a division could join groups that are not actually the two boxplots for all diagrams. In the Independence treated equally. Another point we will investigate is and OAE cases, there is no more overlap between the tails. the possibility of changing the algorithm to identify the In the PPV Sufficiency the high risk cases tend to one synthetic value of the cluster for the metric under con- and almost full separation between values is achieved. sideration. One solution, we will explore, is to integrate For sensitive attributes that have a 𝐢𝑀 𝐴𝑋 greater than the calculation of the difference between maximum and 0.66, there are values of fairness metrics greater than 0.2. minimum in a clustering algorithm that does not have In conclusion, the MinMax method yielded better results a predefined number K of clusters, such as the already compared to clustering, but conversely, using methods presented DBSCAN. Related to this algorithm, alterna- such as K-Means and DBSCAN can help to better define tives on how to treat elements that are not associated treatment similarities among groups. with any cluster will be explored. The question is: should these cases discarded because they are outliers or do they represent borderline cases, e.g., a highly discriminated 5. Current Limits and Future minority? Works The experimentation carried out has yielded encoura- 6. Conclusion ging results for what is the separation of high and low The spread of ML algorithms for constructing decision risk fairness forecast with respect to 𝐢𝑀 𝐴𝑋 . Although systems make the data used in their construction increas- the result obtained shows values that are sometimes in a ingly important. Imbalances or biases that may be present range that is not very wide, improvements have already within the information can affect the results of such sys- been identified that can be investigated in future work. tems, causing discrimination toward certain groups. The first task is the choice of clustering method and pa- The use of the fairness metrics that have been presented rameters that affect the number of clusters. Having few becomes important to predict the impact related to such clusters brings us closer to the worst case, and it becomes biases and go to act accordingly on both algorithms and more easy to identify the correct meaning to give to each input data in line with the objectives. clusters, depending on their distance and their shape. 82 Alessandro Simonetta et al. CEUR Workshop Proceedings 76–84 In this work we tried to provide a methodology to iden- motor current signature analysis and the artificial tify similar clusters by treatment type and to calculate a neural network 10 (2020) 70–79. synthetic index that could predict how at risk the system [8] M. Matta, G. C. Cardarilli, L. Di Nunzio, R. Fazzo- is with respect to sensitive attributes. The experimen- lari, D. Giardino, A. Nannarelli, M. Re, S. SpanΓ², tation carried out provided good results both in terms A reinforcement learning-based qam/psk symbol of identifying agglomerations that undergo similar treat- synchronizer, IEEE Access 7 (2019) 124147–124157. ments and in calculating a parameter that would give a doi:10.1109/ACCESS.2019.2938390. conservative assessment of the metric. [9] R. Brociek, G. Magistris, F. Cardia, F. Coppa, The relationship between the maximum completeness S. Russo, Contagion prevention of covid-19 by index and the fairness indices calculated by the showed means of touch detection for retail stores, in: CEUR methods provided a guideline in order to recognize high- Workshop Proceedings, volume 3092, CEUR-WS, risk and lower-risk sensitive attributes. This will give to 2021, pp. 89–94. the analysts the information to better configure classifi- [10] L. Canese, G. C. Cardarilli, L. Di Nunzio, R. Faz- cation algorithms. zolari, D. Giardino, M. Re, S. SpanΓ², Multi-agent The results of this work lay the groundwork for future reinforcement learning: A review of challenges developments aimed at improving the identification of and applications, Applied Sciences 11 (2021). groups within sensitive attributes and researching alter- URL: https://www.mdpi.com/2076-3417/11/11/4948. native synthetic indices that will have a greater precision. doi:10.3390/app11114948. [11] N. Brandizzi, S. Russo, R. Brociek, A. Wajda, First studies to apply the theory of mind theory to green References and smart mobility by using gaussian area cluster- ing, volume 3118, CEUR-WS, 2021, pp. 71–76. [1] The Economist, The world’s most valuable resource [12] F. Fallucchi, M. Gerardi, M. Petito, E. De Luca, is no longer oil, but data, The Economist, USA (6th Blockchain framework in digital government for May 2019). the certification of authenticity, timestamping and [2] B. Marr, The 5 biggest data science data property, 2021. doi:10.24251/HICSS.2021. trends in 2022, Oct 2021. URL: https: 282. //www.forbes.com/sites/bernardmarr/2021/ [13] N. Brandizzi, V. Bianco, G. Castro, S. Russo, A. Wa- 10/04/the-5-biggest-data-science-trends-in-2022/ jda, Automatic rgb inference based on facial emo- ?sh=22f5fc1d40d3(AccessedMay,2022). tion recognition, in: CEUR Workshop Proceedings, [3] R. Giuliano, The next generation network in 2030: volume 3092, CEUR-WS, 2021, pp. 66–74. Applications, services, and enabling technologies, [14] B. Nowak, R. Nowicki, M. WoΕΊniak, C. Napoli, in: 2021 8th International Conference on Elec- Multi-class nearest neighbour classifier for trical Engineering, Computer Science and Infor- incomplete data handling, in: Lecture Notes matics (EECSI), 2021, pp. 294–298. doi:10.23919/ in Artificial Intelligence (Subseries of Lec- EECSI53397.2021.9624241. ture Notes in Computer Science), volume [4] C. Napoli, G. Pappalardo, E. Tramontana, A 9119, Springer Verlag, 2015, pp. 469–480. hybrid neuro-wavelet predictor for qos control doi:10.1007/978-3-319-19324-3_42. and stability, Lecture Notes in Computer Sci- [15] D. PoΕ‚ap, M. WoΕΊniak, C. Napoli, E. Tramontana, ence (including subseries Lecture Notes in Arti- R. DamaΕ‘evičius, Is the colony of ants able to rec- ficial Intelligence and Lecture Notes in Bioinfor- ognize graphic objects?, Communications in Com- matics) 8249 LNAI (2013) 527–538. doi:10.1007/ puter and Information Science 538 (2015) 376–387. 978-3-319-03524-6_45. doi:10.1007/978-3-319-24770-0_33. [5] S. Verma, R. Sharma, S. Deb, D. Maitra, Artifi- [16] S. Illari, S. Russo, R. Avanzato, C. Napoli, A cloud- cial intelligence in marketing: Systematic review oriented architecture for the remote assessment and future research direction, International Jour- and follow-up of hospitalized patients, in: CEUR nal of Information Management Data Insights 1 Workshop Proceedings, volume 2694, CEUR-WS, (2021) 100002. doi:https://doi.org/10.1016/ 2020, pp. 29–35. j.jjimei.2020.100002. [17] N. Dat, V. Ponzi, S. Russo, F. Vincelli, Supporting [6] G. Capizzi, G. Lo Sciuto, C. Napoli, E. Tramontana, impaired people with a following robotic assistant An advanced neural network based solution to en- by means of end-to-end visual target navigation force dispatch continuity in smart grids, Applied and reinforcement learning approaches, in: CEUR Soft Computing Journal 62 (2018) 768 – 775. Workshop Proceedings, volume 3118, CEUR-WS, [7] T. Dhomad, A. Jaber, Bearing fault diagnosis using 2021, pp. 51–63. [18] J. Angwin, J. Larson, S. Mattu, L. Kirchner, Machine 83 Alessandro Simonetta et al. CEUR Workshop Proceedings 76–84 bias : There’s software used across the country cation using iso/iec 25012: Industrial experi- to predict future criminals. and it’s biased against ences, Journal of Systems and Software 176 blacks., https://www.propublica.org/ (2016). (2021) 110938. URL: https://www.sciencedirect.com/ [19] J. Larson, S. Mattu, L. Kirchner, J. Angwin, science/article/pii/S0164121221000352. doi:https: Compas recidivism dataset, 2016. URL: //doi.org/10.1016/j.jss.2021.110938. https://www.propublica.org/article/ [31] A. Simonetta, A. Trenta, M. C. Paoletti, A. VetrΓ², how-we-analyzed-the-compas-recidivism-algorithm/ Metrics for identifying bias in datasets, SYSTEM (AccessedOct,2021). (2021). [20] J. Lee, Analysis of precision and accuracy in a [32] S. Barocas, M. Hardt, A. Narayanan, Fairness and simple model of machine learning, Journal of the machine learning, 2020. URL: https://fairmlbook. Korean Physical Society, 2017, p. 866–870. doi:10. org/(AccessedSept,2021), chapter: Classification. 3938/jkps.71.866. [33] K. Burkholder, K. Kwock, Y. Xu, J. Liu, C. Chen, [21] A. Carey, X. Wu, The statistical fairness field S. Xie, Certification and trade-off of multiple fair- guide: perspectives from social and formal sci- ness criteria in graph-based spam detection, Asso- ences, AI and Ethics (2022) 1–23. doi:10.1007/ ciation for Computing Machinery, 2021, p. 130–139. s43681-022-00183-3. doi:10.1145/3459637.3482325. [22] D. Pessach, E. Shmueli, Algorithmic fairness, [34] A. VetrΓ², M. Torchiano, M. Mecati, A data 2020. URL: https://arxiv.org/abs/2001.09784. doi:10. quality approach to the identification of discrim- 48550/ARXIV.2001.09784. ination risk in automated decision making sys- [23] S. Prince, Bias and fairness in ai, 2019. tems, Government Information Quarterly 38 URL: https://www.borealisai.com/en/blog/ (2021) 101619. URL: https://www.sciencedirect.com/ tutorial1-bias-and-fairness-ai/(accessedMar, science/article/pii/S0740624X21000551. doi:https: 2022). //doi.org/10.1016/j.giq.2021.101619. [24] G. Capizzi, G. Lo Sciuto, C. Napoli, M. WoΕΊniak, [35] Department of Justice, Recidivism in juvenile jus- G. Susi, A spiking neural network-based long-term tice, 2016. URL: https://cejfe.gencat.cat/en/recerca/ prediction system for biogas production, Neural opendata/jjuvenil/reincidencia-justicia-menors/ Networks 129 (2020) 271 – 279. index.html(AccessedOct,2021). [25] M. Miron, S. Tolan, E. GΓ³mez, C. Castillo, Address- [36] D. H. Hofmann, Uci statelog german credit, ing multiple metrics of group fairness in data-driven 1994. URL: https://archive.ics.uci.edu/ml/datasets/ decision making, 2020. URL: https://arxiv.org/abs/ statlog+(german+credit+data)(AccessedOct,2021). 2003.04794. doi:10.48550/ARXIV.2003.04794. [37] I.-C. Yeh, default of credit card clients data set, [26] G. Capizzi, G. Lo Sciuto, C. Napoli, R. Shikler, 2016. URL: https://archive.ics.uci.edu/ml/datasets/ M. Wozniak, Optimizing the organic solar cell man- default+of+credit+card+clients)(AccessedOct, ufacturing process by means of afm measurements 2021. and neural networks, Energies 11 (2018). [38] R. Kohavi, B. Becker, Adult data set, 1996. [27] International Organization for Standardization, URL: https://archive.ics.uci.edu/ml/datasets/ "ISO/IEC 25012:2008 Software engineering β€” adult(AccessedOct,2021). Software product Quality Requirements and [39] P. Cortez, Student performance data sett, 2014. URL: Evaluation (SQuaRE) β€” Data quality model", https://archive.ics.uci.edu/ml/datasets/student+ 2008. URL: https://www.iso.org/standard/35736. performance(AccessedOct,2021). html(accessedJan,2021). [40] scikit-learn developers, Logistic regression, [28] International Organization for Standardization, 2022. URL: https://scikit-learn.org/stable/ "ISO/IEC 25024:2015 Systems and software engi- modules/generated/sklearn.linear_model. neering β€” Systems and software Quality Require- LogisticRegression.html(AccessedMay,2022). ments and Evaluation (SQuaRE) β€” Measurement [41] A. Simonetta, A. VetrΓ², M. C. Paoletti, M. Torchi- of data quality", 2015. URL: https://www.iso.org/ ano, Integrating square data quality model with iso standard/35749.html(accessedJan,2022). 31000 risk management to measure and mitigate [29] J. Calabrese, S. Esponda, P. M. Pesado, Frame- software bias, CEUR Workshop Proceedings (2021) work for Data Quality Evaluation Based on ISO/IEC pp. 17–22. 25012 and ISO/IEC 25024, in: VIII Conference on [42] scikit-learn developers, Algorithmic fairness, Cloud Computing, Big Data & Emerging Topics, 2022. URL: https://scikit-learn.org/stable/modules/ 2020. URL: http://sedici.unlp.edu.ar/handle/10915/ clustering.html(AccessedMay,2022). 104778. [30] F. Gualo, M. Rodriguez, J. Verdugo, I. Ca- ballero, M. Piattini, Data quality certifi- 84