Fairness and Bias in Learning Systems: a Generative Perspective Serge Dolgikh National Aviation University, 1 Lubomyra Huzara Ave, Kyiv, 03058, Ukraine Abstract In this work that is in progress we approach definitions and analysis of fairness and bias in the learning systems from the perspective of unsupervised generative learning. Based on generative structure of informative low-dimensional representations that can be obtained, as demonstrated previously, with different types and architectures of unsupervised generative models, certain types of bias analysis can be performed without massive prior (True Standard) data. As demonstrated on examples, these methods can provide additional angles and valuable insights in the analysis of bias and fairness of learning systems. Keywords 1 Learning systems, bias, unsupervised learning, generative learning, clustering 1. Introduction Whereas AI technology has been developing at an outstanding pace, finding applications in many areas and domains of research, industry and societal functions, the progress has not been entirely and unconditionally positive. One direction of questioning is understanding the reasoning of AI systems and ability to provide explanations or audit of their decisions (“black box” vs. explainable AI, [1,2]). Another, though closely related one is developing conceptual and ontological framework describing the fairness, trustworthiness and bias of AI systems. Many studies described examples of bias in different functional applications of AI, including criminal justice, health care, human resources, social networks and others [3,4]. It was noted that the problems of explainable and trusted AI are closely interrelated: understanding the reasons why learned systems make certain decisions can be a key factor in determining whether they can be trusted; on the other hand, it is not easy to imagine a mechanism or a process of confident and reliable determination of a trusted AI without some insight into the reasons of its decisions. In this work we pursue a perspective on these essential and actual questions that does not involve prior trusted data for such determination. This approach allows to unbind the question of “chicken and egg” or bootstrap in determination of trustworthiness (the origins of the trusted system that produced prior decisions) while suggesting sound and practical methods of analysis of fairness and bias based on generative structure of the input data. In our view, methods of unsupervised generative concept learning that are being actively developed [5] can provide a basis for such analysis. In essence, methods of generative concept learning, where successful, can establish a structure of characteristic patterns, types or natural concepts in the input distribution by stimulating learning models to improve quality of generation from informative latent distributions, often of significantly reduced complexity. Unlike methods of conventional supervised learning, these approaches do not depend on specific assumptions about the distribution of the data, massive sets of prior data and can be used in a general process with data of different types, origin and domains of application. Assuming that generative structure of the data of interest has been obtained, an analysis of distributions of decisions produced by the audited AI systems across characteristic natural classes of input data can provide valuable insights about the system, including possibility of bias. 1st Workshop on Bias, Ethical AI, Explainability and the role of Logic and Logic Programming, BEWARE-22, co-located with AIxIA 2022, University of Udine, Udine, Italy, 2022 EMAIL: sdolgikh@nau.edu.ua (A. 1) ORCID: 0000-0001-5929-8954 ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 2. Bias and Trustworthiness: Approaches and Definitions We will consider the black box interpretation of AI (and generally, a learning system, LS) whereby a functional or trained LS L can produce decisions on the set of inputs S, for example, sensory inputs from the environment: 𝑑(𝑥) = 𝐷(𝑥, 𝐿), 𝑥 ∈𝑆 (1) where D(x), the decision function of the learning system, however, the justification or explanation for a particular decision d(x) is not necessarily known to the external observer. On a subset of inputs, presumably representative sample of the input distribution, the system produces a set of decisions, D(S). In one approach, suppose there exists a True Standard (“etalon”, standard, TS) set of outputs that represents correct decisions for given inputs with sufficient confidence. Then, characteristics of the trained system such as accuracy and error can be defined with standard measures based on the TS decisions by comparing the decisions produced by the system with those in the standard set. 𝐴, 𝐸(𝐿): { 𝐷(𝑆), 𝑇𝑆 } (2) Where A, E(L): accuracy and error of the learning system over the representative set of inputs and presumably, the input space S. Definition of the bias on the other hand, is not as straightforward. To begin, some observations though trivial need to be made. 1. Bias can be defined on a system level, for example, a subset of decisions D(SX, L) and not on an individual decision; same decision on the same input can be produced by an unbiased and biased system. 2. It can be argued that with a black box system, biased correct decisions are indistinguishable from unbiased ones. For example, it is common to see trained AI system biased to acceptance or rejection; such a bias can be easily detected with an adequate etalon set. However, if one considers only the subset of correct decisions, no conclusion about the bias of the system can be made. As a consequence of this observation, bias analysis in the case of black box learning systems, where additional context of decisions (explanations) is not available, the analysis has to be limited to the subset of incorrect decisions: 𝑆𝑤 : 𝐷(𝑥) ≠ 𝑇𝑆(𝑥), 𝑥 ∈ 𝑆𝑤 . 3. Next, bias is not equivalent to wrong decisions, errors. As already mentioned, there is no reason to expect that correct decisions cannot be produced by biased systems (i.e., correct decisions made for “wrong reasons”); also, unbiased systems can produce incorrect decisions (errors). Based on these observations, a bias in a learning system can be defined as a systematic deviation of decisions produced by the system from correct (etalon) decisions correlated with a set of certain factors (bias factors). The relationship between accuracy and trustworthiness are illustrated in Figure 1. Figure 1: Bias, Fairness and Accuracy. Errors include bias but not limited to it (imperfections, failure to learn). As well, biased systems can produce correct decisions. Consequently, determination of bias can be more challenging than that of correctness that can be measured by standard metrics of accuracy at least in the domain of learning with known TS decisions such as conventional supervised learning. 2.1. Challenges and Approaches in Measuring Bias and Trustworthiness In approaching the question, how trustworthiness and bias can be measured, evaluated for realistic learning systems these challenges can be encountered: 1. True Standard (TS) decisions may not be available for all or significant part of the inputs or be insufficient to make confident judgements. 2. Can TS decisions themselves be trusted? (i.e., assured to be free of bias)? The second question can be far from trivial as has been discussed in numerous studies. These questions relate to “conventional” approach in determining the bias based on prior trusted True Standard decisions. However, it may not be the case in all cases and domains, that brings another perspective: 3. Is definition of bias possible that is not based on pre known TS decisions? This question parallels the dichotomy of supervised versus unsupervised learning: where successful learning can be dependent on prior sets of successful decisions (conceptual bootstrap problem). Thus, in exploration of bias in learning systems one can outline two broad directions: 1. Analysis of fairness and bias based on available True Standard decisions. 2. Approaches in evaluation of bias / trustworthiness without resorting to TS decisions that may not be available for specific task or problem area. In the rest of this work we will focus our attention on the second problem area. 2.2. Non-Standard Bias Analysis In scenarios where standard decisions for evaluation of bias and trustworthiness are not available alternative approaches need to be developed. Clearly, evaluation of correctness of LS decisions is not possible without some measure, or criteria that are imposed externally. Same decision can be correct or the opposite, if different sets of criteria are applied. This, the first essential input to these methods is the correctness criteria. Secondly, it will be assumed that the data used to created (for example, train) the learning system is a correct representation of its sensory environment. Of course, it does not guarantee that the environment itself is correct, that is, representative and fair representation of some desired purpose or objective; such scenarios fall beyond the scope of the study. Based on these assumptions, the problem of detection of bias that is not based on availability of TS (non-standard bias analysis) can be formulated as: determine the probability of systemic deviation of system decisions from the input criteria, correlated with one or several bias factors. For a representative subset of decisions D of a functional (trained) LS L on the subset of inputs S, and a small set of criteria C, determine the probability pB(L) of L being biased; secondly, attempt to identify the bias factors fB(L) correlated with the biased decisions 𝐷𝐵 ⊂ D. (2) 𝐷, 𝐶 → 𝑝𝐵 (𝐿), 𝑓𝐵 (𝐿) “small” here could mean that cardinality of the criteria set has to be much lower than that of a reasonable set of standard decisions: 𝐶𝑎𝑟𝑑(𝐶) ≪ 𝐶𝑎𝑟𝑑(𝑇𝑆). In the remaining sections we will attempt to illustrate how this program can be realized based on the ability of certain learning systems to create informative generative representations of the sensory data that do not require massive prior sets of standard decisions to derive certain essential information from observed distributions. A distinct feature of such system is an ability to learn from the incentive to improve perceptions, or generations of observable inputs in a process of self-supervised learning [6] that, as a number of results have demonstrated, can lead to emergence of characteristic conceptual structure in the resulting representations of sensory data [7,8]. 2.3. Generative Representations and Non-Standard Bias Analysis As has been reported in a number of results, models of unsupervised generative learning can produce informative representations of complex sensory data of different types and origin with clear conceptual structure and significant reduction of dimensionality [7,8]. An example of two-dimensional generative representations of a dataset of images of basic geometric shapes is given in Figure 2. Figure 2: Conceptual representations, set of geometrical shapes. In the illustration, two-dimensional latent representations of a set of images of basic geometric shapes: circles, triangles and empty backgrounds were plotted in the latent coordinates with three independently learned models of unsupervised generative learning [9]. Though without prior knowledge one cannot make any conclusions about the semantics of the input data from which representations were obtained, it is clear from the distribution of the encoded data that it contained at least three distinct types, patterns or concepts. Essentially, successful generative learning allows to identify characteristic structure of arbitrary data by factorizing the latent distribution into characteristic regions (natural concepts). An example of such factorization can be observed in the figure above. An essential advantage of these methods is that they are entirely unsupervised, that is, do not require any prior knowledge of the data, and as such would be in compliance with the objective of non-standard bias analysis as defined earlier. Such unsupervised decomposition of data into characteristic latent structures, if and where successful, can provide an additional perspective for bias analysis, allowing to bypass the dependency on massive standard decision sets. Indeed let us consider a structure of latent clusters KS = { Kn } in the representative input set S identified with certain level of confidence, γ and decisions produced on it by a black box learning model: D(S). With the latent structure KS the decision set can be decomposed into distributions over the identified clusters as: DS = { D(x), x ∈ 𝐾𝑛 }. In contrast to decomposition by observable parameters that in a real complex data can be of very high dimensionality, the advantages of generative decomposition are: 1) significantly lower dimensionality of the latent generative space and 2) generative factorization that represents characteristic, or natural types, classes or patterns in the input data. A comparison of distributions of decisions DS across different natural clusters can provide additional and independent perspective for an analysis of possible bias. For an illustration, let us consider the set of geometrical shapes above, presuming that it describes some observable data on which decisions are produced by a black box learned system L. As the inputs to bias analysis, one would have the set of decisions DS produced by L on a representative set of inputs S, in some format, suppose for simplicity, Boolean or real number representing probability. A common approach in conventional bias analysis would be to seek a correlation between the decisions and observable parameters, and examination of such correlations for potential bias. A large number of observable parameters (i.e., dimensionality of the data samples in the set) can present significant challenges with such an approach, as well as a possibility of a more complex correlation with multiple input parameters that may not be easily detected. To illustrate application of generative methods in this example, suppose unsupervised generative models denoted produced a consistent decomposition of the dataset into characteristic clusters KS with the distribution of decisions in the identified clusters, DS: D = (Ks, Ds). An example of such distribution for two learning systems denoted “F” and “B”, of similar overall accuracy, is shown in Table 1. The level of trustworthiness or bias of each system is not known at this stage in the bias analysis. Table 1 Generative decomposition of decisions, shapes dataset System Mean decision Concept 1 Concept 2 Concept 3 Dataset “F” 0.18 0.28 0.16 0.18 “B” 0.28 0.17 0.11 0.20 Once distributions of decisions by characteristic clusters in the input data is obtained, they can be examined for possible bias. Of many possibilities, we will outline two. In one case, suppose significant differences are observed between distributions of decisions in the clusters, as illustrated in Table 1 (inter-concept decision disparity). As discussed earlier, the test of fairness depends on defined criteria of correctness and let us suppose in this case the hypothesis or objective for the test of fairness is defined as: “no significant differences in decisions observed between identifiable groups of subjects”. From the results of generative bias analysis above, obtained without any TS decisions, one can observe that such differences can be seen in both models: Model “F”: Concept (Cluster) 2; Model “B”: Concepts 1, 3 and additional analysis is necessary. Next, one can examine representative samples of clusters that also can be obtained from generative analysis (Figure 2) and investigate whether deviation from average decision is “justified”, that is, can be explained for these samples based on the objective. Suppose the additional analysis produced this outcome: Concept 1: “No” (disparity of the group from the set mean is not justified) Concept 2: “Yes” (deviation from the mean is justified or explainable) Concept 3: “No” Based on this analysis, one can conclude that model “F” satisfied the generative test of fairness whereas system “B” failed it (by producing decisions incompatible with the objective). Moreover, the factors of bias can be identified in this case as latent coordinates of the regions of diverging clusters. In another case, suppose a learning system have developed a spurious bias with one or some of the input parameters (so called “training shortcut”). This would cause presence of outlier points with expressed deviations from cluster means in some clusters, with observable parameters associated with the bias condition (intra-concept decision disparity). Then the outlier set can be examined for justified deviation as in the preceding case, resulting in confirmation or discarding of the bias hypothesis. Again, correlation analysis of the observable parameters with the outlier samples may indicate the bias factors that were developed in the training process. 3. Conclusions As discussed in this work, unsupervised generative analysis of observable data and the structure of natural types or concepts it can produce, can provide additional perspective and inputs for the analysis of bias and fairness of black box learning systems. An analysis based on a structure of natural types that can be identified with entirely unsupervised methods can bypass the requirement for massive prior True Standard decisions common with conventional methods of machine intelligence, while providing a basis for confident determination of possible bias in the discussed scenarios of inter- and intra-cluster disparity of the decisions. Due to high versatility of models and architectures of generative learning, including deep neural networks, the method can have a broad range of applicability in problems and with data of different types. It is important to remember however that it is only an additional approach in analysis of possible bias that does not and cannot make a claim to a final determination. Generally, at least in the defined context, it can be challenging to guarantee an absence of bias as it would be equivalent to a negative proof of absence of correlation of an arbitrary set of decisions with any factor. Nevertheless, unsupervised generative analysis can offer some valuable insights in this increasingly actual domain of applications of Artificial Intelligence. 4. References [1] Longo L., Goebel R., Lecue F., Kieseberg P., Holzinger A.: Explainable Artificial Intelligence: concepts, applications, research, challenges and visions. CD-MAKE 2020, LNCS 12279, 1–16, (2020). [2] Schwartz R., Vassilev, Green K., Perine L., Bart A.: Towards a standard for identifying and managing bias in Artificial Intelligence. National Institute of Standards and Technology, USA Special Publication 1270 https://doi.org/10.6028/NIST.SP.1270 (2022). [3] Bogen, M.: All the ways hiring algorithms can introduce bias. Harvard Business Review (2019). [4] Gianfrancesco M.A, Tamang S., Yazdany J., Schmajuk G.: Potential biases in Machine Learning algorithms using electronic health record data. JAMA International Medicine, 178(11), 1544 (2018). [5] Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1), 1–127 (2009). [6] Jing L., Tian Y.: Self-supervised visual feature learning with deep neural networks: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (11), 4037-4058 (2021). [7] Higgins, I., Matthey, L., Glorot, X., Pal, A., et al.: Early visual concept learning with unsupervised deep learning. arXiv1606.05579 [cs.LG] (2016). [8] Dolgikh, S.: Low-dimensional representations in generative self-learning models. In: Proc. 20th International Conference Information Technologies – Applications and Theory (ITAT-2020), Slovakia, CEUR-WS.org 2718, 239–245 (2020). [9] Dolgikh, S.: Topology of conceptual representations in unsupervised generative models. In: Proc. 26th International Conference Information Society and University Studies (IVUS-2021) Kaunas Lithuania, CEUR-WS.org 2915, 150–157 (2021).