Introduction

Detecting Bias: Does an Algorithm Have to Be Transparent in Order to Be Fair?

William Seymour

william.seymour@cs.ox.ac.uk 0 0 University of Oxford , Oxford , UK

The most commonly cited solution to problems surrounding algorithmic fairness is increased transparency. But how do we reconcile this point of view with the state of the art? Many of the most e ective modern machine learning methods (such as neural networks) can have millions of variables, defying human understanding. This paper decomposes the quest for transparency and examines two of the options available using technical examples. By considering some of the current uses of machine learning and using human decision making as a null hypothesis, I suggest that pursuing transparent outcomes is the way forward, with the quest for transparent algorithms being a lost cause.

Introduction

Recent investigations into the fairness of algorithms have intensi ed the call for machine learning methods that are transparent. Unless an algorithm is transparent, so the argument goes, then how are we to know if it is fair? But this approach comes with a problem: many machine learning methods are useful precisely because they work in a way which is alien to conscious human reasoning. Thus, we place ourselves in the position of having to choose between a more limited (and potentially less e ective) set of algorithms that work in ways that we can understand, and those which are better suited to the task at hand but cannot easily be explained. To clarify, this paper is concerned with the use of transparency as a tool for auditing and communicating decisions, rather than debate over the higher level `transparency ideal', or harmful/obstructive uses of transparency as described by [ 1, 2 ].

This paper will discuss the arguments for and against transparency as a design requirement of machine learning algorithms. Firstly, I will break down what we mean when we talk about fairness and transparency, before considering arguments and examples from both sides of the discussion. I will cover two di erent black box techniques that provide interpretable explanations about algorithmic decisions|local explanations and statistical analyis|as well as some of the problems associated with each of these techniques. The techniques listed are by no means exhaustive and are meant to represent di erent styles that can be used to generate explanations. To conclude, there will be a discussion on the role that transparency might play in the future of machine learning.

What Do We Mean by Transparency?

Since transparency in this context is rooted in fairness, perhaps a better starting point would be to ask what we mean by fairness. A dauntingly complex question in itself, most people would consider approaches that `treat similar people in similar ways' to be fair. These often coalesce along lines of protected characteristics (such as race and gender), as these are where the most glaring problems are often to be found. These characteristics are often expected to be excluded from the decision making process even if they are statistically related to its outcome.

But problems arise when a philosophical de nition of fairness is translated into a set of statistical rules against which an algorithm is to be compared. There are multiple perpendicular axes against which one can judge an algorithm, and the best t will vary based on the context in which the algorithm is used. Examples include predictive parity, error rate balance, and statistical parity to name a few [ 3 ]. To further muddy the waters, it is possible to draw a distinction between process fairness (the actual process of making a decision) and outcome fairness (the perceived fairness of a decision itself) [ 4 ]. It is possible for an algorithm with low process fairness (e.g. including race as a factor in decision making) to exhibit high output fairness (e.g. ensuring similar levels of false positives across racial groups).

As for the term transparency, I refer to information available about an algorithm that details part of its decision making process or information about the decisions it makes, which can be interpreted by a human being. Depending on the context, this could be a data scientist, policy maker, or even a member of the public. Interpretability is a key requirement here, ensuring that published data do actually aid our understanding of algorithmic processes.

As we are concerned about investigating fairness, it makes sense to think of two types of transparency corresponding to those for fairness: process transparency (how much we understand about the internal state of an algorithm) and outcome transparency (how much we understand about the decisions, and patterns in decisions, made by an algorithm). This distinction is important, as while there exist tools that can achieve some level of outcome transparency for all algorithms, only certain types of algorithm exhibit process transparency.

Method I: Local Explanations

The rst method we consider is a black box method of explaining individual decisions. Local explanations work by sampling decisions from the problem domain weighted by proximity to the instance being explained. These samples are then used to construct a new model that accurately re ects the local decision boundary of the algorithm. For non-trivial algorithms, the local model will be a bad t for other inputs, as global decision boundaries will be of a higher dimension than the local one (see Figure 1).

An example of this would be online content moderation. If a user has submitted a post which is deemed by an algorithm to be too toxic, we might want to explain to them which parts of their message caused the algorithm to reject it. For the input sentence \idiots. backward thinking people. nationalists. not accepting facts. susceptible to lies"1 a local explanation might reveal that the words \idiots", and \nationalists" are the greatest factors contributing to the message being agged as toxic. This is not to say that all messages containing the word \nationalists" are toxic, but that the word is considered problematic in this context.

Here we have produced an interpretable explanation without knowing anything about how the algorithm operates|we can say that local explanations provide evidence for outcome fairness. By looking at these explanations for decisions a system makes, we have enough information to conclude that a decision was unfair because it violates our de nition of fairness as described above. This is a good start to our goal of auditing for fairness.

Moving From Local to Global

Local explanations do a good job of informing users of the main factors behind the decisions they are subject to, but they fall short of providing assurance that the system as a whole operates fairly. In order for this to happen, one needs to be able to create a mental model of the system which is functionally close enough to the original that one can predict what it will do (or at least believe that 1 Taken from the list of examples on the Google Perspective API home page at https://www.perspectiveapi.com/ its reasoning will be of su cient quality). Because local explanations consider only facets of the current decision, they do not reveal much about the wider reasoning that pervades an algorithm. While of great use to an individual who is concerned about a decision concerning themselves, they are much less useful to an auditor who is seeking assurance that the algorithm as a whole is fair. A handful of randomly chosen samples being satisfactory does not give su cient assurance that all answers will satisfy a set of fairness criteria. This highlights the distinction drawn earlier between local and global fairness guarantees.

Perhaps then, explanations for audits need to operate at a higher level than local explanations. But then we encounter the problem that the high dimensionality of non-trivial models means that global explanations must be simpli ed to the point of absurdity in order to be intelligible. If explanations can be thought of as \a three way trade o between the quality of the approximation vs. the ease of understanding the function and the size of the domain for which the approximation is valid" [6], then do we risk going so far towards the scale end of the spectrum that we must abandon our hopes of arriving at an answer which is also understandable and accurate?

Method II: Statistical Analysis

Given these problems it is perhaps questionable as to whether any scheme which only considers individual decisions can ever be su cient to determine if an algorithm is fair or not. When considering higher level explanations of algorithms we nd that statistical analysis can o er us the reassurance (or otherwise) that we desire about an algorithm, taking into accounts trends across entire groups of users rather than being limited to individual circumstances.

Statistical analysis is another black box method, and often takes the form of calculating information about particular groups of users and how they are dealt with by the algorithm. By comparing accuracies and error rates between groups it is possible to identify systemic mistreatment. Explaining these ndings is often simple, given most people's intuitive understanding of accuracy and false positives/negatives (see Figure 2).

Lies, Damned Lies, and Statistics

One trap that exists when performing statistical analysis is that due to the aforementioned multitude of ways one can express statistical fairness it is almost always possible to present evidence of compliance and noncompliance. This is because many types of statistical fairness are inherently incompatible with each other: altering the classi er to increase fairness along one axis will always decrease it in another.

In the wake of Machine Bias [7], ProPublica and Northepoint argued that the COMPAS algorithm was unfair and fair, respectively. Both parties were technically correct. These explanations are thus only valid when paired with background knowledge in data science and ethics, and may not suitable for presentation to the general public|doing so could lead to a reduction in trust of machine learning techniques, especially if the presented facts are used to support previously held beliefs which are incorrect [ 2 ].

Another issue is that all of the methods that provide interpretable decisions inevitably present reasoning that correlates with a decision making algorithm but is not causally related to its output. In these cases if the algorithms internals are indeed intractable then it will remain impossible to ever prove a causal link between the explanation system and the algorithm itself. This is not an insurmountable problem, by its nature all machine learning deals with correlations, but it needs to be understood that using black box analysis techniques is not enough to guarantee that a system is fair unless the entire problem domain is exhaustively searched. For any model big enough to require auditing this will be impossible.

Discussion

The point that becomes clear as we look at the realities surrounding transparency in machine learning is that exclusively pursuing understandable and/or open source algorithms is infeasible. When reviewing even a moderately-sized code base, it quickly becomes apparent that issues of transparency and interpretability cannot be resolved simply by making computer code available [8]. With a caveat for certain contexts, we need to be able to deal with algorithms that are not inherently transparent.

Put another way, industry players are incentivised to use the machine learning techniques that are best for pro ts, a decision which almost always favours e cacy over interpretability. Given this, we need to consider techniques that can be applied to methods where the raw form of the model de es our understanding, such as neural networks.

The position I advocate for here is not that we should give up completely on pursuing transparency, but that we need to be clearer about what we are seeking. By failing to di erentiate between process and outcome transparency we run the risk of intractable algorithms being used as an excuse for opaque and potentially unfair decision making.

At the same time, it is important to understand the epistemological implications that come from using correlation-based methods to provide transparency. However, this is already something that is being dealt with when it comes to algorithmic decisions themselves. If the rest of the community can tackle the misguided use of algorithmic `evidence', then it is surely also possible to do the same with transparency.

Ultimately it is up to us to decide in each case whether the correlation-focused evidence we can generate about an algorithm is su cient to draw conclusions about its fairness or unfairness. It is helpful to frame the question in the context of the alternative, which is human-led decision making. It is no secret that decisions made by people can occasionally be opaque and prone to bias [8], and using this human baseline as a null hypothesis reminds us that the goal of our quest for transparency should be for machines exceed our own capabilities, not to obtain perfection.

A realistic approach would be to use both types of technique (white and black box) in tandem, analysing the inner workings of simpler components where possible and utilising second hand explanations and analysis otherwise. We should remember that transparency can appear as a panacea for ethical issues arising from new technologies, and that the case of machine learning is unlikely to be any di erent [9]. That it is di cult to analyse the inner workings of particular techniques will not slow or prevent their uptake, and it is increasingly clear that there is a public and regulatory appetite for more accountable machine learning systems. Therefore, going forward we need to be focussed on the attainable if we are to e ectively hold algorithm developers and their algorithms to account. 6. Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening the black box: Automated decisions and the GDPR. (2017) 7. Larson, J., Mattu, S., Kirchner, L., Angwin, J.: How we analyzed the COMPAS recidivism algorithm. ProPublica (5 2016) (2016) 8. The Royal Society: Machine learning. Technical report, The Royal Society (2017) 9. Mittelstadt, B.D., Allo, P., Taddeo, M., Wachter, S., Floridi, L.: The ethics of algorithms: Mapping the debate. Big Data & Society 3(2) (2016)

1. Ananny , M. , Crawford , K. : Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability. new media & society ( 2016 )

2. Flyverbom , M. : Transparency: Mediation and the management of visibilities . International Journal of Communication 10 ( 1 ) ( 2016 ) 110 { 122

3. Chouldechova , A. : Fair prediction with disparate impact: A study of bias in recidivism prediction instruments . Big data 5(2) ( 2017 ) 153 { 163

4. Grgic-Hlaca , N. , Zafar , M.B. , Gummadi , K.P. , Weller , A. : The case for process fairness in learning: Feature selection for fair decision making . In: NIPS Symposium on Machine Learning and the Law . ( 2016 )

5. Ribeiro , M.T. , Singh , S. , Guestrin , C. : Why should I trust you?: Explaining the predictions of any classi er . In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , ACM ( 2016 ) 1135 { 1144