-

Analysis to Evaluate the Influence of Model Design Decisions on Algorithmic Fairness

Jan Simson

jan.simson@lmu.de 0 1

Florian Pfisterer

0 1

Christoph Kern

christoph.kern@lmu.de 0 1 0 EWAF'23: European Workshop on Algorithmic Fairness 1 Institute of Statistics, Ludwig-Maximilian University of München , Ludwigstr. 33, 80809 München , Germany

A vast number of systems in Europe and beyond currently use algorithmic decision making (ADM) to (partially) automate decisions that have previously been done by humans. When designed well, these systems promise both more accurate and more eficient decisions all the while saving large amounts of resources and freeing up human time. When ADM systems are not designed well, however, they can lead to unfair algorithms which discriminate against parts of the population under the guise of objectivity and legitimacy. Many examples of both fair and helpful as well as discriminatory algorithms exist in the wild to date. The group they fall into typically depends on the decisions made during their design. It is therefore clearly important to properly understand the decisions that go into the design of ADM systems and how these decisions afect the fairness of the resulting system. To study this, we introduce the method of multiverse analysis for algorithmic fairness.

Algorithmic learning multiverse analysis algorithmic fairness automated decision making robustness reliable machine

(C. Kern) 1. Extended Abstract Across the world, more and more decisions are being made with the support of algorithms, so called algorithmic decision making (ADM). Examples of such systems can be found in finance, the labour market, criminal justice system and beyond. While these systems are very promising when designed well, raising hopes of more accurate, just and fair decisions, their impact can be quite the opposite when designed wrongly. Ample examples exist of unfair ADM systems discriminating against people in the wild, with the Dutch childcare benefits being an especially prominent and recent example [ 1 ].

While these fairness problems are often due to biases in the underlying data, gathering perfectly fair data is usually not an option, so the only way of making sure that the algorithm doesn’t reinforce these biases is via the design of the ADM system. With the promise and peril of ADM systems depending so much on their proper design, it is of clear importance to properly understand the decisions that go into their design and how these decisions afect algorithmic fairness. To enable this we introduce the method of multiverse analysis for algorithmic fairness. Multiverse analyses were introduced in Psychology [ 2 ] to improve reproducibility and to combat p-hacking and cherry-picking of results. This makes them particularly useful to assess the susceptibility of ADM systems with respect to their fairness implications.

In the proposed adaptation of multiverse analysis for ADM one starts by making the many implicit decisions, also referred to as researcher degrees of freedom, during the design of an ADM system explicit. One of the diferences in the present analysis compared to a classic multiverse analysis is, that we will evaluate machine learning systems in the end, whereas classical multiverse analyses will typically culminate in a null-hypothesis-significance-test (NHST). While many of the decision points apply to any machine-learning system (e.g. choice of algorithm, how to preprocess certain variables, cross-validation splits), many of them are also domain specific (e.g. coding of certain variables, how to set classification thresholds, how fairness is operationalized). While we vary certain decisions related to the training of machine learning models, our focus will not be on hyperparameter-selection or optimization. In particular we focus on decisions made during the pre-processing of data and in the translation of predictions into possible decisions. Using all possible unique combinations of these decisions we create a grid of possible universes of decisions. For each of these universes, we compute the fairness of the ADM system and collect it as a data point. The resulting dataset of possible decisions and resulting fairness is treated as our source data for further analysis where we evaluate how individual decisions relate back to fairness.

Existing articles in the literature have focused on specific pre-processing or modeling decisions in isolation, such as the influence of diferent imputation methods [ 3 ] or of model architecture and hyperparameters [ 4 ] on fairness in diferent contexts. Multiverse analyses have also been used to try and model the performance distribution in hyperparameter-space [ 5 ], yet not fairness. Besides multiverse analyses a highly related type of analysis emerged around the same time in the specification curve analysis [ 6 ], yet multiverse analysis seems to be the more common approach in the literature to date.

Here we present a generalizable approach of using multiverse analysis to estimate the efect of decisions during the design of an ADM system on its algorithmic fairness. We demonstrate the feasibility of this approach using a case study of predicting public health coverage in US census data. We use the ACSPublicCoverage benchmark problem [ 7 ] of predicting public health insurance coverage, as other well-established examples have been shown to have non-trivial quality issues [ 7, 8, 9 ].

We will present preliminary results from the case study, demonstrating how plausible and seemingly small design decisions of the ADM system can have significant efects on its algorithmic fairness. We would welcome the discussion of other use cases and possible case studies, especially within the European context.

[1]

Amnesty

International , Xenophobic Machines, Technical Report , 2021 . URL: https://www. amnesty.org/en/wp-content/uploads/2021/10/EUR3546862021ENGLISH.pdf.

[2]

Steegen ,

Tuerlinckx ,

Gelman , W. Vanpaemel, Increasing transparency through a multiverse analysis , Perspectives on Psychological Science 11 ( 2016 ) 702 - 712 . URL: https: //doi.org/10.1177/1745691616658637. doi: 10 .1177/1745691616658637, publisher: SAGE Publications Inc.

[3]

Caton ,

Malisetty ,

Haas , Impact of imputation strategies on fairness in machine learning , Journal of Artificial Intelligence Research 74 ( 2022 ). URL: https://doi.org/10.1613/ jair.1.13197. doi: 10 .1613/jair.1.13197.

[4]

Sukthanker ,

Dooley ,

J. P.

Dickerson ,

White ,

Hutter ,

Goldblum , On the importance of architectures and hyperparameters for fairness in face recognition ( 2022 ). doi: 10 .48550/arXiv.2210.09943.

[5]

S. J.

Bell ,

O. P.

Kampman ,

Dodge ,

N. D.

Lawrence , Modeling the machine learning multiverse ( 2022 ). URL: https://arxiv.org/abs/2206.05985. doi: 10 .48550/ARXIV.2206.05985.

[6]

Simonsohn ,

J. P.

Simmons ,

L. D.

Nelson , Specification curve analysis , Nature Human Behaviour 4 ( 2020 ) 1208 - 1214 . URL: https://www.nature.com/articles/s41562-020-0912-z. doi: 10 .1038/s41562-020-0912-z, number: 11 Publisher: Nature Publishing Group.

[7]

Ding ,

Hardt ,

Miller ,

Schmidt , Retiring adult: New datasets for fair machine learning ( 2021 ). URL: https://arxiv.org/abs/2108.04884. doi: 10 .48550/ARXIV.2108.04884.

[8]

Fabris ,

Messina , G. Silvello,

G. A.

Susto , Algorithmic fairness datasets: the story so far, Data Mining and Knowledge Discovery ( 2022 ). URL: https://doi.org/10.1007/ s10618-022-00854-z. doi: 10 .1007/s10618-022-00854-z.

[9]

Bao ,

Zhou ,

Zottola ,

Brubach ,

Desmarais ,

Horowitz ,

Lum ,

Venkatasubramanian , It's compaslicated: The messy relationship between rai datasets and algorithmic fairness benchmarks ( 2022 ). doi: 10 .48550/arXiv.2106.05498.