<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Analysis to Evaluate the Influence of Model Design Decisions on Algorithmic Fairness</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jan Simson</string-name>
          <email>jan.simson@lmu.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Florian Pfisterer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christoph Kern</string-name>
          <email>christoph.kern@lmu.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>EWAF'23: European Workshop on Algorithmic Fairness</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Statistics, Ludwig-Maximilian University of München</institution>
          ,
          <addr-line>Ludwigstr. 33, 80809 München</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A vast number of systems in Europe and beyond currently use algorithmic decision making (ADM) to (partially) automate decisions that have previously been done by humans. When designed well, these systems promise both more accurate and more eficient decisions all the while saving large amounts of resources and freeing up human time. When ADM systems are not designed well, however, they can lead to unfair algorithms which discriminate against parts of the population under the guise of objectivity and legitimacy. Many examples of both fair and helpful as well as discriminatory algorithms exist in the wild to date. The group they fall into typically depends on the decisions made during their design. It is therefore clearly important to properly understand the decisions that go into the design of ADM systems and how these decisions afect the fairness of the resulting system. To study this, we introduce the method of multiverse analysis for algorithmic fairness.</p>
      </abstract>
      <kwd-group>
        <kwd>Algorithmic</kwd>
        <kwd>learning</kwd>
        <kwd>multiverse analysis</kwd>
        <kwd>algorithmic fairness</kwd>
        <kwd>automated decision making</kwd>
        <kwd>robustness</kwd>
        <kwd>reliable machine</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        (C. Kern)
1. Extended Abstract
Across the world, more and more decisions are being made with the support of algorithms, so
called algorithmic decision making (ADM). Examples of such systems can be found in finance,
the labour market, criminal justice system and beyond. While these systems are very promising
when designed well, raising hopes of more accurate, just and fair decisions, their impact can
be quite the opposite when designed wrongly. Ample examples exist of unfair ADM systems
discriminating against people in the wild, with the Dutch childcare benefits being an especially
prominent and recent example [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        While these fairness problems are often due to biases in the underlying data, gathering
perfectly fair data is usually not an option, so the only way of making sure that the algorithm
doesn’t reinforce these biases is via the design of the ADM system. With the promise and peril
of ADM systems depending so much on their proper design, it is of clear importance to properly
understand the decisions that go into their design and how these decisions afect algorithmic
fairness. To enable this we introduce the method of multiverse analysis for algorithmic fairness.
Multiverse analyses were introduced in Psychology [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to improve reproducibility and to combat
p-hacking and cherry-picking of results. This makes them particularly useful to assess the
susceptibility of ADM systems with respect to their fairness implications.
      </p>
      <p>In the proposed adaptation of multiverse analysis for ADM one starts by making the many
implicit decisions, also referred to as researcher degrees of freedom, during the design of an
ADM system explicit. One of the diferences in the present analysis compared to a classic
multiverse analysis is, that we will evaluate machine learning systems in the end, whereas
classical multiverse analyses will typically culminate in a null-hypothesis-significance-test
(NHST). While many of the decision points apply to any machine-learning system (e.g. choice
of algorithm, how to preprocess certain variables, cross-validation splits), many of them are
also domain specific (e.g. coding of certain variables, how to set classification thresholds,
how fairness is operationalized). While we vary certain decisions related to the training of
machine learning models, our focus will not be on hyperparameter-selection or optimization. In
particular we focus on decisions made during the pre-processing of data and in the translation
of predictions into possible decisions. Using all possible unique combinations of these decisions
we create a grid of possible universes of decisions. For each of these universes, we compute the
fairness of the ADM system and collect it as a data point. The resulting dataset of possible
decisions and resulting fairness is treated as our source data for further analysis where we
evaluate how individual decisions relate back to fairness.</p>
      <p>
        Existing articles in the literature have focused on specific pre-processing or modeling decisions
in isolation, such as the influence of diferent imputation methods [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or of model architecture
and hyperparameters [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] on fairness in diferent contexts. Multiverse analyses have also been
used to try and model the performance distribution in hyperparameter-space [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], yet not fairness.
Besides multiverse analyses a highly related type of analysis emerged around the same time
in the specification curve analysis [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], yet multiverse analysis seems to be the more common
approach in the literature to date.
      </p>
      <p>
        Here we present a generalizable approach of using multiverse analysis to estimate the efect
of decisions during the design of an ADM system on its algorithmic fairness. We demonstrate
the feasibility of this approach using a case study of predicting public health coverage in US
census data. We use the ACSPublicCoverage benchmark problem [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] of predicting public health
insurance coverage, as other well-established examples have been shown to have non-trivial
quality issues [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ].
      </p>
      <p>We will present preliminary results from the case study, demonstrating how plausible and
seemingly small design decisions of the ADM system can have significant efects on its
algorithmic fairness. We would welcome the discussion of other use cases and possible case studies,
especially within the European context.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Amnesty</given-names>
            <surname>International</surname>
          </string-name>
          , Xenophobic Machines,
          <source>Technical Report</source>
          ,
          <year>2021</year>
          . URL: https://www. amnesty.org/en/wp-content/uploads/2021/10/EUR3546862021ENGLISH.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Steegen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tuerlinckx</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelman</surname>
          </string-name>
          , W. Vanpaemel,
          <article-title>Increasing transparency through a multiverse analysis</article-title>
          ,
          <source>Perspectives on Psychological Science</source>
          <volume>11</volume>
          (
          <year>2016</year>
          )
          <fpage>702</fpage>
          -
          <lpage>712</lpage>
          . URL: https: //doi.org/10.1177/1745691616658637. doi:
          <volume>10</volume>
          .1177/1745691616658637, publisher: SAGE Publications Inc.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Caton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malisetty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Haas</surname>
          </string-name>
          ,
          <article-title>Impact of imputation strategies on fairness in machine learning</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          <volume>74</volume>
          (
          <year>2022</year>
          ). URL: https://doi.org/10.1613/ jair.1.13197. doi:
          <volume>10</volume>
          .1613/jair.1.13197.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sukthanker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dooley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Dickerson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>White</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goldblum</surname>
          </string-name>
          ,
          <article-title>On the importance of architectures and hyperparameters for fairness in face recognition (</article-title>
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv.2210.09943.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Bell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. P.</given-names>
            <surname>Kampman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dodge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Lawrence</surname>
          </string-name>
          ,
          <article-title>Modeling the machine learning multiverse (</article-title>
          <year>2022</year>
          ). URL: https://arxiv.org/abs/2206.05985. doi:
          <volume>10</volume>
          .48550/ARXIV.2206.05985.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>U.</given-names>
            <surname>Simonsohn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Simmons</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Nelson</surname>
          </string-name>
          ,
          <article-title>Specification curve analysis</article-title>
          ,
          <source>Nature Human Behaviour</source>
          <volume>4</volume>
          (
          <year>2020</year>
          )
          <fpage>1208</fpage>
          -
          <lpage>1214</lpage>
          . URL: https://www.nature.com/articles/s41562-020-0912-z. doi:
          <volume>10</volume>
          .1038/s41562-020-0912-z, number: 11 Publisher: Nature Publishing Group.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          , Retiring adult:
          <article-title>New datasets for fair machine learning (</article-title>
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2108.04884. doi:
          <volume>10</volume>
          .48550/ARXIV.2108.04884.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fabris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Messina</surname>
          </string-name>
          , G. Silvello,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Susto</surname>
          </string-name>
          ,
          <article-title>Algorithmic fairness datasets: the story so far, Data Mining and Knowledge Discovery (</article-title>
          <year>2022</year>
          ). URL: https://doi.org/10.1007/ s10618-022-00854-z. doi:
          <volume>10</volume>
          .1007/s10618-022-00854-z.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zottola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Brubach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Desmarais</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Horowitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Venkatasubramanian</surname>
          </string-name>
          ,
          <article-title>It's compaslicated: The messy relationship between rai datasets and algorithmic fairness benchmarks (</article-title>
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv.2106.05498.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>