<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Winterthur, Switzerland
* Corresponding author.
$ christoph.kern@stat.uni-muenchen.de (C. Kern)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>When Small Decisions Have Big Impact: Fairness Implications of Algorithmic Profiling Schemes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christoph Kern</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ruben L. Bach</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hannah Mautner</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frauke Kreuter</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Statistics, LMU Munich</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Joint Program in Survey Methodology, University of Maryland</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Mannheim Centre for European Social Research, University of Mannheim</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>dmTECH</institution>
          ,
          <addr-line>Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Algorithmic profiling is increasingly used in the public sector with the hope to allocate limited public resources more efectively and objectively. One example is the prediction-based profiling of job seekers to guide the allocation of support measures by public employment services. However, empirical evaluations of unintended discrimination and fairness concerns are rare in this context. We systematically compare and evaluate statistical models for predicting job seekers' risk of becoming long-term unemployed with respect to subgroup prediction performance, fairness metrics, and vulnerabilities to data analysis decisions using large-scale German administrative data. We show that despite achieving high prediction performance on average, prolfiing models can be considerably less accurate for vulnerable social subgroups and that diferent classification policies can have very diferent fairness implications.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Algorithmic Fairness</kwd>
        <kwd>Modeling Decisions</kwd>
        <kwd>Statistical Profiling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Motivation</title>
      <p>
        The field of fairness in machine learning (fairML) has made considerable progress in proposing
fairness notions and metrics to assess biases of prediction models [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. As the development
of fairML methodology is often centered around a limited number of (U.S.-based) benchmark
data sets [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], their systematic application in real-world scenarios, however, lags behind. This is
particularly the case for high-stakes ADM applications in the public sector as agencies may
not disclose detailed documentation of their profiling models and data access is restricted.
Nonetheless, ADM approaches such as the AMAS model to classify job seekers in Austria [4]
have received considerable public attention due to concerns of algorithmic biases. Following
preliminary work on fairness implications of algorithmic profiling of job seekers [ 5, 6], we set out
to conduct a systematic fairness evaluation of profiling models using real-world administrative
data with labor market histories of over 300,000 German job seekers.
      </p>
      <p>Facing limited resources, many public employment services (PES) apply profiling to eficiently
prevent long-term unemployment (LTU) [7, 8]. Profiling is used at entry into unemployment
such that a PES caseworker can intervene early on and, e.g., support individuals at risk of
LTU in resuming work through targeted support programs. Implementing an algorithmic
profiling system to target job seekers in practice involves a number of critical design decisions,
however. Questions that need to be answered include, for example, what type of prediction
method should be applied? Which type of information should be used for model training? How
should resources be allocated based on a prediction model’s outputs? Eventually, such decisions
can substantially afect the extent to which diferent societal groups are targeted by support
programs. This especially includes the risk of perpetuating discrimination against historically
disadvantaged groups, as debated in the context of the AMAS model [4].</p>
      <p>Against this background, we compare and evaluate algorithmic profiling models for predicting
job seekers’ risk of becoming long-term unemployed with respect to (subgroup) prediction
performance, fairness metrics, and vulnerabilities to data analysis decisions in this study.
Focusing on Germany as a use case, we evaluate profiling models by utilizing administrative data
on job seekers’ employment histories that are routinely collected by German public employment
services. Our contribution to the literature on algorithmic profiling and fairness in profiling is
twofold: (1) We conduct a systematic fairness auditing of diferent prediction models and report
on the implications of implementing algorithmic profiling of job seekers in a European use case
under realistic conditions. (2) We evaluate fairness implications of data analysis decisions such
as using diferent classification thresholds and training data histories. This analysis shows how
modeling decisions along the prediction pipeline can have group-specific downstream efects
with a focus on the eventual allocation of support measures.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methods and Results</title>
      <p>We use regression and tree ensemble techniques to build profiling models. For each technique,
we train multiple sets of prediction models that difer in the time frame and features that are
used for model training. For each model, three classification policies for prioritizing job seekers
are implemented that focus on very high, high and medium predicted risks of LTU. Next to
comparing the profiling models with respect to group-specific prediction performance, we
study fairness implications of the models’ classifications based on (conditional) statistical parity
diference, false negative rate diference and consistency in two evaluation data sets.</p>
      <p>We focus on four groups of job seekers: Female, non-German (i.e., foreign-born), female
non-German and male non-German individuals. Numerous studies have shown that women
and individuals with a migration background are disproportionately afected by unemployment
and have lower job prospects [for Germany, see 9, 10, 11]. There is consistent experimental
evidence that part of these diferences can be attributed to statistical (stereotyping based on
assumed group averages) and taste-based (prejudice against minority groups) discrimination in
hiring decisions [12]. Our fairness evaluation therefore aims to study whether discrimination
against these groups would be learned and eventually perpetuated or mitigated under a given
algorithmic profiling scheme.</p>
      <p>Our results show that applying a standard machine learning pipeline to administrative labor
market data can have detrimental consequences for the individuals that would be afected
by the models’ predictions. While our profiling models achieve good overall performance
scores that are comparable with results reported in other countries, strong diferences in
prediction performance across groups emerge. While the models perform similarly well for
female job seekers, predictions are less accurate for foreign-born job seekers. This is particularly
troubling given the history of discrimination on the labor market based on ethnicity. The drop
in performance is consistent across model types, feature sets and training histories and clearly
visible for both evaluation data sets.</p>
      <p>In the light of group-specific prediction error, choosing between diferent classification
thresholds has considerable fairness implications. Focusing on statistical parity, we observe
group diferences in the proportions of unemployment episodes that are predicted as LTU that
exceed true diferences in base rates and are highly sensitive to the classification threshold.
Foreign-born (non-German) job seekers may have a higher or lower chance of being eligible
for support measures than German job seekers, depending on whether high or medium risk
individuals would be targeted by PES. Turning to false negative rates, it becomes evident that the
observed parity diferences can in part be attributed to systematic prediction error. Compared
to German job seekers, true LTU episodes of foreign-born job seekers are often not correctly
detected by the profiling models under high risk classification policies. The opposite holds true
under a medium risk policy.</p>
      <p>We highlight that diferent thresholds do not only imply diferent precision-recall trade-ofs,
but also diferent amplifications of group-specific biases. That is, the allocation of resources based
on predictions may not only be diferently (in)eficient, but also discriminatory against social
groups to diferent degrees. As structural diferences on the labor market are (over)incorporated
into profiling models, their predictions can be used to either mitigate or reinforce group
diferences, depending on the choice of the intervention regime. Against this background, awareness
of the learned group-specific patterns and errors is essential for guiding informed discussions
between developers, policy makers and PES stakeholders.</p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <p>Christoph Kern’s and Ruben Bach’s work was supported by the Baden-Württemberg Stiftung
(grant “FairADM – Fairness in Algorithmic Decision Making” to Ruben Bach, Christoph Kern
and Frauke Kreuter). Hannah Mautner worked on the project while she was a graduate student
at the Institute for Employment Research (IAB) in Nuremberg, Germany.
[4] J. Holl, G. Kernbeiß, M. Wagner-Pinter, Das AMS-Arbeitsmarktchancen-modell,
2018. https://ams-forschungsnetzwerk.at/downloadpub/arbeitsmarktchancen_methode_
%20dokumentation.pdf.
[5] S. Desiere, L. Struyven, Using artificial intelligence to classify jobseekers: The
accuracy-equity trade-of, Journal of Social Policy 50 (2021) 367–385. doi: 10.1017/
S0047279420000203.
[6] D. Allhutter, F. Cech, F. Fischer, G. Grill, A. Mager, Algorithmic profiling of job seekers in
Austria: How austerity politics are made efective, Frontiers in Big Data 3 (2020). URL:
https://www.frontiersin.org/article/10.3389/fdata.2020.00005/full. doi:10.3389/fdata.
2020.00005.
[7] A. Loxha, M. Morgandi, Profiling the unemployed: a review of OECD experiences and
implications for emerging economies, Social Protection and labor discussion paper SP
1424 (2014).
[8] J. Körtner, G. Bonoli, Predictive algorithms in the delivery of public employment services,
https://osf.io/j7r8y/download, 2021. Accessed December 27, 2022.
[9] I. Kogan, New immigrants—old disadvantage patterns? labour market integration of recent
immigrants into germany, International Migration 49 (2011) 91–117.
[10] M. Arntz, R. A. Wilke, Unemployment duration in germany: individual and regional
determinants of local job finding, migration and subsidized employment, Regional Studies
43 (2009) 43–61.
[11] M. Jacob, C. Kleinert, Marriage, gender, and class: The efects of partner resources on
unemployment exit in germany, Social Forces 92 (2014) 839–871.
[12] D. Neumark, Experimental research on labor market discrimination, Journal of Economic
Literature 56 (2018) 799–866. doi:10.1257/jel.20161309.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          , E. Potash,
          <string-name>
            <given-names>S.</given-names>
            <surname>Barocas</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. D'Amour</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Lum</surname>
          </string-name>
          , Algorithmic fairness: Choices, assumptions, and definitions,
          <source>Annual Review of Statistics and Its Application</source>
          <volume>8</volume>
          (
          <year>2021</year>
          )
          <fpage>141</fpage>
          -
          <lpage>163</lpage>
          . doi:
          <volume>10</volume>
          .1146/annurev-statistics-
          <volume>042720</volume>
          -125902.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mehrabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Morstatter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Saxena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Galstyan</surname>
          </string-name>
          ,
          <article-title>A survey on bias and fairness in machine learning</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>54</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1145/3457607.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fabris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Messina</surname>
          </string-name>
          , G. Silvello,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Susto</surname>
          </string-name>
          ,
          <article-title>Algorithmic fairness datasets: the story so far</article-title>
          ,
          <source>Data Mining and Knowledge Discovery</source>
          <volume>36</volume>
          (
          <year>2022</year>
          )
          <fpage>2074</fpage>
          -
          <lpage>2152</lpage>
          . doi:
          <volume>10</volume>
          .1007/ s10618-022-00854-z.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>