<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Lessons Learned from Problem Gambling Classification: Indirect Discrimination and Algorithmic Fairness</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christian Percy</string-name>
          <email>chris@cspres.co.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Artur d'Avila Garcez</string-name>
          <email>a.garcez@city.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simo Dragicevic</string-name>
          <email>Simo.Dragicevic@playtech.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sanjoy Sarkar</string-name>
          <email>Sanjoy.Sarkar@playtech.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>City, University of London</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Playtech Plc</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>Problem gambling is a public health issue with approximately 300,000 individuals suffering harm in England and 1.5 million at risk. Many gambling operators rely on Machine Learning (ML) algorithms to identify online players at risk. Models are typically gender-blind (gender not included as an input), reflecting the sensitivity of protected characteristic data. However, some stakeholders worry that gender continues to influence the model via other variables (indirect identification) and worry about differential model performance by gender (algorithmic fairness). In this paper, we investigate these concerns using real-world data from 22,500 players across two gambling operators. We propose a method for testing the indirect identification of a protected variable. We identify near-zero levels of indirect identification of gender. Regarding algorithmic fairness, a slight pro-female bias is found in the first ML model and a moderate pro-female bias in found in the second ML model. The challenge is to mitigate such bias without the intrusion of compulsory gender data collection. We propose a new approach which uses gender data for training only, constructing separate models for each gender and combining trained models into an ensemble that does not require gender data once deployed. Since harm identification adopts a precautionary principle, if any one model indicates potential harm, the player is flagged as at risk. This approach is shown to reduce the difference per gender in the True Positive Rate (TPR) of the models from 7.2% points to 4.0% points. This is shown to be better than what can be achieved by simply altering the models' classification thresholds. Both the indirect identification and the algorithmic fairness approaches are part of a wider framework and taxonomy being proposed towards the ethical use of Artificial Intelligence (AI) in the gambling industry.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Problem gambling is a public health issue with
approximately 300,000 individuals suffering harm in England and
1.5 million at risk. Many gambling operators rely on
Machine Learning (ML) algorithms to identify online players
at risk. Models are typically gender-blind (gender not
included as an input), reflecting the sensitivity of protected</p>
      <p>The authors would like to thank Charmaine Hogan and
Lauren Iannarone at Playtech for their constructive challenge and
feedback.</p>
      <p>Copyright © 2020 for this paper by its authors. Use permitted
under Creative Commons License Attribution 4.0 International
(CC BY 4.0).
variables. However, protected variables may continue to
influence an ML model outcome by proxy (via other variables)
in ways that can make the identification of bias even harder,
and make the bias correction towards algorithmic fairness
impossible.</p>
      <p>In this paper, we report the lessons learned from work by
Playtech plc, a provider of B2B and B2C gambling software
services, investigating the role of gender in Playtech’s
gambling harm identification algorithms. The paper connects
cross-sector concerns around algorithmic fairness with the
specific public health work on problem gambling mitigation.
In online gambling, there is a need to balance harm
protection against e-consumers’ desires and rights to privacy, with
a default that minimises the use of sensitive data. As such,
online gambling is a relevant and challenging test case for
exploring issues around algorithmic bias. The main
contributions of this paper are:</p>
      <p>A technique for identifying when indirect discrimination
exists in ML models where a potentially problematic
variable has been dropped (‘model-matched indirect
identification’).</p>
      <p>An approach for incorporating insight based on protected
variables into the classification algorithms without
requiring such data to be collected compulsorily from
individuals (‘blind-separate models’).</p>
      <p>The evaluation of the above technique and approach
using gender in a real-world use case in the gambling
sector, which reveals near-zero indirect identification and yet
the potential to improve algorithmic fairness, as defined
by a reduction in the difference between the model’s true
positive rate for men and women.</p>
      <p>We see this work as part of a broader project, both within
the gambling sector and on the ethics of AI more
generally. In the gambling sector, this paper offers the basis for
the formation of a sector-wide working group to study
algorithmic fairness and to consider how to address multiple
objectives, such as overall true positive rates versus true
negative rates, model performance disparities by key customer
groups, model aggregation policies, parsimony and
transferability.</p>
      <p>On the ethics of AI more generally, a framework and
taxonomy is being developed, which includes the concerns of
algorithmic fairness and bias identification and other use
cases to improve fairness. Our approach recognises that the
diversity and complexity of ML models and data sets, use
cases and stakeholders’ priorities are such that no single
technique can be recommended universally, but also certain
principles and a framework should be sought to be
developed and applied specifically and across the gambling
industry given the importance of the issues being discussed1.</p>
      <p>The remainder of the paper is organised as follows: in
Section 2, we position the paper in the context of the related
work; in Section 3, we describe the proposed technique and
fairness approach within the problem gambling use case; in
Section 4, we evaluate the influence of gender on indirect
identification and options to enhance algorithmic fairness;
in Section 5 we conclude and discuss directions for future
work.</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <sec id="sec-2-1">
        <title>Algorithms for problem gambling harm reduction</title>
        <p>
          Problem gambling is a public health issue with 300,000
estimated individuals in England self-reporting as experiencing
harm
          <xref ref-type="bibr" rid="ref8">(0.7% prevalence, Gambling Commission, 2018)</xref>
          and
a further 1.5 million thought to be at risk. Sector
organisations take a range of steps to mitigate, identify and intervene
to reduce gambling-related harm, including the development
of Machine Learning algorithms to identify players at risk
of harm. Some examples of early interventions that
operators might take, having identified someone above a certain
threshold of risk or possible harm, include tailored
responsible gambling messages or reduced marketing activity.
        </p>
        <p>
          Playtech plc has an in-house suite of ML algorithms
trained to identify players with similar characteristics as
players who have self-identified as experiencing harm
see
          <xref ref-type="bibr" rid="ref15">Percy et al (2016</xref>
          ) for background on these supervised
ML models. Several of Playtech’s operations fall under
the purview of the European Union’s General Data
Protection Regulations
          <xref ref-type="bibr" rid="ref8">(GDPR; officially adopted in April 2016)</xref>
          .
Adopting a precautionary interpretation of GDPR principles
of data minimisation and protections for special category
data, the default decision on algorithms implemented by
Playtech was not to incorporate the player’s gender,
achieving typical cross-validation AUROC rates of 95%+ on
balanced data sets by using behaviour and transaction data
alone.
        </p>
        <p>
          However, the adoption of gender-blind algorithms is
under review. Regulatory advice from the UK Gambling
Commission in 2018 suggests that demographic data can be used
as part of satisfying regulatory requirements2. There is also
1We seek to respond to calls by AI researchers, such as
          <xref ref-type="bibr" rid="ref9">Goodman (2016)</xref>
          of the Oxford Internet Institute, to develop frameworks
for so-called ‘algorithm audits’, and sector bodies, such as the EC’s
Advisory Committee on Equal Opportunities for Women and Men
(2020) which recommends monitoring of algorithms for
discrimination and further work, calling for work to develop and share good
practices. The UK Government’s CDEI further describes a lack of
clear regulatory standards and quality assurance (e.g. around
algorithmic bias) as one of the five key trust-related barriers holding
back AI
          <xref ref-type="bibr" rid="ref2">(CDEI, 2020:4)</xref>
          .
        </p>
        <p>
          2https://www.gamblingcommission.gov.uk/for-gamblingincreasing awareness of the role of gender in problem
gambling. A UK gambling charity has reported that the rate of
problem gambling amongst women increased by a third in
the preceding five years to 2019, a faster rate of increase
compared to men in the same period (15%)3. Therefore, as
the industry becomes increasingly reliant on ML algorithms
to detect problematic play, the changes in demographic
profiles of problem gamblers raises questions on the suitability
of historic gender-blind data sets, their role in training
models, and their potential impact on model efficacy. This can
be seen as part of a broader trend arguing that the traditional
gender-blind approach in gambling research is inappropriate
(Baggio et al, 2018) and implicitly male-biased to the
detriment of female gamblers
          <xref ref-type="bibr" rid="ref12 ref20">(McCarthy et al, 2019; Venne et al,
2019)</xref>
          .
        </p>
        <p>
          The research reported in this paper was initiated to
address two possible stakeholder concerns that point towards
opposite modelling responses. The first is that gender
remains an (unwanted) influence on the gender-blind model
via its indirect associations with other variables (indirect
discrimination). Here, the goal is to remove as much of this
influence as possible. The second concern is whether there is
a missed opportunity for using gender data in a way that
enhances algorithmic performance and fairness by identifying
and mitigating differences in model performance by gender
group (algorithmic fairness). The first concern is motivated
directly by an awareness of the sensitivity of gender data,
both to consumers and in legislation (GDPR). The second
concern reflects an awareness of average structural
differences between men and women that may be relevant for
predicting gambling risk. For instance, research has related
testosterone levels to risk-taking and pathological gambling
          <xref ref-type="bibr" rid="ref17">(Stenstrom and Saad, 2011)</xref>
          , identified gendered behavioural
patterns in gambling problems
          <xref ref-type="bibr" rid="ref22">(Wong et al, 2013)</xref>
          , and
observed gendered patterns in the types of online behaviour
that can be addictive
          <xref ref-type="bibr" rid="ref18">(Su et al, 2020)</xref>
          .
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Bias in AI algorithms and mitigation</title>
        <p>Concerns about bias in AI algorithms in relation to
sociodemographic traits have now become widespread. The
second of Google’s seven principles for AI is to avoid creating
or reinforcing unfair bias4. Organisations and researchers are
responding to this concern in different ways, which can be
grouped based on whether they seek to intervene at the input
level, at the model level or at the output level.</p>
        <p>At the input level, one approach adopted, already
discussed, is to exclude the variable corresponding to the
sociodemographic trait in question. For instance, Goldman Sachs
in its operation of Apple Card deliberately avoid collecting
and using data on sensitive characteristics such as gender,
race or age, using this approach to defend against concerns
businesses/Compliance/General-compliance/General-DataProtection-Regulation-GDPR.aspx</p>
        <p>3www.telegraph.co.uk/news/2020/01/15/female-gamblingaddicts-growing-faster-men-amid-rise-online (accessed August
2020)</p>
        <p>
          4https://ai.google/principles/ (accessed August 2020)
(published 2018 at
https://www.blog.google/technology/ai/aiprinciples/)
of gender bias5. However, this practice typically proves an
insufficient defence in the face of evidence of gender bias
in the outcomes - New York’s Department of Financial
Services opened an investigation into Apple Card in late 2019
given different credit limits provided to men and women
despite apparently similar financial circumstances6. In another
example, analysis by
          <xref ref-type="bibr" rid="ref14">Obermeyer et al (2019</xref>
          ) revealed that
the use, in a widely-used commercial algorithm, of health
costs as a proxy for healthcare needs resulted in anti-Black
racial bias; the authors recommend removing health costs
as an input variable as a proxy for needs. Another approach
is to increase the availability and diversity of training data
relating to the input variable in question, which was part
of Microsoft’s 2018 strategy for reducing the error rate
discrepancy between men and women and between lighter skin
tones and darker skin tones in its image classification tool
Face API7.
        </p>
        <p>
          At the model level, algorithms or their parameters can be
adjusted to reduce the extent to which a model draws on
certain patterns in the input data. One example of this is the
gender-debiasing techniques developed for word embedding
solutions
          <xref ref-type="bibr" rid="ref1">(Bolukbasi et al, 2016)</xref>
          , noting that the authors
describe a mixture of adjusting inputs and model-level
adjustments.
        </p>
        <p>
          At the output level,
          <xref ref-type="bibr" rid="ref13">Moerel (2018)</xref>
          describes LinkedIn’s
recruitment tool as a way of enforcing quotas using the
rankings produced by an algorithm in order to match a
predefined desirable ratio. The tool can subdivide candidates by
gender, rank each candidate within each gender using its
algorithm and then put forward an equal number of men and
women to the hiring manager for consideration.
        </p>
        <p>
          Some of the techniques above have come under
challenge. For instance, the simplistic approach of dropping
socio-demographic input variables (blinding an algorithm)
has come under challenge for inadvertently distracting from
fairness by reducing visibility of the issue, by ignoring
possible proxy variables for socio-demographic traits and by
ignoring opportunities to implement other solutions - see, e.g.
the analysis of US College admissions by
          <xref ref-type="bibr" rid="ref10">Kleinberg et al
(2018</xref>
          ) which argues for data-led proactive intervention at
the output level.
        </p>
        <p>Focusing particularly on identifying output-level bias,
new tools are being developed to identify whether ML
algorithms are biased in terms of having systematically worse
performance (e.g. lower accuracy) for particular groups.
Facebook announced the testing of an internal tool to do this
in 2018, Fairness Flow, which was discussed further in its
July 2020 Civil Rights Report as part of efforts to tackle
algorithmic discrimination8. Google’s open-source What If
5Reported by Kevin Peachey for BBC News on 18
November 2019, Sexist and biased? How credit firms make decisions
https://www.bbc.co.uk/news/business-50432634</p>
        <p>611 Nov 2019, Apple’s sexist credit card investigated by
US regulator. BBC News
https://www.bbc.co.uk/news/business50365609</p>
        <p>7https://blogs.microsoft.com/ai/gender-skin-tone-facialrecognition-improvement/</p>
        <p>
          8https://about.fb.com/wp-content/uploads/2020/07/CivilRights-Audit-Final-Report.pdf
Tool in TensorBoard launched in 2018 to help ML
developers to visualise differences in classification from key
variables, identify borderline cases for particular classifications
and explore the impact of counterfactuals as part of
assessing whether an inappropriate social bias might have been
absorbed from the training data or otherwise reflected in
the model9. The use of counterfactuals for explainable AI,
e.g.
          <xref ref-type="bibr" rid="ref21">White and Garcez (2020)</xref>
          , has become increasingly
associated with the goals of fairness in ML. Various other
methods addressing fairness which have been proposed
recently, have adopted their own measures of fairness.
Notably,
          <xref ref-type="bibr" rid="ref5">Dwork et al (2012</xref>
          ) introduces a framework for fair
classification by a task-specific metric for maximizing
utility subject to a fairness constraint. Agarwal et al (2018)
proposes a cost-sensitive classifier also in an attempt to model a
specific loss function subject to fairness constraints. Results
are evaluated empirically on a variety of data sets. Choi et
al (2019) focuses on a specific family of classifiers, naive
Bayes, and introduce the notion of a discrimination pattern
alongside an algorithm for mining discrimination patterns in
a naive Bayes classifier. The approach is iterative and seeks
to eliminate such patterns until a fair model is obtained. An
overview of the various notions of fairness can be found
in
          <xref ref-type="bibr" rid="ref23">Zemel et al (2013</xref>
          ) and
          <xref ref-type="bibr" rid="ref5">Dwork et al (2012</xref>
          ). More
comprehensive surveys are available in
          <xref ref-type="bibr" rid="ref6">Friedler et al (2019</xref>
          ) and
          <xref ref-type="bibr" rid="ref7">Mehrabi et al (2019</xref>
          ).
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Problem Gambling Use Case</title>
      <sec id="sec-3-1">
        <title>Data available</title>
        <p>We work with two real-world data sets used to train the
Random Forest harm prediction algorithms currently deployed
by Playtech. The two gambling operators have different
brands (one Bingo-focused and one Slot-Machine-focused),
which will help demonstrate the diversity of circumstances
even in a narrow ML domain.</p>
        <p>The binary classification algorithms use data by players
for whether they voluntarily self-excluded from the
gambling platform during the analysis period, as an
approximate proxy for experiencing harm. Only regular players who
have been live in the platform for at least 1-2 months are
included in the training data sets, given the focus of the
algorithm on regular players. The open source Weka tool was
used to replicate a comparable model to the deployed model
with the same (approx. 40) behavioural input variables. A
Random Forest model was trained on the raw, unbalanced
training data, resulting in an accuracy under max-accuracy
ROC curve for the two trained models within 1%pt of the
deployed models. The two trained models (one for each
operator) that are generated by this process are referred to in
this paper as the baseline models.</p>
        <p>These training data sets are enriched for the purpose of
this study with gender data voluntarily supplied by
players during the sign-up process. The gender variable can take
three values: male (M), female (F) or unspecified/undeclared
(U). Caveats remain with the quality of the available gender
9https://ai.googleblog.com/2018/09/the-what-if-tool-codefree-probing-of.html
data, including possible gender bias in this voluntary
supply of self-identification data as well as possible
simplifications and distortions in having only two explicit categories
for gender for players to select. Table 1 includes the
summary descriptive data.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Bias definition</title>
        <p>We identify three initial areas of analysis where metrics can
usefully be analysed by gender: the gender balance in the
training data, the self-exclusion rates in the training data,
and the performance of the models for separate genders. All
three sets of metrics are identified for reporting purposes, but
only the model performance data is proposed as a metric for
assessing potential problematic bias. Despite
governmentcommissioned population surveys providing more detail by
gender, population-wide surveys cannot be related to the
gender ratios in an individual operator platform, as customer
bases attracted by a particular brand are not representative of
the overall gambling community.</p>
        <p>Focusing on model performance, we are interested in a
model that performs similarly well for each gender in terms
of true positives (as a proxy for spotting those who are likely
to be at risk) and true negatives (reducing any disruption or
false alerts for players unlikely to be at risk). Since
negative examples dominate in all populations and given the
precautionary emphasis on identifying possible harm, the True
Positive Rate (TPR) is the chosen primary performance
metric for comparisons used in this paper. Given that we would
not expect exact equality of performance by gender even in a
perfectly fair algorithm, we also identify a tolerance
threshold by which model performance might be identified as
insufficient in that it should prompt action. For the exploratory
purposes of this paper, we use a 2%pt difference in TPR
performance among gender categories as such a threshold,
noting that such a threshold must ultimately be informed by
stakeholder consensus.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Results</title>
      <sec id="sec-4-1">
        <title>Assessment of the influence of gender on the model (indirect discrimination)</title>
        <p>Since gender is not included in the original algorithm there
is no potential for direct use of gender in the model.
However, gender may still be indirectly identified in the model
via the correlations between other input variables and
gender or other patterns in the data.</p>
        <p>The standard initial investigation of relationships between
variables is the correlation coefficient. For Operator 2, 8 of
the 40 input variables have a correlation coefficient
statistically significant at the 5% Bonferonni-adjusted level or
better for male-reported gender, and 5 of the 40 input variables
have the same correlation in the case of female-reported
gender. However, this pattern is near-trivial by nature, in that the
statistical significance reflects the large sample size rather
than the meaningfulness of the co-variance. The r-squared
from a linear regression using all statistically significant
variables reveals that such variables only explain 0.6% of
the linear variation in the male-reported gender dummy
variable (RMSE of 0.30, RMSE across five-fold cross-validation
varies from 0.30 to 0.31) and 1.1% of the linear variation in
the female-reported gender dummy variable (RMSE of 0.48,
RMSE of 0.48 in each of the five folds too). For Operator
1, there are no such statistically significant variables for the
male-reported gender dummy and only two for the
femalereported gender dummy, accounting for 2.2% of the linear
variation (RMSE of 0.40, varying from 0.39 to 0.41 across
five folds).</p>
        <p>The correlation coefficient only identifies linear
relationships, whereas the model in question is a Random Forest
of depth 10 which is likely to identify non-linear patterns.
While many common polynomial relationships in real-world
gambling data might still be hinted at in a significant linear
relationship, other relevant patterns would not. For instance,
the Random Forest models used in this research were seen
to identify relationships based on common values after the
decimal point in a data set in which linear variation with
gender had been removed by decomposition. Deducting the
average value of a particular variable for each gender
artificially results in zero linear correlation, but can result in such
gender-driven patterns in the values after the decimal point,
which remain exploitable by a Random Forest model.</p>
        <p>Instead of linear correlation coefficients as a generic
technique, we propose identifying the maximum possible level
of indirect identification in a model-dependent manner,
using a model with the same structure and parametrization as
the original baseline model, an approach we call
”modelmatched indirect identification”.</p>
        <p>If the model were linear, with no interaction terms,
bivariate linear correlations reflect the model structure and
would be appropriate to capture possible indirect
identification. In this case, we train a new model using the same ML
method (Random Forests) with the same model parameter
selection as the baseline model and the same set of predictor
variables, but this time using gender as a target
classification variable. The target variable from the baseline model,
self-exclusion, does not appear in this new model.</p>
        <p>The accuracy of this new model is seen as a bound on
how well gender can be indirectly identified in the baseline
model, since the new model is optimised to predict gender
explicitly now, whereas the prediction of gender would only
have been an indirect goal10 of the baseline model, which is
optimised to predict self-exclusion only. By using the same
parametrization, we seek to find a maximum bound for the
given use case (i.e. model + data) and to avoid the
ambiguity of a possibly exponential variety of implicit interaction
terms.</p>
        <p>The gender-classification models produced by this
approach have an out-of-bag (OOB) error11 for Operator 1 of
0.5205 and for operator 2 of 0.4637. Collectively, this
sug10Motivated only insofar as implicitly predicting gender midway
through the model may later help to predict self-exclusion.</p>
        <p>11Metric generated internally by Weka’s Random Forest
algorithm. This is an equivalent to a validation set performance
measured for a fold from cross-validation, in that the RF algorithm
deliberately excludes a set of observations in the construction of each
tree. The OOB error measures the classification error rate for such
excluded observations, taking the majority classification for each
observation that has been excluded from various trees.
gests that there is little indirect identification of gender
beyond a random guess based on the majority class.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Assessment of performance bias in baseline models (algorithmic fairness)</title>
        <p>Table 1 reveals that male players outweigh female players
on the slots-focused brand (1.6x prevalence) and male
players are outweighed on the bingo-focused brand (3.5x
prevalence). In both cases, undeclared gender is the most common
group. Men tend to see higher levels of self-exclusion than
women.</p>
        <p>For Operator 1, there is little clear distinction in model
performance by gender. The model is slightly better, based
on TPR, at identifying women at risk than men, but stays
within the 2%pt tolerance threshold. However, for Operator
2 there is a more marked higher model performance among
female players than male players, with much higher TPR
(+7.2%pts) and slightly higher overall accuracy (+0.8%pts).
This gender delta by TPR is higher than the specified 2%pt
threshold, prompting an exercise to see how it might be
mitigated, as follows.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Options to enhance algorithmic fairness</title>
        <p>In the online gambling use case, similar to other e-retail use
cases, there is a strong sector preference for not compelling
users to share sensitive data in order to use the services, both
recognising the potential intrusiveness of such questions and
the ease with which they can be inaccurately answered by
those who would prefer not to be asked. For this reason,
gender is a voluntary data point shared by players.</p>
        <p>We test two mitigation methods for Operator 2 that do not
require compulsory gender data: first, the inclusion of
gender as an additional input variable (allowing Unspecified (U)
to be one of its values). Secondly, we propose an ensemble
method which is gender-blind at its deployment and which
uses multiple gender-separated models in the ensemble
aggregated to form an overall view on a player’s risk.
Naturally, if accurate gender data were assumed available for all
players, other methods exist for reducing performance bias,
provided stakeholders tolerate modelling structural
differences by gender. For instance, separate classification
thresholds could be set for men and women (potentially as part of
gender-separated models) thus weighting false positives
differently by gender, or output quotas could be set such that
the top X highest-scoring male players and top Y
highestscoring female players are classified as at risk to meet a
benchmark quota (potentially balanced against a decision
rule that does not allow the quota to apply below a certain
classification probability or clash with the above mentioned
precautionary approach).</p>
        <p>The first option above proved ineffective. Gender has little
impact on the model. The original 0.1412 OOB error
worsens marginally to 0.1440 with gender included. Male
gender ranks 39 out of 42 input variables in terms of feature
frequency in the Random Forest model, and female gender
ranks 38 out of 42. The gender delta on TPR improves to
4.9%pts (reduced from 7.2%pts) but only with a worse TPR
performance among women, with no improvement among
men.</p>
        <p>In the second option, blind-separate model, we train three
separate models for confirmed male players, confirmed
female players and gender-unspecified or undisclosed players.
If any one model identifies a player as a likely self-excluder,
the player is predicted to be at possible risk, reflecting the
precautionary approach applied across many problem
gambling identification strategies. As such, the overall
classification approach is gendered but does not draw on gender
as an explicit input variable once deployed. In this way,
opt-in privacy of customers is preserved without a loss of
access by customers to the best performing algorithms. A
loss of access might happen if, for instance, one model were
trained with gender data, while another (less accurate) model
were trained without gender data, with the latter model used
whenever a customer chooses not to share gender data. In
our approach, gender data is only required for a sample of
the players, which might be developed from voluntarily
provided data (as done here) or via an ad-hoc collection for the
sole purpose of such a model. This approach reduces the
gender disparity in TPR, but at the cost of the true
negative rate (TNR) and accuracy. Male TPR increases from
46.5% to 54.7% and reduces the gender delta from 7.2%pts
to 4.0%pts. It is important to note that this improvement in
TPR and reduction in delta cannot be achieved by simply
altering the classification thresholds in the baseline model: the
delta increases to 7.3%pts in the baseline model if its
classification threshold is adjusted until the male TPR matches the
male TPR from the blind-separate model. This provides
confidence that the blind-separate model, in its use of
genderinsights, is providing additional value in the identification of
players at possible harm.</p>
        <p>Nonetheless, as mentioned, this reduction in gender
disparity by TPR comes at the cost of accuracy and TNR. The
OOB error is higher for men, which has the smaller
sample of the two confirmed genders (0.1452 vs 0.1361 for
women). TNR decreases from 96.7% to 95.3% for women,
from 98.1% to 91.9% for men and from 97.2% to 94.7% for
unspecified gender.</p>
        <p>
          This may be an acceptable loss of performance in
exchange for reduced gender disparity, given the gambling
industry focus on the precautionary principle, but would
require exploration with sector stakeholders. It is also possible
that a larger exercise may result in model choices that entail
other forms of compromise: one disadvantage of the
blindseparate model is that it reduces the sample size available
for training in each gender group. Improvements might be
expected with larger training data sets and the application of
data set balancing techniques (in Playtech’s deployed
algorithms, the SMOTE technique is used to generate balanced
data and it is not used here; see
          <xref ref-type="bibr" rid="ref15">Percy et al (2016</xref>
          ) for
details).
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Lessons Learned, Conclusion and Future</title>
    </sec>
    <sec id="sec-6">
      <title>Work</title>
      <p>The purpose of this paper has been to report work by
Playtech, a provider of B2B and B2C gambling services, to
investigate the role of gender in its gambling harm
identification algorithms. We identify five key lessons learned to
date as part of an ongoing project to improve practice:
The diversity of ML use cases, data sets and stakeholder
priorities is such that there is no single stance on what
algorithmic fairness should be prioritised or how it should
be enhanced. For the two models in the gambling harm
identification use cases, we have found negligible levels
of indirect gender identification in gender blind models.
Focusing on gender disparities in true positive rates, we
found a meaningful disparity in one model but not the
other. The same technique that reduced gender disparity
on the target model would have increased it on the other
model in the other data set, so we should not assume
consistency from one context to another.</p>
      <p>Analysis of bias requires investing resources in the
definition and defence of unbiased benchmarks and the
specification of a tolerance threshold. Since bias can exist
either above or below any given benchmark, random
variation makes it impossible to achieve an exact ongoing fit.
The margin of error which can be tolerated depends on
what stakeholders find material, worth the apportion of
resources and the level of variation in the values of a
protected variable as measured over time and over different
data set samples.</p>
      <p>Exercises to improve algorithmic fairness need to be
incorporated into overall business priorities, most likely
engaging appropriately balanced stakeholder groups, rather
than treated as a separable analytical exercise. This is both
because judgement calls need to be made as part of the
exercise and because adjusting practice based on insight
may require the balancing of multiple objectives, some of
which may be competing objectives.</p>
      <p>Indirect discrimination needs to be analysed as a feature
of a specific model rather than a feature of the data set. For
instance, a target variable such as gender may be mapped
in diverse ways against other variables in the data set
depending on the complexity of the model (e.g. linear,
polynomial, interaction-dependent, integer/decimal structure,
etc). Indirect discrimination is driven by whether your
model can exploit a particular pattern, rather than by other
patterns that might exist.</p>
      <p>Any analysis of fairness is inevitably limited, both
because of changing expectations and the potential breadth
of the topic. As such, it is important to treat it as a process
rather than a one-off exercise and to recognise the limits
in any one exercise. In this initial exploratory work, for
instance, it is unclear what biases a voluntary provision
of gender data might introduce. Gender bias is also likely
to exist elsewhere in the technical and cultural institutions
surrounding gambling, the self-identification of problem
gambling, and the socio-economic system on which
gambling is embedded; it is unclear how such biases might
influence training data and the resulting AI algorithms.
More specifically, this exploratory analysis has focused
on a sample of regular players and two operators, and
it may not be reflective of early-stage players or players
with other operators.</p>
      <p>In what concerns future work, in the gambling sector, our
next step is convening a working group to apply this at a
larger scale and discussing compromises among competing
objectives. Such a group might comprise domain experts
(e.g. ML experts and data scientists, legal counsel, experts
in the target variable, experts in the use case), managers
and external representatives who provide challenge and
validity as part of the overall exercise, ensuring
representation of individuals from different groups in the target
sociodemographic variables. In doing so, the objective is to
improve safer gambling outcomes across all cohorts and the
scope can be expanded to include the design and evaluation
of industry level interventions as well as risk identification
algorithms.</p>
      <p>On the ethics of AI more generally, we shall develop a
general framework out of our approach to investigating
algorithmic fairness in other use cases in the sector, supported by
a taxonomy of the diverse techniques available to improve
fairness. We invite comment, engagement and challenge on
this paper as part of the broader project to improve practice
and to develop relevant and industry-specific AI principles.</p>
      <p>Advisory Committee on Equal Opportunities for Women
and Men for the European Commission. (2020). Opinion
on Artificial Intelligence - opportunities and challenges for
gender equality (published 18 March 2020).</p>
      <p>Baggio, S., Gainsbury, S., Starcevic, V., Richard, J.,
Beck, F., Billieux, J. (2018). Gender differences in
gambling preferences and problem gambling: a network-level
analysis, International Gambling Studies, 18:3, 512-525.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Bolukbasi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saligrama</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalai</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings</article-title>
          .
          <source>Available via arXiv:1607.06520v1 [cs.CL] 21 Jul</source>
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>CDEI.</surname>
          </string-name>
          (
          <year>2020</year>
          ).
          <source>AI Barometer Report: June</source>
          <year>2020</year>
          .
          <article-title>London: Centre for Data Ethics and Innovation</article-title>
          , UK.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Choi</surname>
          </string-name>
          , YooJung; Farnadi, Golnoosh; Babaki, Behrouz and Broeck, Guy Van den.
          <article-title>Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns</article-title>
          .
          <source>In Proc. AAAI Conference on Artificial Intelligence</source>
          ,
          <source>AAAI</source>
          <year>2020</year>
          , New York, NY,
          <year>February 2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Dragicevic</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Percy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sarkar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Understanding the Risk Profile of Gambling Behaviour through Machine Learning Predictive Modelling and Explanation</article-title>
          .
          <source>KR2ML</source>
          <year>2019</year>
          , Workshop at 33rd NeurIPS Conference, Vancouver, Canada,
          <year>December 2019</year>
          (available via https://kr2ml.github.io/2019/papers/).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Dwork</surname>
          </string-name>
          , Cynthia; Hardt, Moritz; Pitassi, Toniann; Reingold, Omer and Zemel, Richard. Fairness through awareness, Innovations in Theoretical Computer Science Conference, ITCS2012, MIT CSAIL,
          <string-name>
            <surname>Cambridge</surname>
            <given-names>MA</given-names>
          </string-name>
          , January
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Friedler</surname>
          </string-name>
          ,
          <string-name>
            <surname>Sorelle</surname>
            <given-names>A</given-names>
          </string-name>
          ; Choudhary, Sonam; Scheidegger, Carlos; Hamilton,
          <string-name>
            <surname>Evan</surname>
            <given-names>P</given-names>
          </string-name>
          ;
          <article-title>Venkatasubramanian, Suresh and Roth, Derek. A Comparative Study of Fairness-Enhancing Interventions in Machine Learning</article-title>
          ,
          <source>In Proc. 2019 ACM Conference on Fairness, Accountability and Transparency</source>
          , Atlanta,
          <string-name>
            <surname>GA</surname>
          </string-name>
          ,
          <year>January 2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Mehrabi</surname>
          </string-name>
          , Ninareh; Morstatter, Fred; Saxena, Nripsuta; Lerman, Kristina and Galstyan,
          <source>Aram. A Survey on Bias and Fairness in Machine Learning</source>
          , KR2ML Workshop at NeurIPS'19 Conference, Vancouver, Canada,
          <year>December 2019</year>
          (available via https://kr2ml.github.io/2019/papers/).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Gambling</given-names>
            <surname>Commission</surname>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Participation in gambling and rates of problem gambling - England 2016: Statistical report</article-title>
          . Birmingham,
          <string-name>
            <surname>GC</surname>
          </string-name>
          , UK.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Goodman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <source>A Step Towards Accountable Algorithms? Algorithmic Discrimination and the European Union General Data Protection. 29th Conference on Neural Information Processing Systems (NIPS</source>
          <year>2016</year>
          ), Barcelona, Spain,
          <year>December 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Kleinberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ludwig</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mullainathan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rambachan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Advances in big data research in economics: Algorithmic fairness</article-title>
          .
          <source>AEA Papers and Proceedings</source>
          <year>2018</year>
          ,
          <volume>108</volume>
          :
          <fpage>22</fpage>
          -
          <lpage>27</lpage>
          https://doi.org/10.1257/pandp.20181018,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Lundberg</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>A Unified Approach to Interpreting Model Predictions</article-title>
          .
          <source>Advances in Neural Information Processing Systems 30 (NIPS</source>
          <year>2017</year>
          ), Long Beach, CA,
          <year>December 2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>McCarthy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomas</surname>
            ,
            <given-names>S.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bellringer</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          et al. (
          <year>2019</year>
          ).
          <article-title>Women and gambling-related harm: a narrative literature review and implications for research, policy, and practice</article-title>
          .
          <source>BMC Harm Reduction Journal</source>
          ,
          <fpage>16</fpage>
          -
          <lpage>18</lpage>
          2019.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Moerel</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Algorithms can reduce discrimination, but only with proper data</article-title>
          .
          <source>Publ. 16 Nov 2018 by IAPP</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Obermeyer</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Powers</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vogeli</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mullainathan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Dissecting racial bias in an algorithm used to manage the health of populations</article-title>
          .
          <source>Science 25 Oct</source>
          <year>2019</year>
          :
          <fpage>447</fpage>
          -
          <lpage>453</lpage>
          . https://science.sciencemag.org/content/366/6464/447.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Percy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , Franc¸a,
          <string-name>
            <surname>M.</surname>
          </string-name>
          , Dragicˇevic´,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Garcez</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          (
          <year>2016</year>
          ):
          <article-title>Predicting online gambling self-exclusion: an analysis of the performance of supervised machine learning models</article-title>
          ,
          <source>International Gambling Studies</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Sarkar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weyde</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Slabaugh</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dragicevic</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Percy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Accuracy and interpretability trade-offs in machine learning applied to safer gambling</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          , 1773. Dec. Available via http://ceur-ws.
          <source>org/</source>
          Vol-1773
          <source>/CoCoNIPS</source>
          <year>2016</year>
          paper10.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Stenstrom</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saad</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Testosterone, financial risk-taking, and pathological gambling</article-title>
          .
          <source>Journal of Neuroscience</source>
          , Psychology, and Economics,
          <volume>4</volume>
          (
          <issue>4</issue>
          ),
          <fpage>254</fpage>
          -
          <lpage>266</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Su</surname>
            , W., Han,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potenza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Do men become addicted to internet gaming and women to social media? A meta-analysis examining gender-related differences in specific internet addiction</article-title>
          .
          <source>Computers in Human Behavior</source>
          , Volume
          <volume>113</volume>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Suresh</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guttag</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>A Framework for Understanding Unintended Consequences of Machine Learning</article-title>
          . Available via arXiv:
          <year>1901</year>
          .
          <article-title>10002v3 [cs</article-title>
          .LG],
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Venne</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mazar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Volberg</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Gender and Gambling Behaviors: A Comprehensive Analysis of (Dis)Similarities</article-title>
          .
          <string-name>
            <surname>Int J Ment Health</surname>
            <given-names>Addiction</given-names>
          </string-name>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>White</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcez</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Measurable Counterfactual Local Explanations for Any Classifier</article-title>
          .
          <source>In Proc. 24th European Conference on Artificial Intelligence, ECAI</source>
          <year>2020</year>
          , Santiago de Compostela, Spain,
          <year>Aug 2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zane</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saw</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chan</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Examining gender differences for gambling engagement and gambling problems among emerging adults</article-title>
          .
          <source>Journal of gambling studies</source>
          ,
          <volume>29</volume>
          (
          <issue>2</issue>
          ),
          <fpage>171</fpage>
          -
          <lpage>189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Zemel</surname>
          </string-name>
          , Richard; Wu, Yu; Swersky, Kevin; Pitassi, Toniann and Dwork,
          <source>Cynthia. Learning Fair Representations, 30th International Conference on Machine Learning</source>
          ,
          <string-name>
            <surname>ICML</surname>
          </string-name>
          <year>2013</year>
          ,
          <article-title>Atlanta</article-title>
          ,
          <string-name>
            <surname>GA</surname>
          </string-name>
          ,
          <year>June 2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>