<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <abstract>
        <p>More than ever before, data, information, algorithms and systems have the potential to influence and shape our experiences and views. With increased access to digital media and the ubiquity of data and data-driven processes in all areas of life, an awareness and understanding of topics, such as algorithmic accountability, transparency, governance and bias, are becoming increasingly important. Recent cases in the news and media have highlighted the wider societal effects of data and algorithms requiring we pay it more attention. The BIAS workshop brought together researchers from different disciplines who are interested in analysing, understanding and tackling bias within their discipline, arising from the data, algorithms and methods they use. The workshop attracted 14 submissions, including research papers and extended abstracts. After a peer reviewing process in which each submission received three independent reviews, the following six papers were accepted and are included in these proceedings:</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The papers cover a wide range of research topics: from conceptual discussions
of algorithmic transparency and fairness to empirical research and case studies.</p>
      <p>William Seymour (page 2) discusses the relationship between the fairness of
an algorithm and its transparency and the important distinction between process
transparency and output transparency. For most effective machine learning
algorithms we cannot hope to obtain process transparency, as their inner workings
are beyond conscious human reasoning. Seymour argues that a viable alternative
is to analyse the transparency of the outcome of an algorithm. He also presents
two exemplary methods – local explanations and statistical analysis – that could
help to understand the fairness of the outputs.</p>
      <p>Alan Rubel, Clinton Castro and Adam Pham (page 9) address the notions of
agency and autonomy with regard to algorithmic systems. While debates about
biases in algorithmic systems often emphasise potential and actual harms, the
ii Preface
authors argue that our concerns about algorithms should not be limited to such
issues. Moving the debate forward beyond interest in algorithmic harms, they
argue that the ”moral salience” of algorithmic systems cannot be understood
without also addressing their impacts on human agency, autonomy and respect
for personhood.</p>
      <p>Claude Draude, Goda Klumbyte and Pat Treusch (page 14) explore the
potential for theoretical frameworks from gender studies – including Haraway’s
“situated knowledges” and Harding’s “standpoint theory” – to inform a better
understanding of how bias emerges within information systems. With a
particular focus on issues of androcentrism, over/underestimation of gender differences
and the stereotyping of gender traits in the workings of information systems,
their paper considers how feminist insights might help to account for and
prevent bias in information system design.</p>
      <p>Christopher Hube, Robert J¨aschke and Besnik Fetahu (page 19) present a
method for identifying language bias within textual corpora using word
embeddings, based on word2vec. This includes a two-stage process in which firstly seed
words indicating bias are extracted from Conservapedia, a dataset that includes
opinionated political articles. The second step uses word2vec to identify bias
words involving the seed list created previously. The approach iterates to keep
growing the list of bias words that could be used to form feature vectors for
tasks such as supervised learning.</p>
      <p>Vasileios Iosfidis and Eirini Ntoutsi (page 24) describe techniques for data
augmentation (SMOTE and oversampling) to deal with cases of class imbalance
where under-represented groups can affect data-driven methods, such as
supervised learning. Their experiments on the Census Income and German Credit
datasets show that the classes can be more equally represented using data
augmentation without affecting overall classification performance. This is
particularly important when dealing with biases in datasets around certain attributes,
such as gender and race, where the methods proposed in the paper can reduce
classification errors for potentially discriminated groups.</p>
      <p>Serena Oosterloo and Gerwin van Schie (page 30) walk us through a crime
prediction system currently being used in the Netherlands, from a critical data
studies perspective. Their paper illustrates various sources of inaccuracies in the
system, including those that cannot be helped – because the necessary attribute
cannot be measured with great precision in the offline world – as well as those
that result from human biases (e.g., the choices made during the process of
classifying a crime and the parties involved).</p>
      <p>The workshop was opened by a keynote from Ansgar Koene, University of
Nottingham, (page 1) discussing socio-technical causes of bias in algorithms and
systems and the role of policies and ethical standards.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>