<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>2.1. Data and Algorithmic Practices in
Journalism</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Supporting Responsible Data and Algorithmic Practices in The News Media</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dilruba Showkat</string-name>
          <email>showkat.d@northeastern.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Joint Proceedings of the ACM IUI 2022 Workshops</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Northeastern University</institution>
          ,
          <addr-line>Boston, MA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <abstract>
        <p>The journalism discipline has become more data and algorithm-driven than ever before. While the need for transparent algorithmic practices in journalism is widely known, less is known about how to go about doing that in practice. As a result, journalists often face challenges associated with Replicability and Reproducibility (R&amp;R) tasks both within the team and also when checking others' data work. Journalists can be facilitated to practice transparency by providing explicit information about the sources and methodologies - by being responsible dataset and algorithm users both within and outside of the organization. In this work, as a case study, I present a very first responsible dataset and responsible algorithm practices specifically crafted for the domain of journalism, as a step towards motivating and supporting transparent algorithmic practices using a question-driven documentation technique. The outcome of this study is open to critique, adoption, adaptation, and future exploration.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;responsible journalism</kwd>
        <kwd>transparent journalism</kwd>
        <kwd>replicability and reproducibility</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>of journalism is the discipline of verification” [10, p.79].</p>
      <p>There are several limitations which often makes it
imAlgorithms are widely used in a variety of applica- practical to implement transparent journalism in
praction domains ranging from the public and private sec- tice, for instance, misuse of transparent technology
tor, healthcare, automated hiring systems, to the crim- through gaming or manipulation [12], information
overinal justice system. Sometimes, these algorithms in- load, and others (e.g., cost, presentation). Furthermore,
herit, reproduce, or even enhance biases against the fact-checkers tools (such as politifact) [13, 14] are not
marginalized population, causing a lack of users’ trust informative enough to support journalists’ replication
in these systems [1, 2, 3, 4]. Moreover, “models are tasks – in terms of data and algorithmic analysis.
Repliopinions embedded in mathematics” [4, p.27], they en- cability and Reproducibility (R&amp;R) also plays a
signifable us to focus on only the outcome, predictor vari- icant role in journalism to make sure that
journalisables, and validation data while avoiding anything that tic processes are free from biases [15, 16] and the data
promotes an understanding of situations or context they put out in the world is accurate – since
“journal[3]. This is problematic, as a result, there is a grow- ism’s first obligation is to the truth” [17, 10]. There
ing interest in the design of transparent algorithmic is limited research in this space that supports
reprosystems to make the algorithmic decision making and ducibility tasks within the journalism team. Thus, in
context more accessible. In a similar vein, there is an this research, building on prior work I will provide a
increasing focus to produce replicable and reproducible set of question-driven documentation guideline
pracwork in Machine Learning (ML) research, data science, tice to support responsible dataset and algorithm use
and in the healthcare domain among others [5, 6, 7, 8, within journalism team. This work provides
implica9]. Reproducibility also plays a critical role in Journal- tions for making the news story related information
ism (e.g., provenance) [10]. (with caution) also available to the public, and
impli</p>
      <p>Likewise, the demand for transparent journalism has cations for related technology design intervention in
existed for a long time [10, 11], where journalists are the journalism context.
expected to describe what data sources they have used,
revealing subjects and data analysis methodology, for
verification and reproducibility purposes. “The essence 2. Related Work
As data becomes readily available news organizations
are increasingly becoming more data-driven than ever
before [13, 18]. Journalists work with a variety of datasetstigation [28, 27]) between data science and data
jourand data types (e.g., text, tables, numerical, categori- nalism [37, 20, 38, 39] work practices [34]. Therefore,
cal data [19, 20]) in news storytelling. They use pub- this work will take inspiration from various
explainlic datasets such as Medicare or Housing datasets [19], able methods available in data science and other
rethey also collect data from other sources such as through lated areas to propose a responsible data and algorithm
interviews, surveys, public websites using various tools practice for journalism – that will improve efective
and APIs (e.g., ArcGIS). Using public record requests is team communication and support transparent
journalalso very common among them [10, 21]. While some ism.
stories are based on a single dataset, others are based
on multiple datasets. Journalists can chase original 2.2. Current Trend Towards
stories or they might also choose to build of of oth- Explanation
ers’ work.</p>
      <p>When it comes to algorithms, journalists apply a Previous study shows that explaining how/what/why
wide range of algorithms; from simple statistical tests aspects of facebook newsfeed algorithms enhance users
(e.g, ANOVA, t-test) to advanced Machine Learning awareness of how the system works in the context
(ML) algorithms (e.g, regression, classification, unsu- of social media applications [40]. Prior research also
pervised ML) for data analysis [19, 20]. Furthermore, examined explainability in specific domains [41, 42].
recent work also shows that news are often co-produced For example, Liao et al. [42] applied a question-driven
using automated tools (e.g., that uses natural language method to facilitate explainable AI user experience (XAI
generation and large models [22]) alongside with hu- UX) in (adverse) healthcare domain. Existing work
mans [23, 24], especially to improve eficiency and pro- also studied fairness and transparency in recommender
duction. Needless to mention, “transparency is key. Be- systems [43]. Others explored the socio-organizational
ing able to explain how the stories are created is relevant context into explainability [44]. Needless to mention
both in-house and audiences” [23, p.7]. Previous work the vast amount of technical work that exists to
supalso suggests that journalists rely on outlier detection port technical expert users (e.g., ML engineer, data
scifor story idea generation [25, 26, 27, 28], others also re- entist) explainability needs for explaining black box
port on applying simple spreadsheets manipulation for and white box models [45, 46, 47, 48]. As evidence
similar tasks. Regardless of the data analysis method- suggests, less attention has been paid to support
transology or algorithms applied, journalists often perform parency and explainability needs in the journalism
converification; that is, checking others’ data work (veri- text. Although journalists apply wide range of data
fying either teammate and/or data/charts published by and algorithmic tools in news production [28],
applyother news organizations) to ensure correctness [29, ing existing explainable AI (XAI) techniques (even though
19]. sophisticated, e.g., [45]) may require prior knowledge</p>
      <p>Verification enables the journalists to ensure that in ML or may not be easily adaptable [49] in the news
the data they put out in the world is accurate [10, 30]. storytelling context. Additionally, in case of diverse
Verification often depends on the journalists to remem- teams with difering technical skills (ML users vs. SQL
ber things, such as the operation (e.g., min, sum) per- users) using these tools across the team might require
formed (asking a teammate), and the journalists have extra learning support. As a result, perhaps less
techno way of doing it in a way that is reproducible (for nical approach can be suitable for R&amp;R needs in
jourothers, even for themselves after a while) without clear nalism.
documentation [6, 31]. Verification is also challenging Inspired by related works both in data science
(doceven within a team, since people often forget to docu- umentation based approaches) [50, 51, 42, 44, 52, 1,
ment methods or how they have arrived at a particular 53] and transparent data journalism [12, 10, 54, 20, 37,
result. Similar to the field of data science (and other 19, 13], this work will apply a qualitative
questionrelated areas [32, 33, 34, 35]) the lack of proper doc- driven documentation technique to support [42]
alumentation is a common problem in code replication gorithmic transparency in journalism. This approach
tasks for journalists [20]. will require journalists to provide specific data and
al</p>
      <p>Even though journalists makes wide use of data and gorithmic details about the news stories by
specifyalgorithms in their day-to-day news production, very ing what/why/how/who/when information at
diferlimited work explored documentation techniques to ent levels (e.g., individual, organizational, team) of
joursupport journalists’ data and algorithmic practices trans- nalistic decision making process to provide richer
conparent [20, 19, 36]. Previous work also showed close text [1, 55, 56].
resemblance (also through systematic empirical
inves</p>
    </sec>
    <sec id="sec-2">
      <title>3. Transparent Data and</title>
    </sec>
    <sec id="sec-3">
      <title>Algorithmic Practices for</title>
    </sec>
    <sec id="sec-4">
      <title>Journalism</title>
      <p>lection, Preprocessing, Uses, Distribution, and
Maintenance. To ensure exhaustiveness and thorough
characterization of the datasets (e.g., campaign finance, crime
investigation [15]), factors in each of the
aforementioned categories are further updated based on work
This work will facilitate journalists to properly docu- of transparent journalism [12, 10, 54, 59] and data
sciment, contextualize data and algorithmic decision mak- ence [51, 42, 44, 60] due to their relatedness in data
ing in news storytelling, to support the practice of al- work practices [37, 19, 38, 20]. Furthermore, factors
gorithmic transparency. This was achieved through an related to data reported in Diakopoulos and Koliska
extensive review of relevant prior work in journalism, [12] are now carefully incorporated in each of the
catdata science, and other related areas. egories where they logically make sense. For
simplic</p>
      <p>At a very high methodological level, first, factors ity, I show factors related to only two major categories
relevant to the dataset/algorithms use in the journal- as follows:
ism domain was categorized using content analysis [57]
after synthesizing across prior work (similar to the
factors listed in [12]), second, those factors were
translated into question-driven explanation (e.g., How, What,
Why) following prior work such as [42] and others
[40, 44]. Specific details of the methods and processes
are provided below.
• Composition included the following factors:
attributes/feature definitions/description, labeled/unlabeled
data, data format (e.g., mp4, csv), sample size,
missing data/completeness, data category
(healthcare), data language (en-us), train/test split, raw
vs cleaned data, errors/redundancy, describe
sensitive/anonymous/ground truth data
3.1. Responsible Dataset Practice
Following is the description of the methods that were
used to derive the very first question-driven
responsible dataset guidelines for journalism (see Figure 1 for
detail).
• Preprocessing included the following factors: which
data was discarded? why? tools used or done
manually? Manual/automated labeling, process?
annotator/curator demographics (race, class,
gender), data transformation, bias handling.</p>
      <sec id="sec-4-1">
        <title>3.1.1. Methods for Responsible Dataset</title>
      </sec>
      <sec id="sec-4-2">
        <title>Guideline</title>
        <p>The proposed responsible dataset use guideline was
heavily inspired by and built upon previous work
described in [50, 58, 51, 12, 10, 42], and adapted
specifically to be used in journalism. More precisely,
Bender and Friedman [51] proposed data statements for
text data (though it can be applied more broadly) to
alleviate bias and exclusion against certain groups of
people in Natural Language Processing (NLP)
technology. Gebru et al. [50] also developed datasheets for
datasets – a documentation practice to enable
accountability and transparency among dataset creators and
consumers in the ML community. Diakopoulos and
Koliska [12] proposed several factors important for achieva-nalysis) functions and responsibilities across
difering algorithmic transparency in the news media. In ent organizations [28].
this work, I have adapted, refined, and integrated these Dataset Explanation in the News Storytelling:
data documentation practices for journalism. The final In the proposed guideline, journalists should provide
prototype is shown in Figure 1, and the specific fea- context for any dataset used by documenting Who,
ture selection criteria are described below: What, When, Why, and How [40, 44, 1] related
ques</p>
        <p>Major Categories: Consistent with prior work de- tions. For example, journalists were asked to provide
scribed in Gebru et al. [50], journalists are required to context associated with a particular dataset in the
Modocument information for each of the major categories tivation category. Together with demographic
infor(Blue text in Figure 1): Motivation, Composition, Col- mation across diferent journalism roles and subjects,
it is easy to demystify “WHY” a certain dataset was
And these factors were then carefully converted into
explanation questions. Dataset characterization
questions for each of these categories were directly
incorporated into the guideline.</p>
        <p>Diferent Journalism Roles and Demographics:
Following the work from Bender and Friedman [51],
the proposed guideline also enforced journalists to
document important demographics (e.g., age, gender, class)
features for diferent journalism roles (e.g., data
annotator, speaker, data curator, data collector, scripter,
editor, data analyst, presenter, director) to provide
transparency against inadvertent biases. These roles’
definitions are informed and combined from prior research
in [51, 20, 61] to cover a broad range of journalism
roles. Some of these roles may have overlapping (data
used in a story. This can also provide an indication Used, Parametes/Features, Tools/Editor, Programming
Lanof any pre-existing biases that have gone unnoticed. guage and Code, Hardware, Verification , Story
NarraThe individual/organization/team, “WHO” worked on tive Related. These categories were carefully
assemthe story can be found by combining information from bled and informed by previous research in such a way
Motivation, Collection, Preprocessing, Uses, and Mainte- that it covers all the algorithm related details needed
nance categories. “WHAT” aspect or feature descrip- for the journalist replication task without being
reduntion and other related information for any dataset are dant [12, 54, 10].
covered in the Composition, Collection, and Preprocess- Algorithmic Explanation in the News Storytelling:
ing categories. Similarly, “WHEN” information is trackedExplanation regarding algorithm use is required to be
through Maintenance. “HOW” aspect of a dataset is in- documented in the aforementioned categories, for
excluded in the Collection, and Preprocessing categories. ample, “WHAT” model/algorithm was used should be
Please note that How, What, Why, Who, When char- documented in the Model/Algorithm Used category; “WHAT”
acteristic aspects of a dataset in these categories may parameters were chosen and “WHY” should be
docunot be exclusive, however, they provide all the factors mented in the Parameters/Features category; “WHO”
necessary (to the best of my knowledge) for respon- wrote the code, including code/data verification related
sible R&amp;R dataset practice. These pieces of informa- information should be described in Programming
Lantion collectively provide suficient context and insights guage and Code and in Verification category. These
feafrom individual and organizational decision making tures cover specific information to allow journalists to
perspectives in the news storytelling [44, 1]. replicate data analysis done by others (even for
them</p>
        <p>The proposed responsible dataset prototype (Figure selves for later reference), to make sure that
journal1) is dataset type (e.g., healthcare, finance, housing) ist’s (and their teams) does not have to reproduce code
agnostic, meaning that the journalists could describe blindly when checking existing data work. Factors
reany dataset types used in a story with the help of this lated to news story was included in the Story Narrative
prototype. The journalists must also conform to the Related informed by the work in Kovach and
Rosenprivacy and anonymity of their news sources such as stiel [10] consists of specific story related facts such as
anonymous data sources [10, 59]. It is also important quotes, names, date-time information. All these
facto note that all personal demographics should be pub- tors collectively enable journalists to verify facts/numbers
lished only after receiving user consent [54]. Describ- when an error goes unnoticed after publication, by
thoring and characterizing algorithmic information together ough and careful documentation throughout the
lifewith datasets will further facilitate journalists’ verifi- cycle of a story [28, 20].
cation [10] and R&amp;R needs [6, 51, 62], discussed below. In the above paragraphs, I described the methods
for responsible datasets and algorithm use guidelines
in the context of journalism. Responsible dataset
prac3.2. Responsible Algorithmic Use tice has the ability to prevent or reveal unforeseen
biPractice ases (e.g., pre-existing, emergent) in journalistic data
work practices. Journalists (with caution and if they
The methods used for developing the responsible algo- are willing) can take certain level of accountability in
rithm use guideline to support journalists verification their dataset use and attain users trust through
responneed is provided below. sible dataset and algorithmic practices (with caution
by revealing what they know and how they know it).</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.2.1. Methods for Responsible Algorithm Use</title>
      </sec>
      <sec id="sec-4-4">
        <title>Guideline</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusion and Future Work</title>
      <p>Previously, Mitchell et al. [52] proposed model cards
for explaining Machine Learning (ML) models. The As journalists become more reliant on data and
algovery initial prototype of responsible algorithm use for rithms, it is important that they become responsible
journalism is designed based on taking inspiration from dataset and algorithm users. Therefore, this work
prothis and other similar works described in [12, 42, 10, 20, posed a question-driven responsible datasets and
al54, 52]. The final responsible algorithm use guideline gorithm documentation guideline to support
journalis shown in Figure 2. ists’ replicability and reproducibility (R&amp;R) needs – as</p>
      <p>Major Categories: The information for the respon- a way to facilitate transparent algorithmic practices in
sible algorithm/model use was organized in the fol- the news media. The proposed guideline requires
jourlowing categories (Blue text in Figure 2): Model/Algorithmnalists to document and/or summarize datasets and
algorithm related information by answering several key was meant to be exhaustive, but likely it is not because,
questions regarding news storytelling. The questions in the real-world journalism practice, things might change
were derived and informed by relevant prior work from due to various factors outside of data and algorithms
transparent journalism [12, 10, 19, 13, 20, 54] and data (e.g., resource/timing constraints, legal, profit vs
nonscience among others [50, 42, 32, 34, 33, 51, 58, 44, 52, profit); as a result, new questions might emerge and
38]. The proposed responsible documentation guide- add up. Lastly, as newsrooms are increasingly
adoptline is specifically crafted for journalism (or journalists ing automated news production, thus, how the
proteam internal use), but perhaps maybe with caution or posed method will scale is an open line of inquiry.
upon request can be made available for the citizens as
well.</p>
      <p>
        There are several ways this work can be extended 5. Acknowledgments
in the future, first, this work should be evaluated with
journalists and other stakeholders to understand di- I would like to thank anonymous reviewers for their
verse (critical) user information needs [63] (e.g., what valuable comments and feedback.
information is safe to reveal and to whom). Secondly,
the factors reported in the initial guideline, though,
lic should expect, Three Rivers Press (CA), 2014. tigative reporting: Using an expert system to
[11] T. Aitamurto, M. Ananny, C. W. Anderson, enhance journalists’ ability to discover original
L. Birnbaum, N. Diakopoulos, M. Hanson, J. Hull- public afairs stories, Digital Journalism 3 (2015)
man, N. Ritchie, Hci for accurate, impartial 814–831.
and transparent journalism: Challenges and so- [26] A. Jain, B. Sharma, P. Choudhary, R. Sangave,
lutions, in: Extended Abstracts of the 2019 W. Yang, Data-driven investigative
journalCHI Conference on Human Factors in Comput- ism for connectas dataset, arXiv preprint
ing Systems, 2019, pp. 1–8. arXiv:1804.08675 (2018).
[12] N. Diakopoulos, M. Koliska, Algorithmic trans- [27] D. Showkat, E. P. S. Baumer, Outliers: More than
parency in the news media, Digital Journalism 5 numbers? (
        <xref ref-type="bibr" rid="ref2">2020</xref>
        ).
      </p>
      <p>(2017) 809–828. [28] D. Showkat, E. P. S. Baumer, Where do stories
[13] K. McBride, T. Rosenstiel, The new ethics of jour- come from? examining the exploration process
nalism: Principles for the 21st century, CQ Press, in investigative data journalism, Proceedings
2013. of the ACM on Human-Computer Interaction 5
[14] politifact, Politifact: The poynter institute, 2022. (2021) 1–31.</p>
      <p>URL: https://www.politifact.com/, accessed: [29] B. Lee, M. Brehmer, P. Isenberg, E. K. Choe,
2022-1-1. R. Langner, R. Dachselt, Data visualization on
[15] S. M. Julia Angwin, Jef Larson, mobile devices, in: Extended Abstracts of the
L. Kirchner, Machine bias, 2016. URL: 2018 CHI Conference on Human Factors in
Comhttps://www.propublica.org/article/ puting Systems, 2018, pp. 1–8.
machine-bias-risk-assessments-in-criminal-senten[c3in0g], D. G. Johnson, N. Diakopoulos, What to do
accessed: 2020-07-08. about deepfakes, Communications of the ACM
[16] S. U. Noble, Algorithms of oppression: How 64 (2021) 33–35.</p>
      <p>search engines reinforce racism, nyu Press, 2018. [31] D. Showkat, Determining newcomers barrier in
[17] I. Shapiro, Evaluating journalism: Towards an software development: An it industry based
inassessment framework for the practice of jour- vestigation, in: Companion of the 2018 ACM
nalism, Journalism Practice 4 (2010) 143–162. Conference on Computer Supported Cooperative
[18] N. H. Riche, C. Hurter, N. Diakopoulos, Work and Social Computing, 2018, pp. 165–168.</p>
      <p>
        S. Carpendale, Data-driven storytelling, CRC [32] I. Steinmacher, T. Conte, M. A. Gerosa, D.
RedPress, 2018. miles, Social barriers faced by newcomers
plac[19] J. Gray, L. Chambers, L. Bounegru, The data jour- ing their first contribution in open source
softnalism handbook: How journalists can use data ware projects, in: Proceedings of the 18th ACM
to improve the news, " O’Reilly Media, Inc.", 2012. conference on Computer supported cooperative
[20] F. Chevalier, M. Tory, B. Lee, J. van Wijk, G. San- work &amp; social computing, 2015, pp. 1379–1392.
tucci, M. Dörk, J. Hullman, From analysis to [33] A. X. Zhang, M. Muller, D. Wang, How do data
communication: Supporting the lifecycle of a science workers collaborate? roles, workflows,
story, in: Data-Driven Storytelling, AK Pe- and tools, Proceedings of the ACM on
Humanters/CRC Press, 2018, pp. 169–202. Computer Interaction 4 (
        <xref ref-type="bibr" rid="ref2">2020</xref>
        ) 1–23.
[21] H. De Burgh, Investigative journalism, Rout- [34] M. Muller, I. Lange, D. Wang, D. Piorkowski,
ledge, 2008. J. Tsay, Q. V. Liao, C. Dugan, T. Erickson, How
[22] T. B. Brown, B. Mann, N. Ryder, M. Sub- data science workers work with data: Discovery,
biah, J. Kaplan, P. Dhariwal, A. Neelakantan, capture, curation, design, creation, in:
ProceedP. Shyam, G. Sastry, A. Askell, et al., Language ings of the 2019 CHI Conference on Human
Facmodels are few-shot learners, arXiv preprint tors in Computing Systems, 2019, pp. 1–15.
arXiv:2005.14165 (
        <xref ref-type="bibr" rid="ref2">2020</xref>
        ). [35] A. Y. Wang, D. Wang, J. Drozdal, X. Liu, S. Park,
[23] C.-G. Lindén, H. Tuulonen, A. Bäck, N. Di- S. Oney, C. Brooks, What makes a
wellakopoulos, M. Granroth-Wilding, L. Haapanen, documented notebook? a case study of data
sciL. Leppänen, M. Melin, T. Moring, M. Munezero, entists’ documentation practices in kaggle, in:
et al., News automation: The rewards, risks and Extended Abstracts of the 2021 CHI Conference
realities of ‘machine journalism’ (2019). on Human Factors in Computing Systems, 2021,
[24] N. Diakopoulos, Automating the news, Harvard pp. 1–7.
      </p>
      <p>University Press, 2019. [36] N. Diakopoulos, Algorithmic accountability
re[25] M. Broussard, Artificial intelligence for inves- porting: On the investigation of black boxes
(2014). puting systems, 2018, pp. 1–18.
[37] K. Kirkpatrick, Putting the data science into jour- [49] H. Kaur, H. Nori, S. Jenkins, R. Caruana, H.
Walnalism, 2015. lach, J. Wortman Vaughan, Interpreting
inter[38] P. Guo, Data science workflow: Overview pretability: Understanding data scientists’ use of
and challenges, OCTOBER 30, 2013. URL: interpretability tools for machine learning, in:
https://cacm.acm.org/blogs/blog-cacm/ Proceedings of the 2020 CHI Conference on
Hu169199-data-science-workflow-overview-and-challenges/man Factors in Computing Systems, 2020, pp. 1–
fulltext, accessed: 2020-08-22. 14.
[39] P. Bradshaw, The inverted pyramid of [50] T. Gebru, J. Morgenstern, B. Vecchione, J. W.
data journalism, JULY 07, 2011. URL: Vaughan, H. Wallach, H. Daumé III, K.
Crawhttps://onlinejournalismblog.com/2011/07/ ford, Datasheets for datasets, arXiv preprint
07/the-inverted-pyramid-of-data-journalism/, arXiv:1803.09010 (2018).</p>
      <p>accessed: 2020-05-22. [51] E. M. Bender, B. Friedman, Data statements for
[40] E. Rader, K. Cotter, J. Cho, Explanations as mech- natural language processing: Toward mitigating
anisms for supporting algorithmic transparency, system bias and enabling better science,
Transacin: Proceedings of the 2018 CHI Conference on tions of the Association for Computational
LinHuman Factors in Computing Systems, ACM, guistics 6 (2018) 587–604.</p>
      <p>2018, p. 103. [52] M. Mitchell, S. Wu, A. Zaldivar, P. Barnes,
[41] T. Kulesza, M. Burnett, W.-K. Wong, S. Stumpf, L. Vasserman, B. Hutchinson, E. Spitzer, I. D.</p>
      <p>Principles of explanatory debugging to person- Raji, T. Gebru, Model cards for model
reportalize interactive machine learning, in: Proceed- ing, in: Proceedings of the conference on
fairings of the 20th international conference on in- ness, accountability, and transparency, 2019, pp.
telligent user interfaces, 2015, pp. 126–137. 220–229.
[42] Q. V. Liao, M. Pribić, J. Han, S. Miller, D. Sow, [53] E. M. Bender, T. Gebru, A. McMillan-Major,
Question-driven design process for explain- S. Shmitchell, On the dangers of stochastic
parable ai user experiences, arXiv preprint rots: Can language models be too big?, in:
ProarXiv:2104.03483 (2021). ceedings of the 2021 ACM Conference on
Fair[43] N. Sonboli, J. J. Smith, F. C. Berenfus, R. Burke, ness, Accountability, and Transparency, 2021, pp.</p>
      <p>
        C. Fiesler, Fairness and transparency in rec- 610–623.
ommendation: The users’ perspective, arXiv [54] N. Diakopoulos, Transparency, in: The Oxford
preprint arXiv:2103.08786 (2021). Handbook of Ethics of AI, 2020.
[44] U. Ehsan, Q. V. Liao, M. Muller, M. O. Riedl, J. D. [55] C. D’Ignazio, L. F. Klein, Seven intersectional
Weisz, Expanding explainability: Towards so- feminist principles for equitable and actionable
cial transparency in ai systems, arXiv preprint covid-19 data, Big data &amp; society 7 (
        <xref ref-type="bibr" rid="ref2">2020</xref>
        )
arXiv:2101.04719 (2021). 2053951720942544.
[45] H. Nori, S. Jenkins, P. Koch, R. Caruana, [56] M. Kogan, A. Halfaker, S. Guha, C. Aragon,
Interpretml: A unified framework for ma- M. Muller, S. Geiger, Mapping out
humanchine learning interpretability, arXiv preprint centered data science: Methods, approaches, and
arXiv:1909.09223 (2019). best practices, in: Companion of the 2020 ACM
[46] M. 2020, What is responsible ma- International Conference on Supporting Group
chine learning? (preview), 2020. URL: Work, 2020, pp. 151–156.
https://docs.microsoft.com/en-us/azure/ [57] M. Vaismoradi, J. Jones, H. Turunen, S. Snelgrove,
machine-learning/concept-responsible-ml, Theme development in qualitative content
analaccessed: 2020-11-12. ysis and thematic analysis (2016).
[47] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, [58] E. M. Bender, A typology of ethical risks in
F. Giannotti, D. Pedreschi, A survey of methods language technology with an eye towards where
for explaining black box models, ACM comput- transparent documentation can help, in: Future
ing surveys (CSUR) 51 (2018) 1–42. of Artificial Intelligence: Language, Ethics,
Tech[48] A. Abdul, J. Vermeulen, D. Wang, B. Y. Lim, nology Workshop, 2019.
      </p>
      <p>M. Kankanhalli, Trends and trajectories for ex- [59] N. Diakopoulos, Ethics in data-driven visual
stoplainable, accountable and intelligible systems: rytelling, in: Data-Driven Storytelling, AK
PeAn hci research agenda, in: Proceedings of the ters/CRC Press, 2018, pp. 233–248.
2018 CHI conference on human factors in com- [60] S. Holland, A. Hosny, S. Newman, J. Joseph,
K. Chmielinski, The dataset nutrition label: A
framework to drive higher data quality
standards, arXiv preprint arXiv:1805.03677 (2018).
[61] B. Lee, N. H. Riche, P. Isenberg, S. Carpendale,</p>
      <p>More than telling a story: Transforming data into
visually shared stories, IEEE computer graphics
and applications 35 (2015) 84–90.
[62] M. Broussard, Big data in practice: Enabling
computational journalism through code-sharing and
reproducible research methods, Digital
Journalism 4 (2016) 266–279.
[63] J. Kemper, D. Kolkman, Transparent to whom?
no algorithmic accountability without a critical
audience, Information, Communication &amp;
Society 22 (2019) 2081–2096.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>Computer Interaction</source>
          <volume>4</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          . [1]
          <string-name>
            <surname>C. D'Ignazio</surname>
            ,
            <given-names>L. F.</given-names>
          </string-name>
          <string-name>
            <surname>Klein</surname>
          </string-name>
          ,
          <article-title>Data feminism</article-title>
          , MIT [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Haibe-Kains</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hosny</surname>
          </string-name>
          , F. Kho-
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          Press,
          <year>2020</year>
          . dakarami, L. Waldron,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>McIntosh</surname>
          </string-name>
          , [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Christin</surname>
          </string-name>
          , Algorithms in practice: Comparing A.
          <string-name>
            <surname>Goldenberg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kundaje</surname>
            ,
            <given-names>C. S.</given-names>
          </string-name>
          <string-name>
            <surname>Greene</surname>
          </string-name>
          , et al.,
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <issue>Society 4</issue>
          (
          <year>2017</year>
          )
          <article-title>2053951717718855</article-title>
          . telligence,
          <source>Nature</source>
          <volume>586</volume>
          (
          <year>2020</year>
          )
          <fpage>E14</fpage>
          -
          <lpage>E16</lpage>
          . [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Eubanks</surname>
          </string-name>
          , Automating inequality: How high- [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chattopadhyay</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Prasad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Z.</given-names>
            <surname>Henley</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Martin's Press</surname>
          </string-name>
          ,
          <year>2018</year>
          .
          <article-title>tational notebooks? pain points, needs</article-title>
          , and de[4]
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>O'neil, Weapons of math destruction: How big sign opportunities</article-title>
          ,
          <source>in: Proceedings of the 2020</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>racy</surname>
          </string-name>
          , Crown,
          <year>2016</year>
          .
          <source>ing Systems</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          . [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Raf</surname>
          </string-name>
          , A step toward quantifying indepen- [9]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Tullis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kar</surname>
          </string-name>
          ,
          <article-title>Where is the provenance? eth-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>tems 32</source>
          (
          <year>2019</year>
          )
          <fpage>5485</fpage>
          -
          <lpage>5495</lpage>
          . can Association of Geographers 111 (
          <year>2021</year>
          )
          <fpage>1318</fpage>
          - [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Feinberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Sutherland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Nelson</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. H.</surname>
          </string-name>
          <year>1328</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Jarrahi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rajasekar</surname>
            , The new reality of re- [10]
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Kovach</surname>
          </string-name>
          , T. Rosenstiel,
          <article-title>The elements of journal-</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>