=Paper= {{Paper |id=Vol-3124/paper18 |storemode=property |title=Supporting Responsible Data and Algorithmic Practices in The News Media |pdfUrl=https://ceur-ws.org/Vol-3124/paper18.pdf |volume=Vol-3124 |authors=Dilruba Showkat |dblpUrl=https://dblp.org/rec/conf/iui/Showkat22 }} ==Supporting Responsible Data and Algorithmic Practices in The News Media== https://ceur-ws.org/Vol-3124/paper18.pdf
Supporting Responsible Data and Algorithmic Practices
in The News Media
Dilruba Showkat
Northeastern University, Boston, MA, USA


                                         Abstract
                                         The journalism discipline has become more data and algorithm-driven than ever before. While the need for transparent
                                         algorithmic practices in journalism is widely known, less is known about how to go about doing that in practice. As a result,
                                         journalists often face challenges associated with Replicability and Reproducibility (R&R) tasks both within the team and also
                                         when checking others’ data work. Journalists can be facilitated to practice transparency by providing explicit information
                                         about the sources and methodologies – by being responsible dataset and algorithm users both within and outside of the
                                         organization. In this work, as a case study, I present a very first responsible dataset and responsible algorithm practices
                                         specifically crafted for the domain of journalism, as a step towards motivating and supporting transparent algorithmic prac-
                                         tices using a question-driven documentation technique. The outcome of this study is open to critique, adoption, adaptation,
                                         and future exploration.

                                         Keywords
                                         responsible journalism, transparent journalism, replicability and reproducibility


1. Introduction                                                                                                 of journalism is the discipline of verification” [10, p.79].
                                                                                                                There are several limitations which often makes it im-
Algorithms are widely used in a variety of applica-                                                             practical to implement transparent journalism in prac-
tion domains ranging from the public and private sec-                                                           tice, for instance, misuse of transparent technology
tor, healthcare, automated hiring systems, to the crim-                                                         through gaming or manipulation [12], information over-
inal justice system. Sometimes, these algorithms in-                                                            load, and others (e.g., cost, presentation). Furthermore,
herit, reproduce, or even enhance biases against the                                                            fact-checkers tools (such as politifact) [13, 14] are not
marginalized population, causing a lack of users’ trust                                                         informative enough to support journalists’ replication
in these systems [1, 2, 3, 4]. Moreover, “models are                                                            tasks – in terms of data and algorithmic analysis. Repli-
opinions embedded in mathematics” [4, p.27], they en-                                                           cability and Reproducibility (R&R) also plays a signif-
able us to focus on only the outcome, predictor vari-                                                           icant role in journalism to make sure that journalis-
ables, and validation data while avoiding anything that                                                         tic processes are free from biases [15, 16] and the data
promotes an understanding of situations or context                                                              they put out in the world is accurate – since “journal-
[3]. This is problematic, as a result, there is a grow-                                                         ism’s first obligation is to the truth” [17, 10]. There
ing interest in the design of transparent algorithmic                                                           is limited research in this space that supports repro-
systems to make the algorithmic decision making and                                                             ducibility tasks within the journalism team. Thus, in
context more accessible. In a similar vein, there is an                                                         this research, building on prior work I will provide a
increasing focus to produce replicable and reproducible                                                         set of question-driven documentation guideline prac-
work in Machine Learning (ML) research, data science,                                                           tice to support responsible dataset and algorithm use
and in the healthcare domain among others [5, 6, 7, 8,                                                          within journalism team. This work provides implica-
9]. Reproducibility also plays a critical role in Journal-                                                      tions for making the news story related information
ism (e.g., provenance) [10].                                                                                    (with caution) also available to the public, and impli-
   Likewise, the demand for transparent journalism has                                                          cations for related technology design intervention in
existed for a long time [10, 11], where journalists are                                                         the journalism context.
expected to describe what data sources they have used,
revealing subjects and data analysis methodology, for
verification and reproducibility purposes. “The essence                                                         2. Related Work
Joint Proceedings of the ACM IUI 2022 Workshops, March 2022,                                                    2.1. Data and Algorithmic Practices in
Helsinki, Finland                                                                                                    Journalism
$ showkat.d@northeastern.edu (D. Showkat)

                                   © 2022 Copyright © 2022 for this paper by its authors. Use permitted under
                                                                                                                As data becomes readily available news organizations
                                   Creative Commons License Attribution 4.0 International (CC BY 4.0)           are increasingly becoming more data-driven than ever
CEUR
Workshop
Proceedings
              http://ceur-ws.org
              ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
before [13, 18]. Journalists work with a variety of datasetstigation [28, 27]) between data science and data jour-
and data types (e.g., text, tables, numerical, categori- nalism [37, 20, 38, 39] work practices [34]. Therefore,
cal data [19, 20]) in news storytelling. They use pub- this work will take inspiration from various explain-
lic datasets such as Medicare or Housing datasets [19], able methods available in data science and other re-
they also collect data from other sources such as through lated areas to propose a responsible data and algorithm
interviews, surveys, public websites using various tools practice for journalism – that will improve effective
and APIs (e.g., ArcGIS). Using public record requests is team communication and support transparent journal-
also very common among them [10, 21]. While some ism.
stories are based on a single dataset, others are based
on multiple datasets. Journalists can chase original 2.2. Current Trend Towards
stories or they might also choose to build off of oth-
                                                                   Explanation
ers’ work.
   When it comes to algorithms, journalists apply a Previous study shows that explaining how/what/why
wide range of algorithms; from simple statistical tests aspects of facebook newsfeed algorithms enhance users
(e.g, ANOVA, t-test) to advanced Machine Learning awareness of how the system works in the context
(ML) algorithms (e.g, regression, classification, unsu- of social media applications [40]. Prior research also
pervised ML) for data analysis [19, 20]. Furthermore, examined explainability in specific domains [41, 42].
recent work also shows that news are often co-produced For example, Liao et al. [42] applied a question-driven
using automated tools (e.g., that uses natural language method to facilitate explainable AI user experience (XAI
generation and large models [22]) alongside with hu- UX) in (adverse) healthcare domain. Existing work
mans [23, 24], especially to improve efficiency and pro- also studied fairness and transparency in recommender
duction. Needless to mention, “transparency is key. Be- systems [43]. Others explored the socio-organizational
ing able to explain how the stories are created is relevant context into explainability [44]. Needless to mention
both in-house and audiences” [23, p.7]. Previous work the vast amount of technical work that exists to sup-
also suggests that journalists rely on outlier detection port technical expert users (e.g., ML engineer, data sci-
for story idea generation [25, 26, 27, 28], others also re- entist) explainability needs for explaining black box
port on applying simple spreadsheets manipulation for and white box models [45, 46, 47, 48]. As evidence
similar tasks. Regardless of the data analysis method- suggests, less attention has been paid to support trans-
ology or algorithms applied, journalists often perform parency and explainability needs in the journalism con-
verification; that is, checking others’ data work (veri- text. Although journalists apply wide range of data
fying either teammate and/or data/charts published by and algorithmic tools in news production [28], apply-
other news organizations) to ensure correctness [29, ing existing explainable AI (XAI) techniques (even though
19].                                                        sophisticated, e.g., [45]) may require prior knowledge
   Verification enables the journalists to ensure that in ML or may not be easily adaptable [49] in the news
the data they put out in the world is accurate [10, 30]. storytelling context. Additionally, in case of diverse
Verification often depends on the journalists to remem- teams with differing technical skills (ML users vs. SQL
ber things, such as the operation (e.g., min, sum) per- users) using these tools across the team might require
formed (asking a teammate), and the journalists have extra learning support. As a result, perhaps less tech-
no way of doing it in a way that is reproducible (for nical approach can be suitable for R&R needs in jour-
others, even for themselves after a while) without clear nalism.
documentation [6, 31]. Verification is also challenging        Inspired by related works both in data science (doc-
even within a team, since people often forget to docu- umentation based approaches) [50, 51, 42, 44, 52, 1,
ment methods or how they have arrived at a particular 53] and transparent data journalism [12, 10, 54, 20, 37,
result. Similar to the field of data science (and other 19, 13], this work will apply a qualitative question-
related areas [32, 33, 34, 35]) the lack of proper doc- driven documentation technique to support [42] al-
umentation is a common problem in code replication gorithmic transparency in journalism. This approach
tasks for journalists [20].                                 will require journalists to provide specific data and al-
   Even though journalists makes wide use of data and gorithmic details about the news stories by specify-
algorithms in their day-to-day news production, very ing what/why/how/who/when information at differ-
limited work explored documentation techniques to ent levels (e.g., individual, organizational, team) of jour-
support journalists’ data and algorithmic practices trans- nalistic decision making process to provide richer con-
parent [20, 19, 36]. Previous work also showed close text [1, 55, 56].
resemblance (also through systematic empirical inves-
3. Transparent Data and                                      lection, Preprocessing, Uses, Distribution, and Mainte-
                                                             nance. To ensure exhaustiveness and thorough charac-
   Algorithmic Practices for                                 terization of the datasets (e.g., campaign finance, crime
   Journalism                                                investigation [15]), factors in each of the aforemen-
                                                             tioned categories are further updated based on work
This work will facilitate journalists to properly docu-      of transparent journalism [12, 10, 54, 59] and data sci-
ment, contextualize data and algorithmic decision mak-       ence [51, 42, 44, 60] due to their relatedness in data
ing in news storytelling, to support the practice of al-     work practices [37, 19, 38, 20]. Furthermore, factors
gorithmic transparency. This was achieved through an         related to data reported in Diakopoulos and Koliska
extensive review of relevant prior work in journalism,       [12] are now carefully incorporated in each of the cat-
data science, and other related areas.                       egories where they logically make sense. For simplic-
   At a very high methodological level, first, factors       ity, I show factors related to only two major categories
relevant to the dataset/algorithms use in the journal-       as follows:
ism domain was categorized using content analysis [57]
after synthesizing across prior work (similar to the fac-         • Composition included the following factors: at-
tors listed in [12]), second, those factors were trans-             tributes/feature definitions/description, labeled/unlabeled
lated into question-driven explanation (e.g., How, What,            data, data format (e.g., mp4, csv), sample size,
Why) following prior work such as [42] and others                   missing data/completeness, data category (health-
[40, 44]. Specific details of the methods and processes             care), data language (en-us), train/test split, raw
are provided below.                                                 vs cleaned data, errors/redundancy, describe sen-
                                                                    sitive/anonymous/ground truth data
3.1. Responsible Dataset Practice                                 • Preprocessing included the following factors: which
Following is the description of the methods that were               data was discarded? why? tools used or done
used to derive the very first question-driven responsi-             manually? Manual/automated labeling, process?
ble dataset guidelines for journalism (see Figure 1 for             annotator/curator demographics (race, class, gen-
detail).                                                            der), data transformation, bias handling.
                                                          And these factors were then carefully converted into
3.1.1. Methods for Responsible Dataset                    explanation questions. Dataset characterization ques-
       Guideline                                          tions for each of these categories were directly incor-
The proposed responsible dataset use guideline was porated into the guideline.
heavily inspired by and built upon previous work de-         Different Journalism Roles and Demographics:
scribed in [50, 58, 51, 12, 10, 42], and adapted specif-  Following     the work from Bender and Friedman [51],
ically to be used in journalism. More precisely, Ben-     the  proposed    guideline also enforced journalists to doc-
der and Friedman [51] proposed data statements for        ument    important    demographics (e.g., age, gender, class)
text data (though it can be applied more broadly) to      features    for different   journalism roles (e.g., data anno-
alleviate bias and exclusion against certain groups of    tator, speaker,    data  curator,  data collector, scripter, ed-
people in Natural Language Processing (NLP) technol-      itor, data   analyst,  presenter,  director) to provide trans-
ogy. Gebru et al. [50] also developed datasheets for      parency     against  inadvertent    biases. These roles’ defi-
datasets – a documentation practice to enable account-    nitions   are informed     and  combined  from prior research
ability and transparency among dataset creators and       in  [51,  20,  61]  to  cover   a broad  range   of journalism
consumers in the ML community. Diakopoulos and            roles. Some     of these   roles  may  have overlapping   (data
                                                          analysis)
Koliska [12] proposed several factors important for achiev-            functions    and   responsibilities across  differ-
ing algorithmic transparency in the news media. In ent organizations [28].
this work, I have adapted, refined, and integrated these     Dataset Explanation in the News Storytelling:
data documentation practices for journalism. The final    In  the proposed guideline, journalists should provide
prototype is shown in Figure 1, and the specific fea-     context    for any dataset used by documenting Who,
ture selection criteria are described below:              What,    When,     Why, and How [40, 44, 1] related ques-
   Major Categories: Consistent with prior work de-       tions.  For   example,    journalists were asked to provide
scribed in Gebru et al. [50], journalists are required to context    associated   with   a particular dataset in the Mo-
document information for each of the major categories     tivation   category.     Together   with demographic infor-
(Blue text in Figure 1): Motivation, Composition, Col-    mation    across   different   journalism  roles and subjects,
                                                          it is easy to demystify “WHY” a certain dataset was
used in a story. This can also provide an indication Used, Parametes/Features, Tools/Editor, Programming Lan-
of any pre-existing biases that have gone unnoticed. guage and Code, Hardware, Verification, Story Narra-
The individual/organization/team, “WHO” worked on tive Related. These categories were carefully assem-
the story can be found by combining information from bled and informed by previous research in such a way
Motivation, Collection, Preprocessing, Uses, and Mainte- that it covers all the algorithm related details needed
nance categories. “WHAT” aspect or feature descrip- for the journalist replication task without being redun-
tion and other related information for any dataset are dant [12, 54, 10].
covered in the Composition, Collection, and Preprocess-       Algorithmic Explanation in the News Storytelling:
ing categories. Similarly, “WHEN” information is trackedExplanation regarding algorithm use is required to be
through Maintenance. “HOW” aspect of a dataset is in- documented in the aforementioned categories, for ex-
cluded in the Collection, and Preprocessing categories. ample, “WHAT” model/algorithm was used should be
Please note that How, What, Why, Who, When char- documented in the Model/Algorithm Used category; “WHAT”
acteristic aspects of a dataset in these categories may parameters were chosen and “WHY” should be docu-
not be exclusive, however, they provide all the factors mented in the Parameters/Features category; “WHO”
necessary (to the best of my knowledge) for respon- wrote the code, including code/data verification related
sible R&R dataset practice. These pieces of informa- information should be described in Programming Lan-
tion collectively provide sufficient context and insights guage and Code and in Verification category. These fea-
from individual and organizational decision making tures cover specific information to allow journalists to
perspectives in the news storytelling [44, 1].             replicate data analysis done by others (even for them-
   The proposed responsible dataset prototype (Figure selves for later reference), to make sure that journal-
1) is dataset type (e.g., healthcare, finance, housing) ist’s (and their teams) does not have to reproduce code
agnostic, meaning that the journalists could describe blindly when checking existing data work. Factors re-
any dataset types used in a story with the help of this lated to news story was included in the Story Narrative
prototype. The journalists must also conform to the Related informed by the work in Kovach and Rosen-
privacy and anonymity of their news sources such as stiel [10] consists of specific story related facts such as
anonymous data sources [10, 59]. It is also important quotes, names, date-time information. All these fac-
to note that all personal demographics should be pub- tors collectively enable journalists to verify facts/numbers
lished only after receiving user consent [54]. Describ- when an error goes unnoticed after publication, by thor-
ing and characterizing algorithmic information together ough and careful documentation throughout the life-
with datasets will further facilitate journalists’ verifi- cycle of a story [28, 20].
cation [10] and R&R needs [6, 51, 62], discussed below.       In the above paragraphs, I described the methods
                                                           for responsible datasets and algorithm use guidelines
                                                           in the context of journalism. Responsible dataset prac-
3.2. Responsible Algorithmic Use                           tice has the ability to prevent or reveal unforeseen bi-
                                                           ases (e.g., pre-existing, emergent) in journalistic data
       Practice
                                                           work practices. Journalists (with caution and if they
The methods used for developing the responsible algo- are willing) can take certain level of accountability in
rithm use guideline to support journalists verification their dataset use and attain users trust through respon-
need is provided below.                                    sible dataset and algorithmic practices (with caution
                                                           by revealing what they know and how they know it).
3.2.1. Methods for Responsible Algorithm Use
       Guideline
                                                          4. Conclusion and Future Work
Previously, Mitchell et al. [52] proposed model cards
for explaining Machine Learning (ML) models. The As journalists become more reliant on data and algo-
very initial prototype of responsible algorithm use for rithms, it is important that they become responsible
journalism is designed based on taking inspiration from dataset and algorithm users. Therefore, this work pro-
this and other similar works described in [12, 42, 10, 20, posed a question-driven responsible datasets and al-
54, 52]. The final responsible algorithm use guideline gorithm documentation guideline to support journal-
is shown in Figure 2.                                      ists’ replicability and reproducibility (R&R) needs – as
   Major Categories: The information for the respon- a way to facilitate transparent algorithmic practices in
sible algorithm/model use was organized in the fol- the news media. The proposed guideline requires jour-
lowing categories (Blue text in Figure 2): Model/Algorithmnalists to document and/or summarize datasets and al-
Figure 1: Responsible Dataset Use Guideline Questions for journalism.



gorithm related information by answering several key          was meant to be exhaustive, but likely it is not because,
questions regarding news storytelling. The questions          in the real-world journalism practice, things might change
were derived and informed by relevant prior work from         due to various factors outside of data and algorithms
transparent journalism [12, 10, 19, 13, 20, 54] and data      (e.g., resource/timing constraints, legal, profit vs non-
science among others [50, 42, 32, 34, 33, 51, 58, 44, 52,     profit); as a result, new questions might emerge and
38]. The proposed responsible documentation guide-            add up. Lastly, as newsrooms are increasingly adopt-
line is specifically crafted for journalism (or journalists   ing automated news production, thus, how the pro-
team internal use), but perhaps maybe with caution or         posed method will scale is an open line of inquiry.
upon request can be made available for the citizens as
well.
   There are several ways this work can be extended           5. Acknowledgments
in the future, first, this work should be evaluated with
                                                              I would like to thank anonymous reviewers for their
journalists and other stakeholders to understand di-
                                                              valuable comments and feedback.
verse (critical) user information needs [63] (e.g., what
information is safe to reveal and to whom). Secondly,
the factors reported in the initial guideline, though,
Figure 2: Responsible Algorithm use Guideline to facilitate verification and reproduciblity in News Storytelling.



References                                                         research, Proceedings of the ACM on Human-
                                                                   Computer Interaction 4 (2020) 1–22.
 [1] C. D’Ignazio, L. F. Klein, Data feminism, MIT             [7] B. Haibe-Kains, G. A. Adam, A. Hosny, F. Kho-
     Press, 2020.                                                  dakarami, L. Waldron, B. Wang, C. McIntosh,
 [2] A. Christin, Algorithms in practice: Comparing                A. Goldenberg, A. Kundaje, C. S. Greene, et al.,
     web journalism and criminal justice, Big Data &               Transparency and reproducibility in artificial in-
     Society 4 (2017) 2053951717718855.                            telligence, Nature 586 (2020) E14–E16.
 [3] V. Eubanks, Automating inequality: How high-              [8] S. Chattopadhyay, I. Prasad, A. Z. Henley,
     tech tools profile, police, and punish the poor, St.          A. Sarma, T. Barik, What’s wrong with compu-
     Martin’s Press, 2018.                                         tational notebooks? pain points, needs, and de-
 [4] C. O’neil, Weapons of math destruction: How big               sign opportunities, in: Proceedings of the 2020
     data increases inequality and threatens democ-                CHI Conference on Human Factors in Comput-
     racy, Crown, 2016.                                            ing Systems, 2020, pp. 1–12.
 [5] E. Raff, A step toward quantifying indepen-               [9] J. A. Tullis, B. Kar, Where is the provenance? eth-
     dently reproducible machine learning research,                ical replicability and reproducibility in giscience
     Advances in Neural Information Processing Sys-                and its critical applications, Annals of the Ameri-
     tems 32 (2019) 5485–5495.                                     can Association of Geographers 111 (2021) 1318–
 [6] M. Feinberg, W. Sutherland, S. B. Nelson, M. H.               1328.
     Jarrahi, A. Rajasekar, The new reality of re-            [10] B. Kovach, T. Rosenstiel, The elements of journal-
     producibility: The role of data work in scientific            ism: What newspeople should know and the pub-
     lic should expect, Three Rivers Press (CA), 2014.         tigative reporting: Using an expert system to
[11] T. Aitamurto, M. Ananny, C. W. Anderson,                  enhance journalists’ ability to discover original
     L. Birnbaum, N. Diakopoulos, M. Hanson, J. Hull-          public affairs stories, Digital Journalism 3 (2015)
     man, N. Ritchie, Hci for accurate, impartial              814–831.
     and transparent journalism: Challenges and so- [26] A. Jain, B. Sharma, P. Choudhary, R. Sangave,
     lutions, in: Extended Abstracts of the 2019               W. Yang, Data-driven investigative journal-
     CHI Conference on Human Factors in Comput-                ism for connectas dataset,          arXiv preprint
     ing Systems, 2019, pp. 1–8.                               arXiv:1804.08675 (2018).
[12] N. Diakopoulos, M. Koliska, Algorithmic trans- [27] D. Showkat, E. P. S. Baumer, Outliers: More than
     parency in the news media, Digital Journalism 5           numbers? (2020).
     (2017) 809–828.                                      [28] D. Showkat, E. P. S. Baumer, Where do stories
[13] K. McBride, T. Rosenstiel, The new ethics of jour-        come from? examining the exploration process
     nalism: Principles for the 21st century, CQ Press,        in investigative data journalism, Proceedings
     2013.                                                     of the ACM on Human-Computer Interaction 5
[14] politifact, Politifact: The poynter institute, 2022.      (2021) 1–31.
     URL: https://www.politifact.com/, accessed: [29] B. Lee, M. Brehmer, P. Isenberg, E. K. Choe,
     2022-1-1.                                                 R. Langner, R. Dachselt, Data visualization on
[15] S.     M.     Julia     Angwin,     Jeff    Larson,       mobile devices, in: Extended Abstracts of the
     L. Kirchner, Machine bias, 2016. URL:                     2018 CHI Conference on Human Factors in Com-
     https://www.propublica.org/article/                       puting Systems, 2018, pp. 1–8.
     machine-bias-risk-assessments-in-criminal-sentencing,[30] D. G. Johnson, N. Diakopoulos, What to do
     accessed: 2020-07-08.                                     about deepfakes, Communications of the ACM
[16] S. U. Noble, Algorithms of oppression: How                64 (2021) 33–35.
     search engines reinforce racism, nyu Press, 2018. [31] D. Showkat, Determining newcomers barrier in
[17] I. Shapiro, Evaluating journalism: Towards an             software development: An it industry based in-
     assessment framework for the practice of jour-            vestigation, in: Companion of the 2018 ACM
     nalism, Journalism Practice 4 (2010) 143–162.             Conference on Computer Supported Cooperative
[18] N. H. Riche, C. Hurter, N. Diakopoulos,                   Work and Social Computing, 2018, pp. 165–168.
     S. Carpendale, Data-driven storytelling, CRC [32] I. Steinmacher, T. Conte, M. A. Gerosa, D. Red-
     Press, 2018.                                              miles, Social barriers faced by newcomers plac-
[19] J. Gray, L. Chambers, L. Bounegru, The data jour-         ing their first contribution in open source soft-
     nalism handbook: How journalists can use data             ware projects, in: Proceedings of the 18th ACM
     to improve the news, " O’Reilly Media, Inc.", 2012.       conference on Computer supported cooperative
[20] F. Chevalier, M. Tory, B. Lee, J. van Wijk, G. San-       work & social computing, 2015, pp. 1379–1392.
     tucci, M. Dörk, J. Hullman, From analysis to [33] A. X. Zhang, M. Muller, D. Wang, How do data
     communication: Supporting the lifecycle of a              science workers collaborate? roles, workflows,
     story, in: Data-Driven Storytelling, AK Pe-               and tools, Proceedings of the ACM on Human-
     ters/CRC Press, 2018, pp. 169–202.                        Computer Interaction 4 (2020) 1–23.
[21] H. De Burgh, Investigative journalism, Rout- [34] M. Muller, I. Lange, D. Wang, D. Piorkowski,
     ledge, 2008.                                              J. Tsay, Q. V. Liao, C. Dugan, T. Erickson, How
[22] T. B. Brown, B. Mann, N. Ryder, M. Sub-                   data science workers work with data: Discovery,
     biah, J. Kaplan, P. Dhariwal, A. Neelakantan,             capture, curation, design, creation, in: Proceed-
     P. Shyam, G. Sastry, A. Askell, et al., Language          ings of the 2019 CHI Conference on Human Fac-
     models are few-shot learners, arXiv preprint              tors in Computing Systems, 2019, pp. 1–15.
     arXiv:2005.14165 (2020).                             [35] A. Y. Wang, D. Wang, J. Drozdal, X. Liu, S. Park,
[23] C.-G. Lindén, H. Tuulonen, A. Bäck, N. Di-                S. Oney, C. Brooks,         What makes a well-
     akopoulos, M. Granroth-Wilding, L. Haapanen,              documented notebook? a case study of data sci-
     L. Leppänen, M. Melin, T. Moring, M. Munezero,            entists’ documentation practices in kaggle, in:
     et al., News automation: The rewards, risks and           Extended Abstracts of the 2021 CHI Conference
     realities of ‘machine journalism’ (2019).                 on Human Factors in Computing Systems, 2021,
[24] N. Diakopoulos, Automating the news, Harvard              pp. 1–7.
     University Press, 2019.                              [36] N. Diakopoulos, Algorithmic accountability re-
[25] M. Broussard, Artificial intelligence for inves-          porting: On the investigation of black boxes
     (2014).                                                  puting systems, 2018, pp. 1–18.
[37] K. Kirkpatrick, Putting the data science into jour- [49] H. Kaur, H. Nori, S. Jenkins, R. Caruana, H. Wal-
     nalism, 2015.                                            lach, J. Wortman Vaughan, Interpreting inter-
[38] P. Guo, Data science workflow: Overview                  pretability: Understanding data scientists’ use of
     and challenges, OCTOBER 30, 2013. URL:                   interpretability tools for machine learning, in:
     https://cacm.acm.org/blogs/blog-cacm/                    Proceedings of the 2020 CHI Conference on Hu-
     169199-data-science-workflow-overview-and-challenges/    man Factors in Computing Systems, 2020, pp. 1–
     fulltext, accessed: 2020-08-22.                          14.
[39] P. Bradshaw, The inverted pyramid of [50] T. Gebru, J. Morgenstern, B. Vecchione, J. W.
     data journalism, JULY 07, 2011. URL:                     Vaughan, H. Wallach, H. Daumé III, K. Craw-
     https://onlinejournalismblog.com/2011/07/                ford, Datasheets for datasets, arXiv preprint
     07/the-inverted-pyramid-of-data-journalism/,             arXiv:1803.09010 (2018).
     accessed: 2020-05-22.                               [51] E. M. Bender, B. Friedman, Data statements for
[40] E. Rader, K. Cotter, J. Cho, Explanations as mech-       natural language processing: Toward mitigating
     anisms for supporting algorithmic transparency,          system bias and enabling better science, Transac-
     in: Proceedings of the 2018 CHI Conference on            tions of the Association for Computational Lin-
     Human Factors in Computing Systems, ACM,                 guistics 6 (2018) 587–604.
     2018, p. 103.                                       [52] M. Mitchell, S. Wu, A. Zaldivar, P. Barnes,
[41] T. Kulesza, M. Burnett, W.-K. Wong, S. Stumpf,           L. Vasserman, B. Hutchinson, E. Spitzer, I. D.
     Principles of explanatory debugging to person-           Raji, T. Gebru, Model cards for model report-
     alize interactive machine learning, in: Proceed-         ing, in: Proceedings of the conference on fair-
     ings of the 20th international conference on in-         ness, accountability, and transparency, 2019, pp.
     telligent user interfaces, 2015, pp. 126–137.            220–229.
[42] Q. V. Liao, M. Pribić, J. Han, S. Miller, D. Sow, [53] E. M. Bender, T. Gebru, A. McMillan-Major,
     Question-driven design process for explain-              S. Shmitchell, On the dangers of stochastic par-
     able ai user experiences,           arXiv preprint       rots: Can language models be too big?, in: Pro-
     arXiv:2104.03483 (2021).                                 ceedings of the 2021 ACM Conference on Fair-
[43] N. Sonboli, J. J. Smith, F. C. Berenfus, R. Burke,       ness, Accountability, and Transparency, 2021, pp.
     C. Fiesler, Fairness and transparency in rec-            610–623.
     ommendation: The users’ perspective, arXiv [54] N. Diakopoulos, Transparency, in: The Oxford
     preprint arXiv:2103.08786 (2021).                        Handbook of Ethics of AI, 2020.
[44] U. Ehsan, Q. V. Liao, M. Muller, M. O. Riedl, J. D. [55] C. D’Ignazio, L. F. Klein, Seven intersectional
     Weisz, Expanding explainability: Towards so-             feminist principles for equitable and actionable
     cial transparency in ai systems, arXiv preprint          covid-19 data, Big data & society 7 (2020)
     arXiv:2101.04719 (2021).                                 2053951720942544.
[45] H. Nori, S. Jenkins, P. Koch, R. Caruana, [56] M. Kogan, A. Halfaker, S. Guha, C. Aragon,
     Interpretml: A unified framework for ma-                 M. Muller, S. Geiger, Mapping out human-
     chine learning interpretability, arXiv preprint          centered data science: Methods, approaches, and
     arXiv:1909.09223 (2019).                                 best practices, in: Companion of the 2020 ACM
[46] M. 2020,         What is responsible ma-                 International Conference on Supporting Group
     chine learning?           (preview), 2020. URL:          Work, 2020, pp. 151–156.
     https://docs.microsoft.com/en-us/azure/             [57] M. Vaismoradi, J. Jones, H. Turunen, S. Snelgrove,
     machine-learning/concept-responsible-ml,                 Theme development in qualitative content anal-
     accessed: 2020-11-12.                                    ysis and thematic analysis (2016).
[47] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, [58] E. M. Bender, A typology of ethical risks in
     F. Giannotti, D. Pedreschi, A survey of methods          language technology with an eye towards where
     for explaining black box models, ACM comput-             transparent documentation can help, in: Future
     ing surveys (CSUR) 51 (2018) 1–42.                       of Artificial Intelligence: Language, Ethics, Tech-
[48] A. Abdul, J. Vermeulen, D. Wang, B. Y. Lim,              nology Workshop, 2019.
     M. Kankanhalli, Trends and trajectories for ex- [59] N. Diakopoulos, Ethics in data-driven visual sto-
     plainable, accountable and intelligible systems:         rytelling, in: Data-Driven Storytelling, AK Pe-
     An hci research agenda, in: Proceedings of the           ters/CRC Press, 2018, pp. 233–248.
     2018 CHI conference on human factors in com- [60] S. Holland, A. Hosny, S. Newman, J. Joseph,
     K. Chmielinski, The dataset nutrition label: A
     framework to drive higher data quality stan-
     dards, arXiv preprint arXiv:1805.03677 (2018).
[61] B. Lee, N. H. Riche, P. Isenberg, S. Carpendale,
     More than telling a story: Transforming data into
     visually shared stories, IEEE computer graphics
     and applications 35 (2015) 84–90.
[62] M. Broussard, Big data in practice: Enabling com-
     putational journalism through code-sharing and
     reproducible research methods, Digital Journal-
     ism 4 (2016) 266–279.
[63] J. Kemper, D. Kolkman, Transparent to whom?
     no algorithmic accountability without a critical
     audience, Information, Communication & Soci-
     ety 22 (2019) 2081–2096.