1. Introduction

2.1. Data and Algorithmic Practices in Journalism

Supporting Responsible Data and Algorithmic Practices in The News Media

Dilruba Showkat

showkat.d@northeastern.edu 0 1 0 Joint Proceedings of the ACM IUI 2022 Workshops 1 Northeastern University , Boston, MA , USA

2022

The journalism discipline has become more data and algorithm-driven than ever before. While the need for transparent algorithmic practices in journalism is widely known, less is known about how to go about doing that in practice. As a result, journalists often face challenges associated with Replicability and Reproducibility (R&R) tasks both within the team and also when checking others' data work. Journalists can be facilitated to practice transparency by providing explicit information about the sources and methodologies - by being responsible dataset and algorithm users both within and outside of the organization. In this work, as a case study, I present a very first responsible dataset and responsible algorithm practices specifically crafted for the domain of journalism, as a step towards motivating and supporting transparent algorithmic practices using a question-driven documentation technique. The outcome of this study is open to critique, adoption, adaptation, and future exploration.

eol>responsible journalism transparent journalism replicability and reproducibility

1. Introduction

of journalism is the discipline of verification” [10, p.79].

There are several limitations which often makes it imAlgorithms are widely used in a variety of applica- practical to implement transparent journalism in praction domains ranging from the public and private sec- tice, for instance, misuse of transparent technology tor, healthcare, automated hiring systems, to the crim- through gaming or manipulation [12], information overinal justice system. Sometimes, these algorithms in- load, and others (e.g., cost, presentation). Furthermore, herit, reproduce, or even enhance biases against the fact-checkers tools (such as politifact) [13, 14] are not marginalized population, causing a lack of users’ trust informative enough to support journalists’ replication in these systems [1, 2, 3, 4]. Moreover, “models are tasks – in terms of data and algorithmic analysis. Repliopinions embedded in mathematics” [4, p.27], they en- cability and Reproducibility (R&R) also plays a signifable us to focus on only the outcome, predictor vari- icant role in journalism to make sure that journalisables, and validation data while avoiding anything that tic processes are free from biases [15, 16] and the data promotes an understanding of situations or context they put out in the world is accurate – since “journal[3]. This is problematic, as a result, there is a grow- ism’s first obligation is to the truth” [17, 10]. There ing interest in the design of transparent algorithmic is limited research in this space that supports reprosystems to make the algorithmic decision making and ducibility tasks within the journalism team. Thus, in context more accessible. In a similar vein, there is an this research, building on prior work I will provide a increasing focus to produce replicable and reproducible set of question-driven documentation guideline pracwork in Machine Learning (ML) research, data science, tice to support responsible dataset and algorithm use and in the healthcare domain among others [5, 6, 7, 8, within journalism team. This work provides implica9]. Reproducibility also plays a critical role in Journal- tions for making the news story related information ism (e.g., provenance) [10]. (with caution) also available to the public, and impli

Likewise, the demand for transparent journalism has cations for related technology design intervention in existed for a long time [10, 11], where journalists are the journalism context. expected to describe what data sources they have used, revealing subjects and data analysis methodology, for verification and reproducibility purposes. “The essence 2. Related Work As data becomes readily available news organizations are increasingly becoming more data-driven than ever before [13, 18]. Journalists work with a variety of datasetstigation [28, 27]) between data science and data jourand data types (e.g., text, tables, numerical, categori- nalism [37, 20, 38, 39] work practices [34]. Therefore, cal data [19, 20]) in news storytelling. They use pub- this work will take inspiration from various explainlic datasets such as Medicare or Housing datasets [19], able methods available in data science and other rethey also collect data from other sources such as through lated areas to propose a responsible data and algorithm interviews, surveys, public websites using various tools practice for journalism – that will improve efective and APIs (e.g., ArcGIS). Using public record requests is team communication and support transparent journalalso very common among them [10, 21]. While some ism. stories are based on a single dataset, others are based on multiple datasets. Journalists can chase original 2.2. Current Trend Towards stories or they might also choose to build of of oth- Explanation ers’ work.

When it comes to algorithms, journalists apply a Previous study shows that explaining how/what/why wide range of algorithms; from simple statistical tests aspects of facebook newsfeed algorithms enhance users (e.g, ANOVA, t-test) to advanced Machine Learning awareness of how the system works in the context (ML) algorithms (e.g, regression, classification, unsu- of social media applications [40]. Prior research also pervised ML) for data analysis [19, 20]. Furthermore, examined explainability in specific domains [41, 42]. recent work also shows that news are often co-produced For example, Liao et al. [42] applied a question-driven using automated tools (e.g., that uses natural language method to facilitate explainable AI user experience (XAI generation and large models [22]) alongside with hu- UX) in (adverse) healthcare domain. Existing work mans [23, 24], especially to improve eficiency and pro- also studied fairness and transparency in recommender duction. Needless to mention, “transparency is key. Be- systems [43]. Others explored the socio-organizational ing able to explain how the stories are created is relevant context into explainability [44]. Needless to mention both in-house and audiences” [23, p.7]. Previous work the vast amount of technical work that exists to supalso suggests that journalists rely on outlier detection port technical expert users (e.g., ML engineer, data scifor story idea generation [25, 26, 27, 28], others also re- entist) explainability needs for explaining black box port on applying simple spreadsheets manipulation for and white box models [45, 46, 47, 48]. As evidence similar tasks. Regardless of the data analysis method- suggests, less attention has been paid to support transology or algorithms applied, journalists often perform parency and explainability needs in the journalism converification; that is, checking others’ data work (veri- text. Although journalists apply wide range of data fying either teammate and/or data/charts published by and algorithmic tools in news production [28], applyother news organizations) to ensure correctness [29, ing existing explainable AI (XAI) techniques (even though 19]. sophisticated, e.g., [45]) may require prior knowledge

Verification enables the journalists to ensure that in ML or may not be easily adaptable [49] in the news the data they put out in the world is accurate [10, 30]. storytelling context. Additionally, in case of diverse Verification often depends on the journalists to remem- teams with difering technical skills (ML users vs. SQL ber things, such as the operation (e.g., min, sum) per- users) using these tools across the team might require formed (asking a teammate), and the journalists have extra learning support. As a result, perhaps less techno way of doing it in a way that is reproducible (for nical approach can be suitable for R&R needs in jourothers, even for themselves after a while) without clear nalism. documentation [6, 31]. Verification is also challenging Inspired by related works both in data science (doceven within a team, since people often forget to docu- umentation based approaches) [50, 51, 42, 44, 52, 1, ment methods or how they have arrived at a particular 53] and transparent data journalism [12, 10, 54, 20, 37, result. Similar to the field of data science (and other 19, 13], this work will apply a qualitative questionrelated areas [32, 33, 34, 35]) the lack of proper doc- driven documentation technique to support [42] alumentation is a common problem in code replication gorithmic transparency in journalism. This approach tasks for journalists [20]. will require journalists to provide specific data and al

Even though journalists makes wide use of data and gorithmic details about the news stories by specifyalgorithms in their day-to-day news production, very ing what/why/how/who/when information at diferlimited work explored documentation techniques to ent levels (e.g., individual, organizational, team) of joursupport journalists’ data and algorithmic practices trans- nalistic decision making process to provide richer conparent [20, 19, 36]. Previous work also showed close text [1, 55, 56]. resemblance (also through systematic empirical inves

3. Transparent Data and Algorithmic Practices for Journalism

lection, Preprocessing, Uses, Distribution, and Maintenance. To ensure exhaustiveness and thorough characterization of the datasets (e.g., campaign finance, crime investigation [15]), factors in each of the aforementioned categories are further updated based on work This work will facilitate journalists to properly docu- of transparent journalism [12, 10, 54, 59] and data sciment, contextualize data and algorithmic decision mak- ence [51, 42, 44, 60] due to their relatedness in data ing in news storytelling, to support the practice of al- work practices [37, 19, 38, 20]. Furthermore, factors gorithmic transparency. This was achieved through an related to data reported in Diakopoulos and Koliska extensive review of relevant prior work in journalism, [12] are now carefully incorporated in each of the catdata science, and other related areas. egories where they logically make sense. For simplic

At a very high methodological level, first, factors ity, I show factors related to only two major categories relevant to the dataset/algorithms use in the journal- as follows: ism domain was categorized using content analysis [57] after synthesizing across prior work (similar to the factors listed in [12]), second, those factors were translated into question-driven explanation (e.g., How, What, Why) following prior work such as [42] and others [40, 44]. Specific details of the methods and processes are provided below. • Composition included the following factors: attributes/feature definitions/description, labeled/unlabeled data, data format (e.g., mp4, csv), sample size, missing data/completeness, data category (healthcare), data language (en-us), train/test split, raw vs cleaned data, errors/redundancy, describe sensitive/anonymous/ground truth data 3.1. Responsible Dataset Practice Following is the description of the methods that were used to derive the very first question-driven responsible dataset guidelines for journalism (see Figure 1 for detail). • Preprocessing included the following factors: which data was discarded? why? tools used or done manually? Manual/automated labeling, process? annotator/curator demographics (race, class, gender), data transformation, bias handling.

3.1.1. Methods for Responsible Dataset Guideline

The proposed responsible dataset use guideline was heavily inspired by and built upon previous work described in [50, 58, 51, 12, 10, 42], and adapted specifically to be used in journalism. More precisely, Bender and Friedman [51] proposed data statements for text data (though it can be applied more broadly) to alleviate bias and exclusion against certain groups of people in Natural Language Processing (NLP) technology. Gebru et al. [50] also developed datasheets for datasets – a documentation practice to enable accountability and transparency among dataset creators and consumers in the ML community. Diakopoulos and Koliska [12] proposed several factors important for achieva-nalysis) functions and responsibilities across difering algorithmic transparency in the news media. In ent organizations [28]. this work, I have adapted, refined, and integrated these Dataset Explanation in the News Storytelling: data documentation practices for journalism. The final In the proposed guideline, journalists should provide prototype is shown in Figure 1, and the specific fea- context for any dataset used by documenting Who, ture selection criteria are described below: What, When, Why, and How [40, 44, 1] related ques

Major Categories: Consistent with prior work de- tions. For example, journalists were asked to provide scribed in Gebru et al. [50], journalists are required to context associated with a particular dataset in the Modocument information for each of the major categories tivation category. Together with demographic infor(Blue text in Figure 1): Motivation, Composition, Col- mation across diferent journalism roles and subjects, it is easy to demystify “WHY” a certain dataset was And these factors were then carefully converted into explanation questions. Dataset characterization questions for each of these categories were directly incorporated into the guideline.

Diferent Journalism Roles and Demographics: Following the work from Bender and Friedman [51], the proposed guideline also enforced journalists to document important demographics (e.g., age, gender, class) features for diferent journalism roles (e.g., data annotator, speaker, data curator, data collector, scripter, editor, data analyst, presenter, director) to provide transparency against inadvertent biases. These roles’ definitions are informed and combined from prior research in [51, 20, 61] to cover a broad range of journalism roles. Some of these roles may have overlapping (data used in a story. This can also provide an indication Used, Parametes/Features, Tools/Editor, Programming Lanof any pre-existing biases that have gone unnoticed. guage and Code, Hardware, Verification , Story NarraThe individual/organization/team, “WHO” worked on tive Related. These categories were carefully assemthe story can be found by combining information from bled and informed by previous research in such a way Motivation, Collection, Preprocessing, Uses, and Mainte- that it covers all the algorithm related details needed nance categories. “WHAT” aspect or feature descrip- for the journalist replication task without being reduntion and other related information for any dataset are dant [12, 54, 10]. covered in the Composition, Collection, and Preprocess- Algorithmic Explanation in the News Storytelling: ing categories. Similarly, “WHEN” information is trackedExplanation regarding algorithm use is required to be through Maintenance. “HOW” aspect of a dataset is in- documented in the aforementioned categories, for excluded in the Collection, and Preprocessing categories. ample, “WHAT” model/algorithm was used should be Please note that How, What, Why, Who, When char- documented in the Model/Algorithm Used category; “WHAT” acteristic aspects of a dataset in these categories may parameters were chosen and “WHY” should be docunot be exclusive, however, they provide all the factors mented in the Parameters/Features category; “WHO” necessary (to the best of my knowledge) for respon- wrote the code, including code/data verification related sible R&R dataset practice. These pieces of informa- information should be described in Programming Lantion collectively provide suficient context and insights guage and Code and in Verification category. These feafrom individual and organizational decision making tures cover specific information to allow journalists to perspectives in the news storytelling [44, 1]. replicate data analysis done by others (even for them

The proposed responsible dataset prototype (Figure selves for later reference), to make sure that journal1) is dataset type (e.g., healthcare, finance, housing) ist’s (and their teams) does not have to reproduce code agnostic, meaning that the journalists could describe blindly when checking existing data work. Factors reany dataset types used in a story with the help of this lated to news story was included in the Story Narrative prototype. The journalists must also conform to the Related informed by the work in Kovach and Rosenprivacy and anonymity of their news sources such as stiel [10] consists of specific story related facts such as anonymous data sources [10, 59]. It is also important quotes, names, date-time information. All these facto note that all personal demographics should be pub- tors collectively enable journalists to verify facts/numbers lished only after receiving user consent [54]. Describ- when an error goes unnoticed after publication, by thoring and characterizing algorithmic information together ough and careful documentation throughout the lifewith datasets will further facilitate journalists’ verifi- cycle of a story [28, 20]. cation [10] and R&R needs [6, 51, 62], discussed below. In the above paragraphs, I described the methods for responsible datasets and algorithm use guidelines in the context of journalism. Responsible dataset prac3.2. Responsible Algorithmic Use tice has the ability to prevent or reveal unforeseen biPractice ases (e.g., pre-existing, emergent) in journalistic data work practices. Journalists (with caution and if they The methods used for developing the responsible algo- are willing) can take certain level of accountability in rithm use guideline to support journalists verification their dataset use and attain users trust through responneed is provided below. sible dataset and algorithmic practices (with caution by revealing what they know and how they know it).

3.2.1. Methods for Responsible Algorithm Use Guideline 4. Conclusion and Future Work

Previously, Mitchell et al. [52] proposed model cards for explaining Machine Learning (ML) models. The As journalists become more reliant on data and algovery initial prototype of responsible algorithm use for rithms, it is important that they become responsible journalism is designed based on taking inspiration from dataset and algorithm users. Therefore, this work prothis and other similar works described in [12, 42, 10, 20, posed a question-driven responsible datasets and al54, 52]. The final responsible algorithm use guideline gorithm documentation guideline to support journalis shown in Figure 2. ists’ replicability and reproducibility (R&R) needs – as

Major Categories: The information for the respon- a way to facilitate transparent algorithmic practices in sible algorithm/model use was organized in the fol- the news media. The proposed guideline requires jourlowing categories (Blue text in Figure 2): Model/Algorithmnalists to document and/or summarize datasets and algorithm related information by answering several key was meant to be exhaustive, but likely it is not because, questions regarding news storytelling. The questions in the real-world journalism practice, things might change were derived and informed by relevant prior work from due to various factors outside of data and algorithms transparent journalism [12, 10, 19, 13, 20, 54] and data (e.g., resource/timing constraints, legal, profit vs nonscience among others [50, 42, 32, 34, 33, 51, 58, 44, 52, profit); as a result, new questions might emerge and 38]. The proposed responsible documentation guide- add up. Lastly, as newsrooms are increasingly adoptline is specifically crafted for journalism (or journalists ing automated news production, thus, how the proteam internal use), but perhaps maybe with caution or posed method will scale is an open line of inquiry. upon request can be made available for the citizens as well.

There are several ways this work can be extended 5. Acknowledgments in the future, first, this work should be evaluated with journalists and other stakeholders to understand di- I would like to thank anonymous reviewers for their verse (critical) user information needs [63] (e.g., what valuable comments and feedback. information is safe to reveal and to whom). Secondly, the factors reported in the initial guideline, though, lic should expect, Three Rivers Press (CA), 2014. tigative reporting: Using an expert system to [11] T. Aitamurto, M. Ananny, C. W. Anderson, enhance journalists’ ability to discover original L. Birnbaum, N. Diakopoulos, M. Hanson, J. Hull- public afairs stories, Digital Journalism 3 (2015) man, N. Ritchie, Hci for accurate, impartial 814–831. and transparent journalism: Challenges and so- [26] A. Jain, B. Sharma, P. Choudhary, R. Sangave, lutions, in: Extended Abstracts of the 2019 W. Yang, Data-driven investigative journalCHI Conference on Human Factors in Comput- ism for connectas dataset, arXiv preprint ing Systems, 2019, pp. 1–8. arXiv:1804.08675 (2018). [12] N. Diakopoulos, M. Koliska, Algorithmic trans- [27] D. Showkat, E. P. S. Baumer, Outliers: More than parency in the news media, Digital Journalism 5 numbers? ( 2020 ).

(2017) 809–828. [28] D. Showkat, E. P. S. Baumer, Where do stories [13] K. McBride, T. Rosenstiel, The new ethics of jour- come from? examining the exploration process nalism: Principles for the 21st century, CQ Press, in investigative data journalism, Proceedings 2013. of the ACM on Human-Computer Interaction 5 [14] politifact, Politifact: The poynter institute, 2022. (2021) 1–31.

URL: https://www.politifact.com/, accessed: [29] B. Lee, M. Brehmer, P. Isenberg, E. K. Choe, 2022-1-1. R. Langner, R. Dachselt, Data visualization on [15] S. M. Julia Angwin, Jef Larson, mobile devices, in: Extended Abstracts of the L. Kirchner, Machine bias, 2016. URL: 2018 CHI Conference on Human Factors in Comhttps://www.propublica.org/article/ puting Systems, 2018, pp. 1–8. machine-bias-risk-assessments-in-criminal-senten[c3in0g], D. G. Johnson, N. Diakopoulos, What to do accessed: 2020-07-08. about deepfakes, Communications of the ACM [16] S. U. Noble, Algorithms of oppression: How 64 (2021) 33–35.

search engines reinforce racism, nyu Press, 2018. [31] D. Showkat, Determining newcomers barrier in [17] I. Shapiro, Evaluating journalism: Towards an software development: An it industry based inassessment framework for the practice of jour- vestigation, in: Companion of the 2018 ACM nalism, Journalism Practice 4 (2010) 143–162. Conference on Computer Supported Cooperative [18] N. H. Riche, C. Hurter, N. Diakopoulos, Work and Social Computing, 2018, pp. 165–168.

S. Carpendale, Data-driven storytelling, CRC [32] I. Steinmacher, T. Conte, M. A. Gerosa, D. RedPress, 2018. miles, Social barriers faced by newcomers plac[19] J. Gray, L. Chambers, L. Bounegru, The data jour- ing their first contribution in open source softnalism handbook: How journalists can use data ware projects, in: Proceedings of the 18th ACM to improve the news, " O’Reilly Media, Inc.", 2012. conference on Computer supported cooperative [20] F. Chevalier, M. Tory, B. Lee, J. van Wijk, G. San- work & social computing, 2015, pp. 1379–1392. tucci, M. Dörk, J. Hullman, From analysis to [33] A. X. Zhang, M. Muller, D. Wang, How do data communication: Supporting the lifecycle of a science workers collaborate? roles, workflows, story, in: Data-Driven Storytelling, AK Pe- and tools, Proceedings of the ACM on Humanters/CRC Press, 2018, pp. 169–202. Computer Interaction 4 ( 2020 ) 1–23. [21] H. De Burgh, Investigative journalism, Rout- [34] M. Muller, I. Lange, D. Wang, D. Piorkowski, ledge, 2008. J. Tsay, Q. V. Liao, C. Dugan, T. Erickson, How [22] T. B. Brown, B. Mann, N. Ryder, M. Sub- data science workers work with data: Discovery, biah, J. Kaplan, P. Dhariwal, A. Neelakantan, capture, curation, design, creation, in: ProceedP. Shyam, G. Sastry, A. Askell, et al., Language ings of the 2019 CHI Conference on Human Facmodels are few-shot learners, arXiv preprint tors in Computing Systems, 2019, pp. 1–15. arXiv:2005.14165 ( 2020 ). [35] A. Y. Wang, D. Wang, J. Drozdal, X. Liu, S. Park, [23] C.-G. Lindén, H. Tuulonen, A. Bäck, N. Di- S. Oney, C. Brooks, What makes a wellakopoulos, M. Granroth-Wilding, L. Haapanen, documented notebook? a case study of data sciL. Leppänen, M. Melin, T. Moring, M. Munezero, entists’ documentation practices in kaggle, in: et al., News automation: The rewards, risks and Extended Abstracts of the 2021 CHI Conference realities of ‘machine journalism’ (2019). on Human Factors in Computing Systems, 2021, [24] N. Diakopoulos, Automating the news, Harvard pp. 1–7.

University Press, 2019. [36] N. Diakopoulos, Algorithmic accountability re[25] M. Broussard, Artificial intelligence for inves- porting: On the investigation of black boxes (2014). puting systems, 2018, pp. 1–18. [37] K. Kirkpatrick, Putting the data science into jour- [49] H. Kaur, H. Nori, S. Jenkins, R. Caruana, H. Walnalism, 2015. lach, J. Wortman Vaughan, Interpreting inter[38] P. Guo, Data science workflow: Overview pretability: Understanding data scientists’ use of and challenges, OCTOBER 30, 2013. URL: interpretability tools for machine learning, in: https://cacm.acm.org/blogs/blog-cacm/ Proceedings of the 2020 CHI Conference on Hu169199-data-science-workflow-overview-and-challenges/man Factors in Computing Systems, 2020, pp. 1– fulltext, accessed: 2020-08-22. 14. [39] P. Bradshaw, The inverted pyramid of [50] T. Gebru, J. Morgenstern, B. Vecchione, J. W. data journalism, JULY 07, 2011. URL: Vaughan, H. Wallach, H. Daumé III, K. Crawhttps://onlinejournalismblog.com/2011/07/ ford, Datasheets for datasets, arXiv preprint 07/the-inverted-pyramid-of-data-journalism/, arXiv:1803.09010 (2018).

accessed: 2020-05-22. [51] E. M. Bender, B. Friedman, Data statements for [40] E. Rader, K. Cotter, J. Cho, Explanations as mech- natural language processing: Toward mitigating anisms for supporting algorithmic transparency, system bias and enabling better science, Transacin: Proceedings of the 2018 CHI Conference on tions of the Association for Computational LinHuman Factors in Computing Systems, ACM, guistics 6 (2018) 587–604.

2018, p. 103. [52] M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, [41] T. Kulesza, M. Burnett, W.-K. Wong, S. Stumpf, L. Vasserman, B. Hutchinson, E. Spitzer, I. D.

Principles of explanatory debugging to person- Raji, T. Gebru, Model cards for model reportalize interactive machine learning, in: Proceed- ing, in: Proceedings of the conference on fairings of the 20th international conference on in- ness, accountability, and transparency, 2019, pp. telligent user interfaces, 2015, pp. 126–137. 220–229. [42] Q. V. Liao, M. Pribić, J. Han, S. Miller, D. Sow, [53] E. M. Bender, T. Gebru, A. McMillan-Major, Question-driven design process for explain- S. Shmitchell, On the dangers of stochastic parable ai user experiences, arXiv preprint rots: Can language models be too big?, in: ProarXiv:2104.03483 (2021). ceedings of the 2021 ACM Conference on Fair[43] N. Sonboli, J. J. Smith, F. C. Berenfus, R. Burke, ness, Accountability, and Transparency, 2021, pp.

C. Fiesler, Fairness and transparency in rec- 610–623. ommendation: The users’ perspective, arXiv [54] N. Diakopoulos, Transparency, in: The Oxford preprint arXiv:2103.08786 (2021). Handbook of Ethics of AI, 2020. [44] U. Ehsan, Q. V. Liao, M. Muller, M. O. Riedl, J. D. [55] C. D’Ignazio, L. F. Klein, Seven intersectional Weisz, Expanding explainability: Towards so- feminist principles for equitable and actionable cial transparency in ai systems, arXiv preprint covid-19 data, Big data & society 7 ( 2020 ) arXiv:2101.04719 (2021). 2053951720942544. [45] H. Nori, S. Jenkins, P. Koch, R. Caruana, [56] M. Kogan, A. Halfaker, S. Guha, C. Aragon, Interpretml: A unified framework for ma- M. Muller, S. Geiger, Mapping out humanchine learning interpretability, arXiv preprint centered data science: Methods, approaches, and arXiv:1909.09223 (2019). best practices, in: Companion of the 2020 ACM [46] M. 2020, What is responsible ma- International Conference on Supporting Group chine learning? (preview), 2020. URL: Work, 2020, pp. 151–156. https://docs.microsoft.com/en-us/azure/ [57] M. Vaismoradi, J. Jones, H. Turunen, S. Snelgrove, machine-learning/concept-responsible-ml, Theme development in qualitative content analaccessed: 2020-11-12. ysis and thematic analysis (2016). [47] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, [58] E. M. Bender, A typology of ethical risks in F. Giannotti, D. Pedreschi, A survey of methods language technology with an eye towards where for explaining black box models, ACM comput- transparent documentation can help, in: Future ing surveys (CSUR) 51 (2018) 1–42. of Artificial Intelligence: Language, Ethics, Tech[48] A. Abdul, J. Vermeulen, D. Wang, B. Y. Lim, nology Workshop, 2019.

M. Kankanhalli, Trends and trajectories for ex- [59] N. Diakopoulos, Ethics in data-driven visual stoplainable, accountable and intelligible systems: rytelling, in: Data-Driven Storytelling, AK PeAn hci research agenda, in: Proceedings of the ters/CRC Press, 2018, pp. 233–248. 2018 CHI conference on human factors in com- [60] S. Holland, A. Hosny, S. Newman, J. Joseph, K. Chmielinski, The dataset nutrition label: A framework to drive higher data quality standards, arXiv preprint arXiv:1805.03677 (2018). [61] B. Lee, N. H. Riche, P. Isenberg, S. Carpendale,

More than telling a story: Transforming data into visually shared stories, IEEE computer graphics and applications 35 (2015) 84–90. [62] M. Broussard, Big data in practice: Enabling computational journalism through code-sharing and reproducible research methods, Digital Journalism 4 (2016) 266–279. [63] J. Kemper, D. Kolkman, Transparent to whom? no algorithmic accountability without a critical audience, Information, Communication & Society 22 (2019) 2081–2096.

Computer Interaction 4 ( 2020 ) 1 - 22 . [1] C. D'Ignazio , L. F. Klein , Data feminism , MIT [7]

Haibe-Kains ,

G. A.

Adam ,

Hosny , F. Kho-

Press, 2020 . dakarami, L. Waldron,

Wang ,

McIntosh , [2]

Christin , Algorithms in practice: Comparing A. Goldenberg , A.

Kundaje , C. S.

Greene , et al.,

Society 4 ( 2017 ) 2053951717718855 . telligence, Nature 586 ( 2020 ) E14 - E16 . [3]

Eubanks , Automating inequality: How high- [8]

Chattopadhyay , I. Prasad ,

A. Z.

Henley ,

Martin's Press , 2018 . tational notebooks? pain points, needs , and de[4] C. O'neil, Weapons of math destruction: How big sign opportunities , in: Proceedings of the 2020

racy , Crown, 2016 . ing Systems , 2020 , pp. 1 - 12 . [5]

Raf , A step toward quantifying indepen- [9]

J. A.

Tullis ,

Kar , Where is the provenance? eth-

tems 32 ( 2019 ) 5485 - 5495 . can Association of Geographers 111 ( 2021 ) 1318 - [6]

Feinberg ,

Sutherland ,

S. B.

Nelson , M. H. 1328 .

Jarrahi , A.

Rajasekar , The new reality of re- [10] B.

Kovach , T. Rosenstiel, The elements of journal-