Lessons for Supporting Data Science from the Everyday Automation Experience of Spell-Checkers Kevin Crowston Abstract Syracuse University School of We apply two theoretical frameworks to analyze spell-check- Information Studies ers as a form of automation and apply the lessons learned Syracuse, NY 13244, USA to analyze opportunities to support data science. The anal- crowston@syr.edu ysis distinguishes between automation of analysis to sug- gest actions and automation of implementation of actions. Having the automation work in the same space as users (e.g., editing the same document) supports stigmergic coor- dination between the two, but attention is needed to ensure that the contributions can be combined and have a recog- nizable form that indicates their purpose. Author Keywords automation, spell-checking CCS Concepts •Social and professional topics → Automation; •Human- centered computing → Interaction design theory, con- cepts and paradigms; •Applied computing → Word pro- cessors; Introduction A form of automation (i.e., the capability of a system to perform some tasks without human involvement) experi- enced by many people daily is the spell-checker, which has ________________________________________________________ evolved from a stand-alone application providing suggested Workshop proceedings Automation Experience across Domains corrections [3, 6] to an integral component of word proces- In conjunction with CHI'20, April 26th, 2020, Honolulu, HI, USA Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Website: http://everyday-automation.tech-experience.at sors or even a ubiquitous component of a user interface we apply theorizing about stigmergic coordination, mean- framework [4]. As a user types, automated spell-checkers ing coordination through a shared work product rather than flag unknown words as likely errors, offer suggested re- through separate communication. Ref [2] identifies three placements (see Fig. 1) or even make replacements without socio-technical affordances needed to support stigmergic human involvement (see Fig. 2). In this position statement, coordination, namely visibility and combinability of work of we analyze the nature of automation provided by spell- recognizable genres. Visibility means that work done by checkers to derive lessons for ubiquitous automation in one contributor is visible to others. Combinability means other settings, specifically, data science. that different contributions can be made to fit together, as Figure 1: A spelling mistake has been observed to be important for open source soft- identified by the Microsoft Word Theory ware development [5]. Genre means that the contributed spell-checker and a proposed We apply two frameworks for our analysis. First, we apply work has socially-recognized regularities of form and pur- replacement a simple framework developed in Ref [1]. This framework pose that enable others to know how they should work with decomposes information processing tasks into four steps: it. The analysis in Ref [2] focuses on supporting coordina- 1) information acquisition; 2) information analysis; 3) de- tion between members of a work team but these features cision and action selection; and 4) action implementation. may also support coordination between a system and a By considering if each step can be partly or fully automated user. (meaning that the particular step can be done by a system Figure 2: A spelling mistake without human intervention), the framework identifies four Results automatically corrected by the levels of automation: Applying the first framework, spell-checking systems initially Microsoft Word spell-checker* were decision support systems (level 1), flagging unrecog- * Note: animation works in Adobe nized words and giving a list of possible replacements when Reader but not in some other PDF 0. No automation requested. Currently, many support blended decision mak- readers. 1. Decision support: steps 1 and 2 are automated but ing (level 2), automatically fixing (or at least changing) some in step 3, the system recommends possible actions detected errors while deferring other to the user. However, from which the human chooses one to implement given the variability of typing errors, it seems unlikely that spell-checking will ever be completely automated. 2. Blended decision making: all steps are automated but only for a subset of decisions Considering next questions of intelligibility, a spell-checker’s suggestions in current systems are visible because the sys- 3. Complete automation tem is integrated with the work it is meant to support so that the intervention happens in the same space as the work. Second, the workshop call identifies four key aspects of In other words, the interaction between the system and the ubiquitous automated systems: intelligibility, interventions, user is stigmergically coordinated. The users’ typing in a interplay and integrity. In this position statement, we focus document triggers the actions of the spell-checker and the on the first two: how can a human tell what the system is spell-checker offers suggestions to the user or takes ac- doing and intervene if desired? To analyze these issues, tions independently in the same interface, thus making the actions visible. Interestingly, spell-checkers don’t show cer- Our analysis of spell-checkers suggests some design im- tainty of their suggestions, though it might be implicit in the plications for such a system. First, there are different levels ordering of suggestions. For spell-checking, the other two of functionality: at the lowest level of automation, the sys- affordances needed for stigmergic coordination, combinabil- tem would simply flag issues and suggest possibilities to ity and genre of contributions, are non-issues, as words are the user while at a higher level, it would automatically ex- easily combined and have a clear form and purpose. ecute some actions (e.g., automatically checking test as- sumptions). And as before, completely automated analysis Finally, considering opportunities for intervention, a user seems unlikely. can intervene in the work of the spell-checker by interacting with it in the document. Most spell-checkers can be cus- Second, intelligibility would be increased by having the sys- tomized by correcting the corrections made or adding to the tem work in the same space as the users to support stig- dictionary. However, further tuning is not possible, e.g., be- mergic coordination, e.g., in the same notebook if the ana- ing able to tune how confident the system should be of a lyst is using a notebook. Spell-checking words would work correction before it is automatically implemented. the same way as in word processor, while interventions in the process could be done by creating a note on notebook Discussion cell with suggested changes or creating additional cells, We next consider how the observations about spell-checking e.g., the cells to run and interpret diagnostics for an analy- might be transferred to a more complex task. We will con- sis or to create a visualization. The system could commu- sider in particular the task of data analysis, i.e., writing a nicate intent or certainty by adding comments to the code. data-science-analysis script. A spell-checker for a data Finally, if the system intervenes by providing code to run, analysis could be exactly the same as for word process- the user could edit the code if not appropriate. ing, e.g., correcting a misspelled function or variable name or incorrect arguments. More interestingly, an automated Third, the work on stigmergic coordination suggests two system could check the data analysis at a higher level. A additional affordances needed to support stigmergic coor- system could assess data quality, e.g., spotting outliers or dination, in addition to visibility. The first is combinability, problems with missing data, suggesting transformations to meaning that the work done by different contributors can be correct skew or more ambitiously, noticing bias in the data. easily fitted together. In the case of data science, a note- It could create additional data columns, e.g., breaking up book provides a mechanisms for combinability, as different complex data into components or finding related datasets contributors can add different cells. To make cells function and joining them. Finally, a system could suggest additional smoothly together does require some additional work, e.g., actions for an analysis, e.g., suggesting useful visualiza- identifying which variables hold the necessary data. tions or modelling approaches given what it knows about The second factor is genre, meaning socially recognized the data or diagnostics for a user-selected analysis. If the regularities of form and purpose. For a user to be able to assumptions of a test are violated, it could suggest an alter- use suggestions made by an automated system, they need native, e.g., a non-parametric test instead of a parametric to be able to recognize what those contributions do and one. how to use them. Applied to data science analyses, the theory suggests that there is a need for the user to be able Hum.-Comput. Interact. 3, CSCW, Article Article 117 to recognize the purpose of a suggested analysis. Such (Nov. 2019), 25 pages. DOI: recognition could be explicitly supported, e.g., by comment- http://dx.doi.org/10.1145/3359219 ing in the code. [3] Fred J. Damerau. 1964. A technique for computer detection and correction of spelling errors. Commun. Conclusion ACM 7, 3 (1964), 171–176. DOI: The analysis offers two general takeaways for future de- http://dx.doi.org/10.1145/363958.363994 sign. First, automation can happen at different levels and in different ways. We distinguish in particular between au- [4] Ivor Durham, David A. Lamb, and James B. Saxe. tomation of analysis to suggest actions and automation of 1983. Spelling correction in user interfaces. Commun. implementation of actions. Second, having the system work ACM 26, 10 (1983), 764–773. DOI: in the same space as the users supports stigmergic coor- http://dx.doi.org/10.1145/358413.358426 dination between the two. However, additional affordances, [5] James Howison and Kevin Crowston. 2014. namely combinability and genre are necessary to support Collaboration through superposition: How the IT this mode of coordination. artifact as an object of collaboration affords technical interdependence without organizational REFERENCES interdependence. MIS Quarterly 38 (3/2104 2014), [1] Kevin Crowston and Francesco Bolici. 2019. Impacts 29–50. DOI: of machine learning on work. In Hawai’i International http://dx.doi.org/10.25300/MISQ/2014/38.1.02 Conference on System Sciences (HICSS–52). http://hdl.handle.net/10125/60031 [6] James L. Peterson. 1980. Computer programs for [2] Kevin Crowston, Jeff S. Saltz, Amira Rezgui, Yatish detecting and correcting spelling errors. Commun. Hegde, and Sangseok You. 2019. Socio-Technical ACM 23, 12 (1980), 676–687. DOI: Affordances for Stigmergic Coordination Implemented http://dx.doi.org/10.1145/359038.359041 in MIDST, a Tool for Data-Science Teams. Proc. ACM