<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Lessons for Supporting Data Science from the Everyday Automation Experience of Spell-Checkers</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Kevin</forename><surname>Crowston</surname></persName>
							<email>crowston@syr.edu</email>
							<affiliation key="aff0">
								<orgName type="institution">Syracuse University School of Information Studies Syracuse</orgName>
								<address>
									<postCode>13244</postCode>
									<region>NY</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Lessons for Supporting Data Science from the Everyday Automation Experience of Spell-Checkers</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">94B7F62080C6E712D1D308273E80E1FA</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T06:19+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>automation, spell-checking CCS Concepts</term>
					<term>Social and professional topics → Automation</term>
					<term>•Humancentered computing → Interaction design theory, concepts and paradigms</term>
					<term>•Applied computing → Word processors</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We apply two theoretical frameworks to analyze spell-checkers as a form of automation and apply the lessons learned to analyze opportunities to support data science. The analysis distinguishes between automation of analysis to suggest actions and automation of implementation of actions. Having the automation work in the same space as users (e.g., editing the same document) supports stigmergic coordination between the two, but attention is needed to ensure that the contributions can be combined and have a recognizable form that indicates their purpose.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>A form of automation (i.e., the capability of a system to perform some tasks without human involvement) experienced by many people daily is the spell-checker, which has evolved from a stand-alone application providing suggested corrections <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b5">6]</ref> to an integral component of word proces-sors or even a ubiquitous component of a user interface framework <ref type="bibr" target="#b3">[4]</ref>. As a user types, automated spell-checkers flag unknown words as likely errors, offer suggested replacements (see Fig. <ref type="figure" target="#fig_0">1</ref>) or even make replacements without human involvement (see Fig. <ref type="figure" target="#fig_1">2</ref>). In this position statement, we analyze the nature of automation provided by spellcheckers to derive lessons for ubiquitous automation in other settings, specifically, data science.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Theory</head><p>We apply two frameworks for our analysis. First, we apply a simple framework developed in Ref <ref type="bibr" target="#b0">[1]</ref>. This framework decomposes information processing tasks into four steps: 1) information acquisition; 2) information analysis; 3) decision and action selection; and 4) action implementation. By considering if each step can be partly or fully automated (meaning that the particular step can be done by a system without human intervention), the framework identifies four levels of automation: 0. No automation 1. Decision support: steps 1 and 2 are automated but in step 3, the system recommends possible actions from which the human chooses one to implement  <ref type="bibr" target="#b4">[5]</ref>. Genre means that the contributed work has socially-recognized regularities of form and purpose that enable others to know how they should work with it. The analysis in Ref <ref type="bibr" target="#b1">[2]</ref> focuses on supporting coordination between members of a work team but these features may also support coordination between a system and a user.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results</head><p>Applying the first framework, spell-checking systems initially were decision support systems (level 1), flagging unrecognized words and giving a list of possible replacements when requested. Currently, many support blended decision making (level 2), automatically fixing (or at least changing) some detected errors while deferring other to the user. However, given the variability of typing errors, it seems unlikely that spell-checking will ever be completely automated.</p><p>Considering next questions of intelligibility, a spell-checker's suggestions in current systems are visible because the system is integrated with the work it is meant to support so that the intervention happens in the same space as the work.</p><p>In other words, the interaction between the system and the user is stigmergically coordinated. The users' typing in a document triggers the actions of the spell-checker and the spell-checker offers suggestions to the user or takes actions independently in the same interface, thus making the actions visible. Interestingly, spell-checkers don't show certainty of their suggestions, though it might be implicit in the ordering of suggestions. For spell-checking, the other two affordances needed for stigmergic coordination, combinability and genre of contributions, are non-issues, as words are easily combined and have a clear form and purpose.</p><p>Finally, considering opportunities for intervention, a user can intervene in the work of the spell-checker by interacting with it in the document. Most spell-checkers can be customized by correcting the corrections made or adding to the dictionary. However, further tuning is not possible, e.g., being able to tune how confident the system should be of a correction before it is automatically implemented.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discussion</head><p>We next consider how the observations about spell-checking might be transferred to a more complex task. We will consider in particular the task of data analysis, i.e., writing a data-science-analysis script. A spell-checker for a data analysis could be exactly the same as for word processing, e.g., correcting a misspelled function or variable name or incorrect arguments. More interestingly, an automated system could check the data analysis at a higher level. A system could assess data quality, e.g., spotting outliers or problems with missing data, suggesting transformations to correct skew or more ambitiously, noticing bias in the data. It could create additional data columns, e.g., breaking up complex data into components or finding related datasets and joining them. Finally, a system could suggest additional actions for an analysis, e.g., suggesting useful visualizations or modelling approaches given what it knows about the data or diagnostics for a user-selected analysis. If the assumptions of a test are violated, it could suggest an alternative, e.g., a non-parametric test instead of a parametric one.</p><p>Our analysis of spell-checkers suggests some design implications for such a system. First, there are different levels of functionality: at the lowest level of automation, the system would simply flag issues and suggest possibilities to the user while at a higher level, it would automatically execute some actions (e.g., automatically checking test assumptions). And as before, completely automated analysis seems unlikely.</p><p>Second, intelligibility would be increased by having the system work in the same space as the users to support stigmergic coordination, e.g., in the same notebook if the analyst is using a notebook. Spell-checking words would work the same way as in word processor, while interventions in the process could be done by creating a note on notebook cell with suggested changes or creating additional cells, e.g., the cells to run and interpret diagnostics for an analysis or to create a visualization. The system could communicate intent or certainty by adding comments to the code. Finally, if the system intervenes by providing code to run, the user could edit the code if not appropriate.</p><p>Third, the work on stigmergic coordination suggests two additional affordances needed to support stigmergic coordination, in addition to visibility. The first is combinability, meaning that the work done by different contributors can be easily fitted together. In the case of data science, a notebook provides a mechanisms for combinability, as different contributors can add different cells. To make cells function smoothly together does require some additional work, e.g., identifying which variables hold the necessary data.</p><p>The second factor is genre, meaning socially recognized regularities of form and purpose. For a user to be able to use suggestions made by an automated system, they need to be able to recognize what those contributions do and how to use them. Applied to data science analyses, the theory suggests that there is a need for the user to be able to recognize the purpose of a suggested analysis. Such recognition could be explicitly supported, e.g., by commenting in the code.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion</head><p>The analysis offers two general takeaways for future design. First, automation can happen at different levels and in different ways. We distinguish in particular between automation of analysis to suggest actions and automation of implementation of actions. Second, having the system work in the same space as the users supports stigmergic coordination between the two. However, additional affordances, namely combinability and genre are necessary to support this mode of coordination.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: A spelling mistake identified by the Microsoft Word spell-checker and a proposed replacement</figDesc><graphic coords="2,35.46,144.01,129.60,60.66" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: A spelling mistake automatically corrected by the Microsoft Word spell-checker* * Note: animation works in Adobe Reader but not in some other PDF readers.</figDesc></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Impacts of machine learning on work</title>
		<author>
			<persName><forename type="first">Kevin</forename><surname>Crowston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Francesco</forename><surname>Bolici</surname></persName>
		</author>
		<ptr target="http://hdl.handle.net/10125/60031" />
	</analytic>
	<monogr>
		<title level="m">Hawai&apos;i International Conference on System Sciences (HICSS-52</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Socio-Technical Affordances for Stigmergic Coordination Implemented in MIDST, a Tool for Data-Science Teams</title>
		<author>
			<persName><forename type="first">Kevin</forename><surname>Crowston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeff</forename><forename type="middle">S</forename><surname>Saltz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Amira</forename><surname>Rezgui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yatish</forename><surname>Hegde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sangseok</forename><surname>You</surname></persName>
		</author>
		<idno type="DOI">10.1145/3359219</idno>
		<idno>DOI:</idno>
		<ptr target="http://dx.doi.org/10.1145/3359219" />
	</analytic>
	<monogr>
		<title level="m">Proc. ACM Hum.-Comput. Interact. 3</title>
				<meeting>ACM Hum.-Comput. Interact. 3</meeting>
		<imprint>
			<publisher>CSCW</publisher>
			<date type="published" when="2019-11">2019. Nov. 2019</date>
			<biblScope unit="volume">117</biblScope>
			<biblScope unit="page">25</biblScope>
		</imprint>
	</monogr>
	<note>Article Article</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A technique for computer detection and correction of spelling errors</title>
		<author>
			<persName><forename type="first">Fred</forename><forename type="middle">J</forename><surname>Damerau</surname></persName>
		</author>
		<idno type="DOI">10.1145/363958.363994</idno>
		<idno>DOI:</idno>
		<ptr target="http://dx.doi.org/10.1145/363958.363994" />
	</analytic>
	<monogr>
		<title level="j">Commun. ACM</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="171" to="176" />
			<date type="published" when="1964">1964. 1964</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Spelling correction in user interfaces</title>
		<author>
			<persName><forename type="first">Ivor</forename><surname>Durham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><forename type="middle">A</forename><surname>Lamb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">James</forename><forename type="middle">B</forename><surname>Saxe</surname></persName>
		</author>
		<idno type="DOI">10.1145/358413.358426</idno>
		<idno>DOI:</idno>
		<ptr target="http://dx.doi.org/10.1145/358413.358426" />
	</analytic>
	<monogr>
		<title level="j">Commun. ACM</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="page" from="764" to="773" />
			<date type="published" when="1983">1983. 1983</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Collaboration through superposition: How the IT artifact as an object of collaboration affords technical interdependence without organizational interdependence</title>
		<author>
			<persName><forename type="first">James</forename><surname>Howison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kevin</forename><surname>Crowston</surname></persName>
		</author>
		<idno type="DOI">10.25300/MISQ/2014/38.1.02</idno>
		<idno>DOI:</idno>
		<ptr target="http://dx.doi.org/10.25300/MISQ/2014/38.1.02" />
	</analytic>
	<monogr>
		<title level="j">MIS Quarterly</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="29" to="50" />
			<date type="published" when="2014">2014. 2104 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Computer programs for detecting and correcting spelling errors</title>
		<author>
			<persName><forename type="first">L</forename><surname>James</surname></persName>
		</author>
		<author>
			<persName><surname>Peterson</surname></persName>
		</author>
		<idno type="DOI">10.1145/359038.359041</idno>
		<idno>DOI:</idno>
		<ptr target="http://dx.doi.org/10.1145/359038.359041" />
	</analytic>
	<monogr>
		<title level="j">Commun. ACM</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page" from="676" to="687" />
			<date type="published" when="1980">1980. 1980</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
