=Paper=
{{Paper
|id=Vol-323/paper-1
|storemode=property
|title=ScratchTalk: A Natural Language framework for Social Computation
|pdfUrl=https://ceur-ws.org/Vol-323/paper01.pdf
|volume=Vol-323
|dblpUrl=https://dblp.org/rec/conf/iui/Eslick08
}}
==ScratchTalk: A Natural Language framework for Social Computation==
ScratchTalk and Social Computation:
Towards a natural language scripting model
Ian Eslick
MIT Media Lab
20 Ames St. E15-320R
Cambridge, MA 02139 USA
eslick@media.mit.edu
ABSTRACT New functionality can be created by three classes of people:
Natural Language interfaces have been proposed as a solu- a developer, another member of the user community, or the
tion to make application functionality more accessible to end end user. The next section describes the properties of these
users. Practical implementation of such systems has been three classes and where features come from, arguing that to-
hampered by both semantic and knowledge-engineering chal- day’s scripting models are fundamentally too expensive and
lenges. This paper introduces Social Computation, a theo- insufficient to cover a large space of functionality. Increased
retical model targeting both challenges. The model employs coverage of potential functionality can increase the desir-
natural language in a planning framework to enable commu- ability or utility of a given application.
nities of end-users to collaborate, explicitly and implicitly,
in the development of new functionality. The model intrinsi- Social Computation aims to change the cost-economics of
cally supports the acquisition of user goals and natural lan- scripting by dramatically altering the user interaction model
guage semantics in the context of the application domain. and the representation of procedural behavior. This model
ScratchTalk is a prototype of the natural language planning borrows concepts from goal-oriented programming, plan-
component of the model applied to the Scratch programming ning, interpreted languages, collective intelligence, and so-
environment. This serves to ground the framework and pro- cial networks.
vide insights into the space of applications likely to be ad-
dressable by the Social Computation model. SOCIAL COMPUTATION
What users can’t do
Author Keywords Program design requires that a programmer maintain a men-
Natural Language, Goal Oriented User Interfaces, Collective tal dictionary of possible steps and determine how to se-
Intelligence, Social Networking quence those steps so as to bring about desired side effects
while avoiding undesirable ones. Programmers have to ac-
commodate environmental variability, model hidden state that
ACM Classification Keywords
is not easily inspected, and express each step with perfect
H5.2. User Interfaces
precision; all of which increase cognitive burden. When
completed, these programs are brittle and correspondingly
INTRODUCTION difficult to modify.
Today, customization of application behavior is accomplished
through combinations of scripting languages, plugins, and These skills are not easily learned. Competence requires
macros. Unfortunately, the learning curve necessary to use significant training and experience, limiting the populations
any of these approaches is too steep for most users. This that can contribute to the development of new functionality.
problem stems, in part, from the cognitive barriers that all of Attempts to ameliorate this situation in commercially avail-
these models require of users: understanding precise seman- able languages have had little success [8].
tics, modeling hidden state, and navigating a large design
space [4]. Solutions to this conundrum typically trade expressive power
for simplicity so that users are better able to model the im-
User interface research on end user programming aims to plications of a statement. For example, visual programming
reduce the cognitive burden on end users while improving using flowcharts make application dataflow explicit, yet ex-
their access to the capabilities of the underlying system. As clude message passing and other forms of interdependence.
the tasks carried out by the user grow in complexity, the These models usually fall short of addressing the space of
user must discover and memorize complex sequences of ac- user goals, leading to frustration and low adoption rates.
tions and understand how to avoid negative effects. End-user
scripting attempts to enable the user to abstract and automate This outlook is not promising. The very notion of program-
those sequences to save both labor and reduce the cognitive ming appears anathema to most end-users. However, hu-
burden . It is this process of abstraction, the means by which mans can express their desires to experts and through a pro-
a user codifies combinations of primitives, that is the primary cess of iteration and clarification, experts are generally able
interest of this paper. to satisfy user goals. The following section proposes that as-
© 2008 for the individual papers by the papers' authors. Copying permitted for private and academic purposes.
Re-publication of material from this volume requires permission by the copyright owners.
pects of this interaction can be captured in a computational
framework, facilitating a larger range of end-user goals.
What users can do
The research community should be asking what users are
naturally good at and looking for opportunities to leverage
those skills. Following is a list of some natural user skills:
• Asking. Word of mouth has always played a significant
role in problem solving. The Internet today extends that
natural process via blogs, mailing lists and other discus-
sion forums. For the non-expert, this is often the only Figure 1. The long tail of software
source of answers.
• Searching. Most users of the internet today are accus- mortar store a linear foot of shelf-space has a significant vari-
tomed to searching for answers to their problem via key- able cost that has to be amortized over sales, requiring phys-
words or browsing. This can lead to the resources above, ical bookstores to only carry books that have high volume.
but can also be used with a program’s help system (which
are rarely helpful) or as an index to program functions. The cost economics of application platforms today is akin to
the brick and mortar bookstores of yesterday. As with Ama-
• Inspecting. In an environment where the behaviors or side zon, finding a way to serve the long tail of feature demand
effects are transparent, users can be quite good at recog- can dramatically improve the reach of software applications
nizing errors. This is true in a domain such as animation as well as the usefulness of those platforms to end users.
domain, but is less true in the spreadsheet domain where
hidden state increases dramatically with increasing com- The obvious proposition is to find ways to reduce the costs of
plexity [9]. feature development. Open Source has proven to be a way to
extend the reach of an application down the long tail of soft-
• Critiquing. A review of online discussion lists for plugins, ware features, but Open Source is limited by a different kind
extensions and scripts indicate that there are many more of economics. Even in Open Source, the cost of a feature
users than developers and that users are extremely prolific remains high in terms of skilled labor, but using volunteer
in expressing their discontent with specific failures or spe- time instead of paid time. However volunteer developers are
cific missing features. likely to release and maintain code when there is a suffi-
cient incentive. Social incentives can be quite powerful, but
• Tweaking. A smaller set of users (although still larger tend to require a certain critical mass of peers or end users.
than those who program) can take example scripts and Smaller open source projects are likely to have lower qual-
perform minor changes when the relationships of the script ity, limiting the population of users that can take advantage
to the domain model is reasonably easy to comprehend. of them. So Open Source serves a modestly larger popu-
(i.e. Mail filter rules). lation than commercial software, mostly programmers and
power-users, and is unable to effectively support or leverage
Users are good are pattern matching, but bad at pattern speci- the vast majority of users.
fication: users can’t always describe exactly what they want,
but they know it when they see it. Users are bad at design If the Open Source model doesn’t significantly change the
but very good at complaining. How can we exploit these economics to capture the long tail of feature needs, what
observations? can?
The Long-Tail of Software Features Empowering End Users
The cost of developing a software feature is extremely high; The key idea behind the Social Computation framework is
thus developers are forced to pick the features that will have to find ways to enable the untapped end user population to
the greatest impact on sales and satisfying existing users. If contribute new features suited to their specific goals. This
we chose a standard indexing method, such as menus, for should be accomplished in a robust and sustainable way,
making features available, then the index is likely to be ex- while surmounting the problems of precision, hidden state,
tremely underpopulated with respect to all reasonable user and design.
goals.
In this discussion features are considered to be pieces of
Anderson [1] coined the term “long tail” to describe a re- code, or a description of a plan that can be interpreted, that
lated commercial phenomenon wherein a change in cost eco- is directly associated with a user’s goal. Since user goals are
nomics made selling small-volume products highly profitable. often under specified, a plan or code fragment needs to cover
The motivating observation was that Amazon.com had more a range of cases without users having to intervene directly.
total revenue from low-volume books than from high-volume
books. This is because the cost of selling low and high vol- We can characterize the user population with a similar curve
ume books is identical for Amazon, whereas for a brick and as that applied to the software feature space. There are very
Figure 2. Distribution of user skills
few people who can write and maintain robust software. There
are more people who can modify or add incrementally to ex-
isting code and an even larger group that can add various
kinds of content like skins, templates, graphics, etc. How- Figure 3. A Model for Social Computation
ever most of the population is incapable of directly provid-
ing value to other users. However, perhaps it is possible to
do so indirectly. As shown in the graphic, there are two cycles taking place
in any given user’s application. The user is interacting with
A truly transformational impact on the cost-economics of their system which uses a knowledge base to interpret a user
new features becomes possible if can extract value from in- speech act. The user observes the effect and makes any nec-
dividual contributions in a way that enhances the next per- essary corrections. Simultaneously, each statement made by
son’s feature development. To make progress towards this the user that is correctly interpreted, is shared with other
end, I propose a key set of assumptions: users who may be making the same statement.
• Represent scripts as hierarchical plans consisting of goals With such a community there are a large set of techniques
and subgoals. that can be used to sidestep many of the AI-hard problems
that come to mind such as domain representation, planning,
• Goals are specified by user-accessible natural language natural language understanding, and design space exploration.
phrases that enable the system to map to subgoals. A sample of such techniques include:
• Plans execute in a “soft fail” environment; users are ac-
customed to failures but are able to repair failures in a 1. Collaborative filtering. User critique provides immedi-
multitude of ways. ate feedback on the reliability of a given goal interpreta-
tion. Even the most naive of users can say, “that didn’t
• The user interface reifies side effects that the user is likely work” and then select an alternative version. Human to
to care about. human interaction (such as an expert helping a neophyte)
regarding a specific goal can provide direct, explicit feed-
• The user uses a restricted set of natural language or the back on plan success or failure.
user interface to critique plan failure; more sophisticated
users can make suggestions and experts can directly ma- 2. Sampling. A new plan interpretation of a goal can be
nipulate the machinery used by less-sophisticated users. tried out on a small subset of users to investigate whether
it improves reliability or acceptance rates in the same vein
• Plans are incrementally modified to fix the problem iden- as the compiler techniques described in [3].
tified by the critique.
3. Supervised structure learning. The spotty reliability of
These assumptions eliminates two of the three barriers to a plan is an opportunity to learn about missing precondi-
end-user programming described earlier. Goals are not pre- tions for a plan step. When the state variable is unrep-
cisely described, but the related plan should provide increas- resented by the system (i.e. no condition is found that is
ing robustness to different environmental variation as well as correlated to the plans reliability), users can be sampled
handle the bookkeeping of hidden state that isn’t expressed to provide inputs as to why something works in one case
in the UI. To address the design problem we need to think and not another, hypothesizing a variable over which cor-
creatively about the design problem itself. relations can be observed.
Design requires an exploration of the space of possible data 4. Collective mapping. Users can specify a goal as a de-
transformations, ordering and condition tests. This may hap- sired action or state and observe the resulting behavior.
pen in the mind of a developer, via a formal search algo- The user may change the goal, choose a different plan to
rithm, or via an exploratory implementation. The key as- fix a failed goal, or make an explicit correction to an exist-
sumption in Social Computing is that there is a community ing plan. Each of these actions provides an implicit data
of users with similar goals. point about whether a plan meets a goal, allowing users
to choose the preferred method even when there is not a
clear model of state.
5. Preference discovery. Using the same data as in col-
lective mapping above, the system can learn the order in
which to present alternative interpretations to users.
6. Preference clustering. The behavior intended by a given
natural language utterance may vary from user to user.
When a user falls into a cluster of users with similar pat-
terns of goal to plan selection or use of natural language
terms, that user can share preference orderings with their
affinity group rather than all users.
One of the largest advantages of this framework is the abil-
ity to view meaning as a mapping function and not a rep-
resentational challenge. Most of the problems in traditional Figure 4. The Scratch User Interface
AI programs stem from the desire to explicitly represent the
meaning of a goal or a state. Finding a plan becomes a search
through a space defined by this formal representation, with Scratch is a Logo-like world and user interface written in
all the convenient theoretical properties of correctness and Squeak, a modern implementation of the Smalltalk language.
complexity we are used to. Instead, interpretation is a pro- The Scratch UI provides a visual, lego-block style view of
cess of finding a “near match” which sufficiently satisfies the Squeak language constructs which are used to control a set
intent of the user, allowing them to make forward progress sprites that interact on a 2D canvas. The language consists
towards a goal. of primitives for orienting, moving, and transforming sprites
as well as detecting contact and other events. The language
This model is likely to be grossly insufficient for large scale also provides a small set of conditional and iterative con-
systems requiring complex software architecture, new user structs to combine these primitives. There is no support for
interface techniques, and new underling representations. How- function calls or a call stack. The runtime model consists of
ever, these are the very elements that are common to many a set of concurrently executing scripts, with asynchronous
underlying features that developers should be putting their messaging and synchronized shared variables among them.
time and attention. Each script is initiated independently in response to a mes-
sage, key command, mouse event, or a system-wide initia-
SCRATCHTALK tion event (greenflag).
The ScratchTalk prototype emerged from an exploratory de-
sign and implementation intended to uncover the major de- EXAMPLE SCENARIO PART I
sign tradeoffs, representational issues, and user interface re- To simplify the potential space of dialog and the represen-
quirements involved in implementing the Social Computa- tation of both time and causality, I focus specifically on the
tion model. At present, the system does not support a com- using Scratch to create story-like animations.
munity model. The intent of this work was to investigate
constraints on the natural language dialog and to identify The example animation used to drive the construction of the
specific mechanisms by which one user can create domain prototype is a simple “dog chases cat” scenario. The user
knowledge, plans, or language mappings that would be use- expresses the steps of an animation in natural language, the
ful to other users. natural language is converted into an intermediate represen-
tation, and the representation is in turn converted to a Scratch
ScratchTalk is an interface to the Scratch programming en- program that, when executed in the Scratch UI, the user can
vironment [6]. Scratch which was designed to be easy for critique or extend.
children to learn. This simplicity emerges from its simple
world state, small set of primitives, and a highly transparent
A Simple Scenario
graphical user interface. Scratch has many desirable proper-
ties: A ScratchTalk user might start an animation description with
the first visible action:
• Fail-soft. Mistakes are not catastrophic
"A cat wanders randomly
• Interactive. Mistakes are easy to fix. around the screen."
• Visual. It easy to visualize state and behavior. In response, ScratchTalk instantiates a sprite called ’cat’ and
creates a ’wandering’ event with an associated ’wander’ ac-
• Transparent. Behavior and state are not hidden. tion. A script that executes the wander action is synthesized
and a program sent to the UI for execution. The user sees a
• Simple. It is designed for non-programmer children. cat making random movements in random directions on the
screen. A second kind of extension is modifying an existing action.
In the case where the user has provided a sprite costume (or
"When the cat comes near the dog, icon) that depicts the dead cat, the system will not automati-
the dog chases the cat." cally switch to this graphic.
This next statement results in the system inferring that there "When the cat dies, switch to the
is a new actor called ’dog’ which waits for a ’near’ event to ’dead’ costume."
occur (distance between dog and cat sprites is less than some
value) and at that time starts a chase event. The animation Rather than creating a new event, this speech act will mod-
now shows the cat wandering around the screen, if the cat ify an existing event. The die action currently stops the runs
comes near the dog, the dog will keep moving towards the away action and results in the cat saying “Ackthppht”. The
cat while the cat wanders around. action for “switch” binds to the “cat” by default as no ex-
plicit subject is provided. While the exact mechanism is
The user finishes the initial scene with: complicated1 , these two events (say and switch) can be com-
bined by concatenation of the two events. ScratchTalk’s rep-
"If the dog catches the cat, resentation of actions allows this to be done for any two
the cat dies!" events. The editing of actions is a central feature of the sys-
tem and described in-depth in the section on planning below.
This time, when the dog touches the cat during the anima-
tion, the cat stops wandering and performs a ’die’ script. The IMPLEMENTATION
default script in the knowledge base is to mimic the comic
strip character Bill the Cat: stop moving and say “Ackthp-
pht”.
This simple example contains within it a surprising amount
of complexity with implications for both program and lan-
guage semantics.
Extending the Scenario
There are several problems with the initial program. When
the dog chases the cat, the cat has no reciprocal response,
as common sense would lead one to expect. The cat simply
stops when he dies and should probably change its look to
better capture the action.
The user can fix errors or add to the existing program as
follows:
Figure 5. ScratchTalk Architecture
"When the dog chases the cat,
the cat runs away." ScratchTalk is written in Common Lisp and communicates
with a Scratch instance over a network socket. All of the
This phrase triggers a template in the system for “when logic described in this paper was implemented in the Lisp
, ” that searches for an event matching . In this program. The socket simply allows the lisp program to up-
case it first tries matching the phrase directly, assuming a load programs to Scratch as well as the commands: start,
surface variation of it was used to create the event. Fail- stop and clear.
ing this the system can look for near matches by identifying
the verb class and looking for events that match the related The core of ScratchTalk consists of natural language pipeline
classes, such as pursue in this case. that operates roughly as follows:
The “near” event that initiates the “chase” will now also ini- 1. Parse speech act => parse tree + VerbNet annotation
tiate a new event, “runs away”. This event is then linked
to the other two events in a manner to be described in the 2. Compound sentence rules (i.e. when X then Y) =>
following section. primitive subtrees
3. Dialog modeling => filters system commands, validates
Dialog and natural language processing in ScratchTalk ben- reference scope
efit from the highly contextual nature of the interactions.
The scope of the reference into the program representation 4. Reference resolution (NPs, VPs, arguments) =>
is highly likely to be local, thus there are a finite number of SVOO style predicates
possible matches . Moreover, there is a reasonable be- 1
Event merging is an optimization that occurs at program genera-
lief from the conversational contract that is sufficient tion time so the full temporal structure of the graph is maintained
to uniquely identify the event of interest. during dialog with the user.
5. Goal resolution => choose a plan to implement goal in (progn
speech act from a library of plans (do-repeat 10
((forward 2)
6. Modify program representation => insert, delete, reorder, (wait 0.02)))))
or modify objects, properties, events, or actions according (:constraints (type-of $agent actor)
to chosen plan (member $away (away))))
7. Synthesize Scratch code and send to Scratch to be exe-
cuted (defaction run ($agent $away $np)
(run-away $agent $np)
In short, the system extracts goals from user speech acts, (:constraints (type-of $agent actor)
uses those to modify a high-level representation of the ani- (type-of $np actor)
mation, then generates low level code to animate that repre- (member $away (away))))
sentation.
For example, the second definition of run above requires that
the $np argument resolves to an actor object.
Parsing and Reference Resolution
To quickly acquire a reasonably broad coverage of natural
Representing Scratch Animations
language input with which to perform these mappings, the
SwiRL [11] semantic tagging framework was chosen. The ScratchTalk uses an intermediate representation to capture
system is simple to use and produces sufficiently robust re- the essential phenomenology users are expected to reference.
sults for the purposes of the prototype. This representation is compiled into a concrete Scratch pro-
gram which can then be run within the Scratch UI. For ex-
SwiRL uses Charniak’s parser [2] to generate a parse tree, ample the Scratch script below, taken from the cat and dog
then annotates constituents with role labels such as actor, example, is initiated by an asynchronous message, “wander”
object, path, auxiliary arguments, etc. For simplicity, we and causes the associated sprite to perform a random-walk
do not adapt the system to develop new mappings for failed towards the bottom right corner of the screen.
parses using user input. Instead we limit input to simple
sentences on which SwiRL maps verbs in a meaningful way ((when-receive "wander")
by asking the user to rephrase. (set-var wandering 1)
(do-until (= (read-var wandering) 0)
The system first applies a pattern matcher over the parse tree ((glide-seconds-to-xy 0.2
to handle a variety of compound statements indicating con- (+ (xpos) (random -25 35))
ditionality and conjunctions or disjunctions such as and, or, (+ (ypos) (random -30 25))))))
if, while, when, etc. A matching template drives the rest of
the resolution process and performs any coordinate between ScratchTalk’s intermediate representation is made up of four
the constituents, such as creating temporal, causal or condi- components: actors, events, actions and a narrative graph.
tional links between events in the animation narrative. For While these are somewhat tuned to the Scratch world model,
example: they are intended to be generic constructs that can be applied
to other application domains.
(define-linear-template if-then
(or (if $cond |,| then $consequent) Actors
(if $cond |,| $consequent) An actor in ScratchTalk is a simple object representation of a
(if $cond then $consequent)) Scratch sprite. It associates a string name with a set of prop-
((link initiates erties such as sprite size, initial location, costume dictionary
(primary-event (alternative sprite graphics), sound dictionary, etc. Compiled
(resolve-event $cond)) actions result in a set of scripts being associated with the ac-
(primary-event tor object, which is then serialized and sent to the Scratch
(resolve-event $consequent))))) UI.
Once the system has successfully identified a primitive sen- Events
tence, it translates the argument phrases to objects in the An event captures either a point or an interval in time and
system or to action modifiers. It then looks up action that is associated with an actor that is the subject of the event.
matches the VerbNet type2 as well as the arguments. Soft Events serve as anchors for temporal relations and typically
matching allows the system to degrade gracefully by drop- have an associated action that tells the compiler how to gen-
ping unknown or unneeded modifiers, looking for related erate code capturing the actor behavior associated with that
VerbNet verbs that do have matching actions. event. Because Scratch is a dynamic environment, events
can be linked causally (A → B) or an event can be associ-
(defaction run ($agent $away) ated with a conditional action indicating the activation of an
(do-until () event.
2
SwiRL generates Propbank annotations which ScratchTalk con-
verts to VerbNet tags using semlink[5]. Actions
Actions are plan templates that act like generic functions
with multiple dispatch. The semantic head names an action
class, and the arguments determine which method of that ac-
tion is selected and associated with the event identified by
that phrase. An action description such as
(defaction chase ($agent $subject)
(do-forever
(progn
(point-towards $subject) Figure 6. A narrative graph of the cat and dog example
(do-repeat %segment-steps
(progn (forward %segment-step-size)
(wait %segment-step-time))))) narrative and expressed as messages or semaphore variables
(defaults (%segment-steps 5) such that scripts start and stop in accordance with the con-
(%segment-step-size 2) straints in that graph. The fully linked output of the chase
(%segment-step-time 0.1))) script above becomes:
expands to the Scratch script ((when-receive "chase")
(do-until (touching cat)
((do-forever ((point-towards cat)
((point-towards "cat") (do-repeat 5
(do-repeat 5 ((forward 2)
((forward 2) (wait 0.01)))))
(wait 0.01)))))) (broadcast "die"))
which when linked with catch The narrative graph structure is quite general and, with some
extension, should work for any process that can be as an
(defaction catch ($agent $subject) event-driven, control-flow graph.
(wait-until (touching $patient))
EXAMPLE SCENARIO PART II
in the context of the narrative graph (see figure 6) becomes This simple example glosses over some significant issues of
precision, such as where the cat and dog should start on the
((do-until (touching "cat") screen, how big should they be, and how fast should they
((point-towards "cat") execute these actions? Is “near” 100 pixels or 200 pixels?
(do-repeat 5 These are exactly the kinds of things a programmer must
((forward 2) think about that the user shouldn’t have to, unless the default
(wait 0.01)))))) assumptions violate their expectation.
As we will see later, there is a more powerful method for The following sequences illustrate how the user and system
parameterizing actions, action editing, which is integral to can override defaults, create new defaults and augment an
constructing the system’s lexicon of word and phrase mean- existing animation model3 .
ings.
By default, Scratch places all sprites in the center of the
The Narrative Graph screen. In this case the dog is near the cat, the dog is touch-
A ScratchTalk animation is tied together by a graph of nar- ing the cat and thus the whole animation terminates imme-
rative relations among events. This functions loosely as a diately. It is easy to fix this by changing the default start
control flow graph for the animation. An event can initi- location.
ate another event (a touch of a wall initiates a bounce off
that wall) or it can terminate another event (when the dog "The cat should start in the upper left
catches the cat, the dog stops chasing the cat). There is also corner and the dog should start in the
a concurrent relation which allows a set of actions to be ini- lower right corner."
tiated as a group. In this case the compiler will generate a
common message so that any initiating or terminating event The conjunction template for “and” breaks this down into
that is applied to one event is applied to all events labeled as two phrases, each consisting of a “should start” action verb.
concurrent. These actions modify properties of the subject, rather than
inserting or editing an event. A primitive script computes
When the final program is synthesized, the events and ac- 3
tions contained in the graph are converted into a set scripts While the representational apparatus exists for the following ex-
amples, the system does not yet provide the end-to-end mapping
each associated with a specific sprite. A set of optimizations from natural language to final code due to limitations in the seman-
is first applied, such as combining ’chase’ and ’catch’ above. tic labeling system (the analysis section contains a short discussion
Temporal and conditional constraints are extracted from the of this problem).
the coordinates of a sprite of the size of cat and dog in the formulate in the blocks world. In this way, HACKER not
locations described so they start in the appropriate location. only becomes better at satisfying goals, but also better at
In the Social Computation model, placement primitives can constructing new programs.
be provided by more expert users. Ordinary users can then
modify the application of those procedures to specific cir- There are some crucial differences between the blocks world
cumstances. and the Scratch world. ScratchTalk doesn’t have a clean state
model, and thus no means of formally classifying bugs in
Another default behavior of the system is that the default terms of that state. The system can have, however, a formal
wander script has no directional bias. The cat just keeps model of programming knowledge as well as different patch
wandering randomly around the start location. types.
"The cat should wander towards The goal is to use the HACKER system model, but to rely on
the bottom left corner" users to incrementally specify domain knowledge and bug
types4 . Domain knowledge and recognition of mistakes and
This speech act requires the system to perform a very so- problems are exactly what many humans can do that our ma-
phisticated change to a pre-existing script. The step parame- chines can’t.
ters of the loop that implements the random steps need to be
modified to properly add the bias. This can be performed • Recognizing errors. The transparency property of the do-
by having some general knowledge about scripts, param- main is important because it enables users to see errors
eters, coordinates, and motion. Most of the design work when they happen. Highly transparent systems tend to ex-
in ScratchTalk was ensuring that the representations above hibit temporal locality of cause and observed effect, sim-
could support operations such as this. plifying user critique.
Scripts often are missing default parameters; for example • Critique and patching. Given a kind of critique, the sys-
how much should the step calculation be biased? This ex- tem can apply a patch and attempt to repeat the prior be-
ample shows that the database of defaults is as important to havior. The properties of interactivity and fail-soft are
the interactivity of the system as are the representations and critical to being able to backup and retry in the presence
behavioral descriptions. However, each person who creates of failures.
or modifies a specific defaults value can empower other users
in the network. • Explanations and repair. By expanding an action plan, the
system can generate explanations that can prime users to
provide repair suggestions using language the system can
PLANNING IN SCRATCHTALK understand.
Actions are essentially plans that can be tied to a goal. In
the example above, goals were captured as imperative com-
Editing Events and Narrative Relations
mands stated in natural language. The most interesting fea-
ture explored by ScratchTalk is the means by which actions As shown in the original example, users can update the pro-
are extended to accommodate new problems and new situa- gram model by clarifying what happens to or around a spe-
tions. cific event. The system currently supports the following edit-
ing operations for events and narratives:
Plans as Programs • Inserting and removing events. The simplest editing op-
I employ a model of planning introduced by HACKER [12]. eration is to do what the dialog does naturally, which is to
HACKER was an automated planning framework for a sim- have a prior event for the same sprite initiate the next even
ple block stacking domain. Plans to satisfy goals were pro- and the next even terminate the prior event. Concurrency
cedures. These procedures were incrementally modified to of actions for the same object is not assumed by default.
work around obstacles discovered in achieving a goal. In ef- In fact such concurrency is typically implemented by edit-
fect, programs are incrementally “hacked” together using a ing actions.
general representation of domain knowledge, programming
techniques, bug types, and patch types. When a goal can- • Temporal reordering. The user can change the order and
not be achieved in a single primitive step, it is treated as a casual relationships among events by adding or deleting
bug. The bug is classified, which results in a patch attempt temporal relations. This is currently not directly available
being made. The patch attempt may use domain knowledge to the user, but the same machinery is used by the dialog
or general programming knowledge, such as “if failed pre- system when adding new events.
condition, first satisfy precondition”. Finally, a critic is gen-
erated that recognizes when a bug type has been created by • Editing event action slots. An action can be overridden
the program and patches it automatically without actually by an edit to create a new, derived action. This happens
running the program. whenever the user augments an event as illustrated in the
examples.
By incrementally training the system on goals and states of 4
With better access to state than in ScratchTalk, it should be pos-
increasing complexity, the system constructs a rich lexicon sible to create critics which effectively model the domain without
of robust plans that solve many of the goals that one can having to construct a complete model of the domain.
• Creating and revoking concurrency (see figure 6 for an Domain Issues
example of the concurrency relation). The properties of the Scratch animation domain all contributed
to the development of the program, however there are sev-
eral properties lacking that I would want to see in the next
Action Editing
experimental domain:
Action editing can be quite sophisticated. To make a cer-
tain kind of side effect on a program’s behavior, we have to • Access to more internal state. Scratch provides an impov-
map the user’s expression “chase faster” to a command that erished set of sensing primitives. A sprite cannot know
edits action scripts. The simplest editing operation might the location of another sprite, only it’s direction and dis-
be “increment statement X in pattern Y ” to match a spe- tance5 .
cific statement in a procedure. We can make a pattern more
complicated so that it can match more flexibly, similarly to • Synthesize and run. Generating code, then visualizing the
HACKER. For example; “Increment the step-distance of the program as a whole makes the interaction quite stilted. Far
inner-most motion loop in this procedure.” This latter form better to directly execute from the intermediate represen-
is what the system uses currently, although the number of tation and use continuous planning techniques.
such patterns remains small.
• Interpreter, not compiler. If always running interpreted,
The editing operations available and used in the current sys- the system can learn when a certain state will cause a plan
tem include: to fail and switch to another. ScratchTalk is forced to pre-
compile all alternative plans into conditional code state-
• concatenating statement sequences ments or parallel scripts, greatly increasing the complex-
ity for learning.
• insert a statement to the front of a loop body • Direct manipulation. The Scratch UI was hard to use be-
cause of an inability to get inside the UI to allow users to
• insert a statement at the end of a loop body perform direct manipulations that directly teach the sys-
tem. A user-friendly system would provide multi-modal
• remove a statement from a loop body inputs, using direct manipulation where it makes sense
and using language when the UI metaphors become awk-
• add a loop exit condition ward.
• override a default Plan Representations
One of the more challenging aspects of ScratchTalk is the ac-
• modify a parameter tion representation. A great deal of the time on this project
was spent iterating through different action representations
• force an alternate action selection and the current implementation is far from satisfactory. The
chief difficulty lies in handling plan variations under a spe-
Most of these operations are already in use by the event cific goal, or different methods for performing the same plan
linker to merge event bodies and tie together all the actions step. This is further complicated by action editing. Some
using the variables and message passing. The problem of questions immediately arise. Do we edit a fully expanded
using natural language to evoke these editing operations is version of a top level action, or do we edit specific sub-
one of mapping properties or changes (faster, spin around, actions in the hierarchy? If we are editing the hierarchy,
etc) to code sections that can serve as hooks for the editing what information do we have about sub-actions (sub-goals)
primitives. An expert user can add these mappings automat- to determine the reference for an editing action?
ically, or the machine can learn such mappings by general-
izing from small numbers of examples and recording excep- This leads to an interesting design space. The following are
tions in the case of over generalization. some key parameters of this space.
• Dispatch. Is action dispatch a static process (easy for ac-
ANALYSIS AND FUTURE WORK tion expansion) or a dynamic dispatch (allowing for late
The central innovation in ScratchTalk is enabling users to binding based on world state). Static dispatch pushes com-
use natural language to incrementally extend plans through plexity into conditional code, dynamic dispatch pushes
critique and repair of existing plans. These individual con- complexity into the action editing rules.
tributions can be aggregated from a number of users to ex-
pand on the knowledge base available to a given user. These • Hierarchy. How do we maintain information about a spe-
contributions come in the form of new actions, default val- cific action hierarchy? If we want to edit a parameter
ues, action editing scripts, and new language mappings to or default for a sub-action, do we annotate the top level
these components. The work described here is insufficient action or create an instance of the sub-plan and annotate
to prove the full Social Computation model, but takes some that? How would that interact with dynamic dispatch?
important steps in that direction. During the development of 5
There are no function abstractions in Scratch, so the polar to
the application several problems were identified that inform Cartesian coordinate conversion makes the code very difficult to
future work. read.
• Expansion/Synthesis. Do we expand code as we construct http://scratch.mit.edu/
it and edit the primitives, or do we maintain the hierarchy
and synthesize as late as possible? I prefer the latter, but A Quicktime movie of this example can be viewed at
again it adds action editing rule complexity.
http://web.media.mit.edu/
• Annotations. How do we keep track of what a parame- ˜eslick/scratchtalk-demo1.mov
ters or default will be at final code synthesis time? Do
we perform code expansion on every edit to validate all REFERENCES
constraints, or do we wait until code synthesis time? 1. C. Anderson. The long tail. natpe.org, Jan 2006.
Learning 2. E. Charniak. A maximum-entropy-inspired parser.
The chasing and running away example affords an inter- Proceedings of NAACL, Jan 2000.
esting observation about the potential for acquiring general 3. A. DeHon, J. Brown, I. Eslick, J. Harris, L. Karbiner,
knowledge from users of this system. In general there are and T. F. Knight. Global cooperative computing.
a large class of verbs with subject and object arguments for Workshop on Wide-Area Collaboration and
which a user would expect a reciprocal action. (For example: Cooperative Computing, Second International WWW
chase & run, attack & defend, threaten & fear, etc). Conference: Mosaic and the Web, 1994.
Given several examples of chase being associated with run 4. A. Ko, B. Myers, and H. Aung. Six learning barriers in
away, the system can form the abductive hypothesis that any- end-user programming systems. IEEE Symposium on
thing chased runs away and add the reciprocal event by de- Visual Languages and Human-Centric Computing
fault in the future. The response of the user to this kind of (VLHCC’04), Jan 2004.
assumption helps determine the reliability of a given default.
5. E. Loper, S. ting Yi, and M. Palmer. Combining lexical
resources: Mapping between propbank and verbnet. In
Complexity Proceedings of the 7th International Workshop on
Each layer of the action system requires default knowledge Computational Linguistics, 2007.
of various kinds and an ability to resolve natural language
references to the appropriate subsystem, hierarchy layer, and 6. J. Maloney, L. Burd, Y. Kafai, N. andd Silverman
editing routine. It is possible that the complexity of this Brian Rusk, and M. Resnick. Scratch: A sneak preview.
architecture will severely limit the scaling of this system. Proceedings of the Second International Conference on
The promise of Social Computation is that the collective Creating, Connecting and Collaborating through
contributions will compensate for the complexity by rely- Computing, Jan 2004.
ing on human contributors to create, tune, and filter a rich
library of heuristics that work often enough to keep users 7. M. Minsky. The emotion machine: Commonsense
satisfied. This philosophical approach to developing com- thinking, artificial intelligence, and the future of the
putational functionality is heavily inspired by the views of human mind. Simon and Schuster, Nov 2006.
Minsky [7] and Singh [10]. 8. B. Myers, J. Pane, and A. Ko. Natural programming
languages and environments. Communications of the
CONCLUSION ACM, 47(9):47–52, Jan 2004.
This paper introduces ScratchTalk, a prototype system en-
9. R. Panko. What we know about spreadsheet errors.
abling end-users to specify and modify programmatic behav-
Journal of End User Computing, 10(2):15–21, Jan
ior via natural language. This prototype system illustrates a
1998.
representational apparatus amenable to the constraints of the
Social Computation scripting model. 10. P. Singh. Em-One: an architecture for reflective
commonsense reasoning. PhD thesis, Massachusetts
The current prototype allows users to describe simple anima- Institute of Technology, 2005.
tions in the Scratch programming environment using natural
language. ScratchTalk introduces three key ideas: the con- 11. M. Surdeanu and J. Turmo. Semantic role labeling
cept of end user programming through plan elaboration, the using complete syntactic analysis. Proceedings
central role of user critique and repair in that elaboration, Proceedings of CoNLL, Jun 2005.
and the use of natural language to reference plans, critiques,
12. G. J. Sussman. A Computational Model of Skill
and repairs.
Acquisition. PhD thesis, Masachusetts Institute of
A version of the generated cat and dog example can be down- Technology, 1973.
loaded from
http://web.media.mit.edu/
˜eslick/example.sb
and run on the Scratch UI available from