=Paper= {{Paper |id=Vol-323/paper-1 |storemode=property |title=ScratchTalk: A Natural Language framework for Social Computation |pdfUrl=https://ceur-ws.org/Vol-323/paper01.pdf |volume=Vol-323 |dblpUrl=https://dblp.org/rec/conf/iui/Eslick08 }} ==ScratchTalk: A Natural Language framework for Social Computation== https://ceur-ws.org/Vol-323/paper01.pdf
                   ScratchTalk and Social Computation:
                Towards a natural language scripting model
                                                       Ian Eslick
                                                     MIT Media Lab
                                                 20 Ames St. E15-320R
                                               Cambridge, MA 02139 USA
                                                 eslick@media.mit.edu

ABSTRACT                                                          New functionality can be created by three classes of people:
Natural Language interfaces have been proposed as a solu-         a developer, another member of the user community, or the
tion to make application functionality more accessible to end     end user. The next section describes the properties of these
users. Practical implementation of such systems has been          three classes and where features come from, arguing that to-
hampered by both semantic and knowledge-engineering chal-         day’s scripting models are fundamentally too expensive and
lenges. This paper introduces Social Computation, a theo-         insufficient to cover a large space of functionality. Increased
retical model targeting both challenges. The model employs        coverage of potential functionality can increase the desir-
natural language in a planning framework to enable commu-         ability or utility of a given application.
nities of end-users to collaborate, explicitly and implicitly,
in the development of new functionality. The model intrinsi-      Social Computation aims to change the cost-economics of
cally supports the acquisition of user goals and natural lan-     scripting by dramatically altering the user interaction model
guage semantics in the context of the application domain.         and the representation of procedural behavior. This model
ScratchTalk is a prototype of the natural language planning       borrows concepts from goal-oriented programming, plan-
component of the model applied to the Scratch programming         ning, interpreted languages, collective intelligence, and so-
environment. This serves to ground the framework and pro-         cial networks.
vide insights into the space of applications likely to be ad-
dressable by the Social Computation model.                        SOCIAL COMPUTATION
                                                                  What users can’t do
Author Keywords                                                   Program design requires that a programmer maintain a men-
Natural Language, Goal Oriented User Interfaces, Collective       tal dictionary of possible steps and determine how to se-
Intelligence, Social Networking                                   quence those steps so as to bring about desired side effects
                                                                  while avoiding undesirable ones. Programmers have to ac-
                                                                  commodate environmental variability, model hidden state that
ACM Classification Keywords
                                                                  is not easily inspected, and express each step with perfect
H5.2. User Interfaces
                                                                  precision; all of which increase cognitive burden. When
                                                                  completed, these programs are brittle and correspondingly
INTRODUCTION                                                      difficult to modify.
Today, customization of application behavior is accomplished
through combinations of scripting languages, plugins, and         These skills are not easily learned. Competence requires
macros. Unfortunately, the learning curve necessary to use        significant training and experience, limiting the populations
any of these approaches is too steep for most users. This         that can contribute to the development of new functionality.
problem stems, in part, from the cognitive barriers that all of   Attempts to ameliorate this situation in commercially avail-
these models require of users: understanding precise seman-       able languages have had little success [8].
tics, modeling hidden state, and navigating a large design
space [4].                                                        Solutions to this conundrum typically trade expressive power
                                                                  for simplicity so that users are better able to model the im-
User interface research on end user programming aims to           plications of a statement. For example, visual programming
reduce the cognitive burden on end users while improving          using flowcharts make application dataflow explicit, yet ex-
their access to the capabilities of the underlying system. As     clude message passing and other forms of interdependence.
the tasks carried out by the user grow in complexity, the         These models usually fall short of addressing the space of
user must discover and memorize complex sequences of ac-          user goals, leading to frustration and low adoption rates.
tions and understand how to avoid negative effects. End-user
scripting attempts to enable the user to abstract and automate    This outlook is not promising. The very notion of program-
those sequences to save both labor and reduce the cognitive       ming appears anathema to most end-users. However, hu-
burden . It is this process of abstraction, the means by which    mans can express their desires to experts and through a pro-
a user codifies combinations of primitives, that is the primary   cess of iteration and clarification, experts are generally able
interest of this paper.                                           to satisfy user goals. The following section proposes that as-



       © 2008 for the individual papers by the papers' authors. Copying permitted for private and academic purposes.
                  Re-publication of material from this volume requires permission by the copyright owners.
pects of this interaction can be captured in a computational
framework, facilitating a larger range of end-user goals.

What users can do
The research community should be asking what users are
naturally good at and looking for opportunities to leverage
those skills. Following is a list of some natural user skills:

• Asking. Word of mouth has always played a significant
  role in problem solving. The Internet today extends that
  natural process via blogs, mailing lists and other discus-
  sion forums. For the non-expert, this is often the only                         Figure 1. The long tail of software
  source of answers.
• Searching. Most users of the internet today are accus-          mortar store a linear foot of shelf-space has a significant vari-
  tomed to searching for answers to their problem via key-        able cost that has to be amortized over sales, requiring phys-
  words or browsing. This can lead to the resources above,        ical bookstores to only carry books that have high volume.
  but can also be used with a program’s help system (which
  are rarely helpful) or as an index to program functions.        The cost economics of application platforms today is akin to
                                                                  the brick and mortar bookstores of yesterday. As with Ama-
• Inspecting. In an environment where the behaviors or side       zon, finding a way to serve the long tail of feature demand
  effects are transparent, users can be quite good at recog-      can dramatically improve the reach of software applications
  nizing errors. This is true in a domain such as animation       as well as the usefulness of those platforms to end users.
  domain, but is less true in the spreadsheet domain where
  hidden state increases dramatically with increasing com-        The obvious proposition is to find ways to reduce the costs of
  plexity [9].                                                    feature development. Open Source has proven to be a way to
                                                                  extend the reach of an application down the long tail of soft-
• Critiquing. A review of online discussion lists for plugins,    ware features, but Open Source is limited by a different kind
  extensions and scripts indicate that there are many more        of economics. Even in Open Source, the cost of a feature
  users than developers and that users are extremely prolific     remains high in terms of skilled labor, but using volunteer
  in expressing their discontent with specific failures or spe-   time instead of paid time. However volunteer developers are
  cific missing features.                                         likely to release and maintain code when there is a suffi-
                                                                  cient incentive. Social incentives can be quite powerful, but
• Tweaking. A smaller set of users (although still larger         tend to require a certain critical mass of peers or end users.
  than those who program) can take example scripts and            Smaller open source projects are likely to have lower qual-
  perform minor changes when the relationships of the script      ity, limiting the population of users that can take advantage
  to the domain model is reasonably easy to comprehend.           of them. So Open Source serves a modestly larger popu-
  (i.e. Mail filter rules).                                       lation than commercial software, mostly programmers and
                                                                  power-users, and is unable to effectively support or leverage
Users are good are pattern matching, but bad at pattern speci-    the vast majority of users.
fication: users can’t always describe exactly what they want,
but they know it when they see it. Users are bad at design        If the Open Source model doesn’t significantly change the
but very good at complaining. How can we exploit these            economics to capture the long tail of feature needs, what
observations?                                                     can?

The Long-Tail of Software Features                                Empowering End Users
The cost of developing a software feature is extremely high;      The key idea behind the Social Computation framework is
thus developers are forced to pick the features that will have    to find ways to enable the untapped end user population to
the greatest impact on sales and satisfying existing users. If    contribute new features suited to their specific goals. This
we chose a standard indexing method, such as menus, for           should be accomplished in a robust and sustainable way,
making features available, then the index is likely to be ex-     while surmounting the problems of precision, hidden state,
tremely underpopulated with respect to all reasonable user        and design.
goals.
                                                                  In this discussion features are considered to be pieces of
Anderson [1] coined the term “long tail” to describe a re-        code, or a description of a plan that can be interpreted, that
lated commercial phenomenon wherein a change in cost eco-         is directly associated with a user’s goal. Since user goals are
nomics made selling small-volume products highly profitable.      often under specified, a plan or code fragment needs to cover
The motivating observation was that Amazon.com had more           a range of cases without users having to intervene directly.
total revenue from low-volume books than from high-volume
books. This is because the cost of selling low and high vol-      We can characterize the user population with a similar curve
ume books is identical for Amazon, whereas for a brick and        as that applied to the software feature space. There are very
                Figure 2. Distribution of user skills


few people who can write and maintain robust software. There
are more people who can modify or add incrementally to ex-
isting code and an even larger group that can add various
kinds of content like skins, templates, graphics, etc. How-                      Figure 3. A Model for Social Computation
ever most of the population is incapable of directly provid-
ing value to other users. However, perhaps it is possible to
do so indirectly.                                                   As shown in the graphic, there are two cycles taking place
                                                                    in any given user’s application. The user is interacting with
A truly transformational impact on the cost-economics of            their system which uses a knowledge base to interpret a user
new features becomes possible if can extract value from in-         speech act. The user observes the effect and makes any nec-
dividual contributions in a way that enhances the next per-         essary corrections. Simultaneously, each statement made by
son’s feature development. To make progress towards this            the user that is correctly interpreted, is shared with other
end, I propose a key set of assumptions:                            users who may be making the same statement.

• Represent scripts as hierarchical plans consisting of goals       With such a community there are a large set of techniques
  and subgoals.                                                     that can be used to sidestep many of the AI-hard problems
                                                                    that come to mind such as domain representation, planning,
• Goals are specified by user-accessible natural language           natural language understanding, and design space exploration.
  phrases that enable the system to map to subgoals.                A sample of such techniques include:
• Plans execute in a “soft fail” environment; users are ac-
  customed to failures but are able to repair failures in a         1. Collaborative filtering. User critique provides immedi-
  multitude of ways.                                                   ate feedback on the reliability of a given goal interpreta-
                                                                       tion. Even the most naive of users can say, “that didn’t
• The user interface reifies side effects that the user is likely      work” and then select an alternative version. Human to
  to care about.                                                       human interaction (such as an expert helping a neophyte)
                                                                       regarding a specific goal can provide direct, explicit feed-
• The user uses a restricted set of natural language or the            back on plan success or failure.
  user interface to critique plan failure; more sophisticated
  users can make suggestions and experts can directly ma-           2. Sampling. A new plan interpretation of a goal can be
  nipulate the machinery used by less-sophisticated users.             tried out on a small subset of users to investigate whether
                                                                       it improves reliability or acceptance rates in the same vein
• Plans are incrementally modified to fix the problem iden-            as the compiler techniques described in [3].
  tified by the critique.
                                                                    3. Supervised structure learning. The spotty reliability of
These assumptions eliminates two of the three barriers to              a plan is an opportunity to learn about missing precondi-
end-user programming described earlier. Goals are not pre-             tions for a plan step. When the state variable is unrep-
cisely described, but the related plan should provide increas-         resented by the system (i.e. no condition is found that is
ing robustness to different environmental variation as well as         correlated to the plans reliability), users can be sampled
handle the bookkeeping of hidden state that isn’t expressed            to provide inputs as to why something works in one case
in the UI. To address the design problem we need to think              and not another, hypothesizing a variable over which cor-
creatively about the design problem itself.                            relations can be observed.
Design requires an exploration of the space of possible data        4. Collective mapping. Users can specify a goal as a de-
transformations, ordering and condition tests. This may hap-           sired action or state and observe the resulting behavior.
pen in the mind of a developer, via a formal search algo-              The user may change the goal, choose a different plan to
rithm, or via an exploratory implementation. The key as-               fix a failed goal, or make an explicit correction to an exist-
sumption in Social Computing is that there is a community              ing plan. Each of these actions provides an implicit data
of users with similar goals.                                           point about whether a plan meets a goal, allowing users
   to choose the preferred method even when there is not a
   clear model of state.

5. Preference discovery. Using the same data as in col-
   lective mapping above, the system can learn the order in
   which to present alternative interpretations to users.

6. Preference clustering. The behavior intended by a given
   natural language utterance may vary from user to user.
   When a user falls into a cluster of users with similar pat-
   terns of goal to plan selection or use of natural language
   terms, that user can share preference orderings with their
   affinity group rather than all users.

One of the largest advantages of this framework is the abil-
ity to view meaning as a mapping function and not a rep-
resentational challenge. Most of the problems in traditional                     Figure 4. The Scratch User Interface
AI programs stem from the desire to explicitly represent the
meaning of a goal or a state. Finding a plan becomes a search
through a space defined by this formal representation, with       Scratch is a Logo-like world and user interface written in
all the convenient theoretical properties of correctness and      Squeak, a modern implementation of the Smalltalk language.
complexity we are used to. Instead, interpretation is a pro-      The Scratch UI provides a visual, lego-block style view of
cess of finding a “near match” which sufficiently satisfies the   Squeak language constructs which are used to control a set
intent of the user, allowing them to make forward progress        sprites that interact on a 2D canvas. The language consists
towards a goal.                                                   of primitives for orienting, moving, and transforming sprites
                                                                  as well as detecting contact and other events. The language
This model is likely to be grossly insufficient for large scale   also provides a small set of conditional and iterative con-
systems requiring complex software architecture, new user         structs to combine these primitives. There is no support for
interface techniques, and new underling representations. How-     function calls or a call stack. The runtime model consists of
ever, these are the very elements that are common to many         a set of concurrently executing scripts, with asynchronous
underlying features that developers should be putting their       messaging and synchronized shared variables among them.
time and attention.                                               Each script is initiated independently in response to a mes-
                                                                  sage, key command, mouse event, or a system-wide initia-
SCRATCHTALK                                                       tion event (greenflag).
The ScratchTalk prototype emerged from an exploratory de-
sign and implementation intended to uncover the major de-         EXAMPLE SCENARIO PART I
sign tradeoffs, representational issues, and user interface re-   To simplify the potential space of dialog and the represen-
quirements involved in implementing the Social Computa-           tation of both time and causality, I focus specifically on the
tion model. At present, the system does not support a com-        using Scratch to create story-like animations.
munity model. The intent of this work was to investigate
constraints on the natural language dialog and to identify        The example animation used to drive the construction of the
specific mechanisms by which one user can create domain           prototype is a simple “dog chases cat” scenario. The user
knowledge, plans, or language mappings that would be use-         expresses the steps of an animation in natural language, the
ful to other users.                                               natural language is converted into an intermediate represen-
                                                                  tation, and the representation is in turn converted to a Scratch
ScratchTalk is an interface to the Scratch programming en-        program that, when executed in the Scratch UI, the user can
vironment [6]. Scratch which was designed to be easy for          critique or extend.
children to learn. This simplicity emerges from its simple
world state, small set of primitives, and a highly transparent
                                                                  A Simple Scenario
graphical user interface. Scratch has many desirable proper-
ties:                                                             A ScratchTalk user might start an animation description with
                                                                  the first visible action:
• Fail-soft. Mistakes are not catastrophic
                                                                  "A cat wanders randomly
• Interactive. Mistakes are easy to fix.                           around the screen."

• Visual. It easy to visualize state and behavior.                In response, ScratchTalk instantiates a sprite called ’cat’ and
                                                                  creates a ’wandering’ event with an associated ’wander’ ac-
• Transparent. Behavior and state are not hidden.                 tion. A script that executes the wander action is synthesized
                                                                  and a program sent to the UI for execution. The user sees a
• Simple. It is designed for non-programmer children.             cat making random movements in random directions on the
screen.                                                           A second kind of extension is modifying an existing action.
                                                                  In the case where the user has provided a sprite costume (or
"When the cat comes near the dog,                                 icon) that depicts the dead cat, the system will not automati-
 the dog chases the cat."                                         cally switch to this graphic.

This next statement results in the system inferring that there    "When the cat dies, switch to the
is a new actor called ’dog’ which waits for a ’near’ event to      ’dead’ costume."
occur (distance between dog and cat sprites is less than some
value) and at that time starts a chase event. The animation       Rather than creating a new event, this speech act will mod-
now shows the cat wandering around the screen, if the cat         ify an existing event. The die action currently stops the runs
comes near the dog, the dog will keep moving towards the          away action and results in the cat saying “Ackthppht”. The
cat while the cat wanders around.                                 action for “switch” binds to the “cat” by default as no ex-
                                                                  plicit subject is provided. While the exact mechanism is
The user finishes the initial scene with:                         complicated1 , these two events (say and switch) can be com-
                                                                  bined by concatenation of the two events. ScratchTalk’s rep-
"If the dog catches the cat,                                      resentation of actions allows this to be done for any two
 the cat dies!"                                                   events. The editing of actions is a central feature of the sys-
                                                                  tem and described in-depth in the section on planning below.
This time, when the dog touches the cat during the anima-
tion, the cat stops wandering and performs a ’die’ script. The    IMPLEMENTATION
default script in the knowledge base is to mimic the comic
strip character Bill the Cat: stop moving and say “Ackthp-
pht”.

This simple example contains within it a surprising amount
of complexity with implications for both program and lan-
guage semantics.

Extending the Scenario
There are several problems with the initial program. When
the dog chases the cat, the cat has no reciprocal response,
as common sense would lead one to expect. The cat simply
stops when he dies and should probably change its look to
better capture the action.

The user can fix errors or add to the existing program as
follows:
                                                                                  Figure 5. ScratchTalk Architecture
"When the dog chases the cat,
 the cat runs away."                                              ScratchTalk is written in Common Lisp and communicates
                                                                  with a Scratch instance over a network socket. All of the
This phrase triggers a template in the system for “when        logic described in this paper was implemented in the Lisp
, ” that searches for an event matching . In this           program. The socket simply allows the lisp program to up-
case it first tries matching the phrase directly, assuming a      load programs to Scratch as well as the commands: start,
surface variation of it was used to create the event. Fail-       stop and clear.
ing this the system can look for near matches by identifying
the verb class and looking for events that match the related      The core of ScratchTalk consists of natural language pipeline
classes, such as pursue in this case.                             that operates roughly as follows:

The “near” event that initiates the “chase” will now also ini-    1. Parse speech act => parse tree + VerbNet annotation
tiate a new event, “runs away”. This event is then linked
to the other two events in a manner to be described in the        2. Compound sentence rules (i.e. when X then Y) =>
following section.                                                   primitive subtrees
                                                                  3. Dialog modeling => filters system commands, validates
Dialog and natural language processing in ScratchTalk ben-           reference scope
efit from the highly contextual nature of the interactions.
The scope of the reference into the program representation        4. Reference resolution (NPs, VPs, arguments) =>
is highly likely to be local, thus there are a finite number of      SVOO style predicates
possible matches . Moreover, there is a reasonable be-         1
                                                                    Event merging is an optimization that occurs at program genera-
lief from the conversational contract that  is sufficient      tion time so the full temporal structure of the graph is maintained
to uniquely identify the event of interest.                       during dialog with the user.
5. Goal resolution => choose a plan to implement goal in             (progn
   speech act from a library of plans                                  (do-repeat 10
                                                                 ((forward 2)
6. Modify program representation => insert, delete, reorder,      (wait 0.02)))))
   or modify objects, properties, events, or actions according     (:constraints (type-of $agent actor)
   to chosen plan                                                (member $away (away))))
7. Synthesize Scratch code and send to Scratch to be exe-
   cuted                                                         (defaction run ($agent $away $np)
                                                                   (run-away $agent $np)
In short, the system extracts goals from user speech acts,         (:constraints (type-of $agent actor)
uses those to modify a high-level representation of the ani-     (type-of $np actor)
mation, then generates low level code to animate that repre-     (member $away (away))))
sentation.
                                                                 For example, the second definition of run above requires that
                                                                 the $np argument resolves to an actor object.
Parsing and Reference Resolution
To quickly acquire a reasonably broad coverage of natural
                                                                 Representing Scratch Animations
language input with which to perform these mappings, the
SwiRL [11] semantic tagging framework was chosen. The            ScratchTalk uses an intermediate representation to capture
system is simple to use and produces sufficiently robust re-     the essential phenomenology users are expected to reference.
sults for the purposes of the prototype.                         This representation is compiled into a concrete Scratch pro-
                                                                 gram which can then be run within the Scratch UI. For ex-
SwiRL uses Charniak’s parser [2] to generate a parse tree,       ample the Scratch script below, taken from the cat and dog
then annotates constituents with role labels such as actor,      example, is initiated by an asynchronous message, “wander”
object, path, auxiliary arguments, etc. For simplicity, we       and causes the associated sprite to perform a random-walk
do not adapt the system to develop new mappings for failed       towards the bottom right corner of the screen.
parses using user input. Instead we limit input to simple
sentences on which SwiRL maps verbs in a meaningful way          ((when-receive "wander")
by asking the user to rephrase.                                   (set-var wandering 1)
                                                                  (do-until (= (read-var wandering) 0)
The system first applies a pattern matcher over the parse tree      ((glide-seconds-to-xy 0.2
to handle a variety of compound statements indicating con-             (+ (xpos) (random -25 35))
ditionality and conjunctions or disjunctions such as and, or,          (+ (ypos) (random -30 25))))))
if, while, when, etc. A matching template drives the rest of
the resolution process and performs any coordinate between       ScratchTalk’s intermediate representation is made up of four
the constituents, such as creating temporal, causal or condi-    components: actors, events, actions and a narrative graph.
tional links between events in the animation narrative. For      While these are somewhat tuned to the Scratch world model,
example:                                                         they are intended to be generic constructs that can be applied
                                                                 to other application domains.
(define-linear-template if-then
    (or (if $cond |,| then $consequent)                          Actors
(if $cond |,| $consequent)                                       An actor in ScratchTalk is a simple object representation of a
(if $cond then $consequent))                                     Scratch sprite. It associates a string name with a set of prop-
   ((link initiates                                              erties such as sprite size, initial location, costume dictionary
   (primary-event                                                (alternative sprite graphics), sound dictionary, etc. Compiled
    (resolve-event $cond))                                       actions result in a set of scripts being associated with the ac-
   (primary-event                                                tor object, which is then serialized and sent to the Scratch
    (resolve-event $consequent)))))                              UI.

Once the system has successfully identified a primitive sen-     Events
tence, it translates the argument phrases to objects in the      An event captures either a point or an interval in time and
system or to action modifiers. It then looks up action that      is associated with an actor that is the subject of the event.
matches the VerbNet type2 as well as the arguments. Soft         Events serve as anchors for temporal relations and typically
matching allows the system to degrade gracefully by drop-        have an associated action that tells the compiler how to gen-
ping unknown or unneeded modifiers, looking for related          erate code capturing the actor behavior associated with that
VerbNet verbs that do have matching actions.                     event. Because Scratch is a dynamic environment, events
                                                                 can be linked causally (A → B) or an event can be associ-
(defaction run ($agent $away)                                    ated with a conditional action indicating the activation of an
  (do-until ()                                                   event.
2
  SwiRL generates Propbank annotations which ScratchTalk con-
verts to VerbNet tags using semlink[5].                          Actions
Actions are plan templates that act like generic functions
with multiple dispatch. The semantic head names an action
class, and the arguments determine which method of that ac-
tion is selected and associated with the event identified by
that phrase. An action description such as

(defaction chase ($agent $subject)
  (do-forever
     (progn
       (point-towards $subject)                                           Figure 6. A narrative graph of the cat and dog example
       (do-repeat %segment-steps
 (progn (forward %segment-step-size)
(wait %segment-step-time)))))                                      narrative and expressed as messages or semaphore variables
  (defaults (%segment-steps 5)                                     such that scripts start and stop in accordance with the con-
            (%segment-step-size 2)                                 straints in that graph. The fully linked output of the chase
            (%segment-step-time 0.1)))                             script above becomes:

expands to the Scratch script                                      ((when-receive "chase")
                                                                    (do-until (touching cat)
((do-forever                                                          ((point-towards cat)
   ((point-towards "cat")                                              (do-repeat 5
    (do-repeat 5                                                         ((forward 2)
     ((forward 2)                                                         (wait 0.01)))))
      (wait 0.01))))))                                              (broadcast "die"))

which when linked with catch                                       The narrative graph structure is quite general and, with some
                                                                   extension, should work for any process that can be as an
(defaction catch ($agent $subject)                                 event-driven, control-flow graph.
  (wait-until (touching $patient))
                                                                   EXAMPLE SCENARIO PART II
in the context of the narrative graph (see figure 6) becomes       This simple example glosses over some significant issues of
                                                                   precision, such as where the cat and dog should start on the
((do-until (touching "cat")                                        screen, how big should they be, and how fast should they
  ((point-towards "cat")                                           execute these actions? Is “near” 100 pixels or 200 pixels?
   (do-repeat 5                                                    These are exactly the kinds of things a programmer must
    ((forward 2)                                                   think about that the user shouldn’t have to, unless the default
     (wait 0.01))))))                                              assumptions violate their expectation.
As we will see later, there is a more powerful method for          The following sequences illustrate how the user and system
parameterizing actions, action editing, which is integral to       can override defaults, create new defaults and augment an
constructing the system’s lexicon of word and phrase mean-         existing animation model3 .
ings.
                                                                   By default, Scratch places all sprites in the center of the
The Narrative Graph                                                screen. In this case the dog is near the cat, the dog is touch-
A ScratchTalk animation is tied together by a graph of nar-        ing the cat and thus the whole animation terminates imme-
rative relations among events. This functions loosely as a         diately. It is easy to fix this by changing the default start
control flow graph for the animation. An event can initi-          location.
ate another event (a touch of a wall initiates a bounce off
that wall) or it can terminate another event (when the dog         "The cat should start in the upper left
catches the cat, the dog stops chasing the cat). There is also      corner and the dog should start in the
a concurrent relation which allows a set of actions to be ini-      lower right corner."
tiated as a group. In this case the compiler will generate a
common message so that any initiating or terminating event         The conjunction template for “and” breaks this down into
that is applied to one event is applied to all events labeled as   two phrases, each consisting of a “should start” action verb.
concurrent.                                                        These actions modify properties of the subject, rather than
                                                                   inserting or editing an event. A primitive script computes
When the final program is synthesized, the events and ac-          3
tions contained in the graph are converted into a set scripts        While the representational apparatus exists for the following ex-
                                                                   amples, the system does not yet provide the end-to-end mapping
each associated with a specific sprite. A set of optimizations     from natural language to final code due to limitations in the seman-
is first applied, such as combining ’chase’ and ’catch’ above.     tic labeling system (the analysis section contains a short discussion
Temporal and conditional constraints are extracted from the        of this problem).
the coordinates of a sprite of the size of cat and dog in the       formulate in the blocks world. In this way, HACKER not
locations described so they start in the appropriate location.      only becomes better at satisfying goals, but also better at
In the Social Computation model, placement primitives can           constructing new programs.
be provided by more expert users. Ordinary users can then
modify the application of those procedures to specific cir-         There are some crucial differences between the blocks world
cumstances.                                                         and the Scratch world. ScratchTalk doesn’t have a clean state
                                                                    model, and thus no means of formally classifying bugs in
Another default behavior of the system is that the default          terms of that state. The system can have, however, a formal
wander script has no directional bias. The cat just keeps           model of programming knowledge as well as different patch
wandering randomly around the start location.                       types.

"The cat should wander towards                                      The goal is to use the HACKER system model, but to rely on
 the bottom left corner"                                            users to incrementally specify domain knowledge and bug
                                                                    types4 . Domain knowledge and recognition of mistakes and
This speech act requires the system to perform a very so-           problems are exactly what many humans can do that our ma-
phisticated change to a pre-existing script. The step parame-       chines can’t.
ters of the loop that implements the random steps need to be
modified to properly add the bias. This can be performed            • Recognizing errors. The transparency property of the do-
by having some general knowledge about scripts, param-                main is important because it enables users to see errors
eters, coordinates, and motion. Most of the design work               when they happen. Highly transparent systems tend to ex-
in ScratchTalk was ensuring that the representations above            hibit temporal locality of cause and observed effect, sim-
could support operations such as this.                                plifying user critique.

Scripts often are missing default parameters; for example           • Critique and patching. Given a kind of critique, the sys-
how much should the step calculation be biased? This ex-              tem can apply a patch and attempt to repeat the prior be-
ample shows that the database of defaults is as important to          havior. The properties of interactivity and fail-soft are
the interactivity of the system as are the representations and        critical to being able to backup and retry in the presence
behavioral descriptions. However, each person who creates             of failures.
or modifies a specific defaults value can empower other users
in the network.                                                     • Explanations and repair. By expanding an action plan, the
                                                                      system can generate explanations that can prime users to
                                                                      provide repair suggestions using language the system can
PLANNING IN SCRATCHTALK                                               understand.
Actions are essentially plans that can be tied to a goal. In
the example above, goals were captured as imperative com-
                                                                    Editing Events and Narrative Relations
mands stated in natural language. The most interesting fea-
ture explored by ScratchTalk is the means by which actions          As shown in the original example, users can update the pro-
are extended to accommodate new problems and new situa-             gram model by clarifying what happens to or around a spe-
tions.                                                              cific event. The system currently supports the following edit-
                                                                    ing operations for events and narratives:
Plans as Programs                                                   • Inserting and removing events. The simplest editing op-
I employ a model of planning introduced by HACKER [12].               eration is to do what the dialog does naturally, which is to
HACKER was an automated planning framework for a sim-                 have a prior event for the same sprite initiate the next even
ple block stacking domain. Plans to satisfy goals were pro-           and the next even terminate the prior event. Concurrency
cedures. These procedures were incrementally modified to              of actions for the same object is not assumed by default.
work around obstacles discovered in achieving a goal. In ef-          In fact such concurrency is typically implemented by edit-
fect, programs are incrementally “hacked” together using a            ing actions.
general representation of domain knowledge, programming
techniques, bug types, and patch types. When a goal can-            • Temporal reordering. The user can change the order and
not be achieved in a single primitive step, it is treated as a        casual relationships among events by adding or deleting
bug. The bug is classified, which results in a patch attempt          temporal relations. This is currently not directly available
being made. The patch attempt may use domain knowledge                to the user, but the same machinery is used by the dialog
or general programming knowledge, such as “if failed pre-             system when adding new events.
condition, first satisfy precondition”. Finally, a critic is gen-
erated that recognizes when a bug type has been created by          • Editing event action slots. An action can be overridden
the program and patches it automatically without actually             by an edit to create a new, derived action. This happens
running the program.                                                  whenever the user augments an event as illustrated in the
                                                                      examples.
By incrementally training the system on goals and states of         4
                                                                     With better access to state than in ScratchTalk, it should be pos-
increasing complexity, the system constructs a rich lexicon         sible to create critics which effectively model the domain without
of robust plans that solve many of the goals that one can           having to construct a complete model of the domain.
• Creating and revoking concurrency (see figure 6 for an        Domain Issues
  example of the concurrency relation).                         The properties of the Scratch animation domain all contributed
                                                                to the development of the program, however there are sev-
                                                                eral properties lacking that I would want to see in the next
Action Editing
                                                                experimental domain:
Action editing can be quite sophisticated. To make a cer-
tain kind of side effect on a program’s behavior, we have to    • Access to more internal state. Scratch provides an impov-
map the user’s expression “chase faster” to a command that        erished set of sensing primitives. A sprite cannot know
edits action scripts. The simplest editing operation might        the location of another sprite, only it’s direction and dis-
be “increment statement X in pattern Y ” to match a spe-          tance5 .
cific statement in a procedure. We can make a pattern more
complicated so that it can match more flexibly, similarly to    • Synthesize and run. Generating code, then visualizing the
HACKER. For example; “Increment the step-distance of the          program as a whole makes the interaction quite stilted. Far
inner-most motion loop in this procedure.” This latter form       better to directly execute from the intermediate represen-
is what the system uses currently, although the number of         tation and use continuous planning techniques.
such patterns remains small.
                                                                • Interpreter, not compiler. If always running interpreted,
The editing operations available and used in the current sys-     the system can learn when a certain state will cause a plan
tem include:                                                      to fail and switch to another. ScratchTalk is forced to pre-
                                                                  compile all alternative plans into conditional code state-
• concatenating statement sequences                               ments or parallel scripts, greatly increasing the complex-
                                                                  ity for learning.
• insert a statement to the front of a loop body                • Direct manipulation. The Scratch UI was hard to use be-
                                                                  cause of an inability to get inside the UI to allow users to
• insert a statement at the end of a loop body                    perform direct manipulations that directly teach the sys-
                                                                  tem. A user-friendly system would provide multi-modal
• remove a statement from a loop body                             inputs, using direct manipulation where it makes sense
                                                                  and using language when the UI metaphors become awk-
• add a loop exit condition                                       ward.

• override a default                                            Plan Representations
                                                                One of the more challenging aspects of ScratchTalk is the ac-
• modify a parameter                                            tion representation. A great deal of the time on this project
                                                                was spent iterating through different action representations
• force an alternate action selection                           and the current implementation is far from satisfactory. The
                                                                chief difficulty lies in handling plan variations under a spe-
Most of these operations are already in use by the event        cific goal, or different methods for performing the same plan
linker to merge event bodies and tie together all the actions   step. This is further complicated by action editing. Some
using the variables and message passing. The problem of         questions immediately arise. Do we edit a fully expanded
using natural language to evoke these editing operations is     version of a top level action, or do we edit specific sub-
one of mapping properties or changes (faster, spin around,      actions in the hierarchy? If we are editing the hierarchy,
etc) to code sections that can serve as hooks for the editing   what information do we have about sub-actions (sub-goals)
primitives. An expert user can add these mappings automat-      to determine the reference for an editing action?
ically, or the machine can learn such mappings by general-
izing from small numbers of examples and recording excep-       This leads to an interesting design space. The following are
tions in the case of over generalization.                       some key parameters of this space.

                                                                • Dispatch. Is action dispatch a static process (easy for ac-
ANALYSIS AND FUTURE WORK                                          tion expansion) or a dynamic dispatch (allowing for late
The central innovation in ScratchTalk is enabling users to        binding based on world state). Static dispatch pushes com-
use natural language to incrementally extend plans through        plexity into conditional code, dynamic dispatch pushes
critique and repair of existing plans. These individual con-      complexity into the action editing rules.
tributions can be aggregated from a number of users to ex-
pand on the knowledge base available to a given user. These     • Hierarchy. How do we maintain information about a spe-
contributions come in the form of new actions, default val-       cific action hierarchy? If we want to edit a parameter
ues, action editing scripts, and new language mappings to         or default for a sub-action, do we annotate the top level
these components. The work described here is insufficient         action or create an instance of the sub-plan and annotate
to prove the full Social Computation model, but takes some        that? How would that interact with dynamic dispatch?
important steps in that direction. During the development of    5
                                                                 There are no function abstractions in Scratch, so the polar to
the application several problems were identified that inform    Cartesian coordinate conversion makes the code very difficult to
future work.                                                    read.
• Expansion/Synthesis. Do we expand code as we construct         http://scratch.mit.edu/
  it and edit the primitives, or do we maintain the hierarchy
  and synthesize as late as possible? I prefer the latter, but   A Quicktime movie of this example can be viewed at
  again it adds action editing rule complexity.
                                                                 http://web.media.mit.edu/
• Annotations. How do we keep track of what a parame-            ˜eslick/scratchtalk-demo1.mov
  ters or default will be at final code synthesis time? Do
  we perform code expansion on every edit to validate all        REFERENCES
  constraints, or do we wait until code synthesis time?           1. C. Anderson. The long tail. natpe.org, Jan 2006.

Learning                                                          2. E. Charniak. A maximum-entropy-inspired parser.
The chasing and running away example affords an inter-               Proceedings of NAACL, Jan 2000.
esting observation about the potential for acquiring general      3. A. DeHon, J. Brown, I. Eslick, J. Harris, L. Karbiner,
knowledge from users of this system. In general there are            and T. F. Knight. Global cooperative computing.
a large class of verbs with subject and object arguments for         Workshop on Wide-Area Collaboration and
which a user would expect a reciprocal action. (For example:         Cooperative Computing, Second International WWW
chase & run, attack & defend, threaten & fear, etc).                 Conference: Mosaic and the Web, 1994.
Given several examples of chase being associated with run         4. A. Ko, B. Myers, and H. Aung. Six learning barriers in
away, the system can form the abductive hypothesis that any-         end-user programming systems. IEEE Symposium on
thing chased runs away and add the reciprocal event by de-           Visual Languages and Human-Centric Computing
fault in the future. The response of the user to this kind of        (VLHCC’04), Jan 2004.
assumption helps determine the reliability of a given default.
                                                                  5. E. Loper, S. ting Yi, and M. Palmer. Combining lexical
                                                                     resources: Mapping between propbank and verbnet. In
Complexity                                                           Proceedings of the 7th International Workshop on
Each layer of the action system requires default knowledge           Computational Linguistics, 2007.
of various kinds and an ability to resolve natural language
references to the appropriate subsystem, hierarchy layer, and     6. J. Maloney, L. Burd, Y. Kafai, N. andd Silverman
editing routine. It is possible that the complexity of this          Brian Rusk, and M. Resnick. Scratch: A sneak preview.
architecture will severely limit the scaling of this system.         Proceedings of the Second International Conference on
The promise of Social Computation is that the collective             Creating, Connecting and Collaborating through
contributions will compensate for the complexity by rely-            Computing, Jan 2004.
ing on human contributors to create, tune, and filter a rich
library of heuristics that work often enough to keep users        7. M. Minsky. The emotion machine: Commonsense
satisfied. This philosophical approach to developing com-            thinking, artificial intelligence, and the future of the
putational functionality is heavily inspired by the views of         human mind. Simon and Schuster, Nov 2006.
Minsky [7] and Singh [10].                                        8. B. Myers, J. Pane, and A. Ko. Natural programming
                                                                     languages and environments. Communications of the
CONCLUSION                                                           ACM, 47(9):47–52, Jan 2004.
This paper introduces ScratchTalk, a prototype system en-
                                                                  9. R. Panko. What we know about spreadsheet errors.
abling end-users to specify and modify programmatic behav-
                                                                     Journal of End User Computing, 10(2):15–21, Jan
ior via natural language. This prototype system illustrates a
                                                                     1998.
representational apparatus amenable to the constraints of the
Social Computation scripting model.                              10. P. Singh. Em-One: an architecture for reflective
                                                                     commonsense reasoning. PhD thesis, Massachusetts
The current prototype allows users to describe simple anima-         Institute of Technology, 2005.
tions in the Scratch programming environment using natural
language. ScratchTalk introduces three key ideas: the con-       11. M. Surdeanu and J. Turmo. Semantic role labeling
cept of end user programming through plan elaboration, the           using complete syntactic analysis. Proceedings
central role of user critique and repair in that elaboration,        Proceedings of CoNLL, Jun 2005.
and the use of natural language to reference plans, critiques,
                                                                 12. G. J. Sussman. A Computational Model of Skill
and repairs.
                                                                     Acquisition. PhD thesis, Masachusetts Institute of
A version of the generated cat and dog example can be down-          Technology, 1973.
loaded from

http://web.media.mit.edu/
˜eslick/example.sb

and run on the Scratch UI available from