=Paper= {{Paper |id=Vol-1964/CS3 |storemode=property |title=An Eyes and Hands Model for Cognitive Architectures to Interact with User Interfaces |pdfUrl=https://ceur-ws.org/Vol-1964/CS3.pdf |volume=Vol-1964 |authors=Farnaz Tehranchi,Frank Ritter |dblpUrl=https://dblp.org/rec/conf/maics/TehranchiR17 }} ==An Eyes and Hands Model for Cognitive Architectures to Interact with User Interfaces== https://ceur-ws.org/Vol-1964/CS3.pdf
    Farnaz Tehranchi and Frank E. Ritter                  MAICS 2017                                                         pp. 15–20


                  An Eyes and Hands Model for Cognitive
                 Architectures to Interact with User Interfaces
                              Farnaz Tehranchi                                                 Frank E. Ritter
           Department of Computer Science and Engineering                     College of Information Sciences and Technology
             Penn State, University Park, PA 16802 USA                          Penn State, University Park, PA 16802 USA
                      farnaz.tehranchi@psu.edu                                              frank.ritter@psu.edu

                                                                         approach, can be called a Cognitive Model Interface
ABSTRACT                                                                 Management System (CMIMS) [4], which is an extension of
   We propose a cognitive model to interact with interfaces. The         the concept of a User Interface Management System (UIMS)
main objective of cognitive science is understanding the nature          [5].
of the human mind to develop a model that predicts and                      Cognitive architectures are infrastructures for cognitive
explains human behavior. These models are useful to Human-               science theory and provide computational frameworks to
Computer Interaction (HCI) by predicting task performance and            execute cognitive theories. They are programming languages
times, assisting users, finding error patterns, and by acting as         specifically designed for modeling, such as Soar [1, 6] or ACT-
surrogate users. In the future these models will be able to watch        R [2]. A user model is thus a combination of task knowledge
users, correct the discrepancy between model and user, better            and a cognitive architecture with its fixed mechanisms to apply
predicting human performance for interactive design, and also            the knowledge to generate behavior.
useful for AI interface agents. To be fully integrated into HCI             The aim of this research is to develop a cognitive model and
design these models need to interact with interfaces. The two            provide ACT-R models access to the world by enabling ACT-
main requirements for a cognitive model to interact with the             R to interact directly with the Emacs text editor. This approach
interface are (a) the ability to access the information on the           has been called Esegman, for Emacs Segman [7], related to
screen, and (b) the ability to pass commands. To hook models             another tool, Segman, for hooking user models and agents to
to interfaces in the general way we work within a cognitive              interfaces [8]. ACT-R can communicate with a task
architecture.     Cognitive architectures are computational              environment by instrumenting a graphical library [9], in this
frameworks to execute cognition theories—they are essentially            case the Emacs text editor. Emacs is an interactive text editor
programming languages designed for modeling. Prominent                   that works with spreadsheets [10] and includes extensions to
examples of these architectures are Soar [1] and ACT-R [2].              read email and browse the web.
ACT-R models could access the world interacting directly with               A brief review on the Esegman approach is given in section
the Emacs text editor [3]. We present an initial model of eyes           II. Section III and IV explain the ACT-R structure and our eyes
and hands within the ACT-R cognitive architecture that can               and hands model. Finally, conclusion remarks and future
interact with Emacs.                                                     research are discussed in section VI and VII respectively.
KEYWORDS
  Cognitive Model; Cognitive Architecture; Human Computer
Interface, Interaction.

1        INTRODUCTION1
   HCI has used cognitive models of users successfully in
different ways: for examining the efficacy of different designs
to predict task performance times, helping to create and choose                       Figure 1: Cognitive model structure.
better designs, and saving overall system cost by providing
feedback for designers. In the future these models will be able
to watch users, have more realistic input, correct themselves,           2    THE ESEGMAN APPROACH
and predict human performance.
   To be useful in HCI research, these models need to be                    Emacs Segmentation/Manipulation (Esegman) provides a
capable of interacting with interfaces. The two main                     connection between the cognitive architecture and the world.
requirements for a cognitive model to interact with a task               Both ACT-R and Emacs are written in Lisp. It allows us to
environment are (a) the ability to pass commands, and (b) the            extend them and design a bridge between them. This bridge will
ability to access the information on the screen—the cognitive            enable ACT-R to communicate with an interactive
model encodes screen’s objects that has been used as an agent            environment. ACT-R cognitive models have a loop with three
architecture. If cognitive models interact with user interfaces,         components: the task environment, the model, and the results
then the models will be easier to develop and apply. This                (see Figure 1). The cognitive model receives its perspective as

    1
        © Copyright retained by the authors



                                                                    15
  An Eyes and Hands Model for Cognitive Architectures to Interact with User Interfaces                                    pp. 15–20


                                                     Emacs Eyes and Hands

                                                                                      Elisp process Emacs

                           ACT-R Common Lisp process                                      • Parse buffer
                                                                                    • Generate list of events
         ACT-R model



                                Model pressed key                    Handle press key                      Insert




                                                                                                                              Emacs
                                       Model moved mouse
                        Model moved        to #(X Y)                                                     Mouse-1
                                                           Handle mouse event
                       mouse and clicked Model clicked the                                           Set-mouse-point
                                             mouse

                              Model wrote command                    Handle command                   Read and Eval

      Figure 2: Emacs Eyes and Hands output approach, showing how model process (left) are implemented in the Emacs text
                                                        editor (right)
input from the visual (and auditory) module and outputs actions
through the motor/manual module. The cognitive architecture
within this process understands the external world as shown in
Figure 1.
   The modeling process can start with collecting data from the
real world. This data gathering has been performed using the
Dismal spreadsheet [10]. Dismal is an augmented spreadsheet
developed for Emacs that was designed to gather and analyze
behavioral data. Users can interact with the spreadsheet through
keystrokes, menus, and function calls. Participants have
performed a set of spreadsheet tasks in Dismal [11], and their
performance over 14 subtasks, about 20 min. of work, was
recorded using the Recording User Input (RUI) software [12,
13]. As will be discussed in the next section, we thus have an
existing dataset of human behavior on a complex task to model
[11] and also have a model available [14]. A comparison has                            Figure 3: The ACT-R 6 Architecture.
been done, but not on a fine-grained level, and without
modeling interaction, error, and error correction. Because the
model did not have access to the spreadsheet, which is complex                Table 1: Initial Production Rules of the Eyes and Hands
to model without duplicating a spreadsheet. This model was                                             Model.
developed by High-level behaviour representation language                    Production Rules & Buffers used
(Herbal)—an open source software that supports two cognitive                 Start              Goal, Manual, Visual-Location
architectures and one agent architecture through a set of                    Rule1              Goal, Visual-Location, Visual
common cognitive modeling tasks [15]. Herbal compiles into                   Rule2              Goal, Manual, Vocal
declarative knowledge and procedural knowledge, and its                      Rule3              Goal, Manual
output has a syntax similar to ACT-R.                                        Rule4              Goal, Vocal
   In particular, for our connection between the architecture and
the world, we will use Emacs functions to take the interaction           Emacs window will move. This approach provides the output
commands from the model (ACT-R will be a sub-process of the              direction in Figure 1. However, for input direction of Figure 1
Emacs process) and insert them and their effects into the target         we have to feed ACT-R with the information about the world
Emacs window; and similarly take the state of Emacs and make             by inserting contents into its visual module.
it available to the model. Figure 2 diagrams most of the
components of this approach. Therefore, the model will be able
                                                                         3      THE ACT-R ARCHITECTURE AND MODEL
to execute commands and be aware of their results. For
example, after collecting the commands from ACT-R and                      The two basic types of knowledge in ACT-R are declarative
receiving requests to ‘look’ and move the eye, the courser in            and procedural. Declarative knowledge (Chunks) contains



                                                                    16
  Farnaz Tehranchi and Frank E. Ritter                    MAICS 2017                                                       pp. 15–20




                                       Figure 4: Production History in Different Critical Cycle


facts, images, locations, and sounds. Procedural knowledge                  (p rule1
(production rules) indicates behavioral aspects of performance                   =goal>
with goals, operators, methods, and selection rules [16]. All                     isa goal
                                                                                  step 0
tasks in the model [14] we will build on include a combination
                                                                                 =visual-location>
of declarative and procedural knowledge.                                         ?visual>
   Figure 3 shows a schematic of ACT-R’s architecture and the                     state free
default modules of ACT-R [17]. The ACT-R modules                                 ==>
communicate through buffers, which may contain a memory                          +visual>
Chunk. The default set of modules can be partitioned into Goal,                   isa move-attention
Imaginal, Declarative, Vision, Motor/Manual, Speech/Vocal,                        screen-pos =visual-location
and Aural Modules. The model presented in the next section                       +goal>
exercises the Manual, Vision, Goal, and Vocal modules. Table                      isa goal
                                                                                  step 1)
1 shows corresponding buffers.
   Buffers in ACT-R are the interfaces between modules. Each
                                                                                      Figure 5: Sample ACT-R Production.
buffer is connected to a specific module and has a unique name.
A buffer is used to relay requests for actions to its module and
a module will respond to a query through its buffer. In response
to a request, the module will usually generate one or more
                                                                        4    THE EMACS EYES AND HANDS MODEL
events to perform some actions or place a Chunk into the buffer.           In this section we briefly describe a small model used to test
A module has access to any buffer at any time, but can only             and exercise the Esegman architectural additions. In this model
manipulate its own buffer [18].                                         declarative memory Chunks are created initially as the basic
   ACT-R runs a pattern matching process that finds production          building blocks. We define two Chunks types, one for the target
rules with conditions that match the current state of the               window location and one for Goal buffer to track the steps of
system—the left hand side of production rules. Then, tasks are          model. This model is a static model without learning and uses
performed if conditions match and the actions of production             the GOMS framework for HCI. The model proceeds as follow.
rules will be fired (so-called RHS). Productions can make a             After setting the parameters that control the general operation
request to an ACT-R buffer and then that buffer's module will           of the system, we define Chunk types for the model and
perform whatever function it provides may then place a Chunk            declaring the configuration of slots that will be utilized in
into a buffer (summarized in Table 1 for our model). These              Chunks. The ACT-R has an ability to process visual objects on
production rules can be tested by comparing the performance,            a computer screen. In this regard, two separate buffers have
time to perform the task, accuracy in the task, of the model            been defined for locations and objects. The visual object has
performing the task with the results of people doing the same           multiple features such as color, shape, height and width. But
task. Furthermore, we can use neurological data (fMRI) as               visual location is only representing the location of these objects.
measures of cognitive psychology.                                       Thus in ACT-R the location can be retrieved separately [20].
   Herbal/ACT-R generates an agent in ACT-R from a Dismal                  An ACT-R model was constructed in a way that is able to
spreadsheet model represented hierarchically for a sequence of          interact with the same experiment software as the human
tasks and the relationship among these tasks [14]. The task             participants. The manual module provides the functionality to
representation is based upon hierarchical task analysis and             interact with Emacs window such as the moving courser,
GOMS. GOMS (Goals, Operators, Methods, and Selection                    clicking, monitoring hand movement, and use a mouse and a
Rules) is a high-level language in which HCI tasks can be               keyboard. However, it requires a hook to facilitate these
articulated in a hierarchical form to decompose complex tasks           functionalities.
into simpler ones [19]. In this GOMS-like cognitive user model             Figure 4 demonstrates the productions history according to
the production rules are created from the hierarchical tree             Table 1. The critical cycle in ACT–R is when the buffers hold
structure. The model retrieves the next node/sub-task from the          representations determined by the external world and internal
declarative memory elements according to a depth-first search           modules, patterns in these buffers are recognized, a production
algorithm to complete the task. In this model, the novice model         fires, and the buffers are then updated for another cycle [21].
has to carry out more memory retrievals. In contrast, an expert         The assumption in ACT–R is that this cycle takes about 50 ms
model uses fewer declarative memory elements. We present an             to complete—this estimate of 50 ms as the minimum cycle time
initial model of eyes and hands within the ACT-R cognitive              for cognition has emerged in a number of cognitive
architecture that can interact with Emacs in next section that          architectures including Soar (Newell, 1990 [1]).
complete the output direction of Figure 1.


                                                                   17
  An Eyes and Hands Model for Cognitive Architectures to Interact with User Interfaces                                         pp. 15–20


                                                                                           Table 2: Esegman Capabilities.

                                                                                    Implemented                To be implemented
                                                                            x    Press Key                    x Move Fovea (focus
                                                                            x    Move Courser                   of visual attention)
                                                                            x    Click Mouse                  x Parse Fovea Area
                                                                            x    Execute Obituary             x Track/Find Courser
                                                                                 Commands                     x Find Icon


                                                                           tracks the operations in the goal buffer, a motor region that
                                                                           tracks operations in the manual buffer, and a vision region that
                                                                           tracks operations in a visual buffer that holds the problem
                                                                           representation.
                                                                              The model will explain how these components of the mind
    Figure 6: 3D-brain viewer, showing what parts of the                   worked to produce rational cognition. In our model four parts
      brain are active at a point in the model execution.                  of the brain were active and the total amont of active times is
                                                                           presented in Figure 4.
   The first production rule (Start) makes a request to the visual-
location buffer. The visual-location module places the location            5     CONCLUSION
Chunk into a visual-location buffer. Also, in the Start rule the
                                                                              The CMIMSs research area can be exploited to help in the
manual module presses the keystroke. The model moves the
                                                                           development of cognitive models, agents, and supporting them
visual attention and uses the retrieve visual location to reflect
                                                                           as users that interact with interfaces. But, it will definitely help
the visual object in Rule1. Figure 5 shows a sample production,
                                                                           elevate testing user interfaces, making this process more
the equivalent of the Rule1. This production describes
                                                                           approachable and easy enough to do that it can be done and does
operations on three buffers, goal, visual-location, and visual.
                                                                           get done. This approach of using model to test interfaces and
Each of these buffers has a set of slots, prefixed by the ‘=’, its
                                                                           even systems as they are built has been call by the national
values can be constrained in a standard way. On the left hand
                                                                           research council [23].
side of the production, the goal buffer is constrained to hold a
                                                                              By using the Eyes and Hands model in place of a user,
goal with a given name and slot (here goal 0), and it must be
                                                                           questions about user interface designs such as evaluating
found in memory. Also the visual buffer state should be free
                                                                           designs, changing the interface and examining the effects on
and there must be a Chunk in visual-location. On the right hand
                                                                           task performance can be answered more easily. The Eyes and
side, the current task and state slots of the goal buffer are
                                                                           Hands model as a cognitive model approach can prove the
updated (to 1) and the visual module move the attention to the
                                                                           advantages of CMIMS in HCI, and start to realize the use of
address in visual-location buffer. In Rule2 model moves the
                                                                           model in system design [24].
mouse courser to the visual object that at the previews task
                                                                              In this work, instead of utilizing any methods to get the most
moved the attention to and will click upon it in Rule3. The
                                                                           accurate output. We use an approach to get the more human-
Ruld4 uses the vocal buffer to speak out the commands. Table
                                                                           like output and soon input. We present preliminary
1 shows a summary of all modules and buffers that were called
                                                                           implementation of this proposed direction. Table 2
through the corresponding tasks.
                                                                           demonstrates the summary of our work. We were able to
   Figure 4 shows the different time costs of productions rules.
                                                                           complete the output direction in Figure 1 and augmented the
Time columns correspond to when the procedural module
                                                                           model to perform all sequence of tasks in Figure 2. We will
attempted to find a production to fire (the matched
                                                                           build upon Paik et al’s [14] model of the dismal task. It
production). Figure 4 demonstrates the use of serial modules in
                                                                           performs the task and learn, but does not interact with a task
parallel [22] because of the parallel threads of serial processing
                                                                           simulation.
in each module. For instance, Rule2 should wait for the
completion of Start rule because at 0.35 the action at which
Rule2 matches was completed—the manual state will be free                  6    FURTHER RESEARCH AND LIMITATIONS
again. But, the Rule2 used the three modules Goal, Manual, and               To have a comprehensive cognitive model we need to add the
Vocal, which can all make a request to their buffers in parallel           input direction in Figure 1. This remains to be implemented
(three threads). The serial bottleneck of ACT-R is its buffer              fully in our model. For the input direction of Figure 1, we will
limitation to a single declarative unit of knowledge, Chunk, thus          have to feed ACT-R models with the information about the
only a single memory can be retrieved at a time.                           world from Emacs by inserting contents into the architecture
   The ACT-R model explicitly can be a map to the physical                 visual module. In the future these models will be able to watch
implementation of the mind, the human brain (Figure 6) and                 users, correct the discrepancy between model and user, and
how activation of brain areas used by the running model predict            better predict human performance for design. These models will
activation of the brain in human when solving a problem. Figure            be useful to HCI by predicting task performance and times,
6 was dveloped by ACT-R enviroment and brain associated                    assisting users, finding error patterns, and by acting as surrogate
buffers are displayed. Figure 6 shows a prefrontal region that


                                                                      18
  Farnaz Tehranchi and Frank E. Ritter                      MAICS 2017                                                   pp. 15–20


users. Also, useful for building interface agents such as Digital                 interaction," ACM Transactions on Computer-Human
Elfs [25].                                                                        Interaction, vol. 14, p. 24 pages, 2007.
  The model Esegman can intent with surrogate interfaces                   [9]    F. E. Ritter, G. D. Baxter, G. Jones, and R. M.
through Emacs for a wide range of systems—Emacs includes a                        Young, "Supporting cognitive models as users," ACM
text editor but also a web browser, a calculator, a spreadsheet                   Transactions on Computer-Human Interaction, vol.
tool, and even an implementation of the natural language                          7, pp. 141-173, 2000.
system Eliza. Thus, we can explore a wide range of behaviors               [10]   F. E. Ritter and A. B. Wood, "Dismal: A spreadsheet
with this approach.                                                               for sequential data analysis and HCI
  When we extend Esegman to include motor control mistakes,                       experimentation," Behavior Research Methods, vol.
we will also have to include knowledge to recognize and correct                   37, pp. 71-81, 2005.
these mistakes. This will requires breaking architecture to                [11]   J. W. Kim and F. E. Ritter, "Learning, forgetting, and
create human-like mistakes, knowing how to correct it. The                        relearning for keystroke- and mouse-driven tasks:
model will also have to make mistakes in correction, and in                       Relearning is important," Human-Computer
noticing mistakes. They are relatively new types of behavior                      Interaction, vol. 30, pp. 1-33, 2015.
and knowledge in models, so we think we will learn a lot.                  [12]   U. Kukreja, W. E. Stevenson, and F. E. Ritter,
  Therefore, adding error analysis to the design will merge our                   "RUI—Recording User Input from interfaces under
model with large models. However, ACT-R deliberately                              Windows and Mac OS X," Behavior Research
included limitations such as only a single memory can be                          Methods, vol. 38, pp. 656–659, 2006.
retrieved at a time, only a single object can be encoded from the          [13]   J. H. Morgan, C.-Y. Cheng, C. Pike, and F. E. Ritter,
visual field, or only a single production is selected at each cycle               "A design, tests, and considerations for improving
to fire can make this approach challenging.                                       keystroke and mouse loggers," Interacting with
                                                                                  Computers, vol. 25, pp. 242-258, 2013.
ACKNOWLEDGMENT                                                             [14]   J. Paik, J. W. Kim, F. E. Ritter, and D. Reitter,
This work was funded partially by ONR (N00014-11-1-0275 &                         "Predicting user performance and learning in human-
N00014-15-1-2275). David Reitter has provided useful                              computer interaction with the Herbal compiler," ACM
comments on Emacs and Aquamacs (the Emacs version for the                         Transactions on Computer-Human Interaction, vol.
Mac). We wish to thank Jong Kim who provided the idea for                         22, p. Article No.: 25, 2015.
ESegman and Dan Bothell for his assistance.                                [15]   J. Paik, J. W. Kim, and F. E. Ritter, "A preliminary
                                                                                  ACT-R compiler in Herbal," in Proceedings of ICCM
REFERENCES                                                                        - 2009- Ninth International Conference on Cognitive
[1]      A. Newell, Unified Theories of Cognition.                                Modeling, Manchester, England, 2009, pp. 466-467.
         Cambridge, MA: Harvard University Press, 1990.                    [16]   J. W. Kim, R. J. Koubek, and F. E. Ritter,
[2]      J. R. Anderson, How can the human mind exist in the                      "Investigation of procedural skills degradation from
         physical universe? New York, NY: Oxford                                  different modalities," in Proceedings of the 8th
         University Press, 2007.                                                  International Conference on Cognitive Modeling,
[3]      F. Tehranchi and F. E. Ritter, "Connecting Cognitive                     Oxford, UK, 2007, pp. 255-260.
         Models to Interact with Human-Computer                            [17]   F. E. Ritter, M. J. Schoelles, L. C. Klein, and S. E.
         Interfaces," in Proceedings of ICCM - 2016-14th                          Kase, "Modeling the range of performance on the
         International Conference on Cognitive Modeling,                          serial subtraction task," in Proceedings of the 8th
         University Park, PA: Penn State, 2016.                                   International Conference on Cognitive Modeling,
[4]      F. E. Ritter, G. D. Baxter, G. Jones, and R. M.                          Taylor & Francis/Psychology Press, 2007, pp. 299-
         Young, "User interface evaluation: How cognitive                         304.
         models can help," in Human-computer interaction in                [18]   D. Bothell. ACT-R 7 Reference Manual. Available at
         the new millenium, J. Carroll, Ed., ed Reading, MA:                      act-r.psy.cmu.edu/wordpress/wp-
         Addison-Wesley, 2001, pp. 125-147.                                       content/themes/ACT-R/actr7/reference-manual.pdf
[5]      B. A. Myers, "User interface software tools," ACM                 [19]   B. E. John and D. E. Kieras, "The GOMS family of
         Transactions on Computer-Human Interaction, vol.                         user interface analysis techniques: Comparison and
         2, pp. 64-103, 1995.                                                     contrast," ACM Transactions on Computer-Human
[6]      J. E. Laird and A. Nuxoll. (2003, Soar Design                            Interaction, vol. 3, pp. 320-351, 1996.
         Dogma. Available at                                               [20]   D. Peebles and C. Jones, "A model of object location
         http://ai.eecs.umich.edu/soar/sitemaker/docs/misc/do                     memory," in Proceedings of the 36th Annual
         gma.pdf                                                                  Conference of the Cognitive Science Society, Austin,
[7]      J. W. Kim, F. E. Ritter, and R. J. Koubek,                               TX, 2014.
         "ESEGMAN: A substrate for ACT-R architecture                      [21]   J. R. Anderson, D. Bothell, M. D. Byrne, S.
         and an Emacs Lisp application," in Proceedings of                        Douglass, C. Lebiere, and Y. Qin, "An integrated
         ICCM - 2006- Seventh International Conference on                         theory of the mind," Psychological Review, vol. 111,
         Cognitive Modeling, Trieste, Italy, 2006, p. 375.                        pp. 1036-1060, 2004.
[8]      R. St. Amant, T. E. Horton, and F. E. Ritter, "Model-             [22]   M. D. Byrne and J. R. Anderson, "Serial modules in
         based evaluation of expert cell phone menu                               parallel: The psychological refractory period and



                                                                      19
  An Eyes and Hands Model for Cognitive Architectures to Interact with User Interfaces   pp. 15–20


       perfect time-sharing.," Psychological Review, vol.
       108, pp. 847-869, 2001.
[23]   R. W. Pew, "Some history of human performance
       modeling," in Integrated models of cognitive systems,
       W. Gray, Ed., ed New York, NY: Oxford University
       Press, 2007, pp. 29-44.
[24]   R. W. Pew and A. S. Mavor, Eds., Human-system
       integration in the system development process: A new
       look. Washington, DC: National Academy Press.
       http://books.nap.edu/catalog.php?record_id=11893,
       checked March 2012, 2007.
[25]   M. Tambe, W. L. Johnson, R. M. Jones, F. Koss, J. E.
       Laird, P. S. Rosenbloom, et al., "Intelligent agents for
       interactive simulation environments," AI Magazine,
       vol. 16, pp. 15-40, 1995.




                                                                  20