1


  Realizing Seamless Interaction: a Cognitive Agent
  Architecture for Virtual and Smart Environments
                           Youngho Lee, Hedda R. Schmidtke, Youngjung Suh, and Woontack Woo
                                                    GIST, U-VR Lab.
                                      Email: {ylee,schmidtk,ysuh,wwoo}@gist.ac.kr


   Abstract— We propose a cognitively motivated vertically lay-                from a dialog partner within the time frame of a second, even
ered two-pass agent architecture for realizing responsiveness,                 if it is just a nod or ‘hmm’ for signalling demand for a larger
reactivity, and pro-activeness of smart objects, smart environ-                time frame. Pro-activity is needed when a computer system
ments, virtual characters, and virtual place controllers, that is
controllers for lighting and weather conditions. Being cognitively             schedules events, or when a human being constructs a piece
motivated, our approach aims to respect the cognitive demands of               of work. This may require any amout of time: five seconds, a
a human being, in order to deliver an adequate human-computer                  minute, or a day.
interface to ubiquitous and virtual environments. The vertically                  In ubiquitous computing two main types of tasks have to be
layered two-pass architecture allows to realize reaction within a              handled: Context Integration is the task of deriving aggregated
certain time frame, as necessary for generating responsiveness
and reactivity required for natural interaction, while at the same             and abstracted information about a situation from sets of
time providing a layer for further processing of information for               singular data, particularly, sensory data and facts; Context
creating pro-active, intelligent responses.                                    Management is the task of invoking services/actions appro-
                                                                               priate in a context with appropriate contextual information.
                                                                                  Architectures that fullfil the above time requirements have
                          I. I NTRODUCTION                                     been investigated in the domain of artificial intelligence and
   A key problem for realizing ubiquitous virtual reality is                   within the agent-oriented programming paradigm. Wooldridge
seamless interaction between objects in real and virtual space.                summarizes several advantages in comparison to object-
Seamless Interaction is an interation when a user accesses                     oriented programming, which are crucial for realizing ubiq-
and customizes objects seamlessly, and experiences responses                   uitous computing applications [1, p.10f]. Objects have only
of them in real and virtual environments. To achieve this                      control over their state, but not over their behaviour, as meth-
goal, we need to consider physical synchronization, contextual                 ods are called from the outside, i.e. by other objects. Agents,
synchronization, and user-awareness. If the real and virtual                   in contrast, act according to some request, and can thus, for
objects are not registered, users in a real environment have                   instance, include refined strategies for avoiding security issues.
difficulties to manipulate them. Contextual information such                   The second point Wooldridge discusses is the higher flexibility
as time, location, lighting condition should be matched in                     and autonomy of agents. Agents can be reactive, pro-active,
both real and virtual environments. User’s information affects                 and even social for use in multi-agent systems. Only the first
responses of objects in real and virtual environments.                         two points are currently covered in our architecture. Social
   Agent architectures have been used successfully for mod-                    knowledge and strategies can be encoded in our framework
elling virtual characters [3]. Our motivation for constructing                 as special types of knowledge and strategies. The third point
four types of applications – virtual characters, smart objects,                compared in [1] is that agents in contrast to object are always
virtual place control1 , and smart spaces – in the same frame-                 executed concurrently, i.e. that each agent in a multi-agent
work, was that in each case the user’s cognitive demands                       system has its own threads.
determine three crucial time frames. Direct responsiveness is
                                                                                        II. A C OGNITIVE AGENT A RCHITECTURE
bound to the time frame of visual continuity. A mouse cursor,
for instance, should move without any perceivable delay.                          We propose a vertically layered two-pass architecture with
Likewise the movement of virtual characters or visualization                   three layers, as shown in Fig. 1. The main benefit of using a
of gestures has to stay within this time frame. The threshold                  two-pass architecture is that we can get immediate reactions
for imperceivable delay we chose, was the threshold of 40ms                    to sensory input while still performing further processing and
used in animation movies, which corresponds to generating 25                   accumulation of knowledge.
frames per second. Immediate reaction to a user’s commands,                       • A virtual character might run from point A to point B,

on the other hand, can take more than 40ms. However, in                             in order to retrieve a certain object at B. While running,
human-human interaction, we usually require some response                           it will need to do low-level actuator control, be prepared
                                                                                    for immediate reaction to unforseen situations, such as
  This research was supported by the CTI development project, MCT and               obstacles on its path. And at the same time, it might
KOCCA in Korea, and by the UCN Project, the MIC 21C Frontier R&D                    already try to make a plan how to obtain the object.
Program in Korea.
  1 Under this notion, we summarize all kinds of controllers for the virtual      • A smart calendar with a camera and a display might track
environment, such as control of lighting and weather conditions.                    the hand of a user for gesture recognition and directly
                                                                                                                                                                     2


                ≤40ms                      ≤1sec                         >1sec
              responsive                  reactive                     pro-active             refinement and realization of plans (shown in the lower row
  percepts    aggregation
                             prelimin.
                             context
                                         identification
                                                          integrated
                                                            context
                                                                        reasoning             of Fig. 1): from generating local action plans in response to
                                                                                              a user’s overall goals, to generating concrete instantiations of
                                                                                              those actions within a local context, to translation of these
               numerical                  identified       situation     currently    world
                 data                      situation         CKB        valid facts   CKB     actions to actuator responses appropriate to a dynamically
                                                                                              changing world.
   actuator      reaction
                              context
                             condition
                                         local action
                                                           context
                                                          condition     planning      goals
                                                                                                 1) On the pro-active level, goals of an agent, facts about the
    action     interpreter                 compiler
                               script                       script
                                                                                                     world, and rules of causality are employed to generate
                reaction                  behaviour                     causality
                                                                                                     plans, i.e. short parameterized sequences of commands:
                   lib                       lib                          CKB
                                                                                                     planning is required, so that an agent can solve complex
                                                                                                     problems, and achieve goals involving currently not
Fig. 1. A vertically layered two-pass agent architecture for virtual characters,                     existing objects, and situations.
virtual place control, smart objects, and smart spaces. Although by definition                   2) On the reactive level, condition-based action triggering
according to [2] the architecture is a vertically layered architecture, the layers
are here drawn horizontally, with the bottom layer (input and output via                             can be implemented: local action planning means link-
sensors and actuators, respectively) on the left.                                                    ing parameters in an action plan to objects and situations
                                                                                                     of a context and triggering the next action in a plan based
                                                                                                     on identified parameters and preferred behaviours.
        visualize each movement, while still accumulating the                                    3) On the level of responsiveness, the intelligence of smart
        gesture. In the background, it might be executing complex                                    actuators can be implemented: a reaction interpreter
        functions, such as scheduling.                                                               would be able to generate involuntary actuator response
The agent architecture thus helps developers of virtual charac-                                      directly from perceptual input, to generate voluntary
ters, virtual place controllers, smart objects, and smart spaces                                     actuator response by interpreting a local action, and to
to provide natural responsive interaction, reactive beaviour,                                        interpret the local action with respect to continuously
and pro-active capabilites.                                                                          updated coordinates retrieved from perception, such as
   In our layered architecture, context integration is a step-                                       tracking data.
wise process of abstraction (shown in the upper row of                                           Three types of representations are transmitted in the system.
Fig. 1). Contextual information can be integrated from low-                                   Numerical data are produced by sensors and consumed by
level sensory information, to object-based and situation-based                                actuators. Context Representations reflect knowledge about the
knowledge, and finally to high-level, situation-independent                                   current state of a local environment: what is happening here
logic-based knowledge:                                                                        and now? Context Conditions are rules encoded as condition-
   1) On the level of responsiveness, the intelligence of smart                               action pairs. The granularity of context parameters is not fixed,
      sensors can be implemented: an aggregation component                                    that is now (the current state) can have the duration of a
      would be able, for instance, to identify positions in                                   fraction of a second, it can mean today, this weak, or this year.
      successive frames for object tracking, or to aggregate                                  Likewise here can mean this room, this city, this country.
      basic coordinate data and sensor values with clustering
      techniques into object regions.                                                                                 III. C ONCLUSIONS
   2) On the reactive level, basic situation and object recog-                                   We proposed a cognitively motivated vertically layered two-
      nition can be implemented: identification includes rec-                                 pass agent architecture for realizing responsiveness, reactivity,
      ognizing gestures from hand movements, i.e. matching                                    and pro-activeness of smart objects, smart environments, vir-
      shapes and other sensory input to objects and states of                                 tual characters, and virtual place controllers. Being cognitively
      objects stored in the object-oriented knowledge-base or                                 motivated our approach aims to respect the cognitive demands
      recognizing situations from a particular layout of objects                              of a human being, in order to deliver an adequate human-
      or a particular succession of events                                                    computer interface to ubiquitous and virtual environments. The
   3) On the pro-active level, facts about the world are ex-                                  vertically layered two-pass architecture allows to realize reac-
      tracted from the constant flow of situations and objects                                tion within a certain time frame, as necessary for generating
      detected at the lower level: reasoning is required, so                                  responsiveness and reactivity required for natural interaction,
      that an agent can keep track of changes occuring in the                                 while at the same time providing a layer for further processing
      world, generate representations of the current state of the                             of information for creating pro-active, intelligent responses.
      world at varying levels of detail, and identify courses of
      events and their possible outcomes to avert disasters and                                                           R EFERENCES
      reach goals.                                                                            [1] M. Wooldridge. Intelligent agents. In G. Weiss, editor, Multiagent
   In order to invoke services and actions appropriate in                                         Systems. MIT Press, 1999.
                                                                                              [2] M. Wooldridge. Intelligent agents: The key concepts. In V. Marı́k,
a context with appropriate contextual information, not only                                       O. Stepánková, H. Krautwurmova, and M. Luck, editors, Multi-Agent-
the external, sensed context has to be handled but also                                           Systems and Applications, volume 2322 of LNCS, pages 3–43. Springer,
internal, contextual information, such as the status of the                                       2001.
                                                                                              [3] S.-Y. Yoon. Affective Synthetic Characters. PhD thesis, Massachusetts
system with respect to fulfilling a user’s more long-term goals.                                  Institute of Technology, 2000.
Accordingly, context management is performed as step-wise