1 Realizing Seamless Interaction: a Cognitive Agent Architecture for Virtual and Smart Environments Youngho Lee, Hedda R. Schmidtke, Youngjung Suh, and Woontack Woo GIST, U-VR Lab. Email: {ylee,schmidtk,ysuh,wwoo}@gist.ac.kr Abstract— We propose a cognitively motivated vertically lay- from a dialog partner within the time frame of a second, even ered two-pass agent architecture for realizing responsiveness, if it is just a nod or ‘hmm’ for signalling demand for a larger reactivity, and pro-activeness of smart objects, smart environ- time frame. Pro-activity is needed when a computer system ments, virtual characters, and virtual place controllers, that is controllers for lighting and weather conditions. Being cognitively schedules events, or when a human being constructs a piece motivated, our approach aims to respect the cognitive demands of of work. This may require any amout of time: five seconds, a a human being, in order to deliver an adequate human-computer minute, or a day. interface to ubiquitous and virtual environments. The vertically In ubiquitous computing two main types of tasks have to be layered two-pass architecture allows to realize reaction within a handled: Context Integration is the task of deriving aggregated certain time frame, as necessary for generating responsiveness and reactivity required for natural interaction, while at the same and abstracted information about a situation from sets of time providing a layer for further processing of information for singular data, particularly, sensory data and facts; Context creating pro-active, intelligent responses. Management is the task of invoking services/actions appro- priate in a context with appropriate contextual information. Architectures that fullfil the above time requirements have I. I NTRODUCTION been investigated in the domain of artificial intelligence and A key problem for realizing ubiquitous virtual reality is within the agent-oriented programming paradigm. Wooldridge seamless interaction between objects in real and virtual space. summarizes several advantages in comparison to object- Seamless Interaction is an interation when a user accesses oriented programming, which are crucial for realizing ubiq- and customizes objects seamlessly, and experiences responses uitous computing applications [1, p.10f]. Objects have only of them in real and virtual environments. To achieve this control over their state, but not over their behaviour, as meth- goal, we need to consider physical synchronization, contextual ods are called from the outside, i.e. by other objects. Agents, synchronization, and user-awareness. If the real and virtual in contrast, act according to some request, and can thus, for objects are not registered, users in a real environment have instance, include refined strategies for avoiding security issues. difficulties to manipulate them. Contextual information such The second point Wooldridge discusses is the higher flexibility as time, location, lighting condition should be matched in and autonomy of agents. Agents can be reactive, pro-active, both real and virtual environments. User’s information affects and even social for use in multi-agent systems. Only the first responses of objects in real and virtual environments. two points are currently covered in our architecture. Social Agent architectures have been used successfully for mod- knowledge and strategies can be encoded in our framework elling virtual characters [3]. Our motivation for constructing as special types of knowledge and strategies. The third point four types of applications – virtual characters, smart objects, compared in [1] is that agents in contrast to object are always virtual place control1 , and smart spaces – in the same frame- executed concurrently, i.e. that each agent in a multi-agent work, was that in each case the user’s cognitive demands system has its own threads. determine three crucial time frames. Direct responsiveness is II. A C OGNITIVE AGENT A RCHITECTURE bound to the time frame of visual continuity. A mouse cursor, for instance, should move without any perceivable delay. We propose a vertically layered two-pass architecture with Likewise the movement of virtual characters or visualization three layers, as shown in Fig. 1. The main benefit of using a of gestures has to stay within this time frame. The threshold two-pass architecture is that we can get immediate reactions for imperceivable delay we chose, was the threshold of 40ms to sensory input while still performing further processing and used in animation movies, which corresponds to generating 25 accumulation of knowledge. frames per second. Immediate reaction to a user’s commands, • A virtual character might run from point A to point B, on the other hand, can take more than 40ms. However, in in order to retrieve a certain object at B. While running, human-human interaction, we usually require some response it will need to do low-level actuator control, be prepared for immediate reaction to unforseen situations, such as This research was supported by the CTI development project, MCT and obstacles on its path. And at the same time, it might KOCCA in Korea, and by the UCN Project, the MIC 21C Frontier R&D already try to make a plan how to obtain the object. Program in Korea. 1 Under this notion, we summarize all kinds of controllers for the virtual • A smart calendar with a camera and a display might track environment, such as control of lighting and weather conditions. the hand of a user for gesture recognition and directly 2 ≤40ms ≤1sec >1sec responsive reactive pro-active refinement and realization of plans (shown in the lower row percepts aggregation prelimin. context identification integrated context reasoning of Fig. 1): from generating local action plans in response to a user’s overall goals, to generating concrete instantiations of those actions within a local context, to translation of these numerical identified situation currently world data situation CKB valid facts CKB actions to actuator responses appropriate to a dynamically changing world. actuator reaction context condition local action context condition planning goals 1) On the pro-active level, goals of an agent, facts about the action interpreter compiler script script world, and rules of causality are employed to generate reaction behaviour causality plans, i.e. short parameterized sequences of commands: lib lib CKB planning is required, so that an agent can solve complex problems, and achieve goals involving currently not Fig. 1. A vertically layered two-pass agent architecture for virtual characters, existing objects, and situations. virtual place control, smart objects, and smart spaces. Although by definition 2) On the reactive level, condition-based action triggering according to [2] the architecture is a vertically layered architecture, the layers are here drawn horizontally, with the bottom layer (input and output via can be implemented: local action planning means link- sensors and actuators, respectively) on the left. ing parameters in an action plan to objects and situations of a context and triggering the next action in a plan based on identified parameters and preferred behaviours. visualize each movement, while still accumulating the 3) On the level of responsiveness, the intelligence of smart gesture. In the background, it might be executing complex actuators can be implemented: a reaction interpreter functions, such as scheduling. would be able to generate involuntary actuator response The agent architecture thus helps developers of virtual charac- directly from perceptual input, to generate voluntary ters, virtual place controllers, smart objects, and smart spaces actuator response by interpreting a local action, and to to provide natural responsive interaction, reactive beaviour, interpret the local action with respect to continuously and pro-active capabilites. updated coordinates retrieved from perception, such as In our layered architecture, context integration is a step- tracking data. wise process of abstraction (shown in the upper row of Three types of representations are transmitted in the system. Fig. 1). Contextual information can be integrated from low- Numerical data are produced by sensors and consumed by level sensory information, to object-based and situation-based actuators. Context Representations reflect knowledge about the knowledge, and finally to high-level, situation-independent current state of a local environment: what is happening here logic-based knowledge: and now? Context Conditions are rules encoded as condition- 1) On the level of responsiveness, the intelligence of smart action pairs. The granularity of context parameters is not fixed, sensors can be implemented: an aggregation component that is now (the current state) can have the duration of a would be able, for instance, to identify positions in fraction of a second, it can mean today, this weak, or this year. successive frames for object tracking, or to aggregate Likewise here can mean this room, this city, this country. basic coordinate data and sensor values with clustering techniques into object regions. III. C ONCLUSIONS 2) On the reactive level, basic situation and object recog- We proposed a cognitively motivated vertically layered two- nition can be implemented: identification includes rec- pass agent architecture for realizing responsiveness, reactivity, ognizing gestures from hand movements, i.e. matching and pro-activeness of smart objects, smart environments, vir- shapes and other sensory input to objects and states of tual characters, and virtual place controllers. Being cognitively objects stored in the object-oriented knowledge-base or motivated our approach aims to respect the cognitive demands recognizing situations from a particular layout of objects of a human being, in order to deliver an adequate human- or a particular succession of events computer interface to ubiquitous and virtual environments. The 3) On the pro-active level, facts about the world are ex- vertically layered two-pass architecture allows to realize reac- tracted from the constant flow of situations and objects tion within a certain time frame, as necessary for generating detected at the lower level: reasoning is required, so responsiveness and reactivity required for natural interaction, that an agent can keep track of changes occuring in the while at the same time providing a layer for further processing world, generate representations of the current state of the of information for creating pro-active, intelligent responses. world at varying levels of detail, and identify courses of events and their possible outcomes to avert disasters and R EFERENCES reach goals. [1] M. Wooldridge. Intelligent agents. In G. Weiss, editor, Multiagent In order to invoke services and actions appropriate in Systems. MIT Press, 1999. [2] M. Wooldridge. Intelligent agents: The key concepts. In V. Marı́k, a context with appropriate contextual information, not only O. Stepánková, H. Krautwurmova, and M. Luck, editors, Multi-Agent- the external, sensed context has to be handled but also Systems and Applications, volume 2322 of LNCS, pages 3–43. Springer, internal, contextual information, such as the status of the 2001. [3] S.-Y. Yoon. Affective Synthetic Characters. PhD thesis, Massachusetts system with respect to fulfilling a user’s more long-term goals. Institute of Technology, 2000. Accordingly, context management is performed as step-wise