=Paper=
{{Paper
|id=Vol-2584/RE4AI-paper3
|storemode=property
|title=Artificial Actor Behaviours for Interactive Videos: Handling AI Requirements in ELEVATE
|pdfUrl=https://ceur-ws.org/Vol-2584/RE4AI-paper3.pdf
|volume=Vol-2584
|authors=Paolo Busetta,Maurizio Astegher,Daniele Dellagiacoma,Stefano Gabrielli,Matteo Longato,Matteo Pedrotti
|dblpUrl=https://dblp.org/rec/conf/refsq/BusettaADGLP20
}}
==Artificial Actor Behaviours for Interactive Videos: Handling AI Requirements in ELEVATE==
<pdf width="1500px">https://ceur-ws.org/Vol-2584/RE4AI-paper3.pdf</pdf>
<pre>
       Artificial Actor Behaviours for Interactive Videos:
            Handling AI Requirements in ELEVATE

                  Paolo Busetta, Maurizio Astegher, Daniele Dellagiacoma,
                   Stefano Gabrielli, Matteo Longato and Matteo Pedrotti
                            DeltaLab Research and Development Division
                                  Delta Informatica SpA, Trento, Italy
                       {paolo.busetta, maurizio.astegher, daniele.dellagiacoma,
               stefano.gabrielli, matteo.longato, matteo.pedrotti}@deltainformatica.eu


                                                       Abstract
                       Animating artificial character on a virtual stage requires an exacting
                       description of all details of the movements of the body and in space
                       and of the actions to be performed, in particular when plausible be-
                       haviours are requested. Behavioural AI, i.e. a character animation
                       machinery that automatically adapts to context, helps in reducing ef-
                       forts but is difficult to develop. This paper discusses a production
                       process for interactive training videos that allows three stakeholders
                       – an instructional designer defining an exercise; a director shooting
                       video clips from virtual reality with artificial characters; and, a devel-
                       opment team creating models of behaviour – to efficiently cooperate by
                       means of nested requirement / implementation / testing cycles. Good
                       requirements and appropriate AI technologies (typically a mixture of
                       goal-oriented programming and statistical models) allow the creation
                       of behavioural models that can be used for many exercises, thus greatly
                       reducing costs compared to ad hoc scripting.


1    Introduction
A brief description of the desired behaviour is usually more than enough for actors on a stage; human intelligence,
with its understanding of the context and of the impersonated characters, fills the gaps. By contrast, directing an
artificial character on a virtual stage requires an exacting description of all details, in particular when plausible
behaviours are requested according to a very precise script that cannot simply be forced onto a statistical model
or learned somehow from existing data collections. The resulting implementation cost is one of the reasons that
limit the application of virtual reality (VR) in training, even if VR could allow human participants to improve
their skills in dangerous and rare situations, e.g. emergency management with victims and crowd involved.
   Models of procedural behaviour such as behaviour trees [CÖ17] and goal-oriented programming [Ork03] that
adapt to the context in which artificial characters act (called behavioural AI from now on for brevity) could

   Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International
(CC BY 4.0).
   In: M. Sabetzadeh, A. Vogelsang, S. Abualhaija, M. Borg, F. Dalpiaz, M. Daneva, N. Fernández, X. Franch, D. Fucci, V.
Gervasi, E. Groen, R. Guizzardi, A. Herrmann, J. Horkoff, L. Mich, A. Perini, A. Susi (eds.): Joint Proceedings of REFSQ-2020
Workshops, Doctoral Symposium, Live Studies Track, and Poster Track, Pisa, Italy, 24-03-2020, published at http://ceur-ws.org
potentially alleviate this situation if reused many times. To this end, good requirements, capturing the right
abstractions that allow reuse in multiple situations and domains, are essential.
   This paper introduces ELEVATE, a tool suite for the production of interactive videos for online training, and
its virtual stage for video shooting, called EVE. We discuss how behavioural AI models are incrementally added
to EVE, thus progressively enriching the tool suite itself, by means of a process that facilitates the dialogue
among domain experts, content creators and platform developers.
   Next section presents the ELEVATE application domain and its main components. Section 3 briefly reviews
how AI is currently applied to the production of videos and in video games. Section 4 discusses the process
supported by ELEVATE to allow domain expert to describe their requirements and drive the development team
in building behavioural models. A few final remarks conclude this paper.

2     Context: Interactive Videos as Exercises for Learning Management Systems
With the availability of high-speed Internet and the pressure for continuous training, online e-learning has become
very popular, especially in professional fields requiring mandatory skill certifications and facing continuous flows
of business, legal and technical novelties. A number of so-called Learning Management Systems (LMS), both
proprietary and open-source (e.g. Moodle1 ), are available to deliver courses created with content-generation
tools, whose current market leader is Articulate2 .
   In self-paced courses, with no interaction (not even indirect) with human trainers, videos are commonly
used to deliver theory and examples, complementing textual contents. A variety of multimedia interactive
quizzes and tests allow students to check their understanding and possibly undergo an examination; they range
from multiple-choice questionnaires to game-like interactions and simple simulations. Virtual reality (VR) is a
promising technology for these purposes, with supporting devices readily available thanks to their popularity in
video-gaming. However, even today the cost of development and limited user acceptance outside of the youngest
audience makes VR applied almost exclusively to the most complex training and with human supervision.
   A cheaper alternative to full VR, but typically leading to higher levels of engagement than traditional linear
videos and questionnaires, is the use of interactive videos in which the trainees have to take decisions in order
to progress through different sections of a filmed story. This may even have different endings, some positive
and some negative according to the choices taken, so it may be used to test competences. The scripting of an
interactive story for training purposes, the shooting and editing of videos or, equivalently, the preparation of
animated multimedia material are non-trivial tasks that require specialised tools and an instructional designer
with both technical and educational skills.

2.1     ELEVATE: Script Editor and Virtual Stage
The ELEVATE research project has produced a tool suite for the creation of interactive videos to be used as
exercises in online courses. Differently from other tools, where an instructional designer acts also as multimedia
creator, ELEVATE clearly distinguishes two different user roles: the exercise designer, which typically is a
domain expert or an instructional designer that defines training objectives and how to verify them, and the
director, who produces content according to the requests by the exercise designer.
   The first ELEVATE tool of interest for this paper is a graphical story editor for the exercise designer, used
to define the structure of an exercise. The latter is a directed, cyclic graph whose nodes (“fragments”) are
placeholders for video clips; the edges outgoing from a node (“choices”) represent options that are offered to a
trainee at that point in the story, possibly according to contextual parameters. The destination of an edge is
the point from which the story will resume if that edge is followed, i.e. if the trainee takes that choice. Thus, a
path through a story graph represents a view of the video or, equivalently, a sequence of choices.
   Before designing a story structure, the exercise designer needs to pinpoint the goals of an exercise and the
criteria used to judge achievement or failure. Goals, input parameters, trainee decisions and other run-time
context elements are represented with a simplified logical language (“tags”).
   Once goals have been defined and the exercise structure designed, the exercise designer must link fragments
to video clips so that the complete interactive video can be generated. The clip for a fragment may be loaded
from an existing media library or recorded ad hoc. In this case, the exercise designer notifies the director through
a so-called video recording request, which links back to the exercise and to the specific fragment to be satisfied.
    1 https://moodle.org/
    2 https://articulate.com/
The director is in charge of deciding on secondary or under-specified aspects (e.g. camera perspective, objects
on the scene) and for recording.
    Videos can be recorded either externally, e.g. by shooting in real-life settings or exported from a multimedia
authoring system, or by means of another tool of the suite called EVE (ELEVATE Video Editor). This is a VR
system developed with the popular Unity game engine3 . EVE offers the director a virtual environment where
to create situations that could be difficult to be replicated in the real world for many reasons, e.g., a dangerous
fire emergency involving a crowd and many objects, or privacy restrictions on real-life shooting. EVE allows the
director to load a 3D scenario and populate it with a variety of artificial actors (so-called NPCs, Non-Player
Characters) and graphical assets. The director specifies a simple script for the NPCs to follow in order to satisfy
a video recording request. Typically, a script is a sequence of goals to be achieved by the underlying goal-oriented
multi-agent framework that animates the NPCs. Once the script is ready, the director records the scene, letting
the NPCs to behave autonomously in the virtual environment as if they were actors on a real scene. EVE allows
the director to save the state of the virtual stage at any time (“checkpointing”), from which to restart to satisfy
the video recording request of the following fragment.
    Once ready, the recorded video is uploaded to the media library by the director and the video recording
request closed. The exercise designer either approves the video or rejects it, reopening the recording request with
comments explaining the reason for the refusal. Conversely, the director can provide feedback to the exercise
designer before recording; furthermore, if a request cannot be satisfied for any reason related to the exercise, e.g.
the instructions on the fragment are not clear, the request is put into an “error” state. It is left to the exercise
designer to change the request as appropriate and resubmit it to the director.
    The video recording request cycle just described is similar to those of tickets in popular issue trackers, e.g.
Bugzilla. This cycle may be extended in various directions. For instance, feedback from the director may lead the
exercise designer to rethink the exercise and readjust the story graph. Conversely, the director may discover that
the available capabilities are inadequate; most importantly for the following discussion, behaviours of NPCs may
be missing or inappropriate and consequently trigger an inner cycle of development involving an AI developer.
    When all video clips for an exercise have been loaded and approved, the exercise designer generates the
complete interactive video, which is composed of a standard multimedia file and of ELEVATE-specific metadata
containing structure, goals and other parameters as mentioned above. An ELEVATE player, compatible with
any Web-based LMS supporting the SCORM standard (part of the IEEE 1484 series for eLearning Technologies),
allows a trainee to do the exercise and keeps track of its progress.

3     AI Applied to Video Production and Video Gaming
AI has only recently seen an actual improvement and utilisation in film making. As the technology progresses
and the use of machine learning-based approaches such as neural networks is growing in industry, many roles
have become (or are becoming) ready for automation. One example is script-writing AI [Dal18], which gives
a glimpse of what, in the future, AI could do (even if today the results are still pretty awkward and limited).
Another example is the camera-manning role [Dal18] which aids the director in shooting complex sequences that
may involve camera-carrying drones and may simplify the post-production work of mixing and editing. However,
so far the complexity of the visual storytelling (including the ability to convey emotions and narrate a story
relying on images rather than precise text describing every detail) has prevented the full automation of creation
and understanding of complex editing or compositions. In recent times, AI is being trained and exploited to
further assist directors, with research topics such as shot labelling and camera movement recognition [Som19].
Furthermore, AI is being trained to assist 3D artists, learning how 3D modelling and 3D animations are made
[Fou19, Smi16].
   Of particular relevance for ELEVATE, AI is commonly used in video games and, by extension, in serious
games and VR environments based on game engine technologies. It is necessary to clarify that, in video gaming,
traditionally the term AI does not refer to machine learning or symbolic reasoning but to anything, including
well-known algorithms and design patterns, that allows game developers to simulate real-life behaviour; the final
objective, of course, is to give the player the impression of dealing with intelligent entities. Until recently, game
AI would not rely on any complex machinery because they require more resources than those available on a
player’s computer. Because of these limitations, often the AI used in video games is nothing more than a set
of (sometimes complex) behaviour trees [CÖ17] driving the actions of NPCs, meaning that no real reasoning is
involved and NPCs show little or no sensitivity to the evolution of the situation [SML19]. As a consequence,
    3 https://unity.com/
creating a plausible, complex behaviour becomes challenging, with the lurking risk of sacrificing the player
experience because of the poor AI [SML19]. To overcome this problem, many developers have chosen to give
the AI some kind of advantage with respect to the human player: for instance, to simplify its behaviour tree,
an AI agent can directly access some non-observable information (such as the player position in the game at
every instant, even when in hiding) rather than purely relying on a realistic perception of the situation from the
perspective of the NPC. This approach, commonly called “cheating AI”, can be perceived as an unfair advantage
when overdone, e.g. by making it impossible to apply any effective counter-tactics when the user is competing
against NPCs, eventually leading to user frustration [Sco02].
   Some more modern AI patterns and approaches are being adopted with the evolution of the consumer’s
hardware and of the industry itself, which is subject to strong competitive pressure and thus prone to sacrifice
realism to meet deadlines. A well-studied example is the GOAP (Goal-Oriented Action Planning) system [Ork03]
that, even if it has been developed in the early 2000s, it is still not so widely used because of the higher resource
need, compared to standard behaviour trees [Ave18].
   We discussed in previous works [DGB+ 15] how to define semantics for a VR so that NPCs (or, more precisely,
the software agents that control the NPC behaviours; typically, one agent per NPC with a classical multi-agent
approach) can autonomously work out context-specific details. These include, for instance, the classification of
locations and objects. In [BD15] we presented a technique for goal-oriented model development based on the
BDI model of agency, in principle similar to the GOAP system.
   Some types of game AI deal with aspects other than NPC behaviours, for instance, story generation, procedural
environment generation and so on, but their discussion is outside the scope of this paper.

4     Designing Controllable AI Models for Virtual Actors
This section focuses on the process supported by EVE for recording a video as a request by an instructional
designer and, in turn, how this may generate requirements for model development. We illustrate this process
with an example taken from emergency training, where firefighters are involved in rescuing the victim of a car
accident; the trainee has to choose which procedure has to be applied by the emergency team.

4.1   AI in EVE
The purpose of behavioural AI in EVE is to reduce the workload on the director by providing reusable “intel-
ligent” models that minimise the number of ad hoc animations and scripts that are normally required for the
development of a so-called “game level”, i.e. of a scene and its actors in a complex video game. The director
transforms high-level requirements by the exercise designer (typically expressed in natural language) into sce-
nario settings, directives for the existing behavioural AI and, when necessary, in additions or improvements of
the latter demanded to an AI development team.
   The fact that there is no game logic in EVE - there is no player during video recording, indeed - is practically
irrelevant in terms of the inherent complexity of the graphical and behavioural models of the NPCs / virtual
actors. This complexity is very variable, of course. On one extreme, shooting a static scene, e.g. an introduction
to a story that allows the trainee to familiarise with a certain location such as a street corner with sparse road
traffic, is a matter of choosing a 3D environment, populating it with objects that at most execute loops of
stereotyped animations, deciding on the camera position – that is, basic editing activities that require minimal
skills by the director and little “intelligence” in the game engine and its models. On the opposite extreme, a very
animated scene with characters performing complex and coordinated routines, e.g. two fire-fighters that extract
the screaming victim of an accident out of a car on fire, requires a substantial amount of work that, if done purely
by means of a 3D editor or a basic game development environment such as Unity, could be time-consuming if
not challenging even for an expert multimedia specialist.
   EVE is engineered in such a way that 3D scenarios, 3D object models with their basic procedural animations,
and behavioural models written in a goal-oriented fashion can be imported from libraries rather than being
built-in. The director chooses what fits her purposes and writes relatively simple scripts that submit goals to
NPCs in a sequence as well as some minimal event-based coordination (following a pattern such as “give goal G1
to character A1 when event E1 has happened” – e.g. “give goal VICTIM V1 EXTRACTED to FIREFIGHTER
F1 when DOOR D1 OPEN”).
   As mentioned, behavioural models are created by a specialised AI team according to the needs of the director
and exploit all available geometrical data and semantic annotations to “intelligently” adapt to different contexts.
Models for EVE can freely “cheat” without worrying about the user experience because agents are not competing
against a player. So, unlike the standard GOAP definition of world knowledge [Ork04], EVE agents can access
every piece of information they need to reach their goals. simplifying the models; further, the manipulation of
information shared by all agents and maintained by the game engine allows for complex coordinated behaviour.
   That notwithstanding, previous experience with model development has shown that it is critical to reduce
misunderstandings by the modeller / developer to prevent both frustration in VR designers and disappointment
of final users. The engineers developing models for a new domain of application of EVE (e.g. fire-fighting
in buildings is different from open street accidents) cannot know all models required to satisfy all exercises in
advance, so they need to work closely with the director until a sufficiently rich model library is available to let
her work without further support. To this end, a short development cycle with user involvement, such as the
video recording request management in ELEVATE described below, is an important enabling precondition.

4.2   The EVE Requirement-to-Model Cycle for AI
The interaction between the director and the EVE model developers follows a classic requirement specification
/ development / testing / user acceptance cycle, currently supported by a standard issue tracker. Internally, the
EVE model development team may adopt any software engineering methodology they deem appropriate. We
illustrate this process with an example of video refinement causing the creation of new models:

 1. The exercise designer submits the video recording request for a fragment described as “the firefighters extract
    the victim from the car then one checks her conditions while the other puts out the fire”.

 2. Since this is not the first shot of the exercise, the director starts from the final checkpoint of the fragment
    that leads to this one, otherwise she would have to set the scene up from scratch. As mentioned, it includes
    two firefighters (F1 and F2) and a victim in a burning car. The director writes the script, whose pseudo-code
    is: F1 is given goal “driver door opened”; when opened, F2 is given “victim moved to the ground”; when
    the victim is on the ground, F1 is given “extinguisher opened towards fire” and in parallel F2 is given “three
    agitated arm movements directed to victim performed”. The video is recorded, stopping when the director
    considers that it is long enough, then uploaded to the media library and the video recording request closed.

 3. The exercise director rejects the video, reopening the request, because the animation corresponding to
    checking the health condition is not plausible: the firefighter must recline above the victim and check
    breathing and heartbeat in this order, finding no problem.

 4. The director analyses the designer’s comment and, in turn, requires the model development team to create
    three different models for the firefighter satisfying these goals: reclined to a free side of character C; breathing
    of character C checked; hearth beat of C checked. This is done by creating three entries in the issue tracking
    system of the AI development team; each issue includes, as accompanying documents, a read-only link to the
    video recording requests, thus all exchanges between exercise designer and director as well as the rejected
    video are available to the developers. Conversely, the director adds links to the created issues to the video
    recording request itself, as comments for the exercise designer; while this step is not strictly necessary, it
    gives an opportunity for the designer to be involved in addition to justifying a delay in production.

 5. Developers produce the required models and upload them in the model library. User acceptance is performed
    by the director that invokes the new models from its script and evaluates the results. The model development
    cycle is repeated until the director is satisfied.

 6. The video production cycle restarts from step 2 above and is repeated until the exercise designer approves
    the result.

   A detailed discussion of model creation by the development team would be outside of the scope of this paper.
In short, with the goal-oriented agent framework currently adopted for EVE (derived from [BRCG17, BCR16]),
it consists of the definition of a set of accepted goals, with their parameters, and the development of context-
sensitive plans to achieve them, plus the semantic annotations required on the environment and its objects
to understand the context [DGB+ 15, BD15]. The process described above would equally apply to other AI
technologies or even simple ad hoc scripting, with unavoidable limitations on the reusability of models.
4.3   Discussion
In the EVE video production process, the behavioural model development team receives clear inputs concerning
execution context and desired behaviours as well as a specification of the acceptance tests (step 5): they are
nothing else than the fragment to be recorded and the overall exercise structure, plus the breakdown in goals
defined by the director in step 4. They reduce the need to create other typical RE artefacts, such as functional
specifications and test cases.
   The nested cycle presented above differs from typical serious game, simulation and multimedia development
processes because exercise designer, director, and developers work separately and at different times rather than
as a single team focusing on the next product release. Developers aim at creating a framework that enables
the first two to generate an arbitrary number of videos by themselves. Further, they may be part of different
organisations. Indeed, in the current ELEVATE use cases, designer and director are part of the training office of
a large public department, while the development team is a private company. The business objective of the latter
is to make its products as reusable as possible, thus targeting a community of users creating training material
for different audiences and possibly a marketplace of models.
   ELEVATE’s feedback-oriented iterative cycle presents some analogies with those supported by tools for
requirements collection directly from usage context, for instance the ConTexter proposed by Wehrmaker et
al. [WGS12] and the Mobile Scenario Presenter (MSP) of Seyff et al. [SGGM07]. These systems focus on
software-intensive systems in real-life settings, and capture feedback from the user while using a system. In
the case of EVE, which specifically focuses on virtual reality applications, context is established by the users
themselves (exercise designer and director) and captured by the video recording request.

5     Conclusion
We have introduced the ELEVATE tool suite for interactive video production. One of its components, EVE (EL-
EVATE Video Editor) uses AI to animate virtual actors on a virtual stage. In particular, EVE uses behavioural
AI, i.e. symbolic reasoning or some form of machine-learned logic implementing context-sensitive procedures,
as a better alternative to hard-wiring simple algorithms or case-by-case scripting as done by most commercial
entertainment and serious games. The objective is to reduce the costs of production of videos by supporting a
high level of adaptability of its virtual actors to different situations.
   While the idea of reusable AI models, shared by a community, is sound in principle, our own experience is that
good models are very difficult to develop because it is hard to capture truly context-independent requirements in a
systematic way, over time and with the continuous involvement of domain experts. The presented approach aims
at alleviating issues by streamlining the process of requirement capture and implementation and by separating
content conception and production from platform development, thus enabling incremental improvements over
time that can be shared by a community of users.
   All base components of EVE have been developed and used by previous projects or are taken off-the-shelf.
However, the complete interactive environment is still under development at the time of writing. Experimentation
on the full EVE is expected to happen by Spring 2020.

Acknowledgements
This work is part of the ELEVATE research project, which is funded by Provincia Autonoma di Trento, L.P.
6/1999.

References
[Ave18]     D. Aversa. Choosing between behavior tree and goap (planning). https://www.davideaversa.it/
            2018/02/choosing-behavior-tree-goap-planning/, 2018. [Online; accessed 19-December-2019].

[BCR16]     P. Busetta, P. Calanca, and M. Robol. Applying bdi to serious games: The presto ex-
            perience.    Technical Report DISI-16-011, Version 2.0, University of Trento, Italy, 2016.
            http://hdl.handle.net/11572/154697.

[BD15]      P. Busetta and M. Dragoni. Composing cognitive agents from behavioural models in presto. In 16th
            Workshop “From Objects to Agents”, Naples, Italy, June 2015.
[BRCG17] P. Busetta, M. Robol, P. Calanca, and P. Giorgini. Presto script: scripting for serious games. In AI
         & Games Symposium at AISB, University of Bath (UK), 2017.

[CÖ17]    M. Colledanchise and P. Ögren.     Behavior trees in robotics and ai: An introduction.      CoRR,
           abs/1709.00084, 2017.
[Dal18]    D. Dallas. Filmmaking, artificial intelligence & machine learning. https://creativefuture.co/
           artificial-intelligence-automation-film-video-machine-learning-editing, 2018. [Online;
           accessed 19-December-2019].

[DGB+ 15] M. Dragoni, C. Ghidini, P. Busetta, M. Fruet, and M. Pedrotti. Using ontologies for modeling
          virtual reality scenarios. In 2th European Semantic Web Conference, ESWC 2015, LNCS, volume
          9088, pages 575–590. Springer, 2015.
[Fou19]    Myelin Foundry. Deep learning can democratize animation and vfx. https://becominghuman.ai/
           deep-learning-can-democratize-animation-and-vfx-43bf51f06090, 2019. [Online; accessed
           19-December-2019].
[Ork03]    J. Orkin. Applying goal-oriented action planning to games. AI game programming wisdom, 2:217–228,
           2003.
[Ork04]    J. Orkin. Symbolic representation of game world state: Toward real-time planning in games. In
           Proceedings of the AAAI Workshop on Challenges in Game Artificial Intelligence, volume 5, pages
           26–30, 2004.
[Sco02]    B. Scott. The illusion of intelligence. In Rabin, Steve (ed.). AI Game Programming Wisdom. Charles
           River Media., pages 16––20, 2002.
[SGGM07] N. Seyff, F. Graf, P. Grünbacher, and N. Maiden. The mobile scenario presenter: A tool for in
         situ requirements discovery with scenarios. In 15th IEEE International Requirements Engineering
         Conference (RE), pages 365–366. IEEE, 2007.
[Smi16]    J. R. Smith. Ibm research takes watson to hollywood with the first “cognitive movie trailer”. https:
           //www.ibm.com/blogs/think/2016/08/cognitive-movie-trailer/, 2016. [Online; accessed 19-
           December-2019].

[SML19]    L. E. Shummon Maass and A. Luc.        Artificial intelligence in video games. https:
           //towardsdatascience.com/artificial-intelligence-in-video-games-3e2566d59c22, 2019.
           [Online; accessed 19-December-2019].
[Som19]    R. Somani. Ai for filmmaking. https://rsomani95.github.io/ai-film-1.html, 2019. [Online;
           accessed 19-December-2019].

[WGS12]    T. Wehrmaker, S. Gärtner, and K. Schneider. Contexter feedback system. In 34th International
           Conference on Software Engineering (ICSE), pages 1459–1460. IEEE, 2012.

</pre>