<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Event-Schematic, Cooperative, Cognitive Architecture Plays Super Mario</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fabian Schrodt Yves R o¨hm Martin V. Butz</string-name>
          <email>martin.butz@uni-tuebingen.de</email>
          <email>tobias-fabian.schrodt@uni-tuebingen.de</email>
          <email>yves.roehm@student.uni-tuebingen.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science Department of Computer Science Department of Computer Science Eberhard Karls University of Tu ̈bingen Eberhard Karls University of Tu ̈bingen Eberhard Karls University of Tu ̈bingen</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>10</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>-We apply the cognitive architecture SEMLINCS to model multi-agent cooperations in a Super Mario game environment. SEMLINCS is a predictive, self-motivated control architecture that learns conceptual, event-oriented schema rules. We show how the developing, general schema rules yield cooperative behavior, taking into account individual beliefs and environmental context. The implemented agents are able to recognize other agents as individual actors, learning about their respective abilities from observation, and considering them in their plans. As a consequence, they are able to simulate changes in their contextdependent scope of action with respect to their own interactions with the environment, interactions of other agents with the environment, as well as interactions between agents, yielding coordinated multi-agent plans. The plans are communicated between the agents and establish a common ground to initiate cooperation. In sum, our results show how cooperative behavior can be planned and coordinated, developing from sensorimotor experience and predictive, event-based structures.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        Most of the approaches on intelligent, autonomous game
agents are robust, but behavior is typically scripted,
predictable, and hardly flexible. Current game agents are still
rather limited in their speech and learning capabilities as well
as in the way they act believably in a self-motivated manner.
While novel artificial intelligent agents have been developed
over the past decades, the level of intelligence, the interaction
capabilities, and the behavioral versatility of these agents are
still far from optimal [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Besides the lack of truly intelligent game agents, however,
the main motivation for this work comes from cognitive
science and artificial intelligence. Over the past two decades, two
major trends have established themselves in cognitive science.
First, cognition is embodied, or grounded, in the
sensory, motor-, and body-mediated experiences that humans and
other adaptive animals gather in their environment [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Second,
brains are predictive encoding systems, which have evolved
to be able to anticipate incoming sensory information, thus
learning predominantly from the differences between predicted
and actual sensory information [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]–[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Combined with the
principle of free-energy-based inference, neural learning, as
well as active epistemic and motivation-driven inference, a
unified brain principle has been proposed [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Concurrently,
it has been emphasized that event signals may be processed
in a unique manner by our brains. The event segmentation
theory [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] suggests that humans learn to segment the
continuous sensorimotor stream into event codes, which are
also closely related to the common coding framework and
the theory of event coding [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Already in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] it was
proposed that such event codes are very well-suited to be
integrated into event schema-based rules, which are closely
related to production rules [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and rules generated by
anticipatory behavior control mechanisms [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. As acknowledged
from a cognitive robotics perspective, event-based knowledge
structures are as well eligible to be embedded into a linguistic,
grammatical system [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]–[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        We apply the principles of predictive coding and active
inference and integrate them into a highly modularized,
cognitive system architecture. We call the architecture SEMLINCS,
which is a loose acronym for SEMantic, SEnsory-Motor,
SElfMotivated, Learning, INtelligent Cognitive System [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The
architecture is motivated by a recent proposition towards a
unifed subsymbolic computational theory of cognition [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ],
which puts forward how production rule-like systems (such
as SOAR or ACT-R) may be grounded in sensorimotor
experiences by means of predictive encodings and free
energybased inference. The theory also emphasizes how
activeinference-based, goal-directed behavior may yield a fully
autonomous, self-motivated, goal-oriented behavioral system
and how conceptual predictive structures may be learned by
focusing generalization and segmentation mechanisms on the
detection of events and event transitions.
      </p>
      <p>SEMLINCS is essentially a predictive control architecture
that learns event schema rules and interacts with its world
in a self-motivated, goal- and information-driven manner. It
specifies a continuously unfolding cognitive control process
that incorporates (i) a self-motivated behavioral system, (ii)
event-oriented learning of probabilistic event schema rules,
(iii) hierarchical, goal-oriented, probabilistic reasoning,
planning, and decision making, (iv) speech comprehension and
generation mechanisms, and (v) interactions thereof.</p>
      <p>
        Here, our focus lies on studying artificial, cognitive game
agents. Consequently, we offer an implementation of
SEMLINCS to control game agents in a Super Mario game
environment123. Seeing that the game is in fact rather complex,
1https://www.youtube.com/watch?v=AplG6KnOr2Q
2https://www.youtube.com/watch?v=ltPj3RlN4Nw
3https://www.youtube.com/watch?v=GzDt1t iMU8
the implementation of SEMLINCS faces a diverse collection
of tasks. The implemented cognitive game agents are capable
of completing Super Mario levels autonomously or
cooperatively, solving a variety of deductive problems and interaction
tasks. Our implementation focuses on learning and applying
schematic rules that enable artificial agents to cause
behaviorally relevant intrinsic and extrinsic effects, such as
collecting, creating, or destroying objects in the simulated world,
carrying other agents, or changing an agent’s internal state,
such as the health level. Signals of persistent surprise in these
domains can be registered [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], which results in the issuance
of event schema learning [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], and which is closely related to
the reafference principle [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. As a result, production-rule-like,
sensorimotor-grounded event schemas develop from signals
of surprise and form predictive models that can be applied
for planning. SEMLINCS thus offers a next step towards
complete cognitive systems, which include learning techniques
and which build a hierarchical, conceptualized model of their
environment in order to interact with it in a self-motivated,
self-maintenance-oriented manner.
      </p>
      <p>
        A significant aspect when considering multi-agent
architectures inspired by human cognition is cooperation and
communication: Unique aspects of human cognition are characterized
by social skills like empathy, understanding the perspective of
others, building common ground by communication, and
engaging in joint activities [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. As a step towards these abilities,
we show that the developing event-oriented, schematic
knowledge structures enable the implemented SEMLINCS agents to
cooperatively achieve joint goals. Thus, our implementation
shows how sensorimotor grounded event codes can enable
and thus bootstrap cooperative interactions between artificial
agents. SEMLINCS is designed such that the developing
knowledge structures and the motivational system can be
coupled with a natural language processing component. In our
implementation, agents are able to learn from voice inputs
of an instructor, follow instructed goals and motivations, and
communicate their gathered plans and beliefs to the instructor.
Moreover, they can propose to and discuss with other game
agents potential joint action plans.
      </p>
      <p>In the following, we provide a general overview of the
modular structure of SEMLINCS in application to the
Super Mario game environment. Moreover, we outline key
aspects for coordinated cooperation in our implementation. We
evaluate the system in selected multi-agent deduction tasks,
focusing on learning, semantic grounding, and conceptual
reasoning with respect to agent-individual abilities, beliefs, and
environmental context. The final discussion puts forward the
insights gained from our modeling effort, highlights important
design choices, as well as current limitations and possible
system enhancements.</p>
      <p>
        II. SEMLINCS IN APPLICATION TO SUPER MARIO
Here we give a brief overview of the main characteristics
of SEMLINCS in application to the Super Mario game
environment. A detailed description is available in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The
implementation consists of five interacting modules as seen
dgonatl
e e
k v
invo e
(iv)
      </p>
      <p>Schematic</p>
      <p>Knowledge
Condition+Action</p>
      <p>→ Event
oebvseenrtvation prediecvtieonnt
(i)
(iii)
Motivational</p>
      <p>System
Intrinsic drives</p>
      <p>(v)
Speech
System
in / out
Sensorimotor</p>
      <p>Planning</p>
      <p>A*
goasleelveecnted
t</p>
      <p>(ii)
Schematic</p>
      <p>
        Planning
Event anticipation
interapctlaionn
in Figure 1. The motivational system (i) specifies drives that
activate goal-effects that are believed to bring the system
towards homeostasis. The drives comprise an urge to
collect coins, make progress in the level, interact with novel
objects, and maintain a specific health level. Goal-effects
selected by the motivational system are then processed by an
event-anticipatory schematic planning module (ii) that infers
a sequence of abstract, environmental interactions that are
believed to cause the effects in the current context. The
interaction sequence is then planned in terms of actual motor
commands by the sensorimotor planning module (iii), which
infers a sequence of keystrokes that will result in the desired
interactions. Both, the schematic and sensorimotor forward
models used for planning are also used to generate forward
simulations of the currently expected behavioral consequences.
These forwards simulations are continuously compared with
the actual observations by the event-schematic knowledge and
learning module (iv), where significant differences are
registered as event transitions that cause the formation of
procedural, context-dependent, event-schematic rules. The principle is
closely related to Jeffrey Zacks and Barbara Tversky’s event
segmentation theory [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and the reafference principle
[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. After a desired goal effect was achieved, the respective
drive that caused the goal is lowered, and a new goal is
selected, completing an action cycle. The speech system (v)
provides a natural user interface to all of these processes, and
additionally enables verbal communication between agents. In
the following, we focus on the steps most relevant for our
implementation of coordinated joint actions: Event-schematic
knowledge and planning.
      </p>
      <sec id="sec-1-1">
        <title>A. Event-Schematic Knowledge and Planning</title>
        <p>An event can be defined as a certain type of interaction that
ends with the completion of that interaction. An event
boundary marks the end of such an event by co-encoding the
encountered extrinsic and intrinsic changes or effects. Since the
possible interactions with the environment are context-dependent in
nature, we describe an event-schematic rule as a conditional,
probabilistic mapping from interactions to encountered event
boundaries. Production-rule like schemas can be learned by
means of Bayesian statistics under assumptions that apply in
the Mario environment: Object interactions immediately result
in specific effects, such that temporal dependencies can be
neglected. Furthermore, the effects always occur locally, such
that spatial relations can be neglected. Thus, in the Mario
world, interactions can be restricted to directional collisions,
which may result in particular, immediate effects, given a
specific, local context.</p>
        <p>In the SEMLINCS implementation, event boundary
detection is implemented by detecting significant sensory changes
that the agent does not predict by means of its sensorimotor
forward model. Amongst others, these include changes in an
agents’ health level or the number of collected coins, the
destruction or creation of an object, or the action of lifting
or dropping an object or another agent.</p>
        <p>The context for the applicability of a schematic rule,
however, is determined by different factors: It includes a
procedural precondition for an interaction, which specifies in
our current implementation the identity of actor and target as
well as the intrinsic state of the actor (i.e. its health level). On
the other hand, an environmental context precondition limits
the applicable rules to the current scope of an action. That
is, the target of a schema rule must be available and the
interaction with the target must be expected to lead to the
desired effect given the current situation. While the compliance
with procedural constraints can be determined easily, the
reachability of objects has to be ascertained by an intelligent
heuristic, which we describe in the following.</p>
      </sec>
      <sec id="sec-1-2">
        <title>B. Simulating the Scope of Action</title>
        <p>The scope of action in a simulated scene is determined by
a recursive search based on sensorimotor forward simulations.
The search starts at the observed scene or environmental
context and then simulates a number of simplified movement
primitives in parallel. Each of the simulations results in a
number of collisions (or interactions), as well as a new, simulated
scene. Sufficiently different scenes are then expanded in the
same manner, until the scope of action is sufficiently explored.
As a result, it encompasses the reachable positions as well
as attainable interactions in a local context as provided by
the sensorimotor forward simulation, neglecting, however, the
effects that may result from the interactions.</p>
        <p>The simulation of changes in the scope of action is
accomplished using the abstract, schematic forward simulation
of the local environment. In the current implementation, the
schematic forward model is applied by a stochastic, effect
probability based Dijkstra search. In contrast to the
sensorimotor forward model, it neglects the actual motor commands
but integrates the estimated, attainable interactions in the local
context as provided by the recursive, sensorimotor search.
When specific interactions relevant to the scope of action are
simulated (for example the destruction of a block) the scope
of action is updated.</p>
        <p>In the first example shown in Figure 2, an agent aims at
collecting a specific item (the coin on the top right). However,
this item is blocked by destructible objects (the golden boxes
to the right of the agent). Assume that the agent has already
learned that it can destroy and collect the respective objects. In
the initial situation (top left picture), however, the learned rule
about how to collect the coin is not applicable. The schematic
planning module thus first simulates the destruction of one
of the blocking objects, and then updates the simulated scope
of action. When there is more than one destructible object in
the current scene, it furthermore has to identify the correct
object for destruction, that is, degeneralize the schematic rule
with respect to the context (in the example, both objects are
suitable). Next, the agent realizes that the desired item can be
collected, given that one of the blocks was destroyed, resulting
in a schematic action plan.</p>
      </sec>
      <sec id="sec-1-3">
        <title>C. From Schematic Planning to Coordinated Cooperation</title>
        <p>Schema structures gathered from sensorimotor experiences
can be embedded into hierarchical, context-based planning.
Human cognition, however, is highly interactive and social. To
enable our architecture to act in multi-agent scenarios, it has to
(i) recognize other agents as individual actors (ii) observe and
learn about their actions and abilities, (iii) consider them as
actors in own plans (iv) consider them as possible interaction
targets, and (v) communicate emerging plans. Since agents
may have different knowledge and scopes of action, this can
already result in simple cooperative behavior, for example, if
the destruction of a specific block is needed but in the scope
of action of another agent only.</p>
        <p>To yield a greater variety of cooperative scenarios, we
additionally equip the agents with individual abilities. Specifically,
agents are equipped with different jumping heights or the
unique ability to destroy specific blocks. As shown in Figure
2, the agents may then expand their scope of action when
considering interactions with other agents during schematic
planning. As a consequence, depending on the situation, agents
may be committed to include other agents into their plans, as
will be shown in the experiments.</p>
        <p>While these principles are sufficient to model cooperative
planning, additional mechanisms are needed to account for
the coordination and communication of plans. In our
implementation, all schematic plans are strictly sequential, meaning
that only one interaction by one agent is targeted at a time,
eliminating the need for a time-dependent execution of plans.
The communication of plans is done via the speech system
by communicating (grammatical tags corresponding to) the
planned, abstract, schematic interaction sequences from the
planning agent to possibly involved agents. Neither the
concrete, contextualized interaction sequence, nor corresponding
sensorimotor plans are communicated. As a consequence,
the addressed agent has to infer the concrete instances of
targeted objects that the planning agent is talking about. To do
so, the agent performs contextual replanning to comprehend
the proposed plan using his own knowledge – essentially
mentally reenacting it. Given that the involved agent has
learned a different set of knowledge than the planning agent,
it is likely to end up with a different plan and a different
overall probability of success. In our current implementation,
an involved agent accepts a proposed plan when it does not
have another solution for the targeted goal that is more likely
successful than the proposed plan given its knowledge. Given
the involved agent gets to a different plan, it makes a counter
proposal that is always accepted by the initial planning agent.
The process of negotiation is shown in Figure 3.</p>
        <p>Makes plan to reach
a goal event
plan includes another agent?
no
yes
accept plan</p>
        <p>Start sensorimotor</p>
        <p>planning
Propose plan to
involved agent
Contextual replanning
● Application of own knowledge
● Schema degeneralization
● Plan probability comparison
behavior. Videos showcasing these scenarios are available
online45. An additional scenario showing the negotiation process
is also available, but it is not included in this paper because
it is not the main focus here 6.</p>
      </sec>
      <sec id="sec-1-4">
        <title>A. Toad Transports Mario</title>
        <p>The first scenario is shown in Figure 5. In the initial scene
(top left picture), the agent ‘Mario’ stands on the left, below
an object named ‘simple block’ while the agent ‘Toad’ stands
close to Mario to the right side. Neither Mario nor Toad have
gathered schematic knowledge about their environment so far.
Mario is instructed to jump and learns that if he is in his ‘large’
health state and collides with a simple block from the bottom,
the block will be destroyed. Next, he is ordered to jump to
the right– essentially onto the top of Toad – resulting in Toad
carrying Mario and the learning of the option to ‘mount’ Toad
and thus be carried around. As Mario is instructed to jump to
the right again, he also learns how to dismount Toad. Figure 4
shows a graph of Mario’s schematic knowledge at this point.</p>
        <p>Preconditions
Health: Large</p>
        <p>Actor / Target
TargAect:toSri:mMpalerioBlock
Actor / Target
Actor: Mario
Target: Toad</p>
        <p>Interaction
Col ision from below
with simple block
Interaction
Col ision from above
with Toad
Interaction
Col ision from left
with Toad</p>
        <p>P = 1.0
P = 0.6
P = 0.6</p>
        <p>Effect
DESTRUCTION
of simple block</p>
        <p>Effect</p>
        <p>MOUNT
the agent Toad</p>
        <p>Effect
DISMOUNT
the agent Toad</p>
        <p>Equipped with this knowledge, Mario is ordered by voice
input to ‘destroy a simple block’. This sets as goal effect the
destruction of a simple block object which activates planning
in the schematic knowledge space. As can be seen in Figure 5,
the only simple block is located at the top right in the current
context. In this implemented scenario, Toad is able to jump
higher than Mario, such that he can jump to the elevation,
while Mario is not able to do so. Thus, a direct interaction
with the simple block is not possible for Mario as it is not in
Mario’s current scope of action.</p>
        <p>The schematic planning is thus forced to consider other
previously experienced interactions in the context of the current
situation. We assume that all agents have full knowledge about
the sensorimotor abilities of the others. Thus, inferring that it
will expand his scope of action, Mario simulates to jump on
the back of Toad, followed by Toad transporting Mario to the
elevated location on the right. Because the combined height of
Mario and Toad is too tall to pass through the narrow passage
where the simple block is located, a dismount interaction is
simulated subsequently. Finally, Mario is able to destroy the
simple block since it is now in his scope of action.</p>
        <p>This interaction plan is then negotiated between the two
agents before they start sensorimotor planning. As Toad
observed Mario and thus learned the same knowledge entries, he</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4Scenario 1: https://youtu.be/0zle8L6H- 4 5Scenario 2: https://youtu.be/WzOg WcNDik 6Additional Scenario: https://youtu.be/7RV4QCwDK8U</title>
      <p>Start sensorimotor</p>
      <p>planning
Counterproposal of plan</p>
      <p>accept plan
Start sensorimotor
planning
yes
no</p>
      <p>We evaluated the resulting cooperative capabilities of
SEMLINCS by creating exemplar scenarios in the Super Mario
world, which illustrate the cooperative abilities of the agents.
We show two particular, illustrative evaluations. However,
we have evaluated SEMLINCS in various, similar scenarios
and have observed the unfolding of similarly well-coordinated
infers the same schematic plan and thus considers the proposal
useful and accepts. After the agreement, both agents plan
their part of the interaction sequence in terms of keystrokes
(top right picture) and wait for the other agent to execute its
part when necessary. The resulting execution of the plan is
shown in the following pictures: Mario mounting Toad; Toad
transporting Mario to the elevated ground; Mario dismounting
Toad and finally Mario moving to the simple block and
destroying it.</p>
      <sec id="sec-2-1">
        <title>B. Mario Clears a Path for Toad</title>
        <p>In the second scenario, shown in Figure 6, Toad is at first
instructed to collect the coin object, while Mario is ordered
to destroy the simple block (see top left picture). We assume
that Toad is not able to destroy a simple block by himself,
and does not generalize that he can do so as well. Toad is
instructed to increase his number of coins (top right picture).
Although he knows that a collision with a coin will yield the
desired effect, there is no coin inside his scope of action, since
the only coin in the scene is blocked by a simple block. Thus,
the schematic planning module anticipates a destruction of
the simple block by Mario (bottom left picture), expanding
Toad’s scope of action. After that, Toad is able to collect the
coin (bottom right picture).</p>
        <p>Both shown scenarios demonstrate how SEMLINCS agents
are able to learn about each other, include each other in their
action plans by recognizing individual scopes of action in an
environmental context, and coordinate the joint execution of
the plans. Communicating cooperative goals to the
participating agents establishes a common ground, consisting of the
final goal an agent wants to achieve as well as the interactions
it plans to execute while pursuing the final goal.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>IV. CONCLUSION</title>
      <p>
        Humans are able to understand other agents as individual,
intentional agents, who have their own knowledge, beliefs,
perspectives, abilities, motivations, intentions, and so their
own mind. [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]–[
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Furthermore, we are able to cooperate
with others highly flexibly and context-dependently, which
requires coordination. This coordination can be supported by
communication, helping to establish a common ground about
a joint interaction goal.
      </p>
      <p>In the presented work, we showed how social cooperative
skills can be realized in artificial agents. To do so, we equipped
the agents with different behavioral skills, such that particular
goals could only be reached with the help of another agent.
To coordinate a required joint action, SEMLINCS had to
enable agents to learn about the capabilities of other agents by
observing other agent-environment interactions and to assign
the learned event schema rules to particular agents. Moreover,
our implementation shows how procedural rules can be applied
to a local, environmental context, and how sensorimotor and
more abstract schematic forward simulations can be
distinguished in this process, and applied to build an effective,
hierarchical planning structure. Besides the computational insights
into the necessary system enhancements, our implementation
opens new opportunities for future developments towards even
more social, cooperative, artificial cognitive systems.</p>
      <p>
        First of all, currently the agents always cooperate. A
conditional cooperation could be based on the creation of an
incentive for an agent to share its reward with the participating
partner agent. Indeed, it has been shown that a sense of fairness
in terms of sharing rewards when team play was necessary is
a uniquely human ability [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. While a sense of fairness is a
motivation to share when help was provided – or also possibly
when future help is expected, that is, expecting that the partner
will return the favor – a more long term motivation can create
social bonds by monitoring social interactions with partners
over time and preferring interactions and cooperations with
those partners that have shared rewards in the past in a fair
manner. Clearly many factors determine if one is willing to
cooperate, including social factors, game theory factors, and
related aspects – all of which take the expected own effort
into account, the expected effort of the cooperating other(s),
as well as the expected personal gain and the gain for the
others.
      </p>
      <p>
        It also needs to be noted that currently action plans are
executed in a strict, sequential manner. In the real world,
however, joint actions are typically executed concurrently,
such as when preparing dinner together [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Thus, in the
near future we will face the challenge to allow the parallel
execution of cooperative interactions, which will make the
timing partially much more critical.
      </p>
      <p>
        Although our agents already communicate plans on an
abstract, schematic level, all sequential steps of the plans need
to be fully verbalized in order to coordinate a joint action
at the moment. An alternative would be to simply utter the
goal and ask for help, thus expecting the other agent to help
under consideration of the known behavioral abilities of the
individual agent. Therefore, more elaborate theories of mind
would need to be taken into consideration [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. For example,
in the first scenario mentioned above, Toad may realize that
he needs to transport Mario to the higher ground on the
right to enable Mario to destroy the box up there, because
Mario cannot reach this area. Humans are clearly able to
utter or even only manually signal a current goal and still
come up with a joint plan, without verbally communicating
the plan in detail. While verbal communication certainly helps
in the coordination process, obvious interactions can also
unfold successfully without communication (e.g. letting another
pedestrian pass; passing an object out of reach of another
person, who apparently needs it). Although the Mario world
is rather simple, cooperative interactions of this kind could
actually be enabled when enhancing the current SEMLINCS
architecture with the option to simulate potential goals of the
other agent and plans on how to reach them, thus offering a
helping hand wherever it seems necessary.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lucas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mateas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Preuss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Spronck</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Togelius</surname>
          </string-name>
          , “
          <article-title>Artificial and Computational Intelligence in Games (Dagstuhl Seminar 12191</article-title>
          ),” Dagstuhl Reports, vol.
          <volume>2</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>43</fpage>
          -
          <lpage>70</lpage>
          ,
          <year>2012</year>
          . [Online]. Available: http://drops.dagstuhl.de/opus/volltexte/2012/3651
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G. N.</given-names>
            <surname>Yannakakis</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Togelius</surname>
          </string-name>
          , “
          <article-title>A panorama of artificial and computational intelligence in games,” Computational Intelligence and</article-title>
          AI in Games, IEEE Transactions on, vol. PP, no.
          <issue>99</issue>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>1</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L. W.</given-names>
            <surname>Barsalou</surname>
          </string-name>
          , “Grounded cognition,
          <source>” Annual Review of Psychology</source>
          , vol.
          <volume>59</volume>
          , pp.
          <fpage>617</fpage>
          -
          <lpage>645</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hoffmann</surname>
          </string-name>
          , Vorhersage und Erkenntnis:
          <article-title>Die Funktion von Antizipationen in der menschlichen Verhaltenssteuerung und Wahrnehmung. [Anticipation and cognition: The function of anticipations in human behavioral control and perception</article-title>
          .]. Go¨ttingen, Germany: Hogrefe,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Butz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sigaud</surname>
          </string-name>
          , and P. Ge´rard, “
          <article-title>Internal models and anticipations in adaptive learning systems,” in Anticipatory Behavior in Adaptive Learning Systems: Foundations, Theories, and</article-title>
          <string-name>
            <surname>Systems</surname>
            ,
            <given-names>M. V.</given-names>
          </string-name>
          <string-name>
            <surname>Butz</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Sigaud</surname>
          </string-name>
          , and P. Ge´rard, Eds. Berlin Heidelberg: Springer-Verlag,
          <year>2003</year>
          , pp.
          <fpage>86</fpage>
          -
          <lpage>109</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Butz</surname>
          </string-name>
          , “
          <article-title>How and why the brain lays the foundations for a conscious self,” Constructivist Foundations</article-title>
          , vol.
          <volume>4</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>42</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Friston</surname>
          </string-name>
          , “
          <article-title>Learning and inference in the brain</article-title>
          .
          <source>” Neural Netw</source>
          , vol.
          <volume>16</volume>
          , no.
          <issue>9</issue>
          , pp.
          <fpage>1325</fpage>
          -
          <lpage>1352</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8] --, “
          <article-title>The free-energy principle: a rough guide to the brain?” Trends in Cognitive Sciences</article-title>
          , vol.
          <volume>13</volume>
          , no.
          <issue>7</issue>
          , pp.
          <fpage>293</fpage>
          -
          <lpage>301</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Clark</surname>
          </string-name>
          , “
          <article-title>Whatever next? predictive brains, situated agents, and the future of cognitive science</article-title>
          ,
          <source>” Behavioral and Brain Science</source>
          , vol.
          <volume>36</volume>
          , pp.
          <fpage>181</fpage>
          -
          <lpage>253</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Zacks</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Tversky</surname>
          </string-name>
          , “
          <article-title>Event structure in perception and conception,” Psychological Bulletin</article-title>
          , vol.
          <volume>127</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>21</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>J. M. Zacks</surname>
            ,
            <given-names>N. K.</given-names>
          </string-name>
          <string-name>
            <surname>Speer</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Swallow</surname>
            ,
            <given-names>T. S.</given-names>
          </string-name>
          <string-name>
            <surname>Braver</surname>
            , and
            <given-names>J. R.</given-names>
          </string-name>
          <string-name>
            <surname>Reynolds</surname>
          </string-name>
          , “
          <article-title>Event perception: A mind-brain perspective,” Psychological Bulletin</article-title>
          , vol.
          <volume>133</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>273</fpage>
          -
          <lpage>293</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hommel</surname>
          </string-name>
          , J. Mu¨sseler, G. Aschersleben, and W. Prinz, “
          <article-title>The theory of event coding (TEC): A framework for perception and action planning,” Behavioral and Brain Sciences</article-title>
          , vol.
          <volume>24</volume>
          , pp.
          <fpage>849</fpage>
          -
          <lpage>878</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>W.</given-names>
            <surname>Prinz</surname>
          </string-name>
          , “
          <article-title>A common coding approach to perception and action,” in Relationships between perception and action</article-title>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Neumann</surname>
          </string-name>
          and W. Prinz, Eds. Berlin Heidelberg: Springer-Verlag,
          <year>1990</year>
          , pp.
          <fpage>167</fpage>
          -
          <lpage>201</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Newell</surname>
          </string-name>
          and
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Simon</surname>
          </string-name>
          ,
          <article-title>Human problem solving</article-title>
          .
          <source>Englewood Cliffs</source>
          , NJ: Prentice-Hall,
          <year>1972</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Butz</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Hoffmann</surname>
          </string-name>
          , “
          <article-title>Anticipations control behavior: Animal behavior in an anticipatory learning classifier system</article-title>
          ,
          <source>” Adaptive Behavior</source>
          , vol.
          <volume>10</volume>
          , pp.
          <fpage>75</fpage>
          -
          <lpage>96</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Dominey</surname>
          </string-name>
          , “
          <article-title>Recurrent temporal networks and language acquisition: from corticostriatal neurophysiology to reservoir computing,” Frontiers in Psychology</article-title>
          , vol.
          <volume>4</volume>
          , pp.
          <fpage>500</fpage>
          -,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K.</given-names>
            <surname>Pastra</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Aloimonos</surname>
          </string-name>
          , “
          <article-title>The minimalist grammar of action,” Philosophical Transactions of the Royal Society B: Biological Sciences</article-title>
          , vol.
          <volume>367</volume>
          , pp.
          <fpage>103</fpage>
          -
          <lpage>117</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>F.</given-names>
            <surname>Wo</surname>
          </string-name>
          <article-title>¨rgo¨tter, A</article-title>
          . Agostini, N. Kru¨ger, N. Shylo, and
          <string-name>
            <given-names>B.</given-names>
            <surname>Porr</surname>
          </string-name>
          , “
          <article-title>Cognitive agents-a procedural perspective relying on the predictability of objectaction-complexes (OACs</article-title>
          ),
          <source>” Robotics and Autonomous Systems</source>
          , vol.
          <volume>57</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>420</fpage>
          -
          <lpage>432</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>F.</given-names>
            <surname>Schrodt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kneissler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ehrenfeld</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Butz</surname>
          </string-name>
          , “
          <article-title>Mario becomes cognitive,” TOPICS in Cognitive Science</article-title>
          , in press.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Butz</surname>
          </string-name>
          , “
          <article-title>Towards a unified sub-symbolic computational theory of cognition,” Frontiers in Psychology</article-title>
          , vol.
          <volume>7</volume>
          , no.
          <issue>925</issue>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Butz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Swarup</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          , “
          <article-title>Effective online detection of task-independent landmarks</article-title>
          ,” in
          <source>Online Proceedings for the ICML'04 Workshop on Predictive Representations of World Knowledge</source>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Sutton</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          , Eds. online,
          <year>2004</year>
          , p.
          <fpage>10</fpage>
          . [Online]. Available: http://homepage.mac.com/rssutton/ICMLWorkshop.html
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>E. von Holst and H.</given-names>
            <surname>Mittelstaedt</surname>
          </string-name>
          , “
          <article-title>Das Reafferenzprinzip (Wechselwirkungen zwischen Zentralnervensystem und Peripherie</article-title>
          .),” Naturwissenschaften, vol.
          <volume>37</volume>
          , pp.
          <fpage>464</fpage>
          -
          <lpage>476</lpage>
          ,
          <year>1950</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Tomasello</surname>
          </string-name>
          ,
          <article-title>A Natural History of Human Thinking</article-title>
          . Harvard University Press,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Buckner</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Carroll</surname>
          </string-name>
          , “
          <article-title>Self-projection and the brain,” Trends in Cognitive Sciences</article-title>
          , vol.
          <volume>11</volume>
          , pp.
          <fpage>49</fpage>
          -
          <lpage>57</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>N.</given-names>
            <surname>Sebanz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bekkering</surname>
          </string-name>
          , and G. Knoblich, “
          <article-title>Joint action: Bodies and minds moving together,” Trends in cognitive sciences</article-title>
          , vol.
          <volume>10</volume>
          , pp.
          <fpage>70</fpage>
          -
          <lpage>76</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Tomasello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Carpenter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Call</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Behne</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Moll</surname>
          </string-name>
          , “
          <article-title>Understanding and sharing intentions: The origins of cultural cognition,” Behavioral and Brain Sciences</article-title>
          , vol.
          <volume>28</volume>
          , pp.
          <fpage>675</fpage>
          -
          <lpage>691</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hamann</surname>
          </string-name>
          , F. Warneken,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Greenberg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Tomasello</surname>
          </string-name>
          , “
          <article-title>Collaboration encourages equal sharing in children but not in chimpanzees</article-title>
          ,
          <source>” Nature</source>
          , vol.
          <volume>476</volume>
          , no.
          <issue>7360</issue>
          , pp.
          <fpage>328</fpage>
          -
          <lpage>331</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>C.</given-names>
            <surname>Frith</surname>
          </string-name>
          and U. Frith, “Theory of mind,
          <source>” Current Biology</source>
          , vol.
          <volume>15</volume>
          , no.
          <issue>17</issue>
          , pp.
          <fpage>R644</fpage>
          -
          <lpage>R645</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>