<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Conceptualization and Implementation of a Reinforcement Learning Approach Using a Case-Based Reasoning Agent in a FPS Scenario</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marcel Kolbe</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pascal Reuss</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jakob Michael Schoenborn</string-name>
          <email>schoenborng@uni-hildesheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Klaus-Dieter Altho</string-name>
          <email>kalthoff@dfki.uni-kl.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>German Research Center for Arti cial Intelligence (DFKI) Trippstadter Str.</institution>
          <addr-line>122 67663 Kaiserslautern</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Hildesheim Samelsonplatz 1 31141 Hildesheim</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes an approach that combines case-based reasoning (CBR) and reinforcement learning (RL) in the context of a rst person shooter (FPS) game in the game mode deathmatch. Based on an engine written in C#, Unity and a simple rule-based agent, we propose a FPS agent who is using a combination of case-based reasoning and reinforcement learning to improve the overall performance. The reward function is based on learned sequences of performed small plans and considers the current win chance in a given situation. We describe the implementation of the reinforcement algorithm and the performed evaluation using di erent starting case bases.</p>
      </abstract>
      <kwd-group>
        <kwd>Case-Based Reasoning</kwd>
        <kwd>Reinforcement Learning</kwd>
        <kwd>First Per- son Shooter</kwd>
        <kwd>Multi-Agent System</kwd>
        <kwd>Planning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction and motivation</title>
      <p>To prove the functionality of an arti cial intelligence, multi-agent systems
became a popular application area. We take a look at the rst person shooter (FPS)
domain, a sub-genre of action video games. In a typical FPS game, there are two
teams of typically human players trying to overcome the opposing team by either
eliminating each member of the opposing team or by successfully complete
another objective, such as planting a bomb at a certain place or by preventing the
opposing team to do so. While both teams are actively playing at the same time,
most FPS are limited by a round-time of approximately ve minutes and a
limited map size. Each individual FPS game di erentiates itself from other games
Copyright c 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
by having certain unique characteristics, for example, the extent of reality.
However, a FPS game can be generalized to the following: The human player takes
over the sight of an agent, as if he or she would be placed inside of the game.
To emphasize this, environmental sounds, for instance, the sound of a step when
moving through the terrain, are implemented to increase the immersion into the
game. An agent starts with a certain level of health points (HP, usually 100) and
with a basic pistol weapon equipped, having a limited amount of ammunition.
The goal is to nd and eliminate the opposing agent. Each round, credits are
earned based on the outcome of the last round to buy better weapons or supply
like armor and health packs to increase the current health points. This domain
is not limited to one versus one ghts but rather is usually applied to ve versus
ve combat. This increases the complexity a lot since factors like
communication and planning between the agents have to be considered: while the game is
running, the current state changes each millisecond by usually not knowing the
enemies position and by creating predictions of the enemies next possible steps.
Since it is a very advantageous situation to see the enemy before the enemy sees
you, one of the most common tactics to gain this advantage is to \camp", that
is, waiting and hiding for an inde nite amount of time3 at a point of interest
until the enemy passes along. However, especially in the currently most played
FPS games like Fortnite or Players Unknown Battleground, the so-called mode
\Battle Royale" mode gained an increasing popularity. This mode is usually a
free-for-all mode, meaning everyone ghts for himself against any other agent4.
Using this mode, after a certain amount of time, the size of the level shrinks until
no place is left to hide and the last two surviving agents will have to engage each
other. The longer one can avoid ghts and look for equipment, the better. Thus,
an AI agent would need to plan accordingly when a rather defensive behavior has
to be switched to a comparably aggressive behavior. Since the spawn locations
and spawn timings of these equipment di er each round, each game the player
is able to experience a new situation. This is where our research adds on.</p>
      <p>While CBR and RL were and are used in di erent game domains (see section
2), the use of CBR and RL in a FPS game in the game mode of deathmatch,
either one-on-one or team-based is new and has di erent strategies, tactics and
challenges than game modes like conquest or capture the ag. Another goal of
or research is to apply the CBR and RL approach to FPS in the game mode
deathmatch and evaluate the progress of our agents during the games.</p>
      <p>
        As we will later see more in detail in section 3.1, one might think of one round
of a FPS game as one case which can be stored and retrieved. We went on a
more detailed level and used the current perception of the agent as situation and
mapped a corresponding action as a solution to the current situation. In our rst
version, we used 17 attributes, such as currentAmmunition, distanceToEnemy,
among many others and 15 initial cases on a FPS game designed in Unity [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
However, the initial problem space was not large enough, leading to a unique
3 If not limited by a certain objective, e.g., planting a bomb during a limited time
frame.
4 However, most of the games also incorporate a variant where teaming is allowed.
overall dominant tactic by always looking to pick up the weapon upgrade and
therefore nullifying the bene ts of CBR. While we increased the complexity of
the level itself, we ask the research question whether we can improve the retrieval
of the best cases by using reinforcement learning to prevent the agent from being
stuck in a corner, not changing his situation and thus not retrieving another case
to get himself out of the corner.
      </p>
      <p>
        For our own testing purposes, we used a FPS game developed by Jannis
Hillmann as part of his master's thesis using a combination of Unity, C# and
JAVA [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Fig. 1 shows an example of a CBR agent (purple) ghting a rule-based
AI (orange). This platform in general is intended to support other domains, for
example, real-time strategy games and economical simulations to name a few.
      </p>
      <p>
        To enable another learning component besides case-based reasoning for our
agent, we propose to use reinforcement learning (RL). Using the de nition from
Sutton, RL enables to derive actions from situations. Situations are mapped to
an aim such that actions are connected with certain rewards or punishments
(negative rewards) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. During each situation, the agent has to perceive the
current state of the environment he is situated in. One of the advantages of
reinforcement learning is that not only the current situation but also near-future
situations can be considered. Combining the lessons learned from failed
experiences and the positive experiences from properly rewarded situations leads to the
most important aspect of RL and consequently to the creation of an intelligent
plan during a reasonable time frame.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        RL has been used in gaming AI before mainly in the context of real time strategy
(RTS) games. One approach was developed by Wender and Watson in StarCraft
II [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. StarCraft II as a RTS game has almost the same requirements and
conditions, e. g., real-time decision making with incomplete information, as a
FPS game but with more focus on macro- and micromanagement. Thus, the
domains are to some extent interchangeable and approaches can be transferred
by applying the necessary adjustments since the focus of the two genres are
di erent. Using reinforcement learning in combination of case-based reasoning,
we want to prove that a learning agent can always defeat a rather static
rulebased agent after enough experiences have been collected and thus lead to a
bene cial approach. According to Wender and Watson, the use of di erent forms
of CBR and RL in the context of AI research is one of the most common ways
of working, regardless of whether the methods work together or act separately.
The two approaches support each other's respective problem areas. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
      </p>
      <p>
        There are many other approaches that combine CBR and RL in RTS games.
The approach of [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] combines goal-driven autonomy with RL to coordinate the
learning e orts of multiple agents in a RTS games by using a shared reward
function. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] a generalized reward function is used to apply several RL algorithms
to a RTS game, while [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] propose two RL algorithms bases on neural networks
to optimize the build order in StarCraft II. In the last years a new approach
called Deep RL was developed and used in several research work like [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
and [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. All these approaches uses CBR and RL to improve the micro and/or
macro management in RTS games, but RL can also be used in turn-based games
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and in FPS games. The work of Auslander and his colleagues [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] deals with
the use of a RL logic in the context of a rst person shooter Unreal Tournament,
developed with the UNREAL Engine. Their work is based on the previous
successes of various knowledge-based learning approaches that have relied on the
mechanics and dynamics of games, e. g. Star Craft or Backgammon. Instead of
learning \one-on-one" confrontation, the work describes group-oriented
behaviors and strategies based on the combination of RL and the CBR approach for
the game Unreal Tournament. This is a challenge for Unreal Tournament as
different game modes with di erent goals are supported. On the one hand it can
be a team-based deathmatch and on the other hand it can be a game type called
conquest, in which the killing of opponents is secondary. The ultimate goal is to
defend a speci c point on the map. According to [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the game mode conquest
is particularly interesting for the development of cooperative strategies in the
context of a team-oriented approach. Furthermore, it o ers a good opportunity
to observe the learning behaviour of the team.
      </p>
      <p>
        Another work in the context of a FPS game is presented by [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for the
MineCraft game. The goal of this game is not to ght against each other like in
Unreal Tournament or our approach, but to harvest resources and build
structures. The approach combines hierarchical task networks and RL algorithms to
adapt the plans for agents acting in the dynamic world. A deep RL approach for
the FPS game Doom was developed by [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>Architecture and structural dependencies</title>
      <p>Every agent and the environment is built in Unity 3D and thus implemented as a
C# project. We use three di erent agents: Player Agent, Planning Agent,
Communication Agent. The Player Agent uses an inherited method called update()
which updates the agent perception, i.e., the agent received information
throughout its sensors or proposed plans of the planning agent. As the update()-method
triggers multiple times per second, this value has to be evaluated considering
the fairness towards a human enemy who can only evaluate a limited amount of
perceptions. With each update()-cycle, the agents sends a request to the
Communication Agent. This agent is connected with the myCBR component which
uses JAVA as a programming language. Thus, a corresponding interface using
the communication via TCP/IP has been established. Each time the
Communication Agent receives a request, the agent formulates a request to the CBR
system to retrieve the most similar case to the current situation. Once the most
similar case has been retrieved, the proposed solution (which usually results in
a proposed action) will be sent to the Planning Agent. This agent evaluates the
proposed action from the Communication Agent and forms a plan, which will be
sent to the Player Agent and will be followed until a new plan will be proposed.
As a simple example, the Player Agent perceives that he is low on health (e.g.,
&lt; 20 % hit points (HP)), so that perception will be transferred to the
Communication Agent. This agent retrieves as the most similar case, that picking up a
piece of pizza leads to gain back most of the lost hit points. Thus, the planning
agent formulates a plan to nd and to pick up a piece of pizza. This plan will be
followed, until it has been picked up, until the agent dies, or until another plan
gains a higher priority (e.g., self-defence or perceiving another, better, healing
item). The structure builds up as described can be seen in Fig. 2:</p>
      <p>Using reinforcement learning in addition to case-based
reasoning
Whenever the rst prototype of our CBR agent was able to defeat the
enemy, the agent tried to identify the next action, e.g., collection items.
However, based on the retrieval of a at hierarchical case base - and thus, searching
through the whole case base, the agent got frequently stuck if invalid cases, e.g.,
\MoveTo;Shoot" while no enemy is alive or \CollectItem&lt;ammunition&gt;" while
the weapon is reloaded and full of ammunition. While the agent was elaborating
over these invalid actions, the enemy re-spawned and thus killing the enemy took
the most relevance back again. Resulting of this behavior, we added a
reinforcement learning component to evaluate positive and negative agent behaviors. As
a side note, it might also be feasible to evaluate the retrieval process and to
take a deeper look at the cases themselves. To prune the considered case base,
we take a deeper look at the most valuable attributes when deciding on which
action to take next:
1. isEnemyVisible &amp; isEnemyAlive</p>
      <p>Boolean. Whenever the enemy is visible, cases with combat actions should
be preferred over planning cases. If it is known that the enemy is not alive,
the procurement of items takes way higher precedence.
2. distanceToEnemy</p>
      <p>Symbol with attributes: \near, middle, far, unknown". The distance to the
enemy is unknown, if the enemy is invisible (despite one might be able to
extrapolate the estimated distance based on the last time the enemy has
been seen). Since accuracy has not yet been modeled in the prototype, this
attribute only takes increasing relevance if the enemy and the agent both
want to pick up the same item at the same time.
3. ownHealth</p>
      <p>Symbol with attributes: \critical, few, middle, much, full". This is the most
important attribute. The hit points of an agent are perceived as an integer,
but we rather introduce categories of HP to simplify the agents behavior
reasoning. Every plan should evaluate this value when considered to be
executed.</p>
      <p>Example: After killing the enemy, the agent is left with 50 % HP. A health
container is 7 yards away, while a weapon upgrade is 9 yards away (in the same
direction). The agent plans to pick up the health container and re-evaluates the
new situation rst, e.g., checking the visibility of the enemy, before picking up
the weapon upgrade. To evaluate the current situation, we de ne the following
probability to win winsit 2 [0; 1] based on the current situation:
winsit = (HealthCBR</p>
      <sec id="sec-3-1">
        <title>HealthAI ) wHealth</title>
        <p>+(W eaponCBR</p>
      </sec>
      <sec id="sec-3-2">
        <title>W eaponAI ) wW eapon</title>
        <p>+(AmmunitionCBR</p>
      </sec>
      <sec id="sec-3-3">
        <title>AmmunitionAI ) wAmmunition</title>
        <p>The corresponding weights wi with Pwi = 1 are for testing purposes
implemented in a static way, but in future work, the parameters should ultimately be
learned and optimized during the revise step of the CBR cycle (e.g., it might be
acceptable to follow an aggressive behavior despite having low HP when having
a very good weapon). We categorize the result of the calculation into three
areas, whereas the size of the areas shall be learned during the games but will be
initialized with the mentioned values:
{ Rejection area: [0; 0:4]</p>
        <p>The agent perceives that he is in a disadvantageous situation (e.g., because of
low health or a lesser e ective weapon). Usually, cases with passive behavior
like searching and collecting health container will be selected.
{ Risk area: (0:4; 0:6)</p>
        <p>This is the most exible area of calculation. Should certain behaviors fail for
multiple times, the corresponding thresholds for dynamic case manipulation
have to be adjusted in accordance to the perceived result (reinforcement
learning).
{ Acceptance area: [0:6; 1]</p>
        <p>The agent perceives that he is in an advantageous situation and thus
performs aggressive behavior, e.g., actively searching for the enemy.</p>
        <p>To evaluate the risk area using reinforcement learning, cases need to be
adjustable during run-time. To implement this, we are using sequences. A sequence
saves temporarily every case retrieved from the CBR component during the
lifespan of an agent. Whenever the agent dies, the corresponding sequence will be
terminated. Fig. 3 depicts one exemplary sequence where a case has been added
during each step. The sequence itself then contains the corresponding case. These
sequences can be simpli ed by combining reoccurring cases, but are kept this
way for illustrative purposes.</p>
        <p>
          These sequences are an important part in the next topic: Logging. According
to reinforcement learning, we want to move through a sequence (which, again,
represents the whole lifespan of an agent) and evaluate the experience that have
been made. We want to save positive experiences in our case base by rather
setting a higher weight on the case initializing cases than on the later redundant
\moveTo;Shoot" cases which do not necessarily contain a lot of valuable
information. The same approach is used for deleting cases. To prevent an overeager
deletion of cases, we only delete cases if the size of the case base is larger than
50 and the win rate of the current situation has dropped below 10 %.
As stated before, the initial situation was the prototype developed by Hillmann
who implemented a simple rule-based and a case-based agent [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ][
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The
evaluation of the prototype was rather disappointing towards the CBR agent: The
agent lost heavily in comparison to the simple rule-based agent. There were ve
tests with each 15 minutes, calculating the kill-death ratio by grating +1 score
whenever one killed the opposing agent and reducing the score by 1 whenever
one has been killed by the opposing agent. The CBR agent ended on average
with a kill-death ratio of -16, with decreasing tendency. Observations of these
matches have shown, that the rule-based agent had more possessing time of the
machine gun which leads to kill-streaks of ve to six kills, while the CBR agent
was rather hesitant to prioritize picking up the machine gun due to false
similarity assessments. In addition, the limited complexity of the modeled domain
seems also to be unfavorable towards the CBR agent. Nevertheless, we wanted
to show the results of our observations by adding reinforcement learning to the
CBR agent:
        </p>
        <p>
          As Auslander et al. stated, RL approaches need time to gain a signi cant
in uence at the environment [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. This led us to the decision to increase the testing
time by 45 minutes and to test the reuse of the case base. Since our proposed
reinforcement learning approach bases heavily on sequences (and thus, on the
underlying case bases), we used four di erent initial case bases and compared
them:
        </p>
        <p>Combination
Prompt Cases
Kolbe Default</p>
        <p>Bartels Default
5
{ Prompt Cases: 12 cases which have been reduced to their core attributes
to support a more e cient retrieval. This should lead to a faster creation of
unique cases.
{ Kolbe Default: 10 cases handpicked by M. Kolbe which focuses on the
attributes mentioned in Section 3.1.
{ Bartels Default: 15 cases which J.-J. Bartels chose during a similar
researching topic. These cases have been adjusted to t the attribute structure
of M. Kolbe.
{ Combination: 17 cases which have been combined from Kolbe and Bartels.</p>
        <p>However, only suitable cases have been selected so that there are not for
example four initial, redundant \MoveTo;Shoot" cases.</p>
        <p>Figure 4 presents the results of the rst run. During the rst minutes (i.e.,
the rst kills), the performance of the case bases is similar. This result is to
be expected since during the game, both agents start with a simple pistol gun
and no items are available, thus, they do kill one after each other due to the
respective health advantages. Once items do spawn, the results begin to di er
- with the Prompt Cases leading with a 1.25 Kill/Death Ratio. However, after
approximately 40 kills, each case base seems to commute in a certain level.</p>
        <p>This becomes more apparent after looking at the later stages of the run (Fig.
5). The Prompt Cases show a promising result during the begin of the stages,
but cannot seem to hold the level and has been eliminated ten times in a row.
However, this death streak could not be identi ed during later stages of the
run since the agent began to learn picking up health packs instead of running
straight into the enemy (which the agent learned with reinforcement learning).
io0:65
t
a
R
tah 0:6
e
D
l
l
i
K0:55
170
175
180
185
190 195 200
Amount of kills
205
210
215</p>
        <p>Since Bartels Default and the Combination show similar results to the Prompt
Cases, we take a look at the resulting case bases after the test run and how they
have evolved:
{ Prompt Cases: 83 cases after 60 minutes
{ Kolbe Default: 56 cases after 60 minutes
{ Bartels Default: 81 cases after 60 minutes
We can see that Kolbe Default runs similar results to its contenders with an
approximately 30 % smaller case base. This is due to deleting cases with negative
outcome so that the case base still only contains relevant and positive
experiences. The second test run - with using the outcome case bases from the rst
run - has been executed respectively, leading to the following results, ordered
descending:
{ Prompt Cases: K/D-Ratio of 0.55, 50 cases
{ Combination Default: K/D-Ratio of 0.54, 120 cases
{ Bartels Default: K/D-Ratio of 0.49, 50 cases</p>
        <p>The results show again that the systematically created Prompt Cases using
reinforcement learning show the best results. However, the overall result still
needs improvement to gain a K/D-Ratio of at least 1 to actually win a game.</p>
        <p>
          The results show that the integration of an RL approach into the system
described in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] leads to an improvement in the performance of the CBR agent.
On the other hand it can be seen that the K/D-Ratio is still not good enough
to win the game against the simple rule-based agent. Therefore, the brute-force
play style of the rule-based agent still outperforms the planning play style of
the CBR agent. The main reason for this seems to be the low complexity of
the current level and game design. With increasing complexity of the level and
the support of more strategies and tactics, like an ambush or keeping a certain
distance during combat, the CBR approach should be able to outperform the
simple rules.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this work, we tried to show that the addition of reinforcement learning can be
an overall improvement for a case-based reasoning agent in a relatively simple
rst person shooter scenario. While the here modeled domain does not hold too
much complexity (which can be a disadvantage for CBR), the RL-CBR agent
was still able to learn from the experiences he made. This can be proven by the
results of our evaluation, as the agent made progress towards a better kill-death
ratio. Nevertheless, there is still room for optimization. With the addition of RL,
the learning progress of an agent could be shown and looks promising for further
investigations, especially when the domain becomes more complex, for example,
by using the Unreal Tournament engine which holds its own challenges.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Andersen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodwin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Granmo</surname>
          </string-name>
          , O.:
          <article-title>\Deep RTS: A Game Environment for Deep Reinforcement Learning in Real-Time Strategy Games"</article-title>
          .
          <source>In: IEEE Conference on Computational Intelligence and Games (CIG)</source>
          ,
          <year>Maastricht</year>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Auslander</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee-Urban</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hogg</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          and Mun~oz.Avila, H.:
          <article-title>\Recognizing the Enemy: Combining Reinforcement Learning with Strategy Selection using Case-Based Reasoning"</article-title>
          .
          <source>In: European Conference on Case-Based Reasoning</source>
          , Trier, Springer,
          <year>2008</year>
          , pp.
          <fpage>59</fpage>
          -
          <lpage>73</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dobrovsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borgho</surname>
          </string-name>
          , U. M., and
          <string-name>
            <surname>Hofmann</surname>
          </string-name>
          , M.:
          <article-title>\Improving Adaptive Gameplay in Serious Games Through Interactive Deep Reinforcement Learning"</article-title>
          ,
          <source>In: Cognitive Infocommunications, Theory and Applications</source>
          , Springer International Publishing,
          <year>2019</year>
          , pp.
          <fpage>411</fpage>
          -
          <lpage>432</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Foerster</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nardelli</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farquhar</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Afouras</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torr</surname>
            ,
            <given-names>P. H. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kohli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Whiteson</surname>
          </string-name>
          , S.: \
          <article-title>Stabilising Experience Replay for Deep Multi-agent Reinforcement Learning"</article-title>
          ,
          <source>In: Proceedings of the 34th International Conference on Machine Learning</source>
          , Volume
          <volume>70</volume>
          ,
          <year>2017</year>
          , JMLR.org, pp
          <fpage>1146</fpage>
          -
          <lpage>1155</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hillmann</surname>
          </string-name>
          , J.: \
          <article-title>Konzeption und Entwicklung eines Prototypen fr ein lernfhiges MultiAgenten-System mittels des fallbasierten Schlieen im Szenario einer First-Person Perspektive" (Conception and Development of a prototype for a multi-agent-system with learning capabilities using case-based reasoning in the rst-person perspective szenario)</article-title>
          . Hildesheim, University of Hildesheim,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lample</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Chaplot</surname>
          </string-name>
          , D. S.:
          <article-title>\Playing FPS games with deep reinforcement learning"</article-title>
          ,
          <source>In: Thirty-First AAAI Conference on Arti cial Intelligence</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ewalds</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bartunov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Georgiev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vezhnevets</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yeo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Makhzani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Kuttler, H.,
          <string-name>
            <surname>Agapiou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schrittwieser</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,Ga ney, S.,
          <string-name>
            <surname>Petersen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaul</surname>
            , T., van Hasselt,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silver</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lillicrap</surname>
            ,
            <given-names>T. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calderone</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brunasso</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lawrence</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ekermo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Repp</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tsing</surname>
          </string-name>
          , R.:
          <article-title>\StarCraft II: A New Challenge for Reinforcement Learning"</article-title>
          , In: CoRR,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Parashar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sheneman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Goel</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          :
          <article-title>\Adaptive Agents in Minecraft: A Hybrid Paradigm for Combining Domain Knowledge with Reinforcement Learning"</article-title>
          ,
          <source>In: Autonomous Agents and Multiagent Systems</source>
          , Springer International Publishing,
          <year>2017</year>
          , pp.
          <fpage>86</fpage>
          -
          <lpage>100</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Reuss</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hillmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Viefhaus</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Altho</surname>
          </string-name>
          , K.-D.: \
          <article-title>Case-Based Action Planning in a First Person Scenario Game"</article-title>
          . In: Rainer Gemulla, Simone Ponzetto, Christian Bizer, Margret Keuper,
          <source>Heiner Stuckenschmidt (Publ.)</source>
          .
          <source>LWDA 2018 - Lernen</source>
          , Wissen, Daten, Analysen - Workshop
          <string-name>
            <surname>Proceedings</surname>
          </string-name>
          . GI-Workshop-Tage \Lernen, Wissen, Daten,
          <source>Analysen" (LWDA-2018) August</source>
          <volume>22</volume>
          -24 Mannheim Germany University of Mannheim 8/
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Sutton</surname>
          </string-name>
          , R. S.: \Introduction:
          <article-title>The Challenge of Reinforcement Learning"</article-title>
          . In: The International Series in Engineering and Computer Science (SECS, Volume
          <volume>173</volume>
          )
          <string-name>
            <surname>- Machine</surname>
            <given-names>Learning</given-names>
          </string-name>
          , Boston, Kluwer Academic O,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Sethy</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Padmanabhan</surname>
          </string-name>
          , V.:
          <article-title>\Real Time Strategy Games: A Reinforcement Learning Approach"</article-title>
          .
          <source>In: Procedia Computer Science</source>
          , Volume
          <volume>54</volume>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Guo</surname>
          </string-name>
          , P.: \
          <article-title>Reinforcement Learning for Build-Order Production in StarCraft II"</article-title>
          . In: Eighth International Conference on Information Science and
          <source>Technology (ICIST)</source>
          ,
          <year>Cordoba</year>
          ,
          <year>2018</year>
          , pp.
          <fpage>153</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ulit</surname>
          </string-name>
          , Jaidee, Mun~
          <fpage>oz</fpage>
          -Avila, Hector and Aha, David W.:
          <article-title>\Case-Based Goal-Driven Coordination of Multiple Learning Agents"</article-title>
          <source>In: Case-Based Reasoning Research and Development</source>
          , Springer, Proceedings,
          <year>2013</year>
          , pp.
          <fpage>164</fpage>
          -
          <lpage>178</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Wender</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Watson</surname>
          </string-name>
          , I.:
          <article-title>\Combining Case-Based Reasoning and Reinforcement Learning for Unit Navigation in Real-Time Strategy Game AI"</article-title>
          .
          <source>In: International Conference on Case-Based Reasoning (ICCBR)</source>
          ,
          <year>2014</year>
          , Cork, Ireland, Springer, Proceedings,
          <year>2014</year>
          , pp.
          <fpage>511</fpage>
          -
          <lpage>525</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Wender</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Watson</surname>
          </string-name>
          , I.:
          <article-title>\Using reinforcement learning for city site selection in the turn-based strategy game Civilization IV"</article-title>
          .
          <source>In: IEEE Symposium On Computational Intelligence and Games</source>
          , Perth, WA,
          <year>2008</year>
          , pp.
          <fpage>372</fpage>
          -
          <lpage>377</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>