<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Generating Inspiration for Multi-Agent Simulation Design by Q-Learning</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Robert</forename><surname>Junges</surname></persName>
							<email>robert.junges@oru.se</email>
							<affiliation key="aff0">
								<orgName type="department">Modeling and Simulation Research Center</orgName>
								<orgName type="institution">Örebro University</orgName>
								<address>
									<country key="SE">Sweden</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Franziska</forename><surname>Klügl</surname></persName>
							<email>franziska.klugl@oru.se</email>
							<affiliation key="aff1">
								<orgName type="department">Modeling and Simulation Research Center</orgName>
								<orgName type="institution">Örebro University</orgName>
								<address>
									<country key="SE">Sweden</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Generating Inspiration for Multi-Agent Simulation Design by Q-Learning</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">81254FE6B834ACF135F8841790A4E83A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T07:15+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>One major challenge in developing multi-agent simulations is to find the appropriate agent design that is able generating the intended overall phenomenon respectively dynamics, but does not contain unnecessary details. In this paper we suggest to use agent learning for supporting the development of an agent model: The modeler defines the environmental model and the agent interfaces. Using rewards capturing the intended agent behavior, Reinforcement Learning techniques can be used for learning the rules that are optimally governing the agent behavior. However, for really being useful in a modeling and simulation context, a human modeler must be able to review and understand the outcome of the learning. We propose to use additional forms of learning as post-processing step for supporting the analysis of the learnt model. We test our ideas using a simple evacuation simulation scenario.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. MOTIVATION</head><p>Methodological questions are more and more in the focus of research on agent-based simulation as the number of challenges in developing a good multi-agent simulation model are numerous. The central issue hereby concerns what behaviors the agents should exhibit so that the intended outcome is generated. What particular detail must be included, what part of the modeled behavior is not necessary? How to set the parameters involved? However, if it is not fully clear from the beginning how this local behavior should be -even if the original agents behavior can be easily observed -the development may result in a painful try and error procedure. The modeler may add, respectively remove behavioral elements, try different parameter values and test the overall outcome again and again. Such a procedure might be feasible for an experienced modeler who knows the critical starting points for modifications and is capable of using complex calibration tools for multi-agent simulation such as described in <ref type="bibr" target="#b0">[1]</ref>, but this cannot be assumed for less experienced modelers.</p><p>In this contribution we are suggesting to solve this search for the appropriate agent-level behavior by using agent learning. The vision is hereby the following procedure: the modeler starts by developing an environmental model as a part of the overall model, then, determines what the agent might be able to perceive and to manipulate and finally describes the intended outcome based on a reward function that evaluates the agents performance. The agents then use a learning mechanism for determining a behavior program that together generates the intended overall outcome in the given environment. This strat-egy might be also described as a variant of an environmentdriven strategy for developing multiagent simulations <ref type="bibr" target="#b1">[2]</ref>.</p><p>A major issue in this overall procedure refers to the selection of the particular learning agent architecture. An initial analysis of different learning techniques applicable for this problem has already been described in <ref type="bibr" target="#b2">[3]</ref>. There, Learning Classifier Systems (LCS), Feed Forward Neural Networks (FFNN) and Reinforcement Learning (Q-Learning) have been evaluated with regards to learning performance and resulting behavior representation, using the same evacuation scenario problem as in the following. In this contribution we are further investigating Reinforcement Learning for its suitability in such a learning-driven model development process, focusing more on the interpretability of the state-action mapping produced. We are not focussing on mere optimization performance, but on softer factors that define the usability of Q-Learning in the model development setting: the completeness, the complexity and the generalization capabilities of the behavior learnt.</p><p>In the next section we will review existing approaches for learning agent architectures in simulation models. This is followed by a more detailed treatment of the learning-driven methodology and a presentation of the reinforcement learning architecture. In section IV and V we describe the used testbed, the experiments conducted with it and discuss the results. The papers ends with a conclusion and an outlook to future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. LEARNING AGENTS AND SIMULATION</head><p>Adaptive agents and multi-agent learning have been one of the major focuses within distributed artificial intelligence since its very beginning <ref type="bibr" target="#b3">[4]</ref>. Many different forms of learning have shown to be successful when working with agents and multiagent systems. Obviously, we can not cover all techniques for agent learning in this paper, the following paragraph shall give a few general pointers and then give a short glance on directly related work on agent learning in simulation settings. In general our contribution is special concerning the objective of our comparison: not mere learning performance but its suitability for the usage in a modeling support context.</p><p>Reinforcement learning <ref type="bibr" target="#b4">[5]</ref>, learning automata <ref type="bibr" target="#b5">[6]</ref>, evolutionary and neural forms of learning are recurrent examples of learning techniques applied in multi-agent scenarios. Besides that, techniques inspired by biological evolution have been applied for agents in the area of Artificial Life <ref type="bibr" target="#b6">[7]</ref>, <ref type="bibr" target="#b7">[8]</ref>, where evolutionary elements can be found together with multiagent approaches. An example of a simulation of a concrete scenario is <ref type="bibr" target="#b8">[9]</ref>, in which simulated ant agents were controlled by a neural network that was designed by a genetic algorithm. Another experiment, with an approach similar to a Learning Classifier System (LCS) can be found in <ref type="bibr" target="#b9">[10]</ref>, where a rules set was used and modified by a genetic algorithm.</p><p>Although there is a wealth of publications dealing with the performance of particular learning techniques, especially reinforcement learning approaches, there are not many works focussing on the resulting behavioral model dealing with usability. An early example can be found in <ref type="bibr" target="#b10">[11]</ref>, where an evolutionary algorithm is applied to behavior learning of an individual agent in multi agent robots. Another example, from <ref type="bibr" target="#b11">[12]</ref>, describes a general approach for automatically programming a behavior-based robot. Using Q-Learning algorithm, new behaviors are learned by trial and error based on a performance feedback function as reinforcement. In <ref type="bibr" target="#b12">[13]</ref>, also using reinforcement learning, agents share their experiences and most frequently simulated behaviors are adopted as a group behavior strategy. <ref type="bibr" target="#b13">[14]</ref> compares reinforcement learning and neural networks as learning techniques in an exploration scenario for mobile robots. The authors conclude that both learning techniques are able to learn the individual behaviors, sometimes outperforming a hand coded program, and behavior-based architectures speed up reinforcement learning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. AGENT LEARNING ARCHITECTURES FOR MODEL DESIGN</head><p>The basic idea behind a learning-driven design methodology consists in the transfer of the agent behavior design and test activity from the human modeler to the simulation system. Specially in complex models, a high number of details can be manipulated. This could make a manual modeling, debugging and tuning process cumbersome, especially when knowledge about the original system or experience for implicitly bridging the micro-macro gap is missing. Using agents that learn at least parts or initial versions of their behavior might be a good idea for supporting the modeler in finding an appropriate low level behavior model. Such a learning-based approach can also be part of something as the adoption of a Living Design <ref type="bibr" target="#b14">[15]</ref> like methodology for multi-agent simulation models. Nevertheless, the first question on a way to such a learning-driven methodology, is about the selection of the appropriate learning technique -for this form of application, for a particular domain, or maybe just for a particular model. In this paper we focus on the suitability of a well know learning technique, Q-Learning, for such a modeling approach. Before we continue with focussing on this particular learning architecture, we discuss what we have identified as requirements for the applicability of an learning technique to our problem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Requirements for Learning Agent Architectures</head><p>Not all agent learning architectures are equally apt for usage in the modeling support context. There are a number of properties that an appropriate learning technique may be able to exhibit for indicating a successful application.</p><p>1) Feasibility: The learning mechanism should be able to cope with the level of complexity that is required for a valid environmental models. Thus, it should not be necessary to simplify or even to reformulate the problem just for being able to apply the learning mechanism; That means the theoretical prerequisites for applying the learning technology must be known and fulfilled by the environmental model in combination with the reward function. The learning architecture must be able to find a good-enough solution; 2) Interpretability and Model Accessibility: The mechanism should produce behavior models that can be understood and interpreted by a human modeler. The architecture shall not be a black box with a behavior that the human modeler has to trust, but must be accessible for detailed analysis of the processes involved in the overall agent system; 3) Plausibility: The mechanism in the learning architecture should be well-established and well-understood. The motivation is that its usage shall not impose additional complexity to the modeler for example in setting a number of configuration parameter. How the learning architecture works, shall be explainable to and by the modeler. There is a variety of possible learning agent architectures that might be suitable for the aim presented here and the requirements identified -as discussed in section II. We selected Q-Learning, as a Reinforcement Learning technique, as we describe it in the next paragraph.</p><p>1) Q-Learning: Q-Learning <ref type="bibr" target="#b15">[16]</ref> is a well-known reinforcement learning technique. It works by developing an actionvalue function that gives the expected utility of taking a specific action in a specific state. The agents keep track of the experienced situation-action pairs by managing the so called Q-table, that consists of situation descriptions, the actions taken and the corresponding expected prediction, called Qvalue.</p><p>Q-Learning is able to compare the expected utility of the available actions without requiring a model of the environment. Nevertheless, the use of the Q-Learning algorithm is constrained to a finite number of possible states and actions. As a reinforcement learning algorithm, it also is based on modeling the overall problem as Markov Decision Processes. Thus, it needs sufficient information about the current state of the agent for being able to assign discriminating reward. Although there are a number of extensions that improve the convergence speed of Q-Learning <ref type="bibr" target="#b4">[5]</ref>, we include the standard Q-Learning algorithms in our experiment due to its simplicity.</p><p>We suppose that Q-Learning meets the requirements for the application by providing both sufficient performance (if applicable) adaptability and also gives interpretability of the result. This interpretability is achieved by its rule-based structure (represented by the state action mapping) with a clear evaluation of those rules, by means of the Q-Value. The processing of this mapping, weighted by the provided utility value could be used as a bias for the interpretation of the rules, as an input for the behavior modeling.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. TESTBED</head><p>The scenario we use for evaluating the learning architecture approach is the same as in <ref type="bibr" target="#b16">[17]</ref> where we already describe the integration of XCS-based agents into the agent-based modeling and simulation platform SeSAm. This pedestrian evacuation scenario is a typical application domain for multiagent simulation (see <ref type="bibr" target="#b17">[18]</ref> for a real-world application). Albeit the employed scenario may be oversimplified, we expected that the relative simplicity of the scenario will enable us to evaluate the potentials of the learning technique as well as to deduce the involved challenges.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Environmental Model</head><p>The main objective of the simulation concerned the emergence of collision-free exiting behavior. Therefore, the reward and interfaces to the environment were mainly shaped to support this. In contrast to <ref type="bibr" target="#b16">[17]</ref>, we did not test a large variety of configurations as it was not the goal of this research to find an optimal one, but a more modeling-oriented evaluation of the architecture.</p><p>The basic scenario consists of a room (40x60m) surrounded by walls with one exit and a different number of columntype obstacles (with a diameter of 3.5m). In this room a number of pedestrians have to leave as fast as possible without hurting themselves during collisions. We assume that each pedestrian agent is represented by a circle with 50cm diameter and moves with a speed of 1.5m/sec. One time-step in the discrete simulation corresponds to 0.5sec. Space is continuous. We tested this scenario using 1, 5, 10 and 20 agents, and the number of obstacles was set to 10. At the beginning of a testrun, all agents were located at random positions in the upper half of the room.</p><p>All experiments alternated between explore and exploit phases. During the explore phase, the agents randomly execute an action. In exploitation trials, the best action was selected in each step. Every trial consists of 100 iteration steps. Every experiment took 1000 explore-exploit cycles.</p><p>Reward was given to the agent a immediately after executing an action at time-step t. It was computed in the following way: reward(a, t) = reward exit (a, t)+reward dist (a, t)+f eedback collision (a, t)+ f eedback damage (a, t) with reward exit (a, t) = 1000, if agent a has reached the exit in time t, and 0 otherwise; <ref type="figure">, a</ref>)) with β = 5; f eedback collision (a, t) was set to 100 if a collision free actual movement had been made, to 0 if no movement happened, and to −100 if a collision occurred; f eedback damage (a, t) was set to −1000 if a collision with column obstacle has occurred, and 0 otherwise. Together, the different components of the feedback function stress goaldirected collision-free movements. It is goal-directed because the agents are positively rewarded every time an action results in reaching the exit or getting to a state closer to the exit. Complementary, it is collision-free oriented because the agents are positively rewarded for moving without collisions and negatively rewarded every time an action results in a collision.</p><formula xml:id="formula_0">reward dist (a, t) = β × (d t (exit, a) − d t−1 (exit</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Agent Interfaces</head><p>As agent interfaces, the perceived situation and the set of possible actions have to be defined. Similar to <ref type="bibr" target="#b16">[17]</ref>, the perception of the agents is based on their basic orientation of the agent, respectively its movement direction. The overall perceivable area is divided into 5 sectors with a distinction between areas in two different distances as depicted in figure <ref type="figure" target="#fig_0">1</ref>. For every area two binary perception categories were used: the first encoded whether the exit was perceivable in this area and the second encoded whether an obstacle was presentwhere an obstacle can be everything with which a collision should be avoided: walls, columns or other pedestrians. The action set is shaped for supporting the collisionavoidance behavior. We assume that the agents are per default oriented towards the exit. Thus, the action set consists of A = {M ove Lef t , M ove SlightlyLef t , M ove Straight , M ove SlightlyRight , M ove Right , N oop, Stepback}. For any of these actions, the agent turns by the given direction (e.g. +36 degrees for M ove SlightlyRight ), makes an atomic step and orients itself towards the exit again. The combination of this action set and the perceptions of the agents represents an intentional simplification of the problem, as we implicitly represent the orientation task in the actions, in order to have a MDP. This simplification allows concentrating the learning on the collision avoidance, facilitating the learning process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Architecture Configuration</head><p>The testbed was implemented in the visual modeling and simulation platform SeSAm (www.simsesam.de). The Q-Learning could be implemented by means of the standard highlevel behavior language in SeSAm.</p><p>It was not our objective to find the optimal configuration for the tested architecture in the given scenario, we will not give a discussion of the effects of different parameter settings on the learning outcome should not be necessary. Clearly, we tested a number of configuration for finding a reasonable configuration. This is also true for the the appropriate overall configuration including different numbers of obstacles, sizes of scenarios or the particular numbers of the reward function.</p><p>In the context of this paper, we assume an initial Q-value of 0 for all untested state-action pairs. We set the learning rate to 0.5 and the discount factor to 0. It means that the agents' actions are selected based on recent experiences and not taking into consideration the future rewards (only the best action for the current state), respectively. This is another intentional simplification for the problem, as the agents don't need to maximize future rewards.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. EXPERIMENTS AND RESULTS</head><p>In this section we analyze the results of the simulations, first with respect to learning performance showing that the learning technique is actually applicable to the test scenario, but then we focus on the analysis of what the agents actually did learn.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Performance Evaluation</head><p>The metric used for evaluating learning performance is the number of collisions. The time to reach the exit does not vary significantly, as a collision is not influencing the behavior directly, but indirectly via the reward the agent got. The collisions, with other pedestrians or obstacles, do not impose any effect on future movement. They only count as negative rewards. Obviously in the early stages, the agents don't have enough experience to learn from, and therefore a higher number of collisions is expected.</p><p>Table <ref type="table" target="#tab_0">I</ref> presents the mean number of collisions for each tested situation. The values are aggregated only after the first 50 explore-exploit cycles for avoiding the inclusion of any warm-up data. The mean and deviation over the results of the different exploit cycles are given. Despite of having the runs repeated, we did not give means and standard deviations over different runs as currently the number of repetitions is too low. Clearly the number of collisions increases with the number of agents and obstacles. Figure <ref type="figure" target="#fig_2">2</ref> illustrates the adaptation speed by depicting the number of collisions over time for an exemplary run with 5 agents and 10 obstacles. We can see that the number of collisions decreases fast in the beginning, but then the behavioral knowledge converges quite fast. After 50 cycles, there is no further improvement.</p><p>To have a better illustration of the learning process, we show in figure <ref type="figure" target="#fig_3">3</ref>    Alternating between explore and exploit trials plays an important role in the performance outcome. The agents must explore the possible actions set in order to maximize their experience in terms of the route to be chosen. At the end we can see the emergence of a collision-avoidance behavior.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Behavior Learning Outcome</head><p>In this section we are interested in analyzing the rules learned by the Q-Learning process in terms of the complexity of the resulting rule structure and potential use as source of inspiration in a modeling process.</p><p>In the following analysis we will examine two simulation scenarios: 1 agent and 10 obstacles; and 5 agent and 10 obstacles. In both cases we consider the outcome of one agent from an exemplary simulation.</p><p>1) Raw Q-Learning Rules: The rules generated by the learning process can be determined by taking for every situation the action with the highest q-value as it is done in the exploit phases. Depending on the situation, there might be no action with a positive q-value. The rules with a Q-Value of zero represent situation-action pairs that have not been tested during the simulation. Figure <ref type="figure" target="#fig_4">4</ref> depicts two out of 12 rules with the highest Q-value on the 1-agent scenarios. Figures <ref type="figure">5 and 6</ref> show the distribution of the reward prediction, i.e. the Q-value, for the complete rules set for the single agent, respectively a randomly selected exemplary agent from a simulation with 5 agents. One can see that there are only a few rules with a high Q-value.</p><p>It is obvious that the Q-value alone cannot be a selection criteria for rules forming a behavior model as the ones with the highest Q-value naturally contain situations where the agent directly perceives the exit. It is also possible to see that the agent in this case has a majority of rules with Q-Value 0, which means that a lot of state-action mappings have not been tested. This is not case for the simulation with 1 agent and 10 obstacles, as seen in figure <ref type="figure">5</ref>, where the majority of rules have been tested. The agent has explored more, resulting in a more elaborated representation of the behavior. This difference is caused by the fact that the simulation with only 1 agent presented a smaller set of possible states to be tackled due to the simplicity of the interactions just with static obstacles.</p><p>Another important aspect about the agents' experience is that, since the agents are randomly positioned in the scenario at the beginning of each trial, the rules are not biased by a fixed position, so the rules set is more elaborated than it would be if they had to know only one best way to get to the exit.</p><p>The agent from the simulation with only one agent has a positive rules set -consisting of rules with positive, non-zero q-values -of 229 rules, while the agent from the simulation Fig. <ref type="figure">5</ref>.</p><p>Q-Learning value distribution for an exemplary agent from a simulation with 1 agents and 10 obstacles Fig. <ref type="figure">6</ref>.</p><p>Q-Learning value distribution for an exemplary agent from a simulation with 5 agents and 10 obstacles with 5 agents has a number of 1507 true positive rules. This can be seen as an effect of the interaction with other agents, generating different situation to be visited, specially when it gets closer to the exit, the situation becomes more dense and the agents must avoid the collisions, and get to the exit.</p><p>Figures <ref type="figure" target="#fig_5">7 and 8</ref> show the distribution of these final rules over the possible actions, for the cases with 1 and 5 agents respectively. We can see the effect of the initial random positioning in each trial. We have a balanced distribution for the rules determining going to the left or right, which makes sense, since the agent must learn to find its way out of the scenario no matter where it has started. The majority of the rules indicate the M ove Straight action. This comes from the fact that the agent is reoriented towards the exit after the execution of any action. Unless the agent needs to avoid a collision, M ove Straight is the best action to choose.</p><p>We can identify the collision-avoidance behavior focussing on an exemplary element of the perceptions of the agent (1 agent scenario in this case). Considering action M ove Right and perception ObstacleImmediatelyRight, we see that there is a larger number of rules indicating false in this perception in all rules with the M ove Right action, see figure <ref type="figure" target="#fig_6">9</ref>. 2) Processing the rules: As the set of rules with truly positive Q-value in all scenarios is far too large to be transparently presented to a human expert, we suggest to use a post-processing step for improving the analysis of the rule set on a detailed level. As there are a number of candidates that may be suitable for generalizing the rule set in a way that all learnt rules are captured in a compact form.</p><p>For this aim, we tested three different machine learning algorithms -mainly classification learners -using all rules with non-zero, positive Q-Value: K Nearest Neighbors (KNN) <ref type="bibr" target="#b18">[19]</ref>, CART Decision Trees <ref type="bibr" target="#b19">[20]</ref> and the CN2 rule inductor <ref type="bibr" target="#b20">[21]</ref>. The K-Nearest Neighbors is arguably one of the simplest machine learning algorithms, while Decision Trees and CN2 are of particular interest to this work because of the interpretability provided by their resulting representation of the knowledge captured in the training set. We used KNN with a K value of 5 for the experiments. The Decision Tree is a simple CART with Gini's index of impurity for node splitting. CN2 algorithm uses the Laplace method for rule quality estimation.</p><p>As mentioned above, the results of this post-processing step have to be evaluated with two criteria: How well they capture the given rule set and how good they are able to generalize the rule set for bringing the rules. The first can be measured in terms of classification accuracy, the second is the generalization and compactness of the resulting behavior description.</p><p>a) Classification Accuracy: Table <ref type="table" target="#tab_0">II</ref> shows the classification accuracy for the above mentioned algorithms, both in the 1 agent and 5 agents experiments, using 10 fold cross validation in the training set. Table <ref type="table" target="#tab_0">III</ref> shows the average classification accuracy, when model built from one agent's experience is tested with another agent's experience: We can see that the classification accuracy for the case with 1 agent outperformed the case with 5 agents. This is clearly an effect of the exploit- explore tradeoff. The agent from the 1 agent simulation has a lower number of states to visit during the simulation, and this reflects on the accuracy of the rules as they are tested more times and converge faster to the optimal solution (state-action mapping). The agents from the 5 agents scenario have a larger set of states that potentially may occur, reflected also in the number of rules. This requires more cycles to converge to an optimal solution. While they are all good models -as providing a solution to the problem (as seen in section V-A) -they can not be generalized to other good solutions (other agents' experiences). The convergence of the solution, which determines its generalization to the problem is therefore a function of the configuration of the learning, and more important, a function of the explore-exploit distribution, the number of agents and the set of perceptions and actions, that determine the size of the state-action mapping.</p><p>Figure <ref type="figure" target="#fig_7">10</ref> shows the confusion matrix for the decision tree learnt from the simulation with 1 agent, testing with cross-validation: Rows represent the expected class (action) from the classification model, as presented in the Q-Learning mapping and columns represent the classification determined by the decision tree. We highlight the number of correctly classified instances. The majority of misclassified instances falls on cases where different actions could result in similar, good rewards. For instance, there is a common misclassification among the actions M ove Straight , M ove SlightlyRight and M ove SlightlyLef t . This comes from the fact that when the agent is facing the exit, all these three actions will maximize the reward (represented by reaching the exit). The second dimension is to be analyzed, with regards to the improving the representation of the behavior for a human modeler. We assumed that the best result could be produced by the decision tree learner. However, the CART decision tree learner was not able to produce an understandable, compact model in this problem. In the case of 1 agent the tree has 117 nodes and 59 leaves. For the case with 5 agents the tree has 1637 nodes and 819 leaves. For illustration, figure <ref type="figure" target="#fig_8">11</ref> outlines a part of the tree generated from the experience of the agent in the case of 1 agent and 10 obstacles. In this figure, the codes represented in the rule stand for different agent perceptions. For instance EIA means Exit Immediately Ahead. The post-processing result provided by the CN2 algorithm is better than the decision tree learner: For example, figure <ref type="figure" target="#fig_9">12</ref> shows the best 3 (from a total of 29) rules created by the CN2 algorithm, from the training set of non-zero, positive rules in the case of 1 agent. CN2 was able to reduce the rules representation from 229 to 29 rules. The rules can be evaluated by their quality, size and coverage. Here, as for the decision trees, the perceptions are represented by codes. The rules are clear and concise. Because of that, CN2 can be seen as a step further towards interpretability. In principle, a set of rules for the agents' producing a solution to the evacuation problem could be learnt -using a technique that results in human-readable rules. However, from these rules that were found by the Q-Learning, we could not construct a behavior representation that fully resembles the knowledge coded in the rule set, nor derive a representation of the rules that a human modeler could easily oversee. On the other side, the scenario is so simple, that it is possible to directly program a set of about 10 rules exhibiting almost optimal behavior.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. CONCLUSION AND FUTURE WORK</head><p>In this paper we presented our investigation towards a learning-driven methodology by evaluating Reinforcement Learning as an agent learning architecture. The main motivation for this work is investigate the possibilities of creating a learning-based methodology for the design of a multiagent simulation model avoiding a time consuming trial and error process when determining the details of agent behavior.</p><p>In a small evacuation scenario, we showed that the employed learning technique can produce plausible behavior in an agent-based simulation. However, the interface between the learning technique and the agent environment is by no means trivial. The environmental model, feedback function, perception, and action sets are critical. There are also ideas on the analysis of the different architecture that may improve the usability of the learned behavior model.</p><p>Using a learning technique transfers the basic problem from direct behavior modeling to designing the agent interface and environment reward computation. To do so successfully, a general understanding of scenario difficulties and the available machine learning techniques is necessary. An example is the fundamental requirement of the Markov property in reinforcement-based approaches <ref type="bibr" target="#b4">[5]</ref> -in our case Q-learning. Provided perceptions need to contain sufficient information to be able to learn the expectation of immediate and future possible reward accurately.</p><p>The standard implementation of Q-Learning, used in this paper, offers us only the estimated reward for each possible condition-action pair. For more intelligent interpretation of the rule set -that is in its raw state without any form of generalization -we decide to use three different machine learning algorithms: K-Nearest Neighbors, Decision Trees and CN2 rule inductor. The resulting, full behavior model for the Q-Learning is only partially helpful as a guidance for modeling in this case. Generalization still needs to be improved, as a part of the learning process or as a post-processing step. This could be achieved by using more flexible classification techniques, such as multi label classification, since in this process we have to deal with multiple good solutions. Another important aspect to be considered here is the tradeoff between explore and exploit, and how this scales to the complexity of the problem, in terms of the number of agents and the size of the stateaction mapping in a multiagent simulation. This is a relation yet to be analyzed in detail level.</p><p>There are admittedly many more challenging application scenarios than an evacuation scenario where all agents have the same goal, the behavior repertoire is quite restricted, and there is no direct communication between agents. In such advanced environments, the learning and environment design will certainly pose additional challenges.</p><p>Our next steps include testing other learning techniques to investigate their performance, outcome and appropriateness for this methodology. A short analysis of Learning Classifier Systems and Neural Networks can be found in <ref type="bibr" target="#b2">[3]</ref>. We plan to also test approaches such as evolutionary programming support vector machines, and other forms of reinforcement learning, respectively learning automata. An alternative for the post-processing step worth testing could be multi label classification <ref type="bibr" target="#b21">[22]</ref>, where we could gather the experience from different agents and find different best actions for a given situation, increasing generalization.</p><p>Besides that, we will pursue further self-modeling agent experiments. We are considering the application of the learning technique in other, more complex scenarios, such as an evacuation of a train with about 500 agents, complex geometry with exit signs and time pressure. We are also interested in a scenario where cooperation / collaboration is required, in order to investigate the possible emergence of the cooperation in the agent model, through the learning process. This experimentation should consider situations with and without direct communication between the agents.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Agent perception sectors</figDesc><graphic coords="3,325.39,308.09,200.22,92.12" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>Figure2illustrates the adaptation speed by depicting the number of collisions over time for an exemplary run with 5 agents and 10 obstacles. We can see that the number of collisions decreases fast in the beginning, but then the behavioral knowledge converges quite fast. After 50 cycles, there is no further improvement.To have a better illustration of the learning process, we show in figure3the trajectories of the agents in exploit phases after a) 10, b) 100, c) 500 and d) 1000 exploit trials. In this figure we consider the situation with 5 agents and 10 obstacles. We can see the progress of adaptation with more and more collision-free and goal-directed movement. Experience hereby does not just mean positive reinforcement. Even if the agents</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Development of the number of collisions for an exemplary run with 5 agents and 10 obstacles</figDesc><graphic coords="4,303.29,88.09,245.22,181.62" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Exemplary trajectories during exploit trials, for 5 agents and 10 obstacles</figDesc><graphic coords="4,303.29,354.09,245.22,200.72" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. Two out of 12 rules with the highest Q-value for the agent in the 1-agent scenario.</figDesc><graphic coords="5,81.69,217.99,175.72,182.12" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Fig. 7 .</head><label>7</label><figDesc>Fig. 7. Rules distribution over the actions for an exemplary agent from a simulation with 1 agent and 10 obstacles</figDesc><graphic coords="6,94.39,212.49,150.22,82.92" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Fig. 9 .</head><label>9</label><figDesc>Fig. 9. Frequency of rules with perception ObstacleImmediatelyRight as false (left bar) and true (right bar) for action MoveRight</figDesc><graphic coords="6,350.39,88.09,150.22,167.52" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Fig. 10 .</head><label>10</label><figDesc>Fig. 10. Confusion matrix for the decision tree in the simulation with 1 agent and 10 obstacles b) Compactness and Readability of the Learnt Behavior Representation:The second dimension is to be analyzed, with regards to the improving the representation of the behavior for a human modeler. We assumed that the best result could be produced by the decision tree learner. However, the CART decision tree learner was not able to produce an understandable, compact model in this problem. In the case of 1 agent the tree has 117 nodes and 59 leaves. For the case with 5 agents the tree has 1637 nodes and 819 leaves. For illustration, figure11outlines a part of the tree generated from the experience of the agent in the case of 1 agent and 10 obstacles. In this figure, the codes represented in the rule stand for different agent perceptions. For instance EIA means Exit Immediately Ahead.</figDesc><graphic coords="7,47.29,486.29,245.22,99.12" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Fig. 11 .</head><label>11</label><figDesc>Fig. 11. A branch of the decision tree for the case with 1 agent and 10 obstacles</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_9"><head>Fig. 12 .</head><label>12</label><figDesc>Fig. 12. CN2 best three rules for the simulation with 1 agent and 10 obstacles</figDesc><graphic coords="7,47.29,171.89,245.22,101.82" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>TABLE I MEAN</head><label>I</label><figDesc>NUMBER OF COLLISIONS PER RUN -ROWS REPRESENT THE NUMBER OF AGENTS AND COLUMN THE NUMBER OF OBSTACLES.</figDesc><table><row><cell></cell><cell>10</cell></row><row><cell>1</cell><cell>0.01 ±0.23</cell></row><row><cell>5</cell><cell>1.39 ±1.78</cell></row><row><cell>10</cell><cell>6.66 ±3.88</cell></row><row><cell>20</cell><cell>25.17 ±8.77</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Approaches for resolving the dilemma between model structure refinement and parameter calibration in agentbased simulations</title>
		<author>
			<persName><forename type="first">M</forename><surname>Fehler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Puppe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAMAS &apos;06: Proceedings of the 5th international joint conference on Autonomous agents and multiagent systems</title>
				<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="120" to="122" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Multiagent simulation model design strategies</title>
		<author>
			<persName><forename type="first">F</forename><surname>Klügl</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">MAS&amp; S Workshop at MALLOW 2009</title>
		<title level="s">ser. CEUR Workshop Proceedings</title>
		<meeting><address><addrLine>Turin, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2009">Sept. 2009. 2009</date>
			<biblScope unit="volume">494</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Agent architectures for a learning-driven modeling methodology in multiagent simulation</title>
		<author>
			<persName><forename type="first">R</forename><surname>Junges</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Klügl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 8th German Conference on Multiagent System Technologies (to appear)</title>
				<meeting>the 8th German Conference on Multiagent System Technologies (to appear)</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
	<note>MATES 2010</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Adaptation and learning in multi-agent systems: Some remarks and a bibliography</title>
		<author>
			<persName><forename type="first">G</forename><surname>Weiß</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IJCAI &apos;95: Proceedings of the Workshop on Adaption and Learning in Multi-Agent Systems</title>
				<meeting><address><addrLine>London, UK</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="1996">1996</date>
			<biblScope unit="page" from="1" to="21" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Reinforcement Learning: An Introduction</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Sutton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">G</forename><surname>Barto</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1998">1998</date>
			<publisher>MIT Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Learning automata as a basis for multi agent reinforcement learning</title>
		<author>
			<persName><forename type="first">A</forename><surname>Nowe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Verbeeck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Peeters</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="page" from="71" to="85" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Adami</surname></persName>
		</author>
		<title level="m">Introduction to artificial life</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag New York, Inc</publisher>
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">The evolution of strategies for multi-agent environments</title>
		<author>
			<persName><forename type="first">J</forename><surname>Grefenstette</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Adaptive Behavior</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="65" to="90" />
			<date type="published" when="1987">1987</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Antfarm: Towards simulated evolution</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Collins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">R</forename><surname>Jefferson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Artificial Life II</title>
				<imprint>
			<publisher>Addison-Wesley</publisher>
			<date type="published" when="1991">1991</date>
			<biblScope unit="page" from="579" to="601" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Experiments in learning prototypical situations for variants of the pursuit game</title>
		<author>
			<persName><forename type="first">J</forename><surname>Denzinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fuchs</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings on the International Conference on Multi-Agent Systems (ICMAS-1996</title>
				<meeting>on the International Conference on Multi-Agent Systems (ICMAS-1996</meeting>
		<imprint>
			<publisher>MIT Press</publisher>
			<date type="published" when="1995">1995</date>
			<biblScope unit="page" from="48" to="55" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Simulation for behavior learning of multi-agent robot</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Maeda</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Intelligent and Fuzzy Systems</title>
		<imprint>
			<biblScope unit="page" from="53" to="64" />
			<date type="published" when="1998">1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Automatic programming of behaviorbased robots using reinforcement learning</title>
		<author>
			<persName><forename type="first">S</forename><surname>Mahadevan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Connell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<biblScope unit="issue">2-3</biblScope>
			<biblScope unit="page" from="311" to="365" />
			<date type="published" when="1992">1992</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Learning enabled cooperative agent behavior in an evolutionary and competitive environment</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E.-K</forename><surname>Kang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural Computing &amp; Applications</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page" from="124" to="135" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Performance comparison of relational reinforcement learning and rbf neural networks for small mobile robots</title>
		<author>
			<persName><forename type="first">R</forename><surname>Neruda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Slusny</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Vidnerova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2008 Second International Conference on Future Generation Communication and Networking Symposia</title>
				<meeting>the 2008 Second International Conference on Future Generation Communication and Networking Symposia<address><addrLine>Washington, DC, USA</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="29" to="32" />
		</imprint>
	</monogr>
	<note>FGCNS &apos;08</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Living Design for Open Computational Systems</title>
		<author>
			<persName><forename type="first">J.-P</forename><surname>Georg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Picard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-P</forename><surname>Gleizes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Glize</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Workshop on Theory And Practice of Open Computational Systems (TAPOCS) at 12th IEEE International Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE&apos;03)</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Fredriksson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Ricci</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Gustavsson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Omicini</surname></persName>
		</editor>
		<meeting><address><addrLine>Linz, Austria</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="2003-06">June 2003</date>
			<biblScope unit="page" from="389" to="394" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Q-learning</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J C H</forename><surname>Watkins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dayan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="279" to="292" />
			<date type="published" when="1992">1992</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Agent learning instead of behavior implementation for simulations -a case study using classifier systems</title>
		<author>
			<persName><forename type="first">F</forename><surname>Klügl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hatko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">V</forename><surname>Butz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 6th German Conference on Multiagent System Technologies</title>
				<meeting>the 6th German Conference on Multiagent System Technologies<address><addrLine>Berlin / Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="111" to="122" />
		</imprint>
	</monogr>
	<note>MATES 2008</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Agent-based pedestrian simulation of train evacuation integrating environmental data</title>
		<author>
			<persName><forename type="first">F</forename><surname>Klügl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Klubertanz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Rindsfüser</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Artificial Intelligence, 32nd Annual German Conference on AI</title>
		<title level="s">Proceedings, ser. Lecture Notes in Computer Science</title>
		<meeting><address><addrLine>Paderborn, Germany</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2009">September 15-18, 2009. 2009</date>
			<biblScope unit="volume">5803</biblScope>
			<biblScope unit="page" from="631" to="638" />
		</imprint>
	</monogr>
	<note>KI 2009</note>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">Machine Learning</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Mitchell</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1997">1997</date>
			<publisher>McGraw-Hill</publisher>
			<pubPlace>New York</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Classification and Regression Trees</title>
		<author>
			<persName><forename type="first">L</forename><surname>Breiman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Friedman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Stone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Olshen</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1984-01">January 1984</date>
			<publisher>Chapman and Hall/CRC</publisher>
		</imprint>
	</monogr>
	<note>1st ed</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">The cn2 induction algorithm</title>
		<author>
			<persName><forename type="first">P</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Niblett</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">MACHINE LEARNING</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="261" to="283" />
			<date type="published" when="1989">1989</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Multi-label classification: An overview</title>
		<author>
			<persName><forename type="first">G</forename><surname>Tsoumakas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Katakis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int J Data Warehousing and Mining</title>
		<imprint>
			<biblScope unit="volume">2007</biblScope>
			<biblScope unit="page" from="1" to="13" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
