<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Dynamic scheduling in Petroleum process using reinforcement learning</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Nassima</forename><surname>Aissani</surname></persName>
							<email>aissani.nassima@yahoo.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Oran University</orgName>
								<address>
									<addrLine>BP 1524 El M&apos;nouer</addrLine>
									<settlement>Oran</settlement>
									<country key="DZ">Algeria</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Bouziane</forename><surname>Bedjilali</surname></persName>
							<email>bouzianebeldjilali@yahoo.fr</email>
							<affiliation key="aff0">
								<orgName type="institution">Oran University</orgName>
								<address>
									<addrLine>BP 1524 El M&apos;nouer</addrLine>
									<settlement>Oran</settlement>
									<country key="DZ">Algeria</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Dynamic scheduling in Petroleum process using reinforcement learning</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">B926E455C24776649075EB7A4370FAF8</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T00:20+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>reactive scheduling</term>
					<term>reinforcement learning</term>
					<term>petroleum process</term>
					<term>multi-agent system</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Petroleum industry production systems are highly automatized. In this industry, all functions (e.g., planning, scheduling and maintenance) are automated and in order to remain competitive researchers attempt to design an adaptive control system which optimizes the process, but also able to adapt to rapidly evolving demands at a fixed cost. In this paper, we present a multi-agent approach for the dynamic task scheduling in petroleum industry production system. Agents simultaneously insure effective production scheduling and the continuous improvement of the solution quality by means of reinforcement learning, using the SARSA algorithm. Reinforcement learning allows the agents to adapt, learning the best behaviors for their various roles without reducing the performance or reactivity. To demonstrate the innovation of our approach, we include a computer simulation of our model and the results of experimentation applying our model to an Algerian petroleum refinery.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Current oil and gas market trends, characterized by great competitiveness and increasingly complex contradictory constraints, have pushed researchers to design an adaptive control system that is not only able to react effectively, but is also able to adapt to rapidly evolving demands at a fixed cost. The system does this by using the available resources as efficiently as possible to optimize this adaptation. <ref type="bibr" target="#b3">[4]</ref> presented an analysis of the needs of production systems, highlighting the advantages of adopting a self-organized heterarchical control system. The term, heterarchy, is used to describe a relationship between entities on the same hierarchical level <ref type="bibr" target="#b5">[6]</ref>. Initially proposed in the field of medical biology, it was then adapted for several other domains <ref type="bibr">[9; 10; 7]</ref>. In the multi-agent domain, the term, heterarchy, is relatively close to the concept of "distribution", as used in "distributed systems". However, from our point of view, the fact that the decisional capacities are distributed does not mean that the multi-agent system is organized heterarchically, even though this is often the case <ref type="bibr">[15;17]</ref>. Nonetheless, the heterarchic organization of distributed systems is the assumption that we make in this paper. From our point of view, this assumption is justified by the system dynamics and the volatility of the information, which make a purely or partially hierarchical approach inappropriate for creating an effective reactive system <ref type="bibr" target="#b3">[4]</ref>.</p><p>In this paper, we focus on the dynamic control of complex manufacturing systems, such as those found in the petroleum industry. In this industry, all functions (e.g., planning, scheduling and maintenance) and resources (e.g., turbines, storage systems) are automated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">BRIEF DESCRIPTION OF UNIT3100 IN RA1Z REFINERY</head><p>This unit is designed to produce oil from the base oil treated in the units HB3 and HB4 and imported additives, the base oil is received in Tank TK2501 to TK2506. Each docking Tank stock defined grade of oil (SPO, SAE10-30, BS) (Production of 132,000 t / year for an amount of 10% additives) if the type of oil stored in a tank must be changed, the tank must first be rinsed for hours which is often avoided. This unit produces two major oil: engine oils 81% of the production (gasoline, diesel, transmission oils) and industrial oils (hydraulic (TISK), turbines (torba), spiral (Fodda), compressor (Torrada) and various oils). To do this, two methods are used: continuous mixing (mixing line) and mixing in discontinuous (batch) (see Figure <ref type="figure">1</ref>). In this article we focus on the mixing line. To produce finished oil, a recipe must be applied:</p><p>X1% Hb1 + X2% Hb2+ X3% Additif1</p><p>Where : Xi is the rate and HBi is the base oil.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fig. 1. Unite 3100 model</head><p>The mixing line its base oil from the docking Tanks, which produce this decade plan (see figure <ref type="figure">2</ref>):</p><p>In this paper, we aim to develop an adaptive control system for Unit3100 which will produce dynamically efficient scheduling solution using resources in optimal way.</p><p>We consider each resource and Oil in tank as a decisional entity, and we model them as agents.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fig. 3. Production plan</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">STATE-OF-THE-ART</head><p>We conducted a state-of-the-art review of the dynamic scheduling problem in the literature. This section highlights the studies that reflected our point of view.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Dynamic scheduling</head><p>In manufacturing control, scheduling is the most important function. In this paper, we focus on dynamic scheduling.</p><p>[5] Have classified dynamic scheduling into three categories: predictive, proactive, and reactive. The first, predictive, assumes a deterministic environment. Predictive solutions call for a priori off-line resource allocation. However, when the environment is uncertain, some data (e.g., the actual durations) only becomes available when the solution is being executed. This kind of situation requires either a proactive or reactive solution. Proactive solutions are certainly able to take environmental uncertainties into account. They allocate the operations to resources and define the order of the operations, though, because the durations are uncertain, without precise starting times. However, such solutions can only be applied when the durations of the operations are stochastic and the states of the resources are known perfectly (e.g. stochastic job-shop scheduling) <ref type="bibr" target="#b2">[3]</ref>. The third type of dynamic scheduling, reactive, is also able to deal with environmental uncertainties, but is better suited for evolving processes.</p><p>Reactive solutions call for on-line scheduling of resources. In fact, the resource allocation process evolves, making more information available and thus allowing decisions to be made in real-time <ref type="bibr">[16; 11; 5; 1]</ref>. Naturally, a reactive solution is not a simple objective function, but instead a resource allocation policy (i.e., a state-action mapping) which controls the process. In this paper, we focus exclusively on reactive solutions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Reinforcement learning</head><p>Over the last few decades, scheduling researchers were inspired by artificial intelligence whose methods were based exclusively on operational research algorithms of exponential complexity. Taking into account performance effectiveness and efficiency, which means optimizing several criteria, will increase problem complexity even more. Artificial intelligence has allowed such complex problems to be solved, yielding satisfactory, if not always optimal, solutions.</p><p>[9] used genetic algorithms (GA) to adapt the decision strategies of autonomous controllers. Their control agents use pre-assigned decision rules for a limited amount of time only, and obey a rule re-placement policy that propagates the most successful rules to the subsequent populations of concurrently operating agents. However, GA do not provide satisfactory solutions for reactive scheduling. Therefore, a reactive technique must be integrated into GA to allow the system to be controlled in real time.</p><p>Reinforcement learning (RL) might be an appropriate way to obtain quasi-real-time solutions that can be improved over time. Reinforcement learning is learning by trial and error dedicated to agents learning. In this paradigm, agents can perceive their individual states and perform actions for which numerical rewards are given. The goal of the agents is thus to maximize the total reward they receive over time. <ref type="bibr" target="#b7">[8]</ref> used reinforcement learning to optimize resource use in a very expensive electric motor production system. Such systems are characterized by a variety of products that are produced on re-quest, which requires a great deal of flexibility and adaptability. The assembly units must be autonomous and modular, which makes performance control and development difficult. <ref type="bibr" target="#b7">[8]</ref> considered these units as insect colonies able to organize themselves to carry out a task. Self-organization can reduce the number of resources used, allowing production risk problems to be solved more easily.</p><p>The most used reinforcement learning algorithm is Q-learning. <ref type="bibr" target="#b17">[18]</ref> extended this algorithm by using a reward function based on EMLT (Estimated Mean LaTeness) scheduling criteria, which are effective though not efficient. <ref type="bibr" target="#b1">[2]</ref> pro-posed an intelligent agent-based scheduling system. They employed the Q-III algorithm to dynamically select dispatching rules. Their state determination criteria were the queue's mean slack time and the machine's buffer size. These authors take advantage of domain knowledge and experience in the learning process.</p><p>But in this paper, we are exploring a more developed algorithm "SARSA algorithm" in a heterarchical organisation of agents. In conclusion, we are trying to experiment reinforcement learning by using SARSA algorithm to conceive an adaptative and reactive manufacturing control system for petroleum process based on heterarchical multi-agent architecture. In the next section, we will present our system architecture and motivating our choices.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">THE PROPOSED CONTROL SYSTEM</head><p>A multi-agent system is a distributed system with localized decision-making and interaction among agents. An agent is an autonomous entity with its own value system and the means to communicate with other such entities. For a general survey of the application of multi-agent systems in manufacturing, see the review by <ref type="bibr" target="#b0">[1]</ref>. In order to develop multi-agent system with a reactive decision capability in an uncertain environment, they may be modelled as Markov Decision Process (MDP) <ref type="bibr" target="#b11">[12]</ref>. And to improve the system performances and learn optimal policy in Markov environment, If the transition function T (modelling the system's evolution from state to state) is unknown while an objective can be identified a learn-by-trial process such as RL <ref type="bibr">[12;13]</ref> can be designed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">The proposed manufacturing control system</head><p>We consider that a petroleum refinery exists in a dynamic, uncertain and unpredictable environment, since it is subject to internal stress (e.g., production risks) and external constraints (e.g., forced markets, unexpected orders). According to <ref type="bibr" target="#b11">[12]</ref>, the decisions made in such environments involve Markov decision processes (MDP). Clearly, in such a Markovian context, it is necessary to consider the transition function T, modelling the system's evolution from state to state as an unknown. According to <ref type="bibr" target="#b11">[12]</ref> and <ref type="bibr" target="#b12">[13]</ref>, a learn-by-trial process, such as reinforcement learning, should be used determine the optimal policy. This modelling approach is widespread. Figure <ref type="figure">1</ref> shows the main functions embedded in each agent.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">SARSA (Stat, Action, Reward, new Stat, new Action) algorithm to resolve dynamic scheduling problem</head><p>An MDP is a tuple &lt; S,A,T,R &gt;, where S is a set of problem states, A is a set of actions, T(s, a, s') [0, 1] is a function defining the probability that taking action a in state s results in a transition to state s', and R(s, a, s') R defines the reward received after such a transition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Fig.1. MDP RL improvement of on-line scheduling Performances</head><p>If all the parameters of the MDP are known, an optimal policy can be found by dynamic programming. If T and R are initially unknown (which is commonly the case when considering industrial case studies), Reinforcement learning (RL) methods can learn an optimal policy by direct interaction with the environment. RL is learning to act by trial and error. Agents perceive their individual states and perform actions for which numerical rewards are given. The goal of the agents is thus to maximize the total reward received over time. This technique is often used in robotics, in order to teach a robot the behavior to achieve its goals and to overcome obstacles.</p><p>The SARSA algorithm is used to learn the function Q π (s, a), defined as the expected total discounted return when starting in state s, executing action a and thereafter using the policy π to choose actions:</p><formula xml:id="formula_0">( , ) ( , , )[ ( , , ) ( , ( ))] s Q s a T s a s R s a s Q s s π π γ π ′ ′ ′ ′ ′ = + ∑<label>(1)</label></formula><p>The discount factor γ ∈ [0,1] determines the relative importance of short term and long term rewards. For each s and a we store a floating point number Q(s,a) for the current estimate of Q π (s,a).</p><p>As experience tuples &lt; s,a,r,s',a' &gt; are generated through interaction with the environment, the table of Q-values is updated using the following rule:</p><formula xml:id="formula_1">( , ) ( 1) ( , ) ( ( , )) Q s a Q s a r Q s a α α γ ′ ′ = − + +<label>(2)</label></formula><p>The learning rate α ∈ [0,1] determines how much the existing estimate of Q π (s,a)</p><p>contributes to the new estimate.</p><p>If the agent's policy tends towards greedy choices as time passes, the Q(s,a) values will eventually converge to the optimal value function Q*(s,a). To achieve this, we use a Boltzman probability which determines the probability of choosing a random action.</p><p>Figure <ref type="figure">2</ref> shows the steps of the SARSA algorithm Fig. <ref type="figure">2</ref>. The SARSA algorithm</p><p>In our case, this algorithm will make the Resource Agent learn its action policy π, which in turn makes it able to choose the best action for each state (accept task/request, or not). This algorithm works with the following data:</p><p>State parameters are the current time t ∈0…T; the inventory of pmps p 1 … p n and their states Sp 1 … Sp n (e.g., maximum capacity, feeding, receiving); the list of Storage Tanks T 1 … T m , and their states ST 1 …ST m (e.g., Capacity). Action concerns the reception or not of the product, stop or start pumping.... Reward function assigns no reward to most of the states and positive rewards to a specific goal state. For more precision and to obtain a proper convergence, the reward function is a state combination engendered by an action. One idea was to take into account the volum in tanks and (C i ) and feeding and uploading stream (Fd i ) (Ud i ) in the reward function:</p><formula xml:id="formula_2">1 if C ( ) max 0 if C ( ) min 1 if C ( ) min i i Part Ag i i i i t C R t C t C − = ⎧ ⎪ = ≥ ⎨ ⎪ − &lt; ⎩ 6 3 i=1 Re 6 i=1 1 if 1500 m / 1 if 0 i source Ag i Fd h R Fd − ⎧ = ⎪ ⎪ = ⎨ ⎪ − = ⎪ ⎩ ∑ ∑</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Multi-agent interaction</head><p>As shown in Figure <ref type="figure">3</ref>, the MCSR (Manufacturing Control System using Reinforcement learning) architecture consists of "resource agents" for the pumps, "parts agents" for the tanks containing oil and an "observer agent" to control the process.</p><p>Based on agents ha knowledg reference containin the state v The idea is roughly the following: a part agent has a task request that it proposes to resource agents, and then the resource agents give their propositions. The part agent chooses the best proposition and establishes the contract. A detailed illustration of the agent interaction is provided in figure <ref type="figure">4</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">IMPLEMENTATION AND EXPERIMENTS</head><p>Our model was simulated in the Borland Jbuilder environment because of its potential for facilitating communication and thread programming and because of its compatibility with the chosen MADKIT platform architecture for SMA development (visit http:// www.madkit.org/downloads). One of the advantages of the reinforcement learning algorithms is that they allow evaluation during learning. To permit this evaluation, we selected the following criteria.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Description of the process &amp; constraints</head><p>A petroleum refinery is subjected to many operational constraints. Operational constraints include the requirement that only one tank at a time can receive oil, but several can simultaneously feed mixing line, and another that states a tank cannot receive and send oil at the same time. Problem inputs include the base oil arrival schedule, which describes the volumes and qualities of the base oils and additives that will be received in the refinery during the desired time horizon; the finished oil demands, and the current levels and qualities of the base oil in the storage tanks. The major constraints considered can be formalized as follows (see parameter definitions given in 4.2):</p><p>C1: Tank storage level can never be less than a given threshold ( ) min </p><formula xml:id="formula_3">i i i i Ud t Fd t Ud t Fd t &gt; = ⎧ ⎨ = ≥ ⎩</formula><p>The base oil is stored in specific storage tanks (TK2501-TK2506 (see figure <ref type="figure">5</ref>)). The total time horizon spans 160 hours, during which completely defined oil parcels have to be received from the pipeline. Six oil tanks are available; all of them have the same capacity, but different amounts of oil at the beginning of the time horizon (figure. 6.) Aims are to receive all base oil using available pumps feeding Tank with sufficient capacity, and to produce exactly the requested quantity with the available quantity of bases oil in the range of the decade. For this reason, we consider as an evaluation criterion the Cmax (Maximum duration time to produce the requested products).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Experimental results</head><p>The experiment was conducted as follows: we launch the system with data explained above. The graph (Figure <ref type="figure" target="#fig_3">7</ref>) shows the results for the first phase of the learning algorithm. As this graph shows, before 5000 iterations, the Cmax variation is rather high. It varied in the interval [100h, 1500h], which is a modest result. This can be justified by the fact that the results are from the exploration phase, in which actions are executed randomly according to the Boltzmann probability <ref type="bibr" target="#b0">[1]</ref>. The second phase is the exploitation phase, in which the choice of actions is based on Q values (just before and after 5000 iterations), and the results are better. This phase produced solutions with a very interesting Cmax of 45 h. Thus, we can state that our system converges towards optimal solutions by minimizing the total time of production even with maintenance tasks. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Reactive behavior</head><p>Despite being relatively under control, thanks to the preventive maintenance plans, perturbations are always possible in a refinery. To test our system faced with such random events, we caused system perturbations in order to observe the system's behavior. We caused the same perturbation (a breakdown of P3102) in the exploration phase at the 2000th iteration and again in the exploitation phase at the 15000th iteration. When such perturbations occur in the current system, some production tasks have to be cancelled to allow the maintenance tasks to be performed. The human expert then has to manually find a solution to replace the cancelled production tasks. However, in our experiment, the disturbance in the exploitation phase was quickly compensated for without any Cmax variation over 49h, and the system was brought back to the level of its best performances. These results show that our system is able to learn how to establish a continuously improving optimal control policy to schedule maintenance tasks within a production plan without reducing the production rate.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">CONCLUSION AND FUTURE WORKS</head><p>In this paper, we have presented a multi-agent model for the dynamic scheduling of in petroleum process. In this model, agents simultaneously insure effective scheduling and continuous improvement of the solution quality by means of reinforcement learning, using the SARSA algorithm. We have also provided an overview of the research done in the field of manufacturing control, focusing on dynamic and reactive scheduling. The results of our experiments with this model show that our approach can generate on-line scheduling solutions and improve their quality by minimizing Cmax. Nevertheless, we want to widen the time horizon of our experimentation, taking into consideration more complex production units. Last, we are going to work on a holonic version of our model for future comparison with the multi-agent model.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig</head><label></label><figDesc>Fig. 3.</figDesc><graphic coords="8,160.27,410.40,55.48,170.10" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>Tank storage level can never be greater than a given threshold . ( ) max Tank cannot feed and receive at the same time (</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 5 .Fig. 6 .</head><label>56</label><figDesc>Fig. 5. Tank setting</figDesc><graphic coords="10,126.30,147.42,167.22,79.50" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 7 .</head><label>7</label><figDesc>Fig. 7. Cmax graph</figDesc><graphic coords="10,181.86,466.92,244.50,176.94" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="2,126.30,441.18,345.84,163.50" type="bitmap" /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Use of Machine Learning for Continuous improvement of the Real Time Manufacturing control system performances</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">N</forename><surname>Aissani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Trentesaux</surname></persName>
		</author>
		<author>
			<persName><surname>Beldjilali</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IJISE: International Journal of Industrial System Engineering</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="474" to="497" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Dynamic job-shop scheduling using reinforcement learning agents</title>
		<author>
			<persName><forename type="middle">M E</forename><surname>Aydin</surname></persName>
		</author>
		<author>
			<persName><forename type="middle">E</forename><surname>Öztemel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Robotics and Autonomous Systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="169" to="178" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A General Framework for Scheduling in a Stochastic Environment</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bidot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Vidal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Laborie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Beck</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc International Joint Conference on Artificial Intelligence IJICAI07</title>
				<meeting>International Joint Conference on Artificial Intelligence IJICAI07</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="56" to="61" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Self-Organization in Distributed Manufacturing Control: state-of-the-art and future trends</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bousbia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Trentesaux</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International conference on Systems, Man &amp; Cybernetics</title>
				<meeting><address><addrLine>Hammamet, Tunisia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page">6</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Adaptive algorithms in distributed resource allocation</title>
		<author>
			<persName><forename type="first">B</forename><surname>Csaji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Monostori</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc of the 6th International Workshop on Emergent Synthesis</title>
				<meeting>of the 6th International Workshop on Emergent Synthesis<address><addrLine>Tokyo, Japan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2006-08-18">2006. August 18-19</date>
			<biblScope unit="page" from="69" to="75" />
		</imprint>
		<respStmt>
			<orgName>The University of</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Heterarchical control of highly distributed manufacturing Systems</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Duffie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">V</forename><surname>Prabhu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Computer Integrated Manufacturing</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="270" to="281" />
			<date type="published" when="1996">1996. 1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-actionreward association learning</title>
		<author>
			<persName><forename type="middle">M</forename><surname>Haruno</surname></persName>
		</author>
		<author>
			<persName><surname>Kawato</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural Networks</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="page" from="1242" to="1254" />
			<date type="published" when="2006">2006. 2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Bionic assembly system: concept, structure and function</title>
		<author>
			<persName><forename type="middle">B</forename><surname>Katalinic</surname></persName>
		</author>
		<author>
			<persName><surname>Kordic</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc of the 5th IDMME 2004</title>
				<meeting>of the 5th IDMME 2004<address><addrLine>Bath, UK</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004-04-05">2004. April 5-7, 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Discrete-event modeling of heterarchical manufacturing control systems</title>
		<author>
			<persName><forename type="middle">G</forename><surname>Maione</surname></persName>
		</author>
		<author>
			<persName><surname>Naso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference</title>
				<imprint>
			<date type="published" when="2003">2003. 2004. Oct. 2004</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="1783" to="1788" />
		</imprint>
	</monogr>
	<note>Systems, Man and Cybernetics</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Stability and Fault Adaptation in Distributed Control of Heterarchical Manufacturing Job Shops</title>
		<author>
			<persName><forename type="middle">V</forename><surname>Prabhu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Robotics and Automation</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="142" to="147" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">.</forename><forename type="middle">P</forename><surname>Pujo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Brun-Picard</forename></persName>
		</author>
		<title level="m">Pilotage sans plan prévisionnel ni ordonnancement préalable , Méthodes du pilotage des systèmes de production</title>
				<imprint>
			<publisher>Hèrmes</publisher>
			<date type="published" when="2002">2002. 2002</date>
			<biblScope unit="page" from="129" to="162" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Artificial Intelligence: A Modern Approach&apos;, The Intelligent Agent Book</title>
		<author>
			<persName><forename type="first">Russell</forename><forename type="middle">S</forename><surname>Norvig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Prentice Hall Series in Artificial Intelligence</title>
				<imprint>
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Reinforcement learning with replacing eligibility traces</title>
		<author>
			<persName><forename type="first">.</forename><forename type="middle">S</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sutton</forename><forename type="middle">R</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Machine Learning</title>
				<imprint>
			<date type="published" when="1996">1996</date>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="page" from="1" to="3" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">The Contract Net Protocol: High-Level Communication and Control in a Distributed Problem Solver</title>
		<author>
			<persName><forename type="middle">R G</forename><surname>Smith</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions On Computers</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">12</biblScope>
			<biblScope unit="page" from="1104" to="1113" />
			<date type="published" when="1980">1980</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">A MultiCriteria Decision Support System for Dynamic task Allocation in a Distributed Production Activity Control Structure</title>
		<author>
			<persName><forename type="first">D</forename><surname>Trentesaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Dindeleux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Tahon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int. Journal of Computer Integrated Manufacturing</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="3" to="17" />
			<date type="published" when="1998">1998. 1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">D-Sign: un cadre méthodologique pour l&apos;ordonnancement décentralisé et réactif</title>
		<author>
			<persName><forename type="first">D</forename><surname>Trentesaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gzara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hammadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Tahon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Borne</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal Européen des Systèmes Automatisés</title>
		<imprint>
			<biblScope unit="page" from="933" to="962" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Les systèmes de pilotage hétérarchiques : innovations réelles ou modèles stériles ?</title>
		<author>
			<persName><forename type="first">D</forename><surname>Trentesaux</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal Européen des Systèmes Automatisés</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="page" from="1165" to="1202" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">A reinforcement learning-based approach to dynamic Job-shop scheduling</title>
		<author>
			<persName><forename type="first">Y-Z</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M-Y</forename><surname>Zhao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Acta automarica sinica</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="765" to="771" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
