Electricity Market (Virtual) Agents
                Paulo Trigo                               Paulo Marques                                Helder Coelho
    LabMAg, GuIAA; DEETC, ISEL                        GuIAA; DEETC, ISEL                           LabMAg; DI, FCUL
  Instituto Superior de Eng. de Lisboa        Instituto Superior de Eng. de Lisboa       Faculdade de Ciências da Univ. de Lisboa
                Portugal                                    Portugal                                     Portugal
     Email: ptrigo@deetc.isel.ipl.pt            Email: 28562@alunos.isel.ipl.pt                 Email: hcoelho@di.fc.ul.pt


   Abstract—This paper describes a multi-agent based simulation            TEMMAS agents exhibit bounded rationality, i.e., they
(MABS) framework to construct an artificial electric power mar-         make decisions based on local information (partial knowledge)
ket populated with learning agents. The artificial market, named        of the system and of other agents while learning and adapting
TEMMAS (The Electricity Market Multi-Agent Simulator), ex-
plores the integration of two design constructs: i) the specification   their strategies during a simulation. The TEMMAS purpose
of the environmental physical market properties, and ii) the            is not to explicitly search for equilibrium points, but rather
specification of the decision-making (deliberative) and reactive        to reveal and assist to understand the complex and aggregate
agents. TEMMAS is materialized in an experimental setup                 system behaviors that emerge from the interactions of the
involving distinct power generator companies which operate              market agents.
in the market and search for the trading strategies that best
exploit their generating units’ resources. The experimental results               II. T HE MABS M ODELING F RAMEWORK
show a coherent market behavior that emerges from the overall
simulated environment.                                                     We describe the structural MABS constituents by means
                                                                        of two concepts: i) the environmental entity, which owns a
                                                                        distinct existence in the real environment, e.g. a resource such
                       I. I NTRODUCTION
                                                                        as an electricity producer, or a decision-making agent such as
   The start-up of nation-wide electric markets, along with             a market bidder generator company, and ii) the environmental
its recent expansion to intercountry markets, aims at pro-              property, which is a measurable aspect of the real environment,
viding competitive electricity service to consumers. The new            e.g. the price of a bid or the demand for electricity. Hence,
market-based power industry calls for human decision-making             we define the environmental entity set, ET = { e1 , . . . , en },
in order to settle the energy assets’ trading strategies. The           and the environmental property set, EY = { p1 , . . . , pm }. The
interactions and influences among the market participants are           whole environment is the union of its entities and properties:
usually described by game theoretic approaches which are                E = ET ∪ EY .
based on the determination of equilibrium points to which                  The environmental entities, ET , are often clustered in diffe-
compare the actual market performance [1], [2]. However,                rent classes, or types, thus partitioning ET into a set, PET , of
those approaches find it difficult to incorporate the ability           disjoints subsets, PEi T , each containing
                                                                                                                entities that belong to
of market participants to repeatedly probe markets and adapt            the same class. Formally, PET = PE1T , . . . , PEkT defines
their strategies. Usually, the problem of finding the equilibria        a full partition of ET , such that PEi T ⊆ ET and PET =
strategies is relaxed (simplified) both in terms of: i) the human       ∪i=1...k PEi T and PEi T ∩ PEjT = ∅ ∀i 6= j. The partitioning
agents’ bidding policies, and ii) the technical and economical          may be used to distinguish between decision-making agents
operation of the power system.                                          and available resources, e.g. a company that decides the biding
   As an alternative to the equilibrium approaches, the multi-          strategy to pursue or a plant that provides the demanded power.
-agent based simulation (MABS) comes forth as being par-                   The environmental properties, EY , can also be clustered, in
ticulary well fitted to analyze dynamic and adaptive systems            a similar way as for the environmental entities, thus grouping
with complex interactions among constituents [3], [4].                  properties that are related. The partitioning may be used to ex-
   In this paper we describe a MABS modeling frame-                     press distinct categories, e.g. economical, electrical, ecological
work that provides constructs for the (human) designer to               or social aspects. Another, more technical usage, is to separate
specify a dynamic environment, its resources, observable                constant parameters from dynamic state variables.
properties and its inhabitant decision-making agents. We                   The factored state space representation. The state of the
used the framework to capture the behavior of the elec-                 simulated environment is implicitly defined by the state of all
tricity market and to build a simulator, named TEMMAS                   its environmental entities and properties. We follow a factored
(The Electricity Market Multi-Agent Simulator), which incor-            representation, that describes the state space as a set, V, of
porates the operation of several generator company (GenCo)              discrete state variables [5]. Each state variable, vi ∈ V, takes
operators, each with distinct power generating units (GenUnit),         on values in its domain D( vi ) and the global (i.e., over E)
and a market operator (Pool) which computes the hourly                  state space, S ⊆ ×vi ∈V D( vi ), is a subset of the Cartesian
market price (driven by the electricity demand).                        product of the state variable domains. A state s ∈ S is an
assignment of values to the set of state variables V. We define      generator company, GenCo, submits (to Pool) how much
fC , C ⊆ V, as a projection such that if s is an assignment to       energy, each of its generating unit, GenUnitGenCo , is willing
V, fC ( s ) is the assignment of s to C; we define a context c as    to produce and at what price. Thus, we have: i) the power
an assignment to the subset C ⊆ V; the initial state variables       supply system comprises a set, EGenCo , of generator companies,
of each entity and property are defined, respectively, by the        ii) each generator company, GenCo, contains its own set,
functions initET : ET → C and initEY : EY → C.                       EGenUnitGenCo , of generating units, iii) each generating unit,
   From environmental entities to resources and agents. The          GenUnitGenCo , of a GenCo, has constant marginal costs, and
embodiment is central in describing the relation between the         iv) the market operator, Pool, trades all the GenCos’ submitted
entities and the environment [6]. Each environmental entity can      energy.
be seen as a body, possibly with the capability to influence the        The bidding procedure conforms to the so-called “block
environmental properties. Based on this idea of embodiment,          bids” approach [12], where a block represents a quantity of
two higher-level concepts (decoupled from the environment,           energy being bided for a certain price; also, GenCos are not
E, characterization) are introduced: i) agent, owing reasoning       allowed to bid higher than a predefined price ceiling. Thus,
and decision-making capabilities, and ii) resource, without any      the market supply essential measurable aspects are the energy
reasoning capability. Thus, given a set of agents, Υ, we define      price, quantity and production cost. The consumer side of
an association function embody : Υ → ET , which connects             the market is mainly described by the quantity of demanded
an agent to its physical entity. In a similar way, given a set       energy; we assume that there is no price elasticity of demand
of resources, Φ, we define the mapping function identity :           (i.e., no demand-side market bidding).
Φ → EY . We consider that |E| = |Υ| + |Φ|, thus each entity is          Therefore, we have: ET = { Pool } ∪ EGenCo ∪g∈EGenCo
either mapped to an agent or to a resource; there is no third        EGenUnitg where EY = { quantity, price, productionCost }.
category.                                                            The quantity refers both to the supply and demand sides of
                                                                     the market. The price referes both to the supply bided values
   The decision-making approach. Each agent perceives (the           and to the market settled (by Pool) value.
market) and acts (sells or buys) and there are two main                 The EGenCo contains the decision-making agents. The Pool
approaches to develop the reasoning and decision-making              is a reactive agent that always applies the same predefined
capabilities: i) the qualitative mental-state based reasoning,       auction rules in order to determine the market price and
such as the belief-desire-intention (BDI) architecture [7],          hence the block bids that clear the market. Each EGenUnitGenCo
which is founded on logic theories, and ii) the quantita-            represents the GenCo’s set of available resources.
tive, decision-theoretic, evaluation of causal effects, such as
                                                                        The resources’ specification. Each generating unit,
the Markov decision process (MDP) support for sequential
                                                                     GenUnitGenCo , defines its marginal costs and constructs the
decision-making in stochastic environments. There are also
                                                                     block bids according to the strategy indicated by its generator
hybrid approaches that combine the qualitative and quantitative
                                                                     company, GenCo. Each GenUnitGenCo calculates its marginal
formulations [8], [9].
                                                                     costs according to, either the “WithHeatRate” [13]) or the
   The qualitative mental-state approaches capture the relation
                                                                     “WithCO2 ” [14] formulation.
between high level components (e.g. beliefs, desires, inten-
                                                                        The “WithHeatRate” formulation estimates the marginal
tions) and tend to follow heuristic (or rule-based) decision-
                                                                     cost, MC, by combining the variable operations and mainte-
-making strategies, thus being better fitted to tackle large-scale
                                                                     nance costs, vO&M, the number of heat rate intervals, nP at,
problems and worst fitted to deal with stochastic environments.
                                                                     each interval’s capacity, capi and the corresponding heat rate
   The quantitative decision-theoretic approaches deal with low      value, hri , and the price of the fuel, f P rice, being used; the
level components (e.g., primitive actions and immediate re-          marginal cost for a given i ∈ [1, nP at] interval is given by,
wards) and searches for long-term policies that maximize some
utility function, thus being worst fitted to tackle large-scale                            (capi+1 × hri+1 ) − (capi × hri )
                                                                     MCi+1 = vO&M+                                           ×f P rice
problems and better fitted to deal with stochastic environments.                                    blockCapi+1
                                                                                                                                  (1)
   The electric power market is a stochastic environment and
                                                                     where each block’s capacity is given by: blockCapi+1 =
we currently formulate medium-scale problems that can fit a
                                                                     capi+1 − capi .
decision-theoretic agent model. Therefore, TEMMAS adaptive
                                                                        The “WithCO2 ” marginal cost, MC, combines the variable
agents (e.g., market bidders) follow a MDP based approach
                                                                     operations and maintenance costs, vO&M, the price of the
and resort to experience (sampled sequences of states, actions
                                                                     fuel, f P rice, the CO2 cost, CO2 cost, and the unit’s produc-
and rewards from simulated interaction) to search for optimal,
                                                                     tivity, η, through the expression,
or near-optimal, policies using reinforcement learning methods
such as Q-learning [10] or SARSA [11].                                                  f P rice
                                                                                MC =             × K + CO2 cost + vO&M            (2)
                                                                                           η
                   III. TEMMAS D ESIGN                                  where K is a fuel-dependent constant factor, and CO2 cost
  Within the current design model of TEMMAS the electricity          is given by,
asset is traded through a spot market (no bilateral agreements),                                          CO2 emit
which is operated via a Pool institutional power entity. Each                    CO2 cost = CO2 price ×              ×K           (3)
                                                                                                               η
   where CO2 emit is the CO2 fuel’s emissions. Here all                       development platform. Figure 2 presents the general “agent’s
blocks have the same capacity; given a unit’s maximum                         perspective”, where the tasks and the goals are clustered into
capacity, maxCap, and a number of blocks, nBlocks, to sell,                   individual and social perspectives. Figure 3 gives additional
each block’s capacity is given by: blockCap = maxCap           nBlocks .      detail on the construction of tasks and goals using INGENIAS.
   The decision-making strategies. Each generator company
defines the bidding strategy for each of its generating units.                 User Interface
We designed two types of strategies: a) the basic-adjustment,
that chooses among a set of basic rigid options, and b)
the heuristic-adjustment, that selects and follows a prede-
fined well-known heuristic. There are several basic-adjustment
strategies already defined in TEMMAS. Here we outline seven
                                                                               Agents
of those strategies, sttgi where i ∈ { 1, . . . , 7 }, available for
                                                                                             Generating           Generator
a GenCo to apply: i) sttg1 , bid according to the marginal                                     Unit               Company
production cost of each GenUnitGenCo (follow heat rate curves,                                                                    Market
                                                                                                                                 Operator             Buyer
e.g., cf. tables II and III), ii) sttg2 , make a “small” in-                                                                      (Pool)
crement in the prices of all the previous-day’s block bids,                       Generating                Generator
iii) sttg3 , similar to sttg2 , but makes a “large” increment,                      Unit                    Company
iv) sttg4 , make a “small” decrement in the prices of all
the previous-day’s block bids, v) sttg5 , similar to sttg4 , but
                                                                                                 Legend
makes a “large” decrement, vi) sttg6 , hold the prices of all
                                                                                                          Marginal Cost        Buying Offers
previous-day’s block bids, vii) sttg7 set the price to zero.
                                                                                                          Sale Offers          Market Results
There are two heuristic-adjustment defined strategies: a) the
“Fixed Increment Price Probing” (FIPP) that uses a percentage
                                                                                 Fig. 1.    The TEMMAS architecture and the configurable parameters.
to increment the price of last day’s transacted energy blocks
and to decrement the non-transacted blocks, and b) “Physical
Withholding based on System Reserve” (PWSR) that reduces
the block’s capacity, as to decrement the next day’s estimated                                                                                  individual
system reserve (difference between total capacity and total                       social                                                       perspective
demand), and then bids the remaining energy at the maximum                      perspective
market price.
   The agents’ decision process. The above strategies
correspond to the GenCo agent’s primary actions. The
GenCo has a set, EGenUnitGenCo , of generating units and, at
each decision-epoch, it decides the strategy to apply to
each generating unit, thus choosing a vector of strate-
       −−→
gies, sttg, where the ith vector’s component refers to the
          i
GenUnitGenCo     generating unit; thus, its action space is given
              |EGenUnitGenCo |
by: A = ×i=1                   { sttg1 , . . . , sttg7 }i ∪ { FIPP, PWSR }.
The GenCo’s perceived market share, mShare, is used to
characterize the agent internal memory so its state space                           Fig. 2.     TEMMAS agent’s view using INGENIAS framework.
is given by mShare ∈ [ 0..100 ]. Each GenCo is a MDP
decision-making agent such that the decision process period
represents a daily market. At each decision-epoch each agent                               satisfies                                  satisfies
computes its daily profit (that is regarded as an internal reward
function) and the Pool agent receives all the GenCos’s block
bids for the 24 daily hours and settles the hourly market price
by matching offers in a classic supply and demand equilibrium
price (we assume a hourly constant demand).
   TEMMAS architecture and construction. The TEMMAS
agents along with the major inter-agent communication paths
are represented in the bottom region of Figure 1; the top
region represents the user interface that enables to specify the                 consumes                      uses           consumes              uses

each of the resources’ and agents’ configurable parameters.
                                                                              Fig. 3. TEMMAS tasks and goals specification using INGENIAS framework.
The implementation of the TEMMAS architecture followed
the INGENIAS [15] methodology and used its supporting
                                                                                                       TABLE I
              IV. TEMMAS ILLUSTRATIVE SETUP                               P ROPERTIES OF GENERATING UNITS ; THE UNITS ’ TYPES ARE COAL (CO),
                                                                          COMBINED CYCLE (CC) AND GAS TURBINE (GT); THE O&M INDICATES
   We used TEMMAS to build a specific electric market                                    “ OPERATION AND MAINTENANCE ” COST.
simulation model. We picked the inspiration from the Iberian
Electricity Market (MIBEL – “Mercado Ibérico de Electrici-                                                                Type of generating unit
dade”) with Portuguese (e.g., EDP - “Electrividade de Portu-                   Property              unit                 CO          CC         GT
gal”, “Turbogás”, “Tejo Energia”) and Spanish (e.g., “Endesa”,                    Fuel               —             Coal (BIT)           Nat. Gas    Nat. Gas
“Iberdrola”, “Union Fenosa”, “Hidro Cantábrico”, “Viesgo”,                    Capacity              MW                   500              250         125
“Bas Natural”, “Elcogás”) generator companies. Regarding the                 Fuel price          C/MMBtu                 1.5               5           5
total electricity capacity installed the Iberian market is com-             Variable O&M           C/MWh                  1.75             2.8          8
posed of a major player (Spain) and a minor player (Portugal).
Our experiments exploit the combined market behavior of a                                                TABLE II
major and a minor electricity market players.                                   CO AND CC UNIT ’ S CAPACITY BLOCK (MW) AND HEAT RATE
                                                                            (B TU / K W H ) AND THE CORRESPONDING MARGINAL COST ( C/MW H ).
   We abstracted intra-nation market details and modeled each
country as a single generator company (with several generating                     CO generating unit                             CC generating unit
units). Figure 4 uses INGENIAS notation to depict the hierar-               Cap.     Heat rate  Marg. cost                 Cap.     Heat rate  Marg. cost
chical structure of the electricity market; the Pool (OMEL –
                                                                            250       12000               —                100           9000          —
“Operador do Mercado Ibérico de Electricidade”) settles the
                                                                            350       10500              11.9              150           7800         29.8
market price (and coupled bids) after the bids submitted by
                                                                            400       10080              12.5              200           7200         29.8
each GenCo (PT – “Portugal” and ES – “Spain”) according                     450        9770              12.7              225           7010         30.3
to a strategy that depends on the marginal production costs of              500        9550              13.1              250           6880         31.4
each GenUnit.
                                                                                                      TABLE III
                                                                           GT UNIT ’ S CAPACITY BLOCK (MW) AND HEAT RATE (B TU / K W H ) AND
                                                                                    THE CORRESPONDING MARGINAL COST ( C/MW H ) .


                                                                                                         GT generating unit
                                                                                                  Cap.     Heat rate  Marg. cost
                                                                                                   50           14000              —
                                                                                                  100           10600             44.0
                                                                                                  110           10330             46.2
                                                                                                  120           10150             48.9
                                                                                                  125           10100             52.5


                                                                          computed according to the respective GenUnits (cf. Table I).
                                                                          The “active” suffix (cf. Table IV, name column) means that
Fig. 4.   An illustrative TEMMAS formulation (using INGENIAS notation).   the GenCo searches for its GenUnits best bidding strategies;
                                                                          i.e. “active” is a policy learning agent.
   We considered three types of generating units: i) one base
load coal plant, CO, ii) one combined cycle plant, CC, to cover                                           TABLE IV
                                                                                            T HE EXPERIMENT ’ S GenCoS AND GenUnit S .
intermediate load, and iii) one gas turbine, GT, peaking unit.
Table I shows the essential properties of each plant type and                                             GenCo
tables II and III shows the heat rate curves used to define                  Exp.                name                   Prod. Capac.             GenUnits
the bidding blocks. The marginal cost was computed using                      #1             GenCo active                   875             CO & CC & GT
expression ( 1 ); the bidding block’s quantity is the capacity
                                                                                             GenCo major                   2000             2×CO & 4×CC
increment, e.g. for CO, the 11.9 marginal cost bidding block’s                #2
                                                                                          GenCo minor&active               875              3×CC & 1×GT
quantity is 350 − 250 = 100 MW (cf. Table II, CO, top lines
                                                                                          GenCo major&active               2000             2×CO & 4×CC
2 and 1).                                                                     #3
                                                                                          GenCo minor&active               875              3×CC & 1×GT
                 V. E XPERIMENTS AND RESULTS
  Our experiments have two main purposes: i) illustrate the                  Experiment #1. The experiment sets a constant, 600
TEMMAS functionality, and ii) analyze the agents’ resulting               MW, hourly demand for electricity. Figure 5 shows the
behavior, e.g. the learnt bidding policies, in light of the market        GenCo active process of learning the bidding policy that gives
specific dynamics.                                                        the highest long-term profit. We used Q-learning, with an
  We designed three experimental scenarios and Table IV                   -greedy exploration strategy, which picks a random action
shows the GenCo’s name along with its production capacity,                with probability  and behaves greedily otherwise (i.e., picks
the action with the highest estimated action value); we defined                                                                                                                  GenCos' Market Share
                                                                                                                                               100
 = 0.2. The learning factor rate of Q-learning was defined                                                                                     90
as α = 0.01 and the discount factor (which measures the                                                                                         80


                                                                                                                         Market Share ( % )
present value of future rewards) was set to γ = 0.5. Figure                                                                                     70
                                                                                                                                                                                                                  GenCo _major
                                                                                                                                                60
6 shows the bid blocks that cleared the market (at the first                                                                                    50
hour of last simulated day). As there is no market competition                                                                                  40
                                                                                                                                                                                                          GenCo _minor&active
the cheapest, CO, bids zero, the GT sets the market price (to                                                                                   30
                                                                                                                                                20
its ceiling) and the most expensive 200 MW are distributed                                                                                      10
among the most expensive GenUnits (CC, GT). Therefore, the                                                                                          0
                                                                                                                                                        0     10     20     30        40     50     60      70     80      90     100
GenCo active agent found, for each perceived market share,
                             −−→                                                                                                                                                                   Simulation Cycle (1 Day; 24 Hours)
mShare, the best strategy, sttg, to bid its GenUnits’ energy
blocks.
                                                                                                                       Fig. 7. Market share evolution induced by GenCo minor&active. [Exp. #2]
                                                     Profit of GenCo _active
                 2.5


                   2
                                                                                                                       competition each company learns to secure its own fringe of
                                                                                                                       the market.
                 1.5
 Profit ( M€ )


                                                                                                                                                                                 GenCos' Market Share
                   1                                                                                                                                                                                       GenCo_major&active
                                                                                                                                              100
                                                                                                                                               90
                 0.5
                                                                                                                                               80


                                                                                                                         Market Share ( % )
                                                                                                                                               70
                   0
                                                                                                                                               60
                                                                                                                                               50
                 -0.5                                                                                                                                                                                      GenCo_minor&active
                                                                                                                                               40
                        0     200     400   600      800   1000    1200   1400   1600   1800   2000    2200    2400
                                                                                                                                               30
                                                                            Simulation Cycle (1 Day; 24 Hours)
                                                                                                                                               20
                                                                                                                                               10
Fig. 5.                     The process of learning a bid policy to maximize profit. [Exp. #1]                                                  0
                                                                                                                                                    0       500    1000   1500      2000   2500    3000   3500    4000    4500    5000
                                                                                                                                                                                                    Simulation Cycle (1 Day; 24 Hours)


                                     GenCo _active Coupled Block Bids (Day=2500; Hour=1)
                 180
                                                                                                                                         Fig. 8.            Market share evolution induced by both GenCos. [Exp. #3]

                 150
 Price (€/MWh)


                 120
                                                                                                                                                            VI. C ONCLUSIONS AND FUTURE WORK
                  90
                                                                                                                          This paper describes our preliminary work in the cons-
                  60
                                                                                                                       truction of a MABS framework to analyze the macro-scale
                  30
                                                                                                                       dynamics of the electric power market. Although both research
                    0
                        0      50     100    150     200     250    300    350    400    450     500     550     600
                                                                                                                       fields (MABS and market simulation) achieved considerable
                                                                                                  Capacity (MW)        progress there is a lack of cross-cutting approaches. We used
                                    Base Coal (CO)         Comb. Cycle (CC)        Gas Turbine (GT)
                                                                                                                       the proposed MABS framework to support our preliminary
                                                                                                                       work in the construction of the TEMMAS agent-based elec-
Fig. 6. The bid policy that maximizes profit (price ceiling is 180). [Exp. #1]
                                                                                                                       tricity market simulator.
                                                                                                                          Hence, our contribution is two folded: i) a comprehensive
   Experiment #2. The experiment sets a constant, 2000 MW,                                                             formulation of MABS, including the simulated environment
hourly demand for electricity. Figure 7 shows the market share                                                         and the inhabiting decision-making and learning agents, and ii)
evolution while GenCo minor&active learns to play in the                                                               a simulation model (TEMMAS) of the electric power market
market with GenCo major, which is a larger company with a                                                              framed in the proposed formulation.
fixed strategy: “bid each block 5C higher than its marginal                                                               Our initial results reveal an emerging and coherent market
cost”. We see that GenCo minor&active gets around 18%                                                                  behavior, thus inciting us to further extend the experimental
(75 − 57) of market from GenCo major. To earn that market                                                              setup with additional bidding strategies and to incorporate
the GenCo minor&active learnt to lower its prices in order to                                                          specific market rules, such as congestion management and
exploit the “5C space” offered by GenCo major fixed strategy.                                                          pricing regulation mechanisms.
   Experiment #3. In this experiment both GenCos are “ac-                                                                                                                        R EFERENCES
tive”; the remaining is the same as in experiment #2. Figure
                                                                                                                        [1] Berry, C., Hobbs, B., Meroney, W., O’Neill, R., Jr, W.S.: Understanding
8 shows the market share oscillation while each company                                                                     how market power can arise in network competition: a game theoretic
reacts to the other’s strategy to win the market. Despite the                                                               approach. Utilities Policy 8(3) (September 1999) 139–158
 [2] Gabriel, S., Zhuang, J., Kiet, S.: A Nash-Cournot model for the
     north american natural gas market. In: Proceedings of the 6th IAEE
     European Conference: Modelling in Energy Economics and Policy. (2–
     3 September 2004)
 [3] Schuster, S., Gilbert, N.: Simulating online business models. In:
     Proceedings of the 5th Workshop on Agent-Based Simulation (ABS-
     04). (May 3–5 2004) 55–61
 [4] Helleboogh, A., Vizzari, G., Uhrmacher, A., Michel, F.: Modeling
     dynamic environments in multi-agent simulation. JAAMAS 14(1) (2007)
     87–116
 [5] Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy
     construction. In: Proceedings of the IJCAI-95. (1995) 1104–1111
 [6] Clark, A.: Being there: putting brain, body, and world together again.
     MIT (1998)
 [7] Rao, A., Georgeff, M.: BDI agents: From theory to practice. In: Pro-
     ceedings of the First International Conference on Multiagent Systems,
     S (1995) 312–319
 [8] Simari, G., Parsons, S.: On the relationship between MDPs and the
     BDI architecture. In: Proceedings of the AAMAS-06. (May 8–12 2006)
     1041–1048
 [9] Trigo, P., Coelho, H.: Decision making with hybrid models: the case of
     collective and individual motivations. International Journal of Reasoning
     Based Intelligent Systems (IJRIS); Inderscience Publishers (2009)
[10] Watkins, C., Dayan, P.: Q-learning. Mach. Learning 8 (1992) 279–292
[11] Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT
     P. (1998)
[12] : OMIP - The Iberian Electricity Market Operator. online: ‘http://www.
     omip.pt’
[13] Botterud, A., Thimmapuram, P., Yamakado, M.: Simulating GenCo
     bidding strategies in electricity markets with an agent-based model.
     In: Proceedings of the 7th Annual IAEE European Energy Conference
     (IAEE-05). (August 28–30 2005)
[14] Sousa, J., Lagarto, J.: How market players aadjusted their strategic
     behaviour taking into account the CO2 emission costs - an application
     to the spanish electricity market. In: Proceedings of the 4th International
     Conference on the European Electricity Market (EEM-07), Cracow,
     Poland (May 23–27 2007)
[15] Gómez-Sanz, J., Fuentes-Fernández, R., Pavón, J., Garcı́a-Magariño, I.:
     INGENIAS development kit: a visual multi-agent system development
     environment (BEST ACADEMIC DEMO OF AAMAS’08). In: Pro-
     ceedings of the Seventh AAMAS, Estoril, Portugal (May 12-16 2008)
     1675–1676