Electricity Market (Virtual) Agents Paulo Trigo Paulo Marques Helder Coelho LabMAg, GuIAA; DEETC, ISEL GuIAA; DEETC, ISEL LabMAg; DI, FCUL Instituto Superior de Eng. de Lisboa Instituto Superior de Eng. de Lisboa Faculdade de Ciências da Univ. de Lisboa Portugal Portugal Portugal Email: ptrigo@deetc.isel.ipl.pt Email: 28562@alunos.isel.ipl.pt Email: hcoelho@di.fc.ul.pt Abstract—This paper describes a multi-agent based simulation TEMMAS agents exhibit bounded rationality, i.e., they (MABS) framework to construct an artificial electric power mar- make decisions based on local information (partial knowledge) ket populated with learning agents. The artificial market, named of the system and of other agents while learning and adapting TEMMAS (The Electricity Market Multi-Agent Simulator), ex- plores the integration of two design constructs: i) the specification their strategies during a simulation. The TEMMAS purpose of the environmental physical market properties, and ii) the is not to explicitly search for equilibrium points, but rather specification of the decision-making (deliberative) and reactive to reveal and assist to understand the complex and aggregate agents. TEMMAS is materialized in an experimental setup system behaviors that emerge from the interactions of the involving distinct power generator companies which operate market agents. in the market and search for the trading strategies that best exploit their generating units’ resources. The experimental results II. T HE MABS M ODELING F RAMEWORK show a coherent market behavior that emerges from the overall simulated environment. We describe the structural MABS constituents by means of two concepts: i) the environmental entity, which owns a distinct existence in the real environment, e.g. a resource such I. I NTRODUCTION as an electricity producer, or a decision-making agent such as The start-up of nation-wide electric markets, along with a market bidder generator company, and ii) the environmental its recent expansion to intercountry markets, aims at pro- property, which is a measurable aspect of the real environment, viding competitive electricity service to consumers. The new e.g. the price of a bid or the demand for electricity. Hence, market-based power industry calls for human decision-making we define the environmental entity set, ET = { e1 , . . . , en }, in order to settle the energy assets’ trading strategies. The and the environmental property set, EY = { p1 , . . . , pm }. The interactions and influences among the market participants are whole environment is the union of its entities and properties: usually described by game theoretic approaches which are E = ET ∪ EY . based on the determination of equilibrium points to which The environmental entities, ET , are often clustered in diffe- compare the actual market performance [1], [2]. However, rent classes, or types, thus partitioning ET into a set, PET , of those approaches find it difficult to incorporate the ability disjoints subsets, PEi T , each containing  entities that belong to of market participants to repeatedly probe markets and adapt the same class. Formally, PET = PE1T , . . . , PEkT defines their strategies. Usually, the problem of finding the equilibria a full partition of ET , such that PEi T ⊆ ET and PET = strategies is relaxed (simplified) both in terms of: i) the human ∪i=1...k PEi T and PEi T ∩ PEjT = ∅ ∀i 6= j. The partitioning agents’ bidding policies, and ii) the technical and economical may be used to distinguish between decision-making agents operation of the power system. and available resources, e.g. a company that decides the biding As an alternative to the equilibrium approaches, the multi- strategy to pursue or a plant that provides the demanded power. -agent based simulation (MABS) comes forth as being par- The environmental properties, EY , can also be clustered, in ticulary well fitted to analyze dynamic and adaptive systems a similar way as for the environmental entities, thus grouping with complex interactions among constituents [3], [4]. properties that are related. The partitioning may be used to ex- In this paper we describe a MABS modeling frame- press distinct categories, e.g. economical, electrical, ecological work that provides constructs for the (human) designer to or social aspects. Another, more technical usage, is to separate specify a dynamic environment, its resources, observable constant parameters from dynamic state variables. properties and its inhabitant decision-making agents. We The factored state space representation. The state of the used the framework to capture the behavior of the elec- simulated environment is implicitly defined by the state of all tricity market and to build a simulator, named TEMMAS its environmental entities and properties. We follow a factored (The Electricity Market Multi-Agent Simulator), which incor- representation, that describes the state space as a set, V, of porates the operation of several generator company (GenCo) discrete state variables [5]. Each state variable, vi ∈ V, takes operators, each with distinct power generating units (GenUnit), on values in its domain D( vi ) and the global (i.e., over E) and a market operator (Pool) which computes the hourly state space, S ⊆ ×vi ∈V D( vi ), is a subset of the Cartesian market price (driven by the electricity demand). product of the state variable domains. A state s ∈ S is an assignment of values to the set of state variables V. We define generator company, GenCo, submits (to Pool) how much fC , C ⊆ V, as a projection such that if s is an assignment to energy, each of its generating unit, GenUnitGenCo , is willing V, fC ( s ) is the assignment of s to C; we define a context c as to produce and at what price. Thus, we have: i) the power an assignment to the subset C ⊆ V; the initial state variables supply system comprises a set, EGenCo , of generator companies, of each entity and property are defined, respectively, by the ii) each generator company, GenCo, contains its own set, functions initET : ET → C and initEY : EY → C. EGenUnitGenCo , of generating units, iii) each generating unit, From environmental entities to resources and agents. The GenUnitGenCo , of a GenCo, has constant marginal costs, and embodiment is central in describing the relation between the iv) the market operator, Pool, trades all the GenCos’ submitted entities and the environment [6]. Each environmental entity can energy. be seen as a body, possibly with the capability to influence the The bidding procedure conforms to the so-called “block environmental properties. Based on this idea of embodiment, bids” approach [12], where a block represents a quantity of two higher-level concepts (decoupled from the environment, energy being bided for a certain price; also, GenCos are not E, characterization) are introduced: i) agent, owing reasoning allowed to bid higher than a predefined price ceiling. Thus, and decision-making capabilities, and ii) resource, without any the market supply essential measurable aspects are the energy reasoning capability. Thus, given a set of agents, Υ, we define price, quantity and production cost. The consumer side of an association function embody : Υ → ET , which connects the market is mainly described by the quantity of demanded an agent to its physical entity. In a similar way, given a set energy; we assume that there is no price elasticity of demand of resources, Φ, we define the mapping function identity : (i.e., no demand-side market bidding). Φ → EY . We consider that |E| = |Υ| + |Φ|, thus each entity is Therefore, we have: ET = { Pool } ∪ EGenCo ∪g∈EGenCo either mapped to an agent or to a resource; there is no third EGenUnitg where EY = { quantity, price, productionCost }. category. The quantity refers both to the supply and demand sides of the market. The price referes both to the supply bided values The decision-making approach. Each agent perceives (the and to the market settled (by Pool) value. market) and acts (sells or buys) and there are two main The EGenCo contains the decision-making agents. The Pool approaches to develop the reasoning and decision-making is a reactive agent that always applies the same predefined capabilities: i) the qualitative mental-state based reasoning, auction rules in order to determine the market price and such as the belief-desire-intention (BDI) architecture [7], hence the block bids that clear the market. Each EGenUnitGenCo which is founded on logic theories, and ii) the quantita- represents the GenCo’s set of available resources. tive, decision-theoretic, evaluation of causal effects, such as The resources’ specification. Each generating unit, the Markov decision process (MDP) support for sequential GenUnitGenCo , defines its marginal costs and constructs the decision-making in stochastic environments. There are also block bids according to the strategy indicated by its generator hybrid approaches that combine the qualitative and quantitative company, GenCo. Each GenUnitGenCo calculates its marginal formulations [8], [9]. costs according to, either the “WithHeatRate” [13]) or the The qualitative mental-state approaches capture the relation “WithCO2 ” [14] formulation. between high level components (e.g. beliefs, desires, inten- The “WithHeatRate” formulation estimates the marginal tions) and tend to follow heuristic (or rule-based) decision- cost, MC, by combining the variable operations and mainte- -making strategies, thus being better fitted to tackle large-scale nance costs, vO&M, the number of heat rate intervals, nP at, problems and worst fitted to deal with stochastic environments. each interval’s capacity, capi and the corresponding heat rate The quantitative decision-theoretic approaches deal with low value, hri , and the price of the fuel, f P rice, being used; the level components (e.g., primitive actions and immediate re- marginal cost for a given i ∈ [1, nP at] interval is given by, wards) and searches for long-term policies that maximize some utility function, thus being worst fitted to tackle large-scale (capi+1 × hri+1 ) − (capi × hri ) MCi+1 = vO&M+ ×f P rice problems and better fitted to deal with stochastic environments. blockCapi+1 (1) The electric power market is a stochastic environment and where each block’s capacity is given by: blockCapi+1 = we currently formulate medium-scale problems that can fit a capi+1 − capi . decision-theoretic agent model. Therefore, TEMMAS adaptive The “WithCO2 ” marginal cost, MC, combines the variable agents (e.g., market bidders) follow a MDP based approach operations and maintenance costs, vO&M, the price of the and resort to experience (sampled sequences of states, actions fuel, f P rice, the CO2 cost, CO2 cost, and the unit’s produc- and rewards from simulated interaction) to search for optimal, tivity, η, through the expression, or near-optimal, policies using reinforcement learning methods such as Q-learning [10] or SARSA [11]. f P rice MC = × K + CO2 cost + vO&M (2) η III. TEMMAS D ESIGN where K is a fuel-dependent constant factor, and CO2 cost Within the current design model of TEMMAS the electricity is given by, asset is traded through a spot market (no bilateral agreements), CO2 emit which is operated via a Pool institutional power entity. Each CO2 cost = CO2 price × ×K (3) η where CO2 emit is the CO2 fuel’s emissions. Here all development platform. Figure 2 presents the general “agent’s blocks have the same capacity; given a unit’s maximum perspective”, where the tasks and the goals are clustered into capacity, maxCap, and a number of blocks, nBlocks, to sell, individual and social perspectives. Figure 3 gives additional each block’s capacity is given by: blockCap = maxCap nBlocks . detail on the construction of tasks and goals using INGENIAS. The decision-making strategies. Each generator company defines the bidding strategy for each of its generating units. User Interface We designed two types of strategies: a) the basic-adjustment, that chooses among a set of basic rigid options, and b) the heuristic-adjustment, that selects and follows a prede- fined well-known heuristic. There are several basic-adjustment strategies already defined in TEMMAS. Here we outline seven Agents of those strategies, sttgi where i ∈ { 1, . . . , 7 }, available for Generating Generator a GenCo to apply: i) sttg1 , bid according to the marginal Unit Company production cost of each GenUnitGenCo (follow heat rate curves, Market Operator Buyer e.g., cf. tables II and III), ii) sttg2 , make a “small” in- (Pool) crement in the prices of all the previous-day’s block bids, Generating Generator iii) sttg3 , similar to sttg2 , but makes a “large” increment, Unit Company iv) sttg4 , make a “small” decrement in the prices of all the previous-day’s block bids, v) sttg5 , similar to sttg4 , but Legend makes a “large” decrement, vi) sttg6 , hold the prices of all Marginal Cost Buying Offers previous-day’s block bids, vii) sttg7 set the price to zero. Sale Offers Market Results There are two heuristic-adjustment defined strategies: a) the “Fixed Increment Price Probing” (FIPP) that uses a percentage Fig. 1. The TEMMAS architecture and the configurable parameters. to increment the price of last day’s transacted energy blocks and to decrement the non-transacted blocks, and b) “Physical Withholding based on System Reserve” (PWSR) that reduces the block’s capacity, as to decrement the next day’s estimated individual system reserve (difference between total capacity and total social perspective demand), and then bids the remaining energy at the maximum perspective market price. The agents’ decision process. The above strategies correspond to the GenCo agent’s primary actions. The GenCo has a set, EGenUnitGenCo , of generating units and, at each decision-epoch, it decides the strategy to apply to each generating unit, thus choosing a vector of strate- −−→ gies, sttg, where the ith vector’s component refers to the i GenUnitGenCo generating unit; thus, its action space is given |EGenUnitGenCo | by: A = ×i=1 { sttg1 , . . . , sttg7 }i ∪ { FIPP, PWSR }. The GenCo’s perceived market share, mShare, is used to characterize the agent internal memory so its state space Fig. 2. TEMMAS agent’s view using INGENIAS framework. is given by mShare ∈ [ 0..100 ]. Each GenCo is a MDP decision-making agent such that the decision process period represents a daily market. At each decision-epoch each agent satisfies satisfies computes its daily profit (that is regarded as an internal reward function) and the Pool agent receives all the GenCos’s block bids for the 24 daily hours and settles the hourly market price by matching offers in a classic supply and demand equilibrium price (we assume a hourly constant demand). TEMMAS architecture and construction. The TEMMAS agents along with the major inter-agent communication paths are represented in the bottom region of Figure 1; the top region represents the user interface that enables to specify the consumes uses consumes uses each of the resources’ and agents’ configurable parameters. Fig. 3. TEMMAS tasks and goals specification using INGENIAS framework. The implementation of the TEMMAS architecture followed the INGENIAS [15] methodology and used its supporting TABLE I IV. TEMMAS ILLUSTRATIVE SETUP P ROPERTIES OF GENERATING UNITS ; THE UNITS ’ TYPES ARE COAL (CO), COMBINED CYCLE (CC) AND GAS TURBINE (GT); THE O&M INDICATES We used TEMMAS to build a specific electric market “ OPERATION AND MAINTENANCE ” COST. simulation model. We picked the inspiration from the Iberian Electricity Market (MIBEL – “Mercado Ibérico de Electrici- Type of generating unit dade”) with Portuguese (e.g., EDP - “Electrividade de Portu- Property unit CO CC GT gal”, “Turbogás”, “Tejo Energia”) and Spanish (e.g., “Endesa”, Fuel — Coal (BIT) Nat. Gas Nat. Gas “Iberdrola”, “Union Fenosa”, “Hidro Cantábrico”, “Viesgo”, Capacity MW 500 250 125 “Bas Natural”, “Elcogás”) generator companies. Regarding the Fuel price C/MMBtu 1.5 5 5 total electricity capacity installed the Iberian market is com- Variable O&M C/MWh 1.75 2.8 8 posed of a major player (Spain) and a minor player (Portugal). Our experiments exploit the combined market behavior of a TABLE II major and a minor electricity market players. CO AND CC UNIT ’ S CAPACITY BLOCK (MW) AND HEAT RATE (B TU / K W H ) AND THE CORRESPONDING MARGINAL COST ( C/MW H ). We abstracted intra-nation market details and modeled each country as a single generator company (with several generating CO generating unit CC generating unit units). Figure 4 uses INGENIAS notation to depict the hierar- Cap. Heat rate Marg. cost Cap. Heat rate Marg. cost chical structure of the electricity market; the Pool (OMEL – 250 12000 — 100 9000 — “Operador do Mercado Ibérico de Electricidade”) settles the 350 10500 11.9 150 7800 29.8 market price (and coupled bids) after the bids submitted by 400 10080 12.5 200 7200 29.8 each GenCo (PT – “Portugal” and ES – “Spain”) according 450 9770 12.7 225 7010 30.3 to a strategy that depends on the marginal production costs of 500 9550 13.1 250 6880 31.4 each GenUnit. TABLE III GT UNIT ’ S CAPACITY BLOCK (MW) AND HEAT RATE (B TU / K W H ) AND THE CORRESPONDING MARGINAL COST ( C/MW H ) . GT generating unit Cap. Heat rate Marg. cost 50 14000 — 100 10600 44.0 110 10330 46.2 120 10150 48.9 125 10100 52.5 computed according to the respective GenUnits (cf. Table I). The “active” suffix (cf. Table IV, name column) means that Fig. 4. An illustrative TEMMAS formulation (using INGENIAS notation). the GenCo searches for its GenUnits best bidding strategies; i.e. “active” is a policy learning agent. We considered three types of generating units: i) one base load coal plant, CO, ii) one combined cycle plant, CC, to cover TABLE IV T HE EXPERIMENT ’ S GenCoS AND GenUnit S . intermediate load, and iii) one gas turbine, GT, peaking unit. Table I shows the essential properties of each plant type and GenCo tables II and III shows the heat rate curves used to define Exp. name Prod. Capac. GenUnits the bidding blocks. The marginal cost was computed using #1 GenCo active 875 CO & CC & GT expression ( 1 ); the bidding block’s quantity is the capacity GenCo major 2000 2×CO & 4×CC increment, e.g. for CO, the 11.9 marginal cost bidding block’s #2 GenCo minor&active 875 3×CC & 1×GT quantity is 350 − 250 = 100 MW (cf. Table II, CO, top lines GenCo major&active 2000 2×CO & 4×CC 2 and 1). #3 GenCo minor&active 875 3×CC & 1×GT V. E XPERIMENTS AND RESULTS Our experiments have two main purposes: i) illustrate the Experiment #1. The experiment sets a constant, 600 TEMMAS functionality, and ii) analyze the agents’ resulting MW, hourly demand for electricity. Figure 5 shows the behavior, e.g. the learnt bidding policies, in light of the market GenCo active process of learning the bidding policy that gives specific dynamics. the highest long-term profit. We used Q-learning, with an We designed three experimental scenarios and Table IV -greedy exploration strategy, which picks a random action shows the GenCo’s name along with its production capacity, with probability  and behaves greedily otherwise (i.e., picks the action with the highest estimated action value); we defined GenCos' Market Share 100  = 0.2. The learning factor rate of Q-learning was defined 90 as α = 0.01 and the discount factor (which measures the 80 Market Share ( % ) present value of future rewards) was set to γ = 0.5. Figure 70 GenCo _major 60 6 shows the bid blocks that cleared the market (at the first 50 hour of last simulated day). As there is no market competition 40 GenCo _minor&active the cheapest, CO, bids zero, the GT sets the market price (to 30 20 its ceiling) and the most expensive 200 MW are distributed 10 among the most expensive GenUnits (CC, GT). Therefore, the 0 0 10 20 30 40 50 60 70 80 90 100 GenCo active agent found, for each perceived market share, −−→ Simulation Cycle (1 Day; 24 Hours) mShare, the best strategy, sttg, to bid its GenUnits’ energy blocks. Fig. 7. Market share evolution induced by GenCo minor&active. [Exp. #2] Profit of GenCo _active 2.5 2 competition each company learns to secure its own fringe of the market. 1.5 Profit ( M€ ) GenCos' Market Share 1 GenCo_major&active 100 90 0.5 80 Market Share ( % ) 70 0 60 50 -0.5 GenCo_minor&active 40 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 30 Simulation Cycle (1 Day; 24 Hours) 20 10 Fig. 5. The process of learning a bid policy to maximize profit. [Exp. #1] 0 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Simulation Cycle (1 Day; 24 Hours) GenCo _active Coupled Block Bids (Day=2500; Hour=1) 180 Fig. 8. Market share evolution induced by both GenCos. [Exp. #3] 150 Price (€/MWh) 120 VI. C ONCLUSIONS AND FUTURE WORK 90 This paper describes our preliminary work in the cons- 60 truction of a MABS framework to analyze the macro-scale 30 dynamics of the electric power market. Although both research 0 0 50 100 150 200 250 300 350 400 450 500 550 600 fields (MABS and market simulation) achieved considerable Capacity (MW) progress there is a lack of cross-cutting approaches. We used Base Coal (CO) Comb. Cycle (CC) Gas Turbine (GT) the proposed MABS framework to support our preliminary work in the construction of the TEMMAS agent-based elec- Fig. 6. The bid policy that maximizes profit (price ceiling is 180). [Exp. #1] tricity market simulator. Hence, our contribution is two folded: i) a comprehensive Experiment #2. The experiment sets a constant, 2000 MW, formulation of MABS, including the simulated environment hourly demand for electricity. Figure 7 shows the market share and the inhabiting decision-making and learning agents, and ii) evolution while GenCo minor&active learns to play in the a simulation model (TEMMAS) of the electric power market market with GenCo major, which is a larger company with a framed in the proposed formulation. fixed strategy: “bid each block 5C higher than its marginal Our initial results reveal an emerging and coherent market cost”. We see that GenCo minor&active gets around 18% behavior, thus inciting us to further extend the experimental (75 − 57) of market from GenCo major. To earn that market setup with additional bidding strategies and to incorporate the GenCo minor&active learnt to lower its prices in order to specific market rules, such as congestion management and exploit the “5C space” offered by GenCo major fixed strategy. pricing regulation mechanisms. Experiment #3. In this experiment both GenCos are “ac- R EFERENCES tive”; the remaining is the same as in experiment #2. Figure [1] Berry, C., Hobbs, B., Meroney, W., O’Neill, R., Jr, W.S.: Understanding 8 shows the market share oscillation while each company how market power can arise in network competition: a game theoretic reacts to the other’s strategy to win the market. Despite the approach. Utilities Policy 8(3) (September 1999) 139–158 [2] Gabriel, S., Zhuang, J., Kiet, S.: A Nash-Cournot model for the north american natural gas market. In: Proceedings of the 6th IAEE European Conference: Modelling in Energy Economics and Policy. (2– 3 September 2004) [3] Schuster, S., Gilbert, N.: Simulating online business models. In: Proceedings of the 5th Workshop on Agent-Based Simulation (ABS- 04). (May 3–5 2004) 55–61 [4] Helleboogh, A., Vizzari, G., Uhrmacher, A., Michel, F.: Modeling dynamic environments in multi-agent simulation. JAAMAS 14(1) (2007) 87–116 [5] Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: Proceedings of the IJCAI-95. (1995) 1104–1111 [6] Clark, A.: Being there: putting brain, body, and world together again. MIT (1998) [7] Rao, A., Georgeff, M.: BDI agents: From theory to practice. In: Pro- ceedings of the First International Conference on Multiagent Systems, S (1995) 312–319 [8] Simari, G., Parsons, S.: On the relationship between MDPs and the BDI architecture. In: Proceedings of the AAMAS-06. (May 8–12 2006) 1041–1048 [9] Trigo, P., Coelho, H.: Decision making with hybrid models: the case of collective and individual motivations. International Journal of Reasoning Based Intelligent Systems (IJRIS); Inderscience Publishers (2009) [10] Watkins, C., Dayan, P.: Q-learning. Mach. Learning 8 (1992) 279–292 [11] Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT P. (1998) [12] : OMIP - The Iberian Electricity Market Operator. online: ‘http://www. omip.pt’ [13] Botterud, A., Thimmapuram, P., Yamakado, M.: Simulating GenCo bidding strategies in electricity markets with an agent-based model. In: Proceedings of the 7th Annual IAEE European Energy Conference (IAEE-05). (August 28–30 2005) [14] Sousa, J., Lagarto, J.: How market players aadjusted their strategic behaviour taking into account the CO2 emission costs - an application to the spanish electricity market. In: Proceedings of the 4th International Conference on the European Electricity Market (EEM-07), Cracow, Poland (May 23–27 2007) [15] Gómez-Sanz, J., Fuentes-Fernández, R., Pavón, J., Garcı́a-Magariño, I.: INGENIAS development kit: a visual multi-agent system development environment (BEST ACADEMIC DEMO OF AAMAS’08). In: Pro- ceedings of the Seventh AAMAS, Estoril, Portugal (May 12-16 2008) 1675–1676