<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Texas Hold'em Meets Risk-Aware HTN Planning: A Modelling and Architectural Perspective</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ebaa Alnazer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ilche Georgievski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Aiello</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Service Computing Department, IAAS, University of Stuttgart</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Games have long served as both a testbed and catalyst for advances in AI. Texas Hold'em stands out as one of the most strategically demanding variants of poker, characterised by high dynamism, uncertainty, and inherent risk, features commonly shared with real-world domains. At its core, every hand requires planning: deciding what action to take in the face of partial information and evolving strategy. This motivates us to propose a new direction for AI poker by framing Texas Hold'em as an AI planning problem. As a first step, we identify the key knowledge traits of the game using a conceptual framework for real-world planning domains, laying the foundation for a principled modelling approach. We then map gameplay elements into the framework of risk-aware HTN planning, which aligns well with modelling and playing poker, as it can efectively capture the game's strategic structure, uncertainty, and risk sensitivity. This approach allows player-specific risk attitudes to inform action choices based on hand strength and opponent behaviour. Building on this formulation, we propose a layered system architecture that enables adaptive decision-making during gameplay by interleaving risk-aware HTN planning, reactive refinement, learning, and execution. Taken together, our proposal represents a research agenda for modelling and solving strategic decision-making problems through risk-aware HTN planning.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;HTN planning</kwd>
        <kwd>Poker</kwd>
        <kwd>Risk-Awareness</kwd>
        <kwd>Uncertainty</kwd>
        <kwd>Domain Modelling</kwd>
        <kwd>Texas Hold'em</kwd>
        <kwd>System Architecture</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>When Beyoncé sings, "This ain’t Texas, ain’t no hold ’em, so lay your cards down...", she likely is not
thinking about poker algorithms. However, the lyrics capture something essential about Texas Hold’em:
the game is a performance of intention and uncertainty, played out in layers of concealed information
and strategy. Behind every hand lies a strategic plan of when to raise, bluf, or fold. This is based
not only on card strength but also on a player’s evolving beliefs about their opponents and the risks
involved. In this sense, playing poker involves more than tactical moves; it requires constructing and
adapting plans in real time, under uncertainty and risk.</p>
      <p>
        Poker falls into the class of imperfect-information games, which serve as valuable domains for AI
research because they reflect real-world decision-making challenges, such as uncertainty. In addition,
a poker agent has to deal with hidden information, embrace stochasticity along with possibilities for
deception, and adversarial dynamics. Over the years, the game of poker has attracted attention in AI as
a challenging domain, e.g., [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. This has led to the development of approaches that range from
knowledge-based agents and simulation engines to theoretical equilibrium solutions and exploitative
counter-strategies, culminating in AI-powered poker agents that are on par with professional players [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        The notion of building a plan when playing poker resonates with AI planning, where the objective is
to compute a course of action that achieves a desired outcome. AI planning has already been successfully
applied in games, such as the bridge game [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and video games [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Here, we postulate that
decisionmaking in poker can likewise be viewed as solving a planning problem. Taking a planning perspective
on poker allows us to frame decision-making not merely as reactive behaviour but as structured,
goal-driven deliberation. This opens the door to applying established methods from AI planning.
      </p>
      <p>
        Inspired by classical approaches that use game tree search for games of strategy (e.g., extensive-form
games [7]), we consider Hierarchical Task Network (HTN) planning [8], a technique in AI planning,
particularly relevant, as its hierarchical structure mirrors the game tree enabling it to provide a natural
model for the layered structure and decision-making in poker strategy. HTN planning has proven
efective across many real-world domains, such as games, e.g., [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], robotics, e.g., [9], autonomous
vehicles, e.g., [10], cloud computing, e.g., [11], and building automation, e.g., [12]. Our recently
developed risk-aware HTN planning framework [13] enables explicit modelling of risk and uncertainty
through probability distributions over action efects and costs, while preserving the expressiveness and
computational strengths of HTN planning. These features make the framework a particularly suitable
match for capturing the strategic reasoning required under risk and uncertainty in Texas Hold’em.
      </p>
      <p>We propose a novel perspective: viewing Texas Hold’em, particularly No-Limit Texas Hold’em, the
most strategically complex variant, as a risk-aware HTN planning problem. We explore this idea through
two complementary aspects: modelling and system architecture. First, we examine how components of
the game map onto constructs in risk-aware HTN planning, providing first steps toward formalising
poker in this framework. Second, we propose a layered system architecture that interleaves long-term
planning via risk-aware HTN planning, reactive refinement, learning, and execution monitoring.</p>
      <p>Our contributions also include (1) identifying poker’s key traits using an existing conceptual
framework that emphasises realistic aspects in planning domains [14]; (2) mapping these traits to constructs
in risk-aware HTN planning; and (3) designing a layered system architecture that enables adaptive
planning for Texas Hold’em when conceptualised through the lens of uncertainty and risk attitudes.</p>
      <p>The remainder of the paper covers related work (Section 2), background (Section 3), an analysis
of knowledge traits (Section 4), the mapping to risk-aware HTN planning (Section 5), the layered
architecture (Section 6), and conclusion (Section 7).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Despite the research interest in applying AI planning to games, to the best of our knowledge, no
prior work has specifically studied the use of AI planning for poker. Most existing literature either
addresses general game planning or focuses on game types that difer significantly from poker in terms
of structure, information availability, and objectives. Although poker shares certain features with other
games, such as uncertainty, it also presents unique challenges, including hidden information, adversarial
dynamics, and strategic deception, which remain largely unaddressed in planning research for games.</p>
      <p>Moreover, the aspect of risk, particularly the incorporation of risk attitudes into decision-making, is
absent in prior work. Existing approaches do not model or reason about varying attitudes toward risk
when solving games as planning problems, which is a central concern here. We aim to close this gap by
explicitly addressing how risk and risk sensitivity can be integrated into planning for playing poker.</p>
      <p>
        The card game Bridge has served as a successful domain for AI planning [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], ofering a perspective
similar to ours in viewing gameplay as a planning task. This approach adapts HTN planning to handle
uncertainty using belief functions to estimate unknown card positions and probabilistic reasoning for
opponents’ actions. However, it is domain-specific, meaning a specialised HTN planner was developed
to play the game. In contrast, our proposal is general: it leverages domain-independent HTN planning
engines, supports broader forms of non-determinism, and considers strategic elements like blufing.
Moreover, our proposal also accounts for adapting plans and learning opponent models.
      </p>
      <p>Other studies on games propose to model uncertainty through expected efects [ 15]. Although
their focus is on video games, we find this concept relevant and also explore its potential applicability
in the context of poker. Likewise, research on combining multi-criteria evaluation and AI planning
in computer games demonstrates how complex planning problems can be decomposed into smaller,
manageable problems using techniques, such as partial order bounding and decomposition search [16].
While the modelling of these planning problems in conjunction with multi-criteria evaluation is not
explored, this line of work presents a compelling approach for managing the strategic trade-of in poker
and reducing the complexity by locally solving subgames within the larger game structure.</p>
      <p>Some architectural work has laid the groundwork for integrating AI planning with game behaviour.
For example, a three-layer planning architecture was proposed for highly dynamic and uncertain game
environments [17]. While this architecture inspired aspects of our proposed architecture, the overall
work is tailored to a specific video game and does not capture poker complexities and characteristics.</p>
      <p>Other relevant research integrates game-theoretic techniques with AI planning to reason about
opponents’ beliefs and their capacity to disrupt or defeat computed plans [18]. Although this work does
not present game modelling specifically for AI planning, it ofers valuable insights into planning under
uncertainty. When combined with planning methods, presented techniques could enhance an agent’s
ability to handle uncertainty originating from opponent strategies, beliefs, and tactics, and adapt plans
in dynamic game environments accordingly.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Background</title>
      <sec id="sec-3-1">
        <title>3.1. No-Limit Texas Hold’em</title>
        <p>No-Limit Texas Hold’em stands out as the most widely played, popular, and advanced poker variation
from a strategy point of view, evidenced by the experts’ divergent opinions on how to play a specific
hand [19]. It is not only the most commonly associated with poker and the standard in casinos, but also
the oficial ruleset in the World Series of Poker, the most prestigious poker tournament [ 20]. No-Limit
Texas Hold’em is more aggressive and popular than its limit variant, as players can bet any amount,
including their entire stack, and there is no limit on the number of raises per round.</p>
        <p>Poker is typically played with a standard deck of 52 cards, consisting of combinations of 13 diferent
cards (A, K, Q, J, T, 9, 8, 7, 6, 5, 4, 3, 2) and 4 diferent suits ( ♠ , ♡, ♢, ♣), and can involve two to nine
players. In most tournaments, each player starts with the same amount of chips, which is the currency
used to buy into rounds (or hands) or bet with. If a player loses all his chips, the player is eliminated
from the tournament. To win chips, multiple game rounds are played.</p>
        <p>Each hand follows a fixed sequence of five stages (or streets), pre-flop , flop , turn, river, and showdown,
each with a betting round where players choose one of five rule-governed actions: (1) Raise: By adding
chips, at least twice the current highest bet, with no upper limit, to the pot to increase the current bet,
thereby forcing opponents to match the bet or fold. This action is valid only when an open bet exists
and the minimum amount to raise is met. (2) Check: To stay in the game without betting, only if no
open bet exists, and it is not the pre-flop stage. (3) Call: If a previous raise occurred, a player may call
by matching the highest current bet. (4) Fold: To discard the hand and exit the current round at any
time. (5) Bet: To place chips to initiate wagering and forcing others to call or fold, only when no open
bet exists, and it is not the pre-flop stage.</p>
        <p>Each betting round lasts until only one player has not folded their hand, or all players have matched
the bet. If a player goes all-in, betting their entire bankroll (i.e., all their remaining chips), they cannot
bet further but remain eligible to win the pot up to their invested amount.</p>
        <p>A hand begins with the pre-flop stage, where each player is dealt two face-down hole cards. Two
players must post mandatory bets, the Small Blind (SB) and Big Blind (BB), before any cards are dealt.
BB is typically twice the size of SB. These blinds create an initial pot and encourage action. The blinds
increase as the tournament progresses, and the players responsible for posting them, as well as the turn
order, rotate after each round. In pre-flop rounds, the first to act is the player left of the Big Blind; in
other rounds, the SB acts first. Betting then proceeds clockwise.</p>
        <p>After the pre-flop betting round, the flop stage begins, which includes players who have matched
the BB. The dealer places three community cards face-up on the table, and a new betting round starts.
The dealer does not bet but manages dealing, announces all-ins, and calculates hand ranks. After the
lfop betting round, the turn stage begins with one community card dealt, followed by a betting round.
Then the river stage deals the final community card, followed by the last betting round. Finally, the
showdown occurs, where players reveal their hands, and the highest hand wins the hand (i.e., the pot)
or it is split among ties. Hand rank is determined by the best five cards from the community cards and
the player’s hole cards. Thus, the goal of each player is to form the best five-card hand using the shared
community cards in conjunction with their hole hand at the end of the final betting round.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Risk-Aware HTN Planning</title>
        <p>HTN planning is based on enhancing planning domains with detailed knowledge about how tasks
can be performed [8]. This rich domain knowledge and its intrinsic hierarchical structure allow HTN
planning to naturally reflect human-like decision-making processes. Specifically, HTN planning begins
with an initial state and an initial task network as an objective to be accomplished. It also requires
domain knowledge structured as networks of primitive and compound tasks. In this hierarchy, primitive
tasks can be executed directly by actions, while compound tasks must be decomposed into subtasks
using methods. One way to search for solutions involves decomposing the initial task network and
continuing to do so until only primitive tasks remain. If successful, the result is a plan, which is an
ordered sequence of primitive tasks executable from the initial state.</p>
        <p>We developed risk-aware HTN planning, a framework that extends classical HTN planning with
constructs that consider the uncertainty of real-world environments [13]. It enables modelling of risk
and uncertainty through probability distributions over actions’ costs and efects, where costs are defined
as unbounded negative functions. This probability distribution ranges from totally known to totally
unknown, depending on the available knowledge about the application domain. When the probability
distribution is totally known, planning decisions have to be made under risk, whereas when it is not
totally known, they have to be made under uncertainty. In an environment of risk and uncertainty,
planning decisions should be made by following some risk-attitude, which defines the decision makers’
mindset towards taking risks. Agents can be risk-seeking, i.e., prefer risky choices and tolerate losses,
risk-averse, i.e., avoid risky choices, or risk-neutral, i.e., indiferent to risk. Risk attitudes are expressed
mathematically using utility functions, which evaluate the action outcomes according to the risk attitude.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Characterising Texas Hold’em as a Planning Domain</title>
      <p>Treating Texas Hold’em as an AI planning problem requires capturing its relevant aspects as domain
knowledge, which is then formalised into a planning domain. Incomplete or inaccurate knowledge
can result in domain models that misrepresent the application domain [21], leading to plans that fail
real-world execution. We address this by applying a conceptual framework that supports identifying
and categorising aspects of real-world planning domains [14]. We discuss the game knowledge traits in
terms of objectives, tasks, quantities, non-determinism, behaviour, constraints, and qualities.</p>
      <p>Objectives. At the highest level, a Texas Hold’em player’s hard goal is to maximise long-term profit
across many hands, with the subgoal of winning each individual hand. Both goals are quantitative.</p>
      <p>Tasks. Texas Hold’em involves a hierarchy of tasks with varying levels of complexity. At the
highest level, playing the game is a complex, long-term task composed of repeated iterations of playing
individual rounds. Each round consists of sub-tasks corresponding to the diferent betting stages
(pre-flop, flop, turn, river), which further decompose into finer-grained tasks. For example, before
each round begins, a player must check their table position to determine whether they are required
to post BB or SB. Within each stage of a round, a central task is selecting an action (folding, calling,
betting, or raising), which depends on the current game state (e.g., position, stack size, community
cards, opponents’ actions) and the strategic framework the player adopts (e.g., exploitative versus
equilibrium-based play, bluf frequency, risk tolerance). In addition, tasks in the game can often be
accomplished in multiple ways. For example, the task of determining which action to take can vary
depending on the player’s strategy, their tendency to bluf, and their use of deception.</p>
      <p>Quantities. Texas Hold’em contains several quantities that should be considered when representing
the game knowledge. The quantities can be resources, such as the money and chips of each player, costs
of the actions performed by the player or the opponents, i.e., the winnings and losses, the expected
utility of each action computed based on the player risk attitude, and environmental inputs, such as the
number of players, stack size, pot, amount to call, amount to raise, BB, and SB. The computation of
some quantities is done using linear functions, such as the computation of the expected utility, number
of players, and pot size, or non-linear function, such as the utility of a certain action outcome using the
utility function of the player, probability of at least one player calling or raising, and evaluation metrics
to assess hand strength (e.g., Immediate Hand Strength, Hand Potential, and Efective Hand Strength),
and to assess the pot odds for each bet (e.g., Expressed Pot Odds and Implied Pot Odds [22]).</p>
      <p>Non-determinism. One of the key aspects of Texas Hold’em is the presence of uncertainty, which
can lead to risky situations, where the probability distribution of the outcome is known or can be
inferred from statistical inference [13], and/or uncertain situations, where the probability distribution
is partially known, unknown, or unknowable. There are mainly two sources of uncertainty in this
game. The first is related to the inherent randomness of the game, introduced by the shufled deck
and the stochastic distribution of cards. This is often modelled as a chance player, a concept used in
game-theoretic formulations to represent non-deterministic events such as the dealing of cards [23].
The probabilities associated with these events are fully known and calculable, based on combinatorial
analysis. For example, the probability of being dealt a specific pocket pair, such as two Aces, in Texas
Hold’em is calculated using the formula: (︀ 42)︀ · ︀( 522)︀ − 1 ≈ 0.452%. This kind of probabilistic reasoning
also applies to other scenarios, such as the odds of the opponent having a stronger hand, the probability
of improving the hand after the flop, such as completing a straight draw, or the probability of specific
cards appearing in the game’s stages. The other source of uncertainty is related to the hidden or private
information, such as the opponent’s hand, their strategies and risk attitudes. Because poker is a game
of imperfect information, players cannot directly observe opponents’ hands. However, by interpreting
betting patterns and timing, a skilled player can infer likely hand strength. For example, if an opponent
consistently bets and raises aggressively, we might infer that they hold a strong hand. However, such
inferences are strategically complex because opponents are aware that their actions convey information.
Thus, they may bluf, intentionally acting strong (e.g., raising with a weak hand) to deceive other players
into folding better hands. This interplay of belief modelling and deception introduces uncertainty.
An additional and more technical source of uncertainty arises when considering poker-playing AI
agents. Unlike deterministic systems that always choose the single best action, advanced agents typically
compute a probability distribution over legal actions in a given game state. Even if raising has the highest
expected value, the agent may occasionally choose to call or fold with a small probability. This form of
action randomisation is intentional and serves to reduce exploitability by avoiding predictable patterns.
These uncertainties in the game create a situation where there is always a degree of risk involved in
every hand, as players must weigh the potential losses and gains and make decisions accordingly. As a
result, these uncertainties make the consequences and costs of the player non-deterministic, ranging
from known to unknown probability distributions of consequences, depending on player’s knowledge.</p>
      <p>Behaviour. Players in Texas Hold’em have risk attitudes (or risk tolerance), where some players
are risk-loving, while others are risk-averse. Multiple works study the dynamics of the player’s risk
attitude during the game. For example, in [24], it is shown that within a reference point of wealth,
players are risk-averse and the risk aversion decreases as wealth moves away from the reference point
in either direction, while in [25], it is shown that players have lower marginal utility for additional
wealth when they are wealthy than when they are poor. Although poker is not typically a cooperative
game, the trust a player has in opponents can influence their decisions. For example, in [ 26], it is shown
that players fold more often when their opponent’s face appears trustworthy, showing how visual cues
shape trust in decisions.</p>
      <p>Constraints. Most tasks in the game have ordering constraints for their execution, i.e., are
sequentially dependent. That is, they are totally ordered and certain decisions (e.g., posting blinds) must occur
before others (e.g., choosing an action), and this order must be respected to maintain the game rules.</p>
      <p>Qualities. When automating the planning process of a Texas Hold’em player, it is important that the
behaviour of the planning system is explainable regarding three perspectives: the formation of domain
knowledge, the formation of plans computed in the game, and the planning process itself.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Mapping Texas Hold’em to Risk-Aware HTN Planning</title>
      <p>With the key components of Texas Hold’em identified as planning traits, we examine how gameplay can
be modelled using risk-aware HTN planning, a framework well-suited to capture the game’s uncertainty,
probabilistic outcomes, and strategies, while supporting decision-making under varying risk attitudes.</p>
      <p>We begin by mapping the diferent components of the Texas Hold’em domain to corresponding
constructs in risk-aware HTN planning. Table 1 summarises this mapping, including examples from
gameplay. Generally, the tasks performed in the game can be modelled as compound tasks and operators
in HTN planning. Game actions, such as call, fold, and check, can be modelled as operators, with
preconditions governing the legality of game rules, i.e., allowing the execution of only valid actions in
a specific game state (e.g., street). More complex game actions, such as raising, can be modelled as a
compound task that can be decomposed by multiple methods, based on variables, such as the amount
to raise. Here, we can apply abstraction, a common concept used in AI poker to minimise the number
of possible actions that can be performed (e.g., [27, 28]). Consider the case of raise amount, where bet
sizes are always relative, typically measured against either the current pot size or the size of the blinds.
For instance, a bet of 100 chips may be considered large when the pot is only 10 chips, but the same
amount would be negligible if the pot has already grown to 10,000 chips. Due to this relativity, we can
categorise our raising options into multiple abstract levels, where the chip amounts are determined
relative to the current pot size, allowing the agent to scale its raises appropriately with the game state.</p>
      <p>Other decision-making processes in the game, such as determining which action to take, can also be
modelled as compound tasks. These tasks may be decomposed using multiple methods, each governed
by a set of preconditions that not only reflect the rules of the game but also incorporate strategic
factors, such as blufing and deception. For instance, the compound task for selecting an action can be
decomposed via a specific method triggered when the agent intends to bluf. The decision of which
method to apply can be dynamically selected during planning, depending on the agent’s current strategy,
potentially encoded in the game state, and its risk attitude. Additionally, strategic metrics (e.g., efective
hand strength, expressed pot odds) that are computed to determine an action to be taken can be
encoded in the preconditions of operators and methods. While these metrics may need to be computed
dynamically via external functions, they play a key role in action selection.</p>
      <p>Based on the logical or temporal structure of the poker game, ordering constraints between the tasks
modelling the game knowledge can be modelled using totally, partially, or unordered task networks [8].</p>
      <p>The non-deterministic aspects of Texas Hold’em can be represented in several ways, depending
on the nature of the uncertainty involved. When the underlying probabilities are fully known, such
as the likelihood of being dealt specific hole cards, they can be represented as chance events with
probabilistic outcomes, often modelled via a chance player whose actions introduce stochastic efects
into the environment. When the uncertainty is related to hidden information, such as the true value of
the pot or the opponent’s hand strength, modelling is more complex. Here, estimated metrics, such
as expressed pot odds and implied pot odds, can inform probabilistic estimates and be modelled as
preconditions of operators/methods. These estimates afect the operators’ efects and costs (in terms of
wins and losses) and make them variable. This modelling allows the agent to reason under uncertainty
even when outcomes are not strictly random but instead depend on incomplete or inferred information.</p>
      <p>Player behaviour is captured via a utility function , which evaluates outcomes in terms of expected
utility based on the agent’s risk attitude. This utility function is a component of the risk-aware HTN
planning formulation, enabling the computation of plans with the highest expected utility. Such plans
reflect an agent’s individual preferences toward risk when playing poker.</p>
      <p>Finally, the initial task network contains the overarching gameplay objective, such as playing the
game of multiple rounds or play a round, while the initial state includes contextual game information
like pot size, stack, SB, BB, position of the player around the table, current street, and possibly the state
of other opponents regarding their aggression levels and known strategies. Together, these elements
define the planning problem instance that governs the agent’s decision-making throughout the game.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Layered Planning Architecture for Adaptive Play</title>
      <p>The primary challenge in developing approaches that model and solve Texas Hold’em as a planning
problem lies in the highly dynamic nature of the game, which makes long-term planning non-trivial.
Long-term planning has to be made on estimations of opponents’ behaviour, hand strengths, pot
assessments, hole card assessments, and the likely future flow of the game, all of which are inherently
uncertain. As a result, preconditions assumed during planning may no longer hold during execution,
making long-term plans fail to match the actual flow and game events. This mismatch between planned
and actual (game) trajectories is a well-known challenge in dynamic environments.</p>
      <p>This challenge is commonly addressed by interleaving planning and execution [29]. If execution
fails due to unforeseen events, an execution module can trigger plan repair or replanning [30, 17,
31]. In rapidly changing environments like poker, such failures may occur frequently, leading to
repeated replanning. This not only imposes a high computational cost but can also slow down agent’s
responsiveness. To mitigate this, some approaches do not replan starting from the initial task network,
but opt for localised planning, that is, do plan repair or replan from the point of failure [31].</p>
      <p>To address this primary challenge in Texas Hold’em, we propose a layered architecture, inspired by
prior work in game environments [17]. Figure 1 provides an overview of the proposed architecture.
At the top is the long-term planning layer represented by risk-aware HTN planning. This layer is
responsible for generating abstract plans with the highest expected utility that may include compound
tasks and benefit from knowledge abstraction. Below that is a reactive planning layer responsible
for plan refinement at runtime, enabling adaptation to evolving game conditions. That is, the layer
interprets abstract plans based on current context or failures and adjusts plans dynamically while still
aiming for plans with the highest expected utility. Reactive techniques, such as Behaviour Trees [32]
and Monte Carlo Tree Search [33] are applicable here. The architecture contains two other layers: the
execution and learning layers. The execution layer interacts directly with the game environment. It
executes actions from refined plans and continuously monitors the game state. If an action fails, let’s
say, due to invalid preconditions, the execution layer triggers the reactive planning layer to revise the
action or the compound task it belongs to. For example, suppose raise to be a high-level task and the
reactive planning layer to specify raising a certain number of chips. If the specified raise is no longer
valid due to an intervening player action with a higher raise, the reactive planning layer must adapt
and generate a valid raise. For more substantial deviations or failures, such as when raising is no longer
a valid option, the player busts, or a fundamental strategy change is required, the reactive planning
layer escalates the issue to the long-term planning layer for repair or recomputation of the abstract
plan. Importantly, this process does not need to start from scratch; the system can retain and leverage
search space traversal history and resume from the point of failure.</p>
      <p>
        The learning layer is concerned with opponent modelling. Integrating it into the planning loop
enables the agent to adapt its strategies based on observed opponent behaviour. Learning from patterns,
such as fold frequency or blufing tendencies, can improve AI agents playing poker [
        <xref ref-type="bibr" rid="ref1">1, 34</xref>
        ]. Building
opponent models through learning allows agents to act exploitatively [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Embedding these models
in the planning layers lets the agent adjust task and selection based on inferred strategies. Moreover,
repeated interaction can inform not only strategy but also risk attitude: if high-risk play consistently
fails against certain opponents, the agent can adapt its utility function to favour lower-risk decisions.
Such changes in the risk attitude are propagated to the reactive planning layer, which propagates it to
long-term planning whenever replanning is required. Thus, integrating learning, such as reinforcement
learning, enables both convergence to equilibrium strategies and dynamic adjustment of risk sensitivity.
      </p>
      <p>We further separate concerns of estimation tasks (e.g., pot assessment, hand strength estimation) as
distinct components, which can also be seen as systems external to the architecture. These systems may
feed context updates into the reactive planning and learning layers. If that new context significantly
alters the game dynamics, these layers may prompt the long-term planning layer to initiate replanning.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>We propose a new direction for AI poker research by framing No-Limit Texas Hold’em as a risk-aware
HTN planning problem. This perspective leverages the game’s strategic depth and inherent uncertainty,
ofering a structured approach to decision-making that accounts for win/loss outcomes and risk attitudes.
Our work introduces this framing and lays the groundwork for modelling and solving poker through
risk-aware HTN planning. We began by identifying the key knowledge characteristics of No-Limit
Texas Hold’em using a conceptual framework for real-world planning domains. These features informed
our mapping of the game’s components into constructs within the risk-aware HTN planning framework.
Building on this, we proposed a layered architecture that supports adaptive decision-making during
play by interleaving long-term planning, reactive refinement, learning, and execution monitoring.</p>
      <p>Our proposal opens several avenues for further investigation. One question is which game aspects
can and cannot be efectively modelled within risk-aware HTN planning. For example, opponent
modelling and strategy could be learned, but those outputs should still be integrated into planning
through modelling in the initial state and preconditions of actions/methods. Another area of interest is
the practical realisation of the proposed framework. We are working on encoding a domain model for
No-Limit Texas Hold’em and implementing the layered planning architecture.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
[7] T. Davis, K. Waugh, M. Bowling, Solving large extensive-form games with strategy constraints, in:</p>
      <p>AAAI Conference on AI, volume 33, 2019, pp. 1861–1868.
[8] I. Georgievski, M. Aiello, HTN planning: Overview, comparison, and beyond, AI 222 (2015)
124–156.
[9] M. Weser, D. Of, J. Zhang, HTN robot planning in partially observable dynamic environments, in:</p>
      <p>IEEE International Conference on Robotics and Automation, 2010, pp. 1505–1510.
[10] E. Alnazer, I. Georgievski, N. Prakash, M. Aiello, A role for HTN planning in increasing trust in
autonomous driving, in: IEEE International Smart Cities Conference, 2022, pp. 1–7.
[11] I. Georgievski, F. Nizamic, A. Lazovik, M. Aiello, Cloud Ready Applications Composed via HTN</p>
      <p>Planning, in: IEEE Int’l Conf. on Service-Oriented Computing and Applications, 2017, pp. 23–33.
[12] I. Georgievski, T. A. Nguyen, F. Nizamic, B. Setz, A. Lazovik, M. Aiello, Planning meets activity
recognition: Service coordination for intelligent buildings, Perv. and Mob. Comp. 38 (2017) 110–139.
[13] E. Alnazer, I. Georgievski, M. Aiello, Risk Awareness in HTN Planning, N. 2204.10669, arXiv, 2022.
[14] E. Alnazer, I. Georgievski, Understanding real-world AI planning domains: A conceptual
framework, in: SummerSOC, Communications in Computer and Information Science, 2023, pp. 3–23.
[15] T. Humphreys, Exploring HTN planners through example, in: Game AI Pro 360: Guide to</p>
      <p>Architecture, CRC Press, 2019, pp. 103–122.
[16] M. Muller, Multicriteria Evaluation in Computer Game-Playing, and its Relation to AI Planning,
in: International Conference on AI Planning &amp; Scheduling, 2002, p. 1.
[17] X. Neufeld, Long-term planning and reactive execution in highly dynamic environments, Ph.D.</p>
      <p>thesis, Otto von Guericke University Magdeburg, 2020.
[18] R. Vane, P. E. Lehner, Hypergames and AI in automated adversarial planning, in: DARPA Planning</p>
      <p>Workshop, 1990, pp. 198–206.
[19] D. Sklansky, M. Malmuth, Hold’em Poker for Advanced Players, Two Plus Two Publishing (1999).
[20] S. Braids, The Intelligent Guide to Texas Hold’em Poker, Chartley Publishing LLC, 2010.
[21] T. S. Vaquero, J. R. Silva, F. Tonidandel, J. C. Beck, itSIMPLE: towards an integrated design system
for real planning applications, The Knowledge Engineering Review 28 (2013) 215–230.
[22] D. Billings, A. Davidson, J. Schaefer, D. Szafron, The challenge of poker, AI 134 (2002) 201–240.
[23] M. Bowling, N. Burch, M. Johanson, O. Tammelin, Heads-up limit hold’em poker is solved, Science
347 (2015) 145–149.
[24] G. Smith, M. Levere, R. Kurtzman, Poker player behavior after big wins and big losses, Management</p>
      <p>Science 55 (2009) 1547–1555.
[25] M. Rabin, Risk aversion and expected-utility theory: A calibration theorem, in: Handbook of the
fundamentals of financial decision making: Part I, World Scientific, 2013, pp. 241–252.
[26] E. J. Schlicht, S. Shimojo, C. F. Camerer, P. Battaglia, K. Nakayama, Human wagering behavior
depends on opponents’ faces, PloS one 5 (2010) e11663.
[27] M. Johanson, N. Burch, R. Valenzano, M. Bowling, Evaluating state-space abstractions in
extensiveform games, in: Int’l Conf. on Autonomous Agents and Multi-Agent Systems, 2013, pp. 271–278.
[28] M. Moravčík, M. Schmid, N. e. a. Burch, Deepstack: Expert-level artificial intelligence in heads-up
no-limit poker, Science 356 (2017) 508–513.
[29] I. Georgievski, M. Aiello, Automated planning for ubiquitous computing, ACM CSUR 49 (2016)
63:1–63:46.
[30] D. J. Soemers, M. H. Winands, Hierarchical task network plan reuse for video games, in: IEEE</p>
      <p>Conference on Computational Intelligence and Games, 2016, pp. 1–8.
[31] Y. Bansod, D. Nau, S. Patra, M. Roberts, Integrating Planning and Acting With a Re-Entrant HTN</p>
      <p>Planner, in: ICAPS Workshop on Hierarchical Planning, 2021, pp. 22–54.
[32] G. Robertson, I. Watson, Building behavior trees from observations in real-time strategy games,
in: IEEE Int’l Symposium on Innovations in Intelligent Systems and Applications, 2015, pp. 1–7.
[33] M. Chung, Monte Carlo planning in RTS games (2005).
[34] S. Ganzfried, T. Sandholm, Potential-aware imperfect-recall abstraction with earth mover’s distance
in imperfect-information games, in: AAAI Conference on Artificial Intelligence, volume 28, 2014.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Johanson</surname>
          </string-name>
          ,
          <article-title>Robust strategies and counter-strategies: Building a champion level computer poker player</article-title>
          ,
          <source>Ph.D. thesis</source>
          , University of Alberta,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Beattie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Nicolai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gerhard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Hilderman</surname>
          </string-name>
          ,
          <article-title>Pattern classification in no-limit poker: A head-start evolutionary approach</article-title>
          ,
          <source>in: CCSCSI on Advances in AI</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>204</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gilpin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sandholm</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Sørensen</surname>
          </string-name>
          ,
          <article-title>A heads-up no-limit Texas Hold'em poker player: Discretized betting models and automatically generated equilibrium-finding programs</article-title>
          ,
          <source>in: International Joint Conference on Autonomous Agents and Multiagent Systems</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>911</fpage>
          -
          <lpage>918</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Rubin</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Watson</surname>
          </string-name>
          , Computer poker: A review,
          <source>Artificial Intelligence</source>
          <volume>175</volume>
          (
          <year>2011</year>
          )
          <fpage>958</fpage>
          -
          <lpage>987</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nau</surname>
          </string-name>
          , T. Throop,
          <article-title>Computer bridge: A big win for AI planning</article-title>
          ,
          <source>AI Mag</source>
          .
          <volume>19</volume>
          (
          <year>1998</year>
          )
          <fpage>93</fpage>
          -
          <lpage>93</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Botea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Koenig</surname>
          </string-name>
          ,
          <article-title>Ofline planning with hierarchical task networks in video games</article-title>
          ,
          <source>in: AAAI Conference on AI and Interactive Digital Entertainment</source>
          , volume
          <volume>4</volume>
          ,
          <year>2008</year>
          , pp.
          <fpage>60</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>