=Paper=
{{Paper
|id=Vol-2419/paper8
|storemode=property
|title=Computational Strategies for the Trustworthy Pursuit and the Safe Modeling of Probabilistic Maintenance Commitments
|pdfUrl=https://ceur-ws.org/Vol-2419/paper_8.pdf
|volume=Vol-2419
|authors=Qi Zhang,Edmund Durfee,Satinder Singh
|dblpUrl=https://dblp.org/rec/conf/ijcai/ZhangDS19
}}
==Computational Strategies for the Trustworthy Pursuit and the Safe Modeling of Probabilistic Maintenance Commitments==
Computational Strategies for the Trustworthy Pursuit and the Safe Modeling of Probabilistic Maintenance Commitments Qi Zhang , Edmund Durfee , Satinder Singh Computer Science and Engineering, University of Michigan {qizhg,durfee,baveja}@umich.edu Abstract worthy agent, given that an agent often lacks complete con- trol over its environment. Specifically, the form of interde- Most research on probabilistic commitments fo- pendency we focus on is with respect to a scenario where an cuses on commitments to achieve conditions for agent (the commitment provider) makes a social commitment other agents. Our work reveals that probabilis- [Singh, 1999; Kalia et al., 2014] to another (the commitment tic commitments to instead maintain conditions for recipient). When stochasticity is inherent in the environ- others are surprisingly different from their achieve- ment, the provider cannot guarantee to bring about the out- ment counterparts, despite strong semantic similar- comes that the recipient expects [Kwiatkowska et al., 2007; ities. We focus on the question of how the com- Nuzzo et al., 2019], and in fact could discover after making mitment recipient should model the provider’s ef- the commitment that how it planned to try to bring about the fect on the recipient’s local environment, with only outcomes would be more costly or risky than it had previ- imperfect information being provided in the com- ously realized. Given that the recipient is unable to predict mitment specification. Our theoretic analyses show precisely the future situations it will face, it’s also unclear that we can more tightly bound the inefficiency how it should model the commitment safely and effectively. of this imperfect modeling for achievement com- mitments than for maintenance commitments. We There exists work focusing on semantics and mechanisms empirically demonstrate that probabilistic mainte- for an agent to follow such that it is assured of faithfully nance commitments are qualitatively more chal- pursuing its commitments despite the uncertainty [Jennings, lenging for the recipient to model, and addressing 1993; Xing and Singh, 2001; Winikoff, 2006; Durfee and the challenges can require the provider to adhere to Singh, 2016]. Previous work articulated the perspective that a more detailed profile and sacrifice flexibility. such a probabilistic commitment should be considered ful- filled if the provider’s actions would have brought about the desired outcome with a high enough expectation, even if in 1 Introduction a particular instance the desired outcome was not realized. Safe cooperative behavior among humans is often realized via That is, the provider acted in good faith. Thus, even if the social commitments that constrain people to acting reliably. provider changes its course of action as it learns more about By making a commitment, a person promises to act in a man- costs and risks on the fly, it can still fulfill its commitment ner to fulfill it. This form of commitment-based interaction if whatever course of action it pursued could be expected to also exists among autonomous agents. To build safe artificial achieve the desired outcome with at least the promised like- multiagent systems, we need trustworthy and reliable mecha- lihood. With this perspective, previous work has focused nisms that are accountable for pursuing and modeling agent- largely on commitments of achievement [Xuan and Lesser, based commitments. Adopting a decision-theoretic frame- 1999; Maheswaran et al., 2008; Witwicki and Durfee, 2009; work, this paper formulates and studies problems that arise Zhang et al., 2016; Zhang et al., 2017], which we also call in managing commitments in multiagent systems. enablement commitments, where the provider commits to In multiagent systems, agents are often interdependent in changing some features of the state in a way desired by the re- the sense that what one agent does can be beneficial or harm- cipient with some probability by some time. For example, the ful to another. If trust means having confidence that another recipient plans to take an action (e.g., move from one room to will act so as to reciprocate help in a safe and reliable manner, another) with a precondition (e.g., the door separating them then being trustworthy—worthy of such trust—constrains the is open) that it is counting on the provider to enable. agent to acting thusly. To persuade an agent designer to create This paper focuses on another form of commitment, which trustworthy agents, other agents (individually and/or collec- we refer to as a maintenance commitment, where instead of tively) can form and share opinions about agents’ trustwor- committing to some course of action that in expectation will thiness, and won’t act to benefit agents with a bad reputation. enable conditions the recipient wants, the provider instead Our work assumes that the designer has been persuaded. commits to courses of action to probabilistically avoid chang- Even so, however, it isn’t always clear how to create a trust- ing conditions that are already the way the recipient wants them maintained, up until a particular time. After that time, We assume the recipient’s cumulative reward can be ex- the condition cannot be assumed to remain unchanged, and pressed in the trajectory of l: before that time, there is a (usually small) probability it could be changed at any point. For example, an open door the re- R(s, a, s0 ) = R(s0 ) = R((l0 , u0 )) = R(l0 ). cipient needs might be initially achieved, but as the provider Note that though the value of u does not directly determine opens and closes other doors during its housekeeping tasks, a the reward, it does affect the value of l0 at the next time step. resulting draft could close the door the recipient needs open. Throughout, we refer to Pu as the true profile of u, which is The provider could plan its tasks to postpone altering the fully determined by the provider’s policy. riskiest doors as long as possible, but an ill-placed breeze could close the door at any time. 2.1 Commitment Semantics Our claim is that decision-theoretic mechanisms for repre- An enablement or maintenance commitment is concerned senting and reasoning about enablement commitments can- with state feature u that is shared by both agents but only not straightforwardly apply to maintenance commitments be- controllable by the provider. Intuitively, a commitment pro- cause, despite strong superficial similarities, the two types of vides partial information about Pu from which the recipient commitments are fundamentally different. We will substanti- can plan accordingly. We will refer to u as the commit- ate this claim analytically and empirically. This in turn raises ment feature. In this paper, we focus on the setting where the questions of whether modifications could be made to the value of u is binary, letting u+ , as opposed to u− , be existing decision-theoretic mechanisms for representing and the value of u that is desirable for the recipient. Further, we reasoning about enablement commitments so that they can be assume that u can be toggled at most once. Citations with applied to maintenance commitments. We will show empir- this assumption include [Hindriks and van Riemsdijk, 2007; ical results that cast doubt on whether this is possible, thus Witwicki and Durfee, 2009; Zhang et al., 2016]. In transac- suggesting that in the future a different treatment of mainte- tional settings, a feature (e.g., possession of goods) changing nance commitments should be considered. once is common. It is also common in multiagent planning domains where one agent establishes a precondition needed 2 Preliminaries by an action of another. Some cooperative agent work re- In this section, we describe the decision-theoretic setting quires agents to return changed features to prior values (e.g., we adopt for analyzing probabilistic commitments, includ- shutting the door after opening and passing through it). And ing both enablement commitments and maintenance commit- in the extreme case where toggling reliably repeats frequently ments. We review the prior work in the definition and seman- (e.g., a traffic light) there may be no need for explicit commit- tics of enablement commitments, which can be extended to ments. More generally, while removing this assumption can maintenance commitments. complicate the specification of a commitment (e.g., a com- The recipient’s environment is modeled as an MDP de- pound commitment to enable and then maintain a condition fined by the tuple M = (S, A, P, R, H, s0 ) where S is the over a time interval), we think the fundamental difference be- finite state space, A is the finite action space, P : S × A → tween modeling enablement and maintenance commitments ∆(S) (∆(S) denotes the set of all probability distributions can be best theoretically explained and conceptually under- over S), R : S × A × S → R is the reward function, H is stood without such complications. We next formally give the finite horizon, and s0 is the initial state. The state space is the definition and semantics of enablement commitments and SH partitioned into disjoint sets by the time step, S = h=0 Sh , maintenance commitments, respectively. where states in Sh only transition to states in Sh+1 . The Enablement Commitments MDP starts in s0 and terminates in SH . Given a policy Let the initial state be factored as s0 = (l0 , u0 ). For enable- π : S → A and starting in the initial state, a random tra- ment commitments, the initial value of the commitment fea- jectory is generated by ah = π(sh ), sh+1 ∼ P (sh , ah ), rh = ture is u− , i.e. u0 = u− . The provider commits to pursuing a R(sh , ah , sh+1 ) for h = 0, · · · , H − 1. The value function of course of action that can bring about the commitment feature π PH−1 π is VM (s) = E[ h0 =h rh0 |π, sh = s] where h is such that desirable to the recipient with some probability. Formally, an ∗ s ∈ Sh . There exists an optimal policy in M , denoted as πM , enablement commitment is defined by tuple ce = (Te , pe ), π and its value function maximizes VM for all s ∈ S and is ab- where Te is the enablement commitment time, and pe is the ∗ breviated as VM . The value of the initial state is abbreviated enablement commitment probability. The provider’s commit- π π as vM := VM (s0 ). ment semantics is to follow a policy µ that sets u to u+ by As one way to model the interaction between the provider time step Te with at least probability pe , i.e. and the recipient [Witwicki and Durfee, 2010; Zhang et al., 2016], we assume that the recipient’s state can be factored as Pr(uTe = u+ |u0 = u− , µ) ≥ pe . s = (l, u), where l is the set of all the recipient’s state features Maintenance Commitments locally controlled by the recipient, and u is the state feature As a reminder, our maintenance commitment is motivated by shared with the provider. The provider and the recipient are scenarios where the initial value of state feature u is desir- weakly coupled in the sense that u is the only shared state able to the recipient, who wants it to maintain its initial value feature and is only controllable by the provider. Formally, the for some interval of time (e.g., [Hindriks and van Riemsdijk, dynamics of the recipient’s state can be factored as 2007; Duff et al., 2014]), but where the provider could want P (s0 |s, a) = P ((l0 , u0 )|(l, u), a) = Pu (u0 |u)Pl (l0 |(l, u), a) . to take actions that could change it. Formally, a maintenance commitment is defined by tuple cm = (Tm , pm ), where Tm from time step t = Te − 1 to t = Te with probability pe , and is the maintenance commitment time, and pm is the main- does not toggle u at any other time step. tenance commitment probability. Given such a maintenance Definition 2. Given maintenance commitment cm = commitment, the provider is constrained to follow a policy pessimistic (Tm , pm ), its pessimistic profile Pbu,c toggles u in the µ that keeps u unchanged for the first Tm time steps with at m transition from time step t = 0 to t = 1 with probability least probability pm . Since u can be toggled at most once, it 1 − pm , and from t = Tm to t = Tm + 1 with probability is equivalent to guaranteeing u = u+ at Tm , i.e. one. It does not toggle u at any other time step. Pr(uTm = u0 |u0 = u+ , µ) ≥ pm . In a previous, unpublished workshop paper [Zhang et al., 2.2 The Recipient’s Approximate Profile 2018], we have addressed the general topic of using an ap- proximate profile, especially the pessimitisc profile, for the The commitment semantics provides partial information on recipient to model probabilistic commitments, but didn’t pro- Pu by specifying the profile at the single time step, rather vide any results. This paper presents both theoretical anal- than in a potentially more gradual manner. Why would we ysis (Section 3) and empirical results (Section 4) that reveal choose to do that? The most important reason is that it opens the fundamental difference between enablement and mainte- up even more latitude for the provider to evolve its policy as it nance commitments. learns more about its environment: instead of needing to meet probabilistic expectations at multiple time steps, it can mod- ify its policy much more flexibly as long as in the long run 3 Theoretical Analysis (by the commitment time) it hits the target probability. Prior In this section, we derive bounds on the suboptimality of the work has shown the value of having such flexibility [Zhang pessimistic profiles. Our analysis makes the following two et al., 2017]. Here, we are interested in the problem when assumptions. Assumption 1 intuitively says that u+ estab- the recipient creates a profile Pbu with this partial information lishes a condition for an action that would be irrational, or as an approximation of Pu , and plans accordingly. Specifi- even unsafe, to take when u− holds. For example, if u+ is cally, we are interested in the quality of the plan computed a door being open, then the action of moving into the door- from approximate profile Pbu when evaluated in (true) profile way could be part of an optimal plan, but taking that action if c = (S, A, Pb, R, H, s0 ) be the the door is closed (u− ) never is. Assumption 2 is a simplify- Pu . Formally, given Pbu , let M ing assumption for our analysis stating the true profile agrees approximate model that only differs from M in terms of the with the pessimistic profile after the commitment time, so that profile of u, i.e. Pb = (Pl , Pbu ). The quality of Pbu is evaluated the suboptimality is caused by the imperfect modeling by the using the difference between the value of the optimal policy commitment time. for Mc and the value of the optimal policy for M when both Assumption 1. Let s− = (l, u− ) and s+ = (l, u+ ) be a pair policies are evaluated in M starting in s0 , i.e. of states that only differ in u. For any M with arbitrary profile ∗ π∗ Pu , we have Suboptimality : vM − vMM c . Pl ·|s− , πM ∗ (s− ) = Pl ·|s+ , πM ∗ (s− ) . Note that when the support of Pu is not fully contained in the support of Pbu , the recipient could end up in un-modelled Assumption 2. Pu (uh+1 |uh ) agrees with the pessimistic ∗ π∗ states when executing πM c in M , which makes VM M ill- profile for h ≥ T , where T is the commitment time. c defined. In this paper, we fix this by re-planning: during exe- To derive bounds on enablement and maintenance commit- ∗ cution of πM c, the recipient re-plans from un-modelled states. ments, we will make use of the following lemma, where M + Previous work chooses an intuitive and straightforward ap- (M − ) is defined as the recipient’s MDP identical to M ex- proximate profile for enablement commitments that models a cept that u is always set to u+ (u− ). Lemma 1 directly follows single branch (at the commitment time) for when u− proba- from Assumption 1, stating that the value of M − is no more bilistically toggles to u+ . This strategy reduces the complex- than that of M + and the value of any M is between the two. ity of the recipient’s reasoning; the inefficiency caused by Lemma 1. For any M with arbitrary profile Pu and initial imperfect modelling is easily outweighed by computational ∗ ∗ ∗ value of u, we have vM − ≤ vM ≤ vM + . benefits [Witwicki and Durfee, 2010; Zhang et al., 2016]. This profile takes a pessimistic view in the sense that u is Proof. Let’s first consider the case in which Pu toggles u (stochastically) enabled at the latest possible time consistent only at a single time step. We show vM ∗ ∗ − ≤ vM by con- with the commitment, and, if it was not enabled at that time, structing a policy in M for which the value is vM ∗ − by mim- will never be enabled after the commitment time. Therefore, ∗ − icking πM − . Whether u is initially u and later toggled to u+ we refer to it as the pessimistic profile, as formalized in Defi- or the other way around, we can construct a policy πM that nition 1. For maintenance commitments, the pessimistic pro- chooses the same actions as πM ∗ − − assuming u = u through- file should probabilistically disable u at the earliest time, and out the episode. Formally, for any s = (l, u− ), letting − should determinisically disable u after the commitment time, s+ = (l, u+ ), we have πM (s+ ) = πM (s− ) = πM ∗ − − (s ). as formalized in Definition 2. By Assumption 1, πM in M yields the same trajectory distri- Definition 1. Given enablement commitment ce = (Te , pe ), ∗ − πM ∗ bution of l as πM − in M , and therefore vM = vM − since pessimistic its pessimistic profile Pbu,c e toggles u in the transition value only depends on the trajectory of l. ∗ ∗ Similarly, we show vM ≤ vM + by constructing a policy + πM + in M for which the value is vM ∗ by mimicking πM ∗ . ... …... 0 1 L0=5 L=14 Formally, for time steps when u = u− in M , let πM + (s+ ) = Approx. profile policy ∗ πM (s− ). For time steps when u = u+ in M , let πM + (s+ ) = Optimal policy πM (s+ ), where s− = (l, u− ), s+ = (l, u+ ). ∗ ... …... For the case in which Pu toggles u at K > 1 time steps, we 0 1 L0=5 L=14 can decompose the value function Pu as the weighted aver- Figure 1: 1D Walk. Up: Example in the proof of Theorem 1. Down: age of K value functions corresponding to the K profiles that Example in the proof of Theorem 2. toggle u at a single time step, and the weights of the average are the toggling probabilities of Pu at these K time steps. Proof. The derivation is straightforward from Lemma 2: 3.1 Bounding Suboptimality for Enablement π∗ π∗ ∗ ∗ ∗ ∗ Here, we derive a bound on the suboptimality for enablement vM − vMM c ≤ vM + − v c ≤ vM + − v M − . M c M commitments. From the two assumptions, we also make use Then, we use a simple illustrative example to give an enable- of Lemma 2 to prove Theorem 1 that bounds the suboptimal- ment commitment for which the equality is attained. ity for enablement commitments as the difference between ∗ ∗ Example: An Enablement Commitment in 1D Walk vM − and vM + . Lemma 2 states that, for enablement com- mitments, the possible imperfect modeling of the pessimistic Consider the example of a 1D walk on [0, L], as illustrated profile can only improve the expected value. in Figure 1(top), in which the recipient starts at L0 and can Lemma 2. Given enablement commitment ce = (Te , pe ), let move right, left, or stay still. There is a gate between 0 and 1 pessimistic π ∗c π ∗c for which u+ denotes the state of open and u− denotes closed. Pbu = Pbu,c e , then we have v MM ≥v M where profile The gate toggles stochastically according to Pu . For each step M c Pu in M respects the commitment semantics of ce . until the recipient reaches either end, a −1 reward is given. Therefore, the optimal policy is to reach either end as soon as Proof. For enablement commitments, the initial value of u is possible in expectation. We assume 1 ≤ L0 < L/2 to avoid u− . Let Pu (t) be the probability that u is enabled to u+ at t in the uninteresting trivial case of vM ∗ ∗ − = vM + . A negative profile Pu , v πt be the initial state value of π when u is enabled reward is incurred when bumping into the closed gate, which from u− to u+ at t with probability one. By Assumption 2, makes Assumption 1 hold. π∗ π∗ vMM c and v cM c can be decomposed as Here, we derive an enablement commitment for which the M bound in Theorem 1 is attained. Consider L = 14, L0 = π∗ πM∗ π ∗c vMM = PTe + (1 − pe )vMM 5, H = 15, enablement commitment (Te = L−L0 = 9, pe = t=1 Pu (t)v t −, c c πM∗ π ∗ π ∗ 1), and the true profile Pu in M that toggles the gate to open at v cc = pe v TMc + (1 − pe )vMMc −. t = 4 with probability pe = 1. The optimal policy in M is to M e ∗ ∗ move left to 0. Therefore, vM = vM + = −L0 = −5. Given ∗ When u is enabled at t in M , πM c can be executed as if u the pessimistic profile, moving right to L (arriving at time 9) is not enabled, by Assumption 1, yielding identical trajectory is faster than waiting for the gate to toggle at Te = 9 and then distribution of l (therefore value) as in M c. Therefore, the reaching location 0 at time 10. Had the recipient known the recipient’s replanning at t when u = u+ will derive a better gate would toggle at time 4, it would have moved left, but by policy if possible. Therefore, the value of executing πM ∗ the time it toggles at time 4 the recipient is at location 9, and c in π∗ ∗ ∗ ∗ c, i.e. v πt M πc going to L is the faster choice. Therefore vMM c = vM − = M is no less than that in M c ≥ v TM . Therefore, e −(L − L0 ) = −9, and bound (1) is attained. π∗ PTe πM∗ π ∗c vMM c = t=1 Pu (t)v t c + (1 − pe )vMM − 3.2 Bounding Suboptimality for Maintenance ∗ ∗ PTe πM πM We next ask if the bound in Equation (1) on suboptimality in ≥ t=1 Pu (t)v Te + (1 − pe )vM − c c enablement commitments also holds for maintenance com- π ∗c π ∗c ≥pe v TM + (1 − pe )vMM − commitment semantics mitments. Unfortunately, as stated in Theorem 2, the opti- e ∗ mal policy of the pessimistic profile for maintenance com- πc =v cM . mitments can be arbitrarily bad when evaluated in the true M profile, incurring a suboptimality exceeding (1). An existence proof is given with an example. Theorem 1. Given enablement commitment ce , let Pbu = Theorem 2. There exists an MDP M with nonnegative re- pessimistic Pbu,c . The suboptimality can be bounded as wards, and a maintenance commitment cm , such that the e profile Pu in M respects the commitment semantics of cm , π∗ ∗ ∗ π∗ ∗ ∗ ∗ vM = vM + , vM M = 0, and therefore the suboptimality is − vMM ≤ vM + − vM − c vM c (1) ∗ π∗ ∗ where profile Pu in M respects the commitment semantics vM − vMM c = vM + (2) of ce . Further, there exists an enablement commitment for pessimistic which the equality is attained. where Pbu = Pbu,c m is the profile in M c. Proof. As an existence proof, we give an example of a main- Pu , Pbu are chosen. Besides using the pessimistic profile to tenance commitment in 1D Walk with nonnegative rewards approximate the true profile, we consider the following three π∗ for which vM ∗ = vM ∗ + and vM M c = 0. heuristics for generating approximate profile Pbu ∈ Pu1 : Consider 1D Walk with the same L = 14, L0 = 5, H = 15 Optimistic. As opposed to the pessimistic, the optimistic as in the example for Theorem 1. Here we offset the re- profile toggles u right after the initial time step for en- wards by +1 such that the rewards are nonnegative. Consider ablement commitments, and at the commitment time for maintenance commitment (Tm = 7, pm = 0), and Pu tog- maintenance commitments. gles the gate from open to closed at t = 6 with probability 1 − pm = 1. As illustrated in Figure 1(bottom), the optimal- Minimum Value. The toggling time minimizes the op- ity policy should take 5 steps to move directly to 0, for which timal value over all possible profiles in Pu1 , i.e., ∗ the value is vM∗ + . With probability pm , the gate is kept open arg minPbu ∈P 1 vM c, where Pu is the profile of u in M . b c u ∗ through Tm , and πM c takes 7 steps to reach 0. With probabil- Minimax Regret. The toggling time is chosen based on the ∗ ity 1 − pm , the gate is closed at t = 6, and πMc takes 19 > H minimax regret principle. Formally, π∗ steps to reach L = 14. Therefore, vMM c = 0. ∗ π∗ arg minPbu ∈P 1 maxPu ∈Pu1 vM − vMM c u Comparing bound (1) in Theorem 1 with bound (2) in The- orem 2 reveals a fundamental difference between enablement where Pu , Pbu are the profiles of u in M, M c, respectively. and maintenance commitments: maintenance commitments are inherently less tolerant to an unexpected change in the The four heuristics include two simple, inexpensive heuris- commitment feature. For enablement commitments, it is easy tics (Pessimistic and Optimistic), and two more complex and to construct a pessimistic profile, such that any unexpected expensive heuristics (Minimum Value and Minimax Regret). changes to the feature, if they impact the recipient at all, can Recall that our theoretical analysis suggests, for maintenance, only improve the expected value. Thus, if despite the pes- the worst time for toggling to u− is not right away, but right simistic profile, a recipient has chosen to follow a policy that before the recipient uses the condition, and this causes the exploits the commitment, it can never experience a true pro- poor performance of the pessimistic profile. We hypothesize file that would lead it to regret having done so. The same that the latter two heuristics can improve the pessimistic pro- cannot be said for maintenance commitments. The easily- file by identifying the worst toggling time. constructed pessimistic profile does not guarantee that any Results Here we evaluate the suboptimality of our candi- deviations from the profile can only improve the expected date heuristics for both enablement commitments and main- value. As our theoretical results show, the pessimistic pro- tenance commitments. The setting is the same as the exam- file of assuming toggling from u+ to u− right away can still ple for Theorem 1 except that the horizon is longer, L = lead to negative surprises, since if the toggling doesn’t occur 14, L0 = 5, H = 30. Figure 2 shows the mean, minimum, the profile suggests that it is safe to assume no toggling until and maximum suboptimality over all realizations of Pu ∈ Pu1 Tm , but that is not true since toggling could happen sooner, for commitment time Te , Tm ∈ {5, 7, 10}. We see that for en- after the recipient has incurred cost for a policy that would ablement commitments, the suboptimality of the pessimistic need to be abandoned. The poor performance of the pes- profile is comparable to the two more sophisticated strategies, simistic model for maintenance is because it is not reliably and the optimistic profile incurs most suboptimality overall. pessimistic enough: in the example for Theorem 2, the worst For maintenance commitments, however, the two expensive time for toggling to u− is not right away, but right before the strategies incur overall less suboptimality than the pessimistic condition would be used (the gate shutting just as the recipi- and the optimistic, yet it is difficult to identify a single best ent was about to pass through). heuristic that reliably reduces the suboptimality for all the maintenance commitments. 4 Empirical Results 4.2 Gate Control Our analyses suggest the pessimistic profile might not be the best approximate profile for a recipient to adopt for mainte- In this domain, we are concerned with the more general situ- nance commitments. In this section, we identify several alter- ation in which Pu 6∈ Pu1 can toggle u at more than one time native heuristics to create approximate profiles for the recipi- step by the commitment time, and even can toggle u after the ent, and evaluate them for both maintenance and enablement commitment time. We also consider approximate profiles Pbu commitments. We conduct our evaluations in two domains. that are not elements of Pu1 . The first is the same 1D Walk domain as in our theoretical As illustrated in Figure 3, the provider’s environment con- analysis, and the second is a Gate Control problem with a tains four cells, A ↔ B ↔ C ↔ D ↔ A, that are connected more interesting transition profile (violating Assumption 2). circularly. The provider can deterministically move to an ad- jacent cell or stay in the current cell. Upon a transition, the 4.1 1D Walk gate could toggle with probability 0.5 if the provider ends As previously defined, the 1D Walk domain restricts the set up in cell C. In the enablement commitment scenario, the of profiles to toggle u only at a single time step no later than provider gets a +1 reward if it ends up in cell C, and in the the commitment time, and agree with Assumption 2 there- maintenance commitment scenario it gets a +1 reward if end- after. We denote the set of such profiles as Pu1 from which ing up in cell A. For a given commitment, the provider adopts commitment time = 5 commitment time = 5 2.00 2.00 14 Pessimistic 14 Pessimistic Optimistic 1.75 Optimistic 1.75 12 Min Value 12 Min Value 1.50 1.50 10 Minimax Regret 10 Minimax Regret 1.25 Const 1.25 suboptimality suboptimality suboptimality suboptimality 8 8 Multi Timepoints 1.00 1.00 6 6 0.75 0.75 4 4 0.50 0.50 2 2 0.25 0.25 0 0 0.00 0.00 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 enablement commitment probability maintenance commitment probability enablement commitment probability maintenance commitment probability (a) enablement, Te = 5 (b) maintenance, Tm = 5 (a) enablement, Te = 4 (b) maintenance, Tm = 4 commitment time = 7 commitment time = 7 2.00 2.00 14 14 1.75 1.75 12 12 1.50 1.50 10 10 1.25 1.25 suboptimality suboptimality suboptimality suboptimality 8 8 1.00 1.00 6 6 0.75 0.75 4 4 0.50 0.50 2 2 0.25 0.25 0 0 0.00 0.00 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 enablement commitment probability maintenance commitment probability enablement commitment probability maintenance commitment probability (c) enablement, Te = 7 (d) maintenance, Tm = 7 (c) enablement, Te = 6 (d) maintenance, Tm = 6 commitment time = 10 commitment time = 10 14 14 Figure 4: Suboptimality in Gate Control. Markers on the curves 12 12 show the mean suboptimality over possible time steps of toggling. 10 10 Bars show the minimum and maximum. suboptimality suboptimality 8 8 6 6 4 4 2 2 other time steps T . Here, we consider T = {1, bT /2c}, 0 0 and the pessimistic heuristic is then used to match the 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 enablement commitment probability maintenance commitment probability toggling probabilities at these three time steps. (e) enablement, Te = 10 (f) maintenance, Tm = 10 Results We consider the combination of the following sce- Figure 2: Suboptimality in 1D Walk. Markers on the curves show narios: the provider can start in any one of the four cells; the mean suboptimality over possible time steps of toggling, Pu ∈ and the toggling can happen in even, odd, or all time steps. Pu1 . Bars show the minimum and maximum. The time horizon is H = 10 for both the provider and the recipient. This yields in total 12 (true) profiles Pu . Figure 4 shows the mean, maximum, and minimum suboptimality for a policy that aims to maximize its cumulative reward while re- Te , Tm ∈ {4, 6} over the 12 profiles. Similar to 1D Walk, specting the commitment semantics. The recipient gets a -0.1 the results show that the pessimistic profile is among the best reward each time step. Upon reaching cell G, the recipient for enablement commitments, but it is difficult for mainte- gets a +1 reward and the episode ends. nance commitments to identify a best heuristic, besides the Besides the four heuristics we considered for the 1D Walk, multi-timepoints, that reliably reduces the suboptimality for we further consider the following two that choose an approx- all commitment time/probability pairs we consider. Using the imate profile outside of the set Pu1 : multi-timepoints profile that is more aligned with the true pro- Constant. This profile toggles u at every time step up to the file, the suboptimality can be dramatically reduced for main- commitment time with a constant probability, and the tenance commitments, but it has a less significant impact for probability is chosen such that the overall probability of enablement commitments. This suggests that, unlike enable- toggling by the commitment time matches the commit- ment commitments where the cost is low of the provider re- ment probability. It agrees with the pessimistic profile taining considerable flexibility by only committing to a single after the commitment time. time-probability pair (leaving itself freedom to change its pol- icy dynamically so long as it meets or exceeds that target), Multi-timepoints. Besides time T , the provider also pro- maintenance commitments greatly benefit from a provider vides the recipient with the toggling probabilities for committing to a more detailed profile, sacrificing flexibility in order to improve the quality of the recipient’s expectations G to reduce the frequency and costs of negative surprises. A 3 2 D B 5 Conclusion 1 We have shown that we cannot straightforwardly extend the C 0 S ? semantics and algorithms for trustworthy fulfillment of en- y x 0 1 2 3 ablement commitments to maintenance commitments. Our Figure 3: Gate Control. Left: The provider. Cell C toggles the gate. theoretical and empirical results suggest that, despite their Right: The recipient. similarities in describing the toggling of conditions over time, maintenance commitments are fundamentally different from Computer, Communication and Software Systems, pages enablement commitments. We have theoretically shown that 220–270. Springer, 2007. the easily-constructed pessimistic profile can only improve [Maheswaran et al., 2008] Rajiv Maheswaran, Pedro the expected value in the face of unexpected changes for Szekely, Marcel Becker, Stephen Fitzpatrick, Gergely enablement commitments but not for maintenance commit- Gati, Jing Jin, Robert Neches, Narges Noori, Craig ments. Empirically, we have seen that an inexpensive pes- Rogers, Romeo Sanchez, et al. Predictability & criticality simistic approximation of the profile works comparably to metrics for coordination in complex environments. In more sophisticated approximations for enablement commit- Proceedings of the 7th International Joint Conference on ments, but not for maintenance commitments. Autonomous Agents and Multiagent Systems-Volume 2, The fact that approximating profiles well is harder for pages 647–654, 2008. maintenance commitments could mean that agents engaged in maintenance commitments might need to make a differ- [Nuzzo et al., 2019] Pierluigi Nuzzo, Jiwei Li, Alberto L. ent tradeoff. That is, for enablement, we could give the Sangiovanni-Vincentelli, Yugeng Xi, and Dewei Li. provider a lot of flexibility by only constraining it to meet Stochastic assume-guarantee contracts for cyber-physical the probability at the commitment time and so can unilater- system design. ACM Trans. Embed. Comput. Syst., ally change the profile before then. The gain in flexibility 18(1):2:1–2:26, January 2019. for the provider is worth the relatively small value loss to [Singh, 1999] M. P. Singh. An ontology for commitments the recipient from using the pessimistic profile. However, for in multiagent systems. Artificial Intelligence and Law, maintenance commitments, the potential for the recipient to 7(1):97–113, 1999. lose more value with a bad (simple or sophisticated) approxi- [Winikoff, 2006] Michael Winikoff. Implementing flexible mate profile could mean that the provider should commit to a and robust agent interactions using distributed commit- more detailed profile—the loss of flexibility for the provider ment machines. Multiagent and Grid Systems, 2(4):365– in this case is warranted because the recipient makes much 381, 2006. better decisions. In other words, our theoretical and empirical work suggests that maintenance commitments could require [Witwicki and Durfee, 2009] Stefan J. Witwicki and Ed- providers and recipients to inherently be more tightly coupled mund H. Durfee. Commitment-based service coordina- than they need to be for enablement commitments. tion. Int.J. Agent-Oriented Software Engineering, 3:59– Acknowledgments We thank the anonymous reviewers for 87, 01 2009. great suggestions. This work was supported in part by the [Witwicki and Durfee, 2010] Stefan J. Witwicki and Ed- Air Force Office of Scientific Research under grant FA9550- mund H. Durfee. Influence-based policy abstraction for 15-1-0039. Any opinions, findings, conclusions, or recom- weakly-coupled Dec-POMDPs. In Int. Conf. Auto. Plan- mendations expressed here are those of the authors and do ning Sys. (ICAPS), pages 185–192, 2010. not necessarily reflect the views of the sponsors. [Xing and Singh, 2001] Jie Xing and Munindar P Singh. Formalization of commitment-based agent interaction. In References Proceedings of the 2001 ACM symposium on Applied com- [Duff et al., 2014] S. Duff, J. Thangarajah, and J. Harland. puting, pages 115–120. ACM, 2001. Maintenance goals in intelligent agents. Computational [Xuan and Lesser, 1999] Ping Xuan and Victor R Lesser. In- Intelligence, 30(1):71–114, 2014. corporating uncertainty in agent commitments. In Inter- [Durfee and Singh, 2016] Edmund H. Durfee and Satinder national Workshop on Agent Theories, Architectures, and Singh. On the trustworthy fulfillment of commitments. In Languages, pages 57–70. Springer, 1999. Nardine Osman and Carles Sierra, editors, AAMAS Work- [Zhang et al., 2016] Qi Zhang, Edmund H Durfee, Satinder shops, pages 1–13. Springer, 2016. Singh, Anna Chen, and Stefan J Witwicki. Commitment [Hindriks and van Riemsdijk, 2007] K. V. Hindriks and semantics for sequential decision making under reward un- M. B. van Riemsdijk. Satisfying maintenance goals. certainty. In Int J. Conf. on Artificial Intelligence (IJCAI), In 5th Int. Workshop Declarative Agent Languages and pages 3315–3323, 2016. Technologies (DALT), pages 86–103, 2007. [Zhang et al., 2017] Qi Zhang, Satinder Singh, and Edmund [Jennings, 1993] Nick R Jennings. Commitments and con- Durfee. Minimizing maximum regret in commitment con- ventions: The foundation of coordination in multi-agent strained sequential decision making. In Int. Conf. Auto- systems. The knowledge engineering review, 8(3):223– mated Planning Systems (ICAPS), pages 348–356, 2017. 250, 1993. [Zhang et al., 2018] Qi Zhang, Edmund H. Durfee, and [Kalia et al., 2014] A. K. Kalia, Z. Zhang, and M. P. Singh. Satinder P. Singh. Challenges in the trustworthy pursuit of Estimating trust from agents’ interactions via commit- maintenance commitments under uncertainty. In Proceed- ments. In 21st Euro. Conf. on AI (ECAI), pages 1043– ings of the 20th International Trust Workshop co-located 1044, 2014. with AAMAS/IJCAI/ECAI/ICML 2018, Stockholm, Swe- [Kwiatkowska et al., 2007] Marta Kwiatkowska, Gethin den, July 14, 2018., pages 75–86, 2018. Norman, and David Parker. Stochastic model checking. In International School on Formal Methods for the Design of