How to Tame Your Anticipatory Algorithm Allegra De Filippo, Michele Lombardi, and Michela Milano DISI, University of Bologna, Bologna, Italy {allegra.defilippo,michele.lombardi2,michela.milano}@unibo.it Abstract. Sampling-based anticipatory algorithms can be very effective at solv- ing online optimization problems under uncertainty, but their computational cost may be prohibitive in some cases. Given an arbitrary anticipatory algorithm, we present three methods that allow to retain its solution quality at a fraction of the online computational cost, via a substantial degree of offline preparation. Our ap- proaches are obtained by combining: 1) a simple technique to identify likely fu- ture outcomes based on past observations; 2) the (expensive) offline computation of a “contingency table”; and 3) an efficient solution-fixing heuristic. We ground our techniques on two case studies: an energy management system with uncertain renewable generation and load demand, and a traveling salesman problem with uncertain travel times. In both cases, our techniques achieve high solution quality, while substantially reducing the online computation time. 1 Introduction Optimization problems under uncertainty arise in many important application domains, such as production scheduling or energy system management. These problems often benefit from making all or part of their decisions online, reacting and adapting to exter- nal events. In this context, stochastic online anticipatory algorithms have proved partic- ularly effective (see e.g. [11]). However, many of such algorithms have a considerable computational cost, which may be problematic if (as it is often the case) online deci- sions must be taken within a short time frame. In most practical settings, however, a substantial amount of time and information is available before the online problem is solved, in an offline phase. We refer to this sort of data as offline information. Usually, it is employed to characterize the uncertain elements and for sampling likely outcomes (i.e. scenarios). We will show how to exploit this information at a much deeper level. This paper reports and summarizes the methods and the results presented in [7] and accepted for publication at IJCAI 2019. We propose three hybrid offline/online methods that build over a given, sampling- based, anticipatory algorithm, and allow to match its solution quality at a fraction of the online computational cost: 1) a technique to estimate the probability of future out- comes, given past observations; 2) a scheme for building a “contingency table”, with precomputed solutions to guide the online choices; and 3) an efficient fixing heuristic for adapting the precomputed solutions to run-time conditions. We ground our approaches on a (numeric) energy management problem with uncer- tain loads and generation from Renewable Energy Sources (RES), and on a (discrete) Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). How to Tame Your Anticipatory Algorithm 37 Traveling Salesman Problem with uncertain travel times. We show how our methods reach a solution quality comparable with the anticipatory algorithm, with lower (or dra- matically lower) online computational cost. 2 Background and Related Work Historically, the literature on optimization under uncertainty has focused on offline problems [17, 19]. These methods usually rely on sampling (yielding a number of sce- narios) to obtain a statistical model of future uncertainty. More recently, improvements in the solution techniques and computational power have enabled the application of similar methods to online optimization. This lead to so-called online anticipatory algorithms, which proved very effective at finding robust, high quality, solutions as uncertainty slowly reveals itself [11]. Notable algorithms and results are described in [5, 2, 3, 15, 6]. Online anticipatory algorithm typically rely on scenario sampling to estimate the possible developments for a fixed number of future steps, known as look-ahead horizon. Larger sample sizes result in higher accuracy, but also in more and bigger (possibly NP-hard) problems to be solved. Considerable research effort has therefore focused on improving the efficiency of these algorithms [2, 4, 14]. The approaches from [12, 16, 13] attempt instead to reduce the number of scenarios by increasing their relevance by taking into account past observations while sampling. 3 “Taming” an Anticipatory Algorithm Our goal is reducing the online computational cost of a given sampling-based anticipa- tory algorithm, referred to as A, by exploiting the existence of an offline phase. Such A algorithm is the main input for all our methods. Similarly to [11], we view online optimization under uncertainty as a stochastic n- stage problem. At each stage some uncertainty is resolved, and some decision must be made. A stage k is associated to a decision variable xk (e.g. the power flows between loads and generators) and a state variable sk (summarizing the effect of past decisions). All variables may be vector-valued. We assume that uncertainty is exogenous, i.e. not affected by the decisions (e.g. the RES generation does not depend on how we choose to route it), and modeled via a set of random variables ξi . Which variables are observed at each stage depends on the state, and is controlled by a peek function: O = peek(sk ) which returns a set O with the indices of the observed variables. We will use the notation ξO to denote the observed ξ variables, and ξŌ for the unobserved ones. Base Anticipatory Algorithm Sampling-based algorithms rely on scenarios to estimate future outcomes. Formally, a scenario ω specifies a value ξiω for all the random vari- ables. Given a set Ω of scenarios, the system state sk , and values for ξO corresponding to the observed uncertainty, we assume that A can compute the decisions for stage k: xk = A(sk , ξO , {ξ ω }ω∈Ω ) (1) 38 Allegra De Filippo, Michele Lombardi, and Michela Milano Algorithm 1 ANTICIPATE (s1 , ξ) Sample a set of scenarios Ω for k = 1 . . . n do O = O ∪ peek(sk ) # observe uncertainty xk = A(sk , ξO , {ξ ω }ω∈Ω ) # obtain decisions sk+1 = next(sk , xk , ξO ) # determine next state return s, x Once the decisions are computed, the next state can be determined. This is controlled via a state transition function next that, based on the current state, decisions, and observed uncertainty, computes: sk+1 = next(sk , xk , ξO ) (2) Given the initial state s1 , a set of scenarios Ω, and a set of values sampled from ξk (which represent the online observations), we can simulate the execution of the method via Algorithm 1. The O set is assumed to be initially empty. This generic anticipatory algorithm will be referred to as ANTICIPATE in the remainder of the paper. O FFLINE I NFORMATION Defining a representative set of scenarios Ω is critical for the approach effectiveness and it is usually done by exploiting the available offline in- formation. Here, we assume that the such offline information is a collection of observed uncertain values. This definition captures many practical cases (e.g. forecasts or predic- tions, historical data, data from test runs). More importantly, this means that the offline information is in fact a collection of scenarios. We will denote the offline information as I, index its element with ω, and assume (as it is usual) that I is representative of the true probability distribution of the random variables. A set Ω of scenarios for ANTICIPATE can be obtained by sampling a number of elements from I. 3.1 Basic Contributions All our methods rely on three techniques, which will be described in this section. P ROBABILITY E STIMATION FOR S CENARIO S AMPLING : Using a fixed set of sce- narios (as in ANTICIPATE) is beneficial when the ξi variables are statistically indepen- dent. When they are not, however, the set of scenarios may loose relevance as uncer- tainty is resolved. For example, a scenario based on a cloudy day forecast becomes less likely if fair weather is observed at the beginning of online execution. Formally, at stage k we wish to sample scenarios that are likely to occur given the past observations, i.e. to sample the unobserved variables ξŌ according to the conditional distribution P (ξŌ | ξO ). If we draw the scenarios from the offline information (which guarantees physically meaningfulness), then sampling requires to estimate the conditional proba- bilities of the elements in I. From basic probability theory, this is given by the ratio of two joint probabilities: ω ω P (ξO ξŌ ) ∀ω ∈ I, P (ξŌ | ξO ) = (3) P (ξO ) ω where P (ξO ξŌ ) is the probability that observed values occur together with the remain- ing predictions from the scenario, and P (ξO ) is the probability that the values are ob- served. The joint probability at the numerator can be approximated via any density How to Tame Your Anticipatory Algorithm 39 Algorithm 2 BUILDTABLE (s1 , AA) for ω ∈ I do sω , xω = AA(s1 , ξ ω ) return {ξ ω , sω , xω }ω∈T estimation method, such as Kernel Density Estimation [20], Gaussian Mixture Models [10], or recent Deep Learning techniques such as Normalizing Flows [18] and Real NVP [8]. Any such method can be trained on the offline information to obtain an es- timator P̃ (ξ) for the joint distribution of the random variables. An estimator for the distribution P (ξO ) at the denominator can then be derived from P̃ (ξ) via marginaliza- tion, i.e. by averaging the contribution of the unobserved variables. We perform this step over all possible completions of the observed values in the offline information. Overall, we have: ω ω P̃ (ξO ξŌ ) ∀ω ∈ I, P̃ (ξŌ | ξO ) = P ω0 (4) ω 0 ∈T P̃ (ξO ξŌ ) This estimator defines a discrete distribution over the offline information I. The chosen marginalization technique guarantees an estimate that is approximately proportional (not approximately equal) to the true P (ξO ). Hence, we have that: ω ω ∀ω ∈ I, P (ξŌ | ξO ) ∝ P̃ (ξŌ | ξO ) (5) Sampling from I according to Equation (4) yields scenarios with a distribution that takes into account the observed values. B UILDING A C ONTINGENCY TABLE : If a significant amount of time is available in the offline phase, we can exploit the offline information more aggressively, by trying to prepare for each likely future development. Intuitively, we can treat each scenario ω ∈ I as if it were an actual sequence of online observations, and process it via some anticipatory algorithm. By doing this, we build a pool of solutions that can then be used to guide an online method. The process is outlined in Algorithm 2, which requires as input the initial state s1 of the system, and a solution algorithm AA, accepting the same parameters as ANTICIPATE. The result is an augmented version of the offline information, where each scenario ω is additionally associated to the sequence of states sω visited by the algorithm and its sequence of decisions xω . We refer to this data structure as contingency table, and to its elements as traces. We denote the table as T . F IXING H EURISTIC : We use the traces from T to guide an efficient fixing heuristic, which tries to choose decisions having the largest chance of being optimal. Formally, it solves: arg max{P ∗ (xk | sk ξO ) : xk ∈ Xk } (6) where P ∗ is the probability that the chosen xk is optimal, given the state sk and the ob- served uncertainty. The Xk set represents the feasible decision space, which is defined via problem-dependent constraints and auxiliary variables. Closed-forms for P ∗ can be obtained separately for discrete and numeric problems, based on the contingency table. 40 Allegra De Filippo, Michele Lombardi, and Michela Milano Algorithm 3 FIXING (s1 , ξ, T ) for k = 1 . . . n do O = O ∪ peek(sk ) Ω = top elements in T by descending Equation (9) Compute pjv and/or pω based on Ω Solve Equation (7)/(10) to obtain xk sk+1 = next(sk , xk , ξO ) return s, x Overall, in case of discrete decisions, the problem from Equation (6) translates into:    X m X  arg min − log pjv Jxkj = vK : xk ∈ Xk (7)   j=1 v∈Dj where J·K denotes the truth value of a predicate, Dj is the domain of xkj , and: P ω∈T,xω =v P (ω) pjv = P kj (8) ω∈T P (ω) Here, P (ω) is a compact notation for the probability that we reach the same state as trace ω, and then everything goes according to plan. It can be approximated using: ∀ω ∈ T, P (ω) ∝ P̃ (sω ω sk+1 | sk )P̃ (ξŌ | ξO ) (9) where P̃ (ξŌ | ξO ) is the estimator from Equation (4), and P̃ (ssk+1 | sk ) is a second estimator obtained via similar means. The cost function in Equation (7) is linear if a one-hot encoding is adopted for xkj , and the size of T affects only the computation of the pjv values. Overall, the problem is efficient to solve. In case of numeric decisions, we have instead:   m X X 1  arg min pω (xkj − xω kj )2 : x k ∈ Xk (10)  j=1 ω∈T 2σj  with: P (ω) pω = P (11) P (ω 0 ) ω 0 ∈T The cost function is quadratic and convex, and the problem size is small due to the same arguments as Equation (7). Intuitively, the discrete version of the heuristic is related to minimizing weighted discrepancies w.r.t. the traces in T , i.e. to weighted Hamming distances. The numeric version is instead related to weighted Euclidean distances. The pseudo-code for the heuristic is provided in Algorithm 3. The only difference with the process described so far is that the pjv and pω probabilities may be computed based on a subset Ω of the full contingency table. This may be useful to bias the choice of the online decision according to the most relevant traces. How to Tame Your Anticipatory Algorithm 41 Algorithm 4 ANTICIPATE - D (s1 , ξ) Train the P̃ (ξ) estimator on I for k = 1 . . . n do O = O ∪ peek(sk ) Sample Ω from T , according to Equation (4) xk = A(sk , ξO , {ξ ω }ω∈Ω ) sk+1 = next(sk , xk , ξO ) return s, x Algorithm 5 CONTINGENCY (s1 , ξ) Train the P̃ (ξ) estimator on I T = BUILDTABLE(s1 , ANTICIPATE) Train the P̃ (sk sk+1 ) estimators on T , for all k s, x = FIXING(s1 , ξ, T ) return s, x 3.2 Hybrid Offline/Online Methods Our three solution methods can now be defined with relative ease, by combining the techniques just described. ANTICIPATE - D Our first hybrid method is obtained from ANTICIPATE by simply replacing the static set of samples with a dynamically adjusted one. The dynamic set can be populated according to the estimated probabilities from Equation (4), so as not to loose relevance: this may enable to reach similar solution qualities with fewer scenarios, at the cost of training an estimator offline. We refer to this approach as ANTICIPATE - D, and its pseudo-code is in Algorithm 4 CONTINGENCY The second method is based on the idea of computing robust so- lutions for the scenarios in the offline information, and then use them as guidance for the FIXING heuristic. Robust solutions are obtained by using ANTICIPATE, so that hope- fully the (fast) fixing heuristic will be able to match their quality: the price to pay is a hefty offline computational effort. We refer to this approach as CONTINGENCY, and its pseudo-code is reported in Algorithm 5. CONTINGENCY- D Our final method is similar to the previous one, except that the contingency table is populated with non-robust solutions. This is done by using ANTIC - IPATE with a single scenario, given by the values of ξ ω (i.e. the pretend online observa- tions). This technique (referred to as ANTICIPATE1 ) provides perfect information about the future, so that achieving robustness is entirely delegated to the FIXING heuristic. The approach is likely to loose reliability, but has two important advantages: 1) lower offline computational costs; and 2) while ANTICIPATE is a stochastic algorithm, ANTICIPATE1 is deterministic. So, this method may provide anticipatory-like results even when no anticipatory algorithm is available. We refer to this method as CONTINGENCY- D, and its pseudo-code is reported in Algorithm 6. 4 Grounding the Approaches Grounding our approaches requires to specify: 1) the x, s and ξ variables, 2) the peek and next functions, 3) the sampling-based algorithm A, and 4) the feasible space Xk 42 Allegra De Filippo, Michele Lombardi, and Michela Milano Algorithm 6 CONTINGENCY- D (s1 , ξ) Train the P̃ (ξ) estimator on I T = BUILDTABLE(s1 , ANTICIPATE1 ) Train the P̃ (sk sk+1 ) estimators on T , for all k s, x = FIXING(s1 , ξ, T ) return s, x for the FIXING heuristic. Additionally, evaluating the solution quality requires to define 5) a cost metric. We show how this can be done in two case studies: 1) a Virtual Power Plant en- ergy management problem with numerical decisions; and 2) a combinatorial Traveling Salesman Problem with uncertain travel times. In both cases, the input anticipatory al- gorithm A is given by a Mathematical Programming model, based on the Sample Aver- age Approximation. The models are slight improvements over those by [6], whose work brought to attention the interplay between offline and online phases. Both approaches are serviceable, but not necessarily representative of the state-of-the-art (especially for the TSP). We focus on the minimal information needed. 5 Experimentation We empirically evaluated the three hybrid offline/online methods on realistic instances for the two case studies. The baseline in both cases are myopic heuristics, obtained by running ANTICIPATE with an empty set of scenarios. Our methods are evaluated over different uncertainty realizations, obtained by sam- pling the random variables for the loads and RES generation in the VPP, and for the travel times in the TSP. In both cases, we use models of uncertainty that ensure real- istic statistical dependence between the variables. These models are used to obtain the offline information I and the sequences of observations for the actual experiments. For the VPP, grid electricity prices change every 15 minutes, which is also the duration of our online stages. New offline information (e.g. market prices) becomes available every day, hence our horizon corresponds to 24 × 4 = 96 stages. We use (real) phys- ical bounds for power generation from [1, 9]. The initial battery state, efficiency, and power flow limit are also based on real data [1, 9]. The same goes for the data, which is also from [9]. Different instances have then been obtained by manually scaling load and RES generation. For the TSP we use classical benchmarks1 , by including prob- lems from 10 to 40 nodes. In the TSP each stage represents a visit, hence our horizon corresponds to the total number of nodes. We use Kernel Density Estimation (KDE with Gaussian Kernels) to obtain all approximate distributions. As an underlying solver we use Gurobi 2 , which can handle both MILPs and Quadratic Programs. Each eval- uated algorithm (in each considered configuration) is run 50 times, with the same 50 sequences of realizations. We use a time limit of 300 seconds for both the VPP and the TSP. For each run we record both the time required by each approach and the corre- sponding solution cost, and we report their average values over the 50 realizations. In 1 http://myweb.uiowa.edu/bthoa/TSPTWBenchmarkDataSets.htm 2 Available at http://www.gurobi.com How to Tame Your Anticipatory Algorithm 43 275 120 ANTICIPATE ANTICIPATE Total Travel Time (cost) 270 ANTICIPATE-D 110 ANTICIPATE-D CONTINGENCY CONTINGENCY 265 CONTINGENCY-D CONTINGENCY-D Cost (keuro) 260 MYOPIC HEURISTIC (~ 316) 100 MYOPIC H (~ 136) 255 90 250 245 80 240 70 5 10 15 20 25 30 5 10 15 20 25 30 103 103 ANTICIPATE ANTICIPATE-D Online Comp Time (s) Online Comp Time (s) CONTINGENCY 102 ANTICIPATE 102 CONTINGENCY-D MYOPIC HEURISTIC (~ 2) ANTICIPATE-D CONTINGENCY 101 CONTINGENCY-D MYOPIC H (~ 0.1) 101 100 100 5 10 15 20 25 30 10-1 5 10 15 20 25 30 # of scenarios/traces # of scenarios/traces Fig. 1: (left) Methods solution/quality comparison for the VPP. (right) For the TSP. all cases, |I| = |T | = 100, and for the CONTINGENCY method, the contingency table is built using ANTICIPATE with 20 scenarios. Due to space constraints, we report results only for a few representative instances. The offline training times of the KDE models are roughly the same for all the three hybrid methods (∼ 65 sec for the VPP and ∼ 32 sec for the TSP). Building the con- tingency tables for CONTINGENCY takes ∼ 6, 000 sec in the VPP and ∼ 15, 000 sec in the TSP, but only ∼ 400 and ∼ 2, 000 sec for CONTINGENCY- D. In Section 5 we show the cost/quality tradeoff of the proposed methods and of ANTICIPATE for the VPP (base instance) and for the TSP (a representative 20 customers instance). The use of a dynamic Ω set of scenarios allow ANTICIPATE - D to work better than ANTICIPATE. The CONTINGENCY method is surprisingly close in terms of quality to the original anticipa- tory algorithm, especially considered that its online computational cost is dramatically smaller (one to two orders of magnitude). CONTINGENCY- D performs (as expected) slightly worse than CONTINGENCY, but it still much better than the myopic heuristic. Increasing the number of guiding traces is beneficial for both methods, and in particular for CONTINGENCY- D. 6 Conclusion We have presented three methods that can be applied to a generic anticipatory algo- rithm to reduce its online computational effort by exploiting offline information. In particular, both CONTINGENCY and CONTINGENCY- D are dramatically faster than AN - TICIPATE during online operation. Between the two of them CONTINGENCY is signifi- cantly more reliable in terms of quality, but may require a substantial amount of offline computation. The ANTICIPATE - D technique provides a modest advantage in terms of solution time, but can match and even surpass ANTICIPATE in terms of quality. The ability to shift part of the computational cost to an offline stage provides a sig- nificant degree of flexibility to stochastic anticipatory algorithm, and likely to increase their applicability. We believe there is room for improving the scalability and efficiency of our methods, and achieving this goal is part of our current research directions. 44 Allegra De Filippo, Michele Lombardi, and Michela Milano References 1. Bai, H., Miao, S., Ran, X., Ye, C.: Optimal dispatch strategy of a virtual power plant contain- ing battery switch stations in a unified electricity market. Energies 8(3), 2268–2289 (2015) 2. Bent, R., Van Hentenryck, P.: Regrets only! online stochastic optimization under time con- straints. In: AAAI. vol. 4, pp. 501–506 (2004) 3. Bent, R., Van Hentenryck, P.: Online stochastic optimization without distributions. In: ICAPS. vol. 5, pp. 171–180 (2005) 4. Borodin, A., El-Yaniv, R.: Online computation and competitive analysis. cambridge univer- sity press (2005) 5. Chang, H.S., Givan, R., Chong, E.K.: On-line scheduling via sampling. In: AIPS. pp. 62–71 (2000) 6. De Filippo, A., Lombardi, M., Milano, M.: Methods for off-line/on-line optimization under uncertainty. In: IJCAI. pp. 1270–1276 (2018) 7. De Filippo, A., Lombardi, M., Milano, M.: How to tame your anticipatory algorithm. In: to be published at IJCAI (2019) 8. Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real nvp. arXiv preprint arXiv:1605.08803 (2016) 9. Espinosa, A.N., Ochoa, L.N.: Dissemination document “low voltage networks models and low carbon technology profiles”. Tech. rep., University of Manchester (June 2015) 10. Gauvain, J.L., Lee, C.H.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE transactions on speech and audio processing 2(2), 291– 298 (1994) 11. Hentenryck, P.V., Bent, R.: Online stochastic combinatorial optimization. The MIT Press (2009) 12. John, G.H., Langley, P.: Static versus dynamic sampling for data mining. In: KDD. vol. 96, pp. 367–370 (1996) 13. Lin, M., Tang, K., Yao, X.: Dynamic sampling approach to training neural networks for multiclass imbalance classification. IEEE Transactions on Neural Networks and Learning Systems 24(4), 647–660 (2013) 14. Mercier, L., Van Hentenryck, P.: Performance analysis of online anticipatory algorithms for large multistage stochastic integer programs. In: IJCAI. pp. 1979–1984 (2007) 15. Mercier, L., Van Hentenryck, P.: Amsaa: A multistep anticipatory algorithm for online stochastic combinatorial optimization. Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems pp. 173–187 (2008) 16. Philpott, A.B., De Matos, V.L.: Dynamic sampling algorithms for multi-stage stochastic pro- grams with risk aversion. European Journal of Operational Research 218(2), 470–483 (2012) 17. Powell, W.B.: A Unified Framework for Optimization Under Uncertainty, chap. 3, pp. 45–83. INFORMS (2016) 18. Rezende, D.J., Mohamed, S.: Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770 (2015) 19. Sahinidis, N.V.: Optimization under uncertainty: state-of-the-art and opportunities. Comput- ers & Chemical Engineering 28(6), 971 – 983 (2004), http://www.sciencedirect.com/science/ article/pii/S0098135403002369, fOCAPO 2003 Special issue 20. Silverman, B.W.: Density estimation for statistics and data analysis. Routledge (2018)