=Paper=
{{Paper
|id=Vol-2495/paper5
|storemode=property
|title=How to Tame Your Anticipatory Algorithm
|pdfUrl=https://ceur-ws.org/Vol-2495/paper5.pdf
|volume=Vol-2495
|authors=Allegra De Filippo,Michele Lombardi,Michela Milano
|dblpUrl=https://dblp.org/rec/conf/aiia/Filippo0M19
}}
==How to Tame Your Anticipatory Algorithm==
<pdf width="1500px">https://ceur-ws.org/Vol-2495/paper5.pdf</pdf>
<pre>
                                How to Tame Your Anticipatory Algorithm

                                    Allegra De Filippo, Michele Lombardi, and Michela Milano

                                         DISI, University of Bologna, Bologna, Italy
                         {allegra.defilippo,michele.lombardi2,michela.milano}@unibo.it


                             Abstract. Sampling-based anticipatory algorithms can be very effective at solv-
                             ing online optimization problems under uncertainty, but their computational cost
                             may be prohibitive in some cases. Given an arbitrary anticipatory algorithm, we
                             present three methods that allow to retain its solution quality at a fraction of the
                             online computational cost, via a substantial degree of offline preparation. Our ap-
                             proaches are obtained by combining: 1) a simple technique to identify likely fu-
                             ture outcomes based on past observations; 2) the (expensive) offline computation
                             of a “contingency table”; and 3) an efficient solution-fixing heuristic. We ground
                             our techniques on two case studies: an energy management system with uncertain
                             renewable generation and load demand, and a traveling salesman problem with
                             uncertain travel times. In both cases, our techniques achieve high solution quality,
                             while substantially reducing the online computation time.


                     1    Introduction

                     Optimization problems under uncertainty arise in many important application domains,
                     such as production scheduling or energy system management. These problems often
                     benefit from making all or part of their decisions online, reacting and adapting to exter-
                     nal events. In this context, stochastic online anticipatory algorithms have proved partic-
                     ularly effective (see e.g. [11]). However, many of such algorithms have a considerable
                     computational cost, which may be problematic if (as it is often the case) online deci-
                     sions must be taken within a short time frame.
                         In most practical settings, however, a substantial amount of time and information
                     is available before the online problem is solved, in an offline phase. We refer to this
                     sort of data as offline information. Usually, it is employed to characterize the uncertain
                     elements and for sampling likely outcomes (i.e. scenarios). We will show how to exploit
                     this information at a much deeper level.
                         This paper reports and summarizes the methods and the results presented in [7] and
                     accepted for publication at IJCAI 2019.
                         We propose three hybrid offline/online methods that build over a given, sampling-
                     based, anticipatory algorithm, and allow to match its solution quality at a fraction of
                     the online computational cost: 1) a technique to estimate the probability of future out-
                     comes, given past observations; 2) a scheme for building a “contingency table”, with
                     precomputed solutions to guide the online choices; and 3) an efficient fixing heuristic
                     for adapting the precomputed solutions to run-time conditions.
                         We ground our approaches on a (numeric) energy management problem with uncer-
                     tain loads and generation from Renewable Energy Sources (RES), and on a (discrete)


Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                        How to Tame Your Anticipatory Algorithm       37

Traveling Salesman Problem with uncertain travel times. We show how our methods
reach a solution quality comparable with the anticipatory algorithm, with lower (or dra-
matically lower) online computational cost.


2   Background and Related Work
Historically, the literature on optimization under uncertainty has focused on offline
problems [17, 19]. These methods usually rely on sampling (yielding a number of sce-
narios) to obtain a statistical model of future uncertainty.
    More recently, improvements in the solution techniques and computational power
have enabled the application of similar methods to online optimization. This lead to
so-called online anticipatory algorithms, which proved very effective at finding robust,
high quality, solutions as uncertainty slowly reveals itself [11]. Notable algorithms and
results are described in [5, 2, 3, 15, 6].
    Online anticipatory algorithm typically rely on scenario sampling to estimate the
possible developments for a fixed number of future steps, known as look-ahead horizon.
Larger sample sizes result in higher accuracy, but also in more and bigger (possibly
NP-hard) problems to be solved. Considerable research effort has therefore focused on
improving the efficiency of these algorithms [2, 4, 14]. The approaches from [12, 16,
13] attempt instead to reduce the number of scenarios by increasing their relevance by
taking into account past observations while sampling.


3   “Taming” an Anticipatory Algorithm
Our goal is reducing the online computational cost of a given sampling-based anticipa-
tory algorithm, referred to as A, by exploiting the existence of an offline phase. Such A
algorithm is the main input for all our methods.
    Similarly to [11], we view online optimization under uncertainty as a stochastic n-
stage problem. At each stage some uncertainty is resolved, and some decision must be
made. A stage k is associated to a decision variable xk (e.g. the power flows between
loads and generators) and a state variable sk (summarizing the effect of past decisions).
All variables may be vector-valued.
    We assume that uncertainty is exogenous, i.e. not affected by the decisions (e.g. the
RES generation does not depend on how we choose to route it), and modeled via a set of
random variables ξi . Which variables are observed at each stage depends on the state,
and is controlled by a peek function: O = peek(sk ) which returns a set O with the
indices of the observed variables. We will use the notation ξO to denote the observed ξ
variables, and ξŌ for the unobserved ones.

Base Anticipatory Algorithm Sampling-based algorithms rely on scenarios to estimate
future outcomes. Formally, a scenario ω specifies a value ξiω for all the random vari-
ables. Given a set Ω of scenarios, the system state sk , and values for ξO corresponding
to the observed uncertainty, we assume that A can compute the decisions for stage k:

    xk = A(sk , ξO , {ξ ω }ω∈Ω )                                                      (1)
38        Allegra De Filippo, Michele Lombardi, and Michela Milano

Algorithm 1 ANTICIPATE (s1 , ξ)
 Sample a set of scenarios Ω
 for k = 1 . . . n do
     O = O ∪ peek(sk )                                                      # observe uncertainty
     xk = A(sk , ξO , {ξ ω }ω∈Ω )                                           # obtain decisions
     sk+1 = next(sk , xk , ξO )                                             # determine next state
 return s, x


Once the decisions are computed, the next state can be determined. This is controlled via
a state transition function next that, based on the current state, decisions, and observed
uncertainty, computes:
      sk+1 = next(sk , xk , ξO )                                                             (2)
Given the initial state s1 , a set of scenarios Ω, and a set of values sampled from ξk
(which represent the online observations), we can simulate the execution of the method
via Algorithm 1. The O set is assumed to be initially empty. This generic anticipatory
algorithm will be referred to as ANTICIPATE in the remainder of the paper.
     O FFLINE I NFORMATION Defining a representative set of scenarios Ω is critical for
the approach effectiveness and it is usually done by exploiting the available offline in-
formation. Here, we assume that the such offline information is a collection of observed
uncertain values. This definition captures many practical cases (e.g. forecasts or predic-
tions, historical data, data from test runs). More importantly, this means that the offline
information is in fact a collection of scenarios. We will denote the offline information as
I, index its element with ω, and assume (as it is usual) that I is representative of the true
probability distribution of the random variables. A set Ω of scenarios for ANTICIPATE
can be obtained by sampling a number of elements from I.

3.1    Basic Contributions
All our methods rely on three techniques, which will be described in this section.
     P ROBABILITY E STIMATION FOR S CENARIO S AMPLING : Using a fixed set of sce-
narios (as in ANTICIPATE) is beneficial when the ξi variables are statistically indepen-
dent. When they are not, however, the set of scenarios may loose relevance as uncer-
tainty is resolved. For example, a scenario based on a cloudy day forecast becomes
less likely if fair weather is observed at the beginning of online execution. Formally, at
stage k we wish to sample scenarios that are likely to occur given the past observations,
i.e. to sample the unobserved variables ξŌ according to the conditional distribution
P (ξŌ | ξO ). If we draw the scenarios from the offline information (which guarantees
physically meaningfulness), then sampling requires to estimate the conditional proba-
bilities of the elements in I. From basic probability theory, this is given by the ratio of
two joint probabilities:
                                           ω
                      ω
                                    P (ξO ξŌ )
      ∀ω ∈ I,     P (ξŌ | ξO ) =                                                            (3)
                                     P (ξO )
              ω
where P (ξO ξŌ ) is the probability that observed values occur together with the remain-
ing predictions from the scenario, and P (ξO ) is the probability that the values are ob-
served. The joint probability at the numerator can be approximated via any density
                                               How to Tame Your Anticipatory Algorithm   39

Algorithm 2 BUILDTABLE (s1 , AA)
 for ω ∈ I do
     sω , xω = AA(s1 , ξ ω )
 return {ξ ω , sω , xω }ω∈T


estimation method, such as Kernel Density Estimation [20], Gaussian Mixture Models
[10], or recent Deep Learning techniques such as Normalizing Flows [18] and Real
NVP [8]. Any such method can be trained on the offline information to obtain an es-
timator P̃ (ξ) for the joint distribution of the random variables. An estimator for the
distribution P (ξO ) at the denominator can then be derived from P̃ (ξ) via marginaliza-
tion, i.e. by averaging the contribution of the unobserved variables. We perform this step
over all possible completions of the observed values in the offline information. Overall,
we have:
                                                 ω
                       ω
                                         P̃ (ξO ξŌ )
    ∀ω ∈ I,       P̃ (ξŌ | ξO ) = P                  ω0
                                                                                         (4)
                                       ω 0 ∈T P̃ (ξO ξŌ )

This estimator defines a discrete distribution over the offline information I. The chosen
marginalization technique guarantees an estimate that is approximately proportional
(not approximately equal) to the true P (ξO ). Hence, we have that:
                      ω                ω
    ∀ω ∈ I,       P (ξŌ | ξO ) ∝ P̃ (ξŌ | ξO )                                         (5)

Sampling from I according to Equation (4) yields scenarios with a distribution that
takes into account the observed values.
    B UILDING A C ONTINGENCY TABLE : If a significant amount of time is available
in the offline phase, we can exploit the offline information more aggressively, by trying
to prepare for each likely future development. Intuitively, we can treat each scenario
ω ∈ I as if it were an actual sequence of online observations, and process it via some
anticipatory algorithm. By doing this, we build a pool of solutions that can then be
used to guide an online method. The process is outlined in Algorithm 2, which requires
as input the initial state s1 of the system, and a solution algorithm AA, accepting the
same parameters as ANTICIPATE. The result is an augmented version of the offline
information, where each scenario ω is additionally associated to the sequence of states
sω visited by the algorithm and its sequence of decisions xω . We refer to this data
structure as contingency table, and to its elements as traces. We denote the table as T .
    F IXING H EURISTIC : We use the traces from T to guide an efficient fixing heuristic,
which tries to choose decisions having the largest chance of being optimal. Formally, it
solves:

    arg max{P ∗ (xk | sk ξO ) : xk ∈ Xk }                                                (6)

where P ∗ is the probability that the chosen xk is optimal, given the state sk and the ob-
served uncertainty. The Xk set represents the feasible decision space, which is defined
via problem-dependent constraints and auxiliary variables. Closed-forms for P ∗ can be
obtained separately for discrete and numeric problems, based on the contingency table.
40       Allegra De Filippo, Michele Lombardi, and Michela Milano

Algorithm 3 FIXING (s1 , ξ, T )
 for k = 1 . . . n do
     O = O ∪ peek(sk )
     Ω = top elements in T by descending Equation (9)
     Compute pjv and/or pω based on Ω
     Solve Equation (7)/(10) to obtain xk
     sk+1 = next(sk , xk , ξO )
 return s, x


Overall, in case of discrete decisions, the problem from Equation (6) translates into:
                                                       
              X   m X                                  
   arg min −                log pjv Jxkj = vK : xk ∈ Xk                               (7)
                                                       
                     j=1 v∈Dj


where J·K denotes the truth value of a predicate, Dj is the domain of xkj , and:
           P
             ω∈T,xω  =v P (ω)
   pjv =      P kj                                                                    (8)
                ω∈T P (ω)

Here, P (ω) is a compact notation for the probability that we reach the same state as
trace ω, and then everything goes according to plan. It can be approximated using:

     ∀ω ∈ T,       P (ω) ∝ P̃ (sω               ω
                                sk+1 | sk )P̃ (ξŌ | ξO )                             (9)

where P̃ (ξŌ | ξO ) is the estimator from Equation (4), and P̃ (ssk+1 | sk ) is a second
estimator obtained via similar means. The cost function in Equation (7) is linear if a
one-hot encoding is adopted for xkj , and the size of T affects only the computation of
the pjv values. Overall, the problem is efficient to solve. In case of numeric decisions,
we have instead:
                                                         
                 m X
             X               1                           
    arg min               pω     (xkj − xω
                                         kj )2
                                               : x k ∈ Xk                             (10)
             
                  j=1 ω∈T
                             2σj                          

with:
                  P (ω)
     pω = P                                                                          (11)
                     P (ω 0 )
                ω 0 ∈T

The cost function is quadratic and convex, and the problem size is small due to the same
arguments as Equation (7).
    Intuitively, the discrete version of the heuristic is related to minimizing weighted
discrepancies w.r.t. the traces in T , i.e. to weighted Hamming distances. The numeric
version is instead related to weighted Euclidean distances. The pseudo-code for the
heuristic is provided in Algorithm 3. The only difference with the process described
so far is that the pjv and pω probabilities may be computed based on a subset Ω of
the full contingency table. This may be useful to bias the choice of the online decision
according to the most relevant traces.
                                                          How to Tame Your Anticipatory Algorithm   41

Algorithm 4 ANTICIPATE - D (s1 , ξ)
    Train the P̃ (ξ) estimator on I
    for k = 1 . . . n do
        O = O ∪ peek(sk )
        Sample Ω from T , according to Equation (4)
        xk = A(sk , ξO , {ξ ω }ω∈Ω )
        sk+1 = next(sk , xk , ξO )
    return s, x


Algorithm 5 CONTINGENCY (s1 , ξ)
    Train the P̃ (ξ) estimator on I
    T = BUILDTABLE(s1 , ANTICIPATE)
    Train the P̃ (sk sk+1 ) estimators on T , for all k
    s, x = FIXING(s1 , ξ, T )
    return s, x


3.2      Hybrid Offline/Online Methods

Our three solution methods can now be defined with relative ease, by combining the
techniques just described.
     ANTICIPATE - D Our first hybrid method is obtained from ANTICIPATE by simply
replacing the static set of samples with a dynamically adjusted one. The dynamic set
can be populated according to the estimated probabilities from Equation (4), so as not to
loose relevance: this may enable to reach similar solution qualities with fewer scenarios,
at the cost of training an estimator offline. We refer to this approach as ANTICIPATE - D,
and its pseudo-code is in Algorithm 4
     CONTINGENCY The second method is based on the idea of computing robust so-
lutions for the scenarios in the offline information, and then use them as guidance for
the FIXING heuristic. Robust solutions are obtained by using ANTICIPATE, so that hope-
fully the (fast) fixing heuristic will be able to match their quality: the price to pay is a
hefty offline computational effort. We refer to this approach as CONTINGENCY, and its
pseudo-code is reported in Algorithm 5.
     CONTINGENCY- D Our final method is similar to the previous one, except that the
contingency table is populated with non-robust solutions. This is done by using ANTIC -
IPATE with a single scenario, given by the values of ξ ω (i.e. the pretend online observa-
tions). This technique (referred to as ANTICIPATE1 ) provides perfect information about
the future, so that achieving robustness is entirely delegated to the FIXING heuristic. The
approach is likely to loose reliability, but has two important advantages: 1) lower offline
computational costs; and 2) while ANTICIPATE is a stochastic algorithm, ANTICIPATE1
is deterministic. So, this method may provide anticipatory-like results even when no
anticipatory algorithm is available. We refer to this method as CONTINGENCY- D, and
its pseudo-code is reported in Algorithm 6.


4       Grounding the Approaches

Grounding our approaches requires to specify: 1) the x, s and ξ variables, 2) the peek
and next functions, 3) the sampling-based algorithm A, and 4) the feasible space Xk
42           Allegra De Filippo, Michele Lombardi, and Michela Milano

Algorithm 6 CONTINGENCY- D (s1 , ξ)
    Train the P̃ (ξ) estimator on I
    T = BUILDTABLE(s1 , ANTICIPATE1 )
    Train the P̃ (sk sk+1 ) estimators on T , for all k
    s, x = FIXING(s1 , ξ, T )
    return s, x


for the FIXING heuristic. Additionally, evaluating the solution quality requires to define
5) a cost metric.
    We show how this can be done in two case studies: 1) a Virtual Power Plant en-
ergy management problem with numerical decisions; and 2) a combinatorial Traveling
Salesman Problem with uncertain travel times. In both cases, the input anticipatory al-
gorithm A is given by a Mathematical Programming model, based on the Sample Aver-
age Approximation. The models are slight improvements over those by [6], whose work
brought to attention the interplay between offline and online phases. Both approaches
are serviceable, but not necessarily representative of the state-of-the-art (especially for
the TSP). We focus on the minimal information needed.


5       Experimentation
We empirically evaluated the three hybrid offline/online methods on realistic instances
for the two case studies. The baseline in both cases are myopic heuristics, obtained by
running ANTICIPATE with an empty set of scenarios.
     Our methods are evaluated over different uncertainty realizations, obtained by sam-
pling the random variables for the loads and RES generation in the VPP, and for the
travel times in the TSP. In both cases, we use models of uncertainty that ensure real-
istic statistical dependence between the variables. These models are used to obtain the
offline information I and the sequences of observations for the actual experiments. For
the VPP, grid electricity prices change every 15 minutes, which is also the duration
of our online stages. New offline information (e.g. market prices) becomes available
every day, hence our horizon corresponds to 24 × 4 = 96 stages. We use (real) phys-
ical bounds for power generation from [1, 9]. The initial battery state, efficiency, and
power flow limit are also based on real data [1, 9]. The same goes for the data, which
is also from [9]. Different instances have then been obtained by manually scaling load
and RES generation. For the TSP we use classical benchmarks1 , by including prob-
lems from 10 to 40 nodes. In the TSP each stage represents a visit, hence our horizon
corresponds to the total number of nodes. We use Kernel Density Estimation (KDE
with Gaussian Kernels) to obtain all approximate distributions. As an underlying solver
we use Gurobi 2 , which can handle both MILPs and Quadratic Programs. Each eval-
uated algorithm (in each considered configuration) is run 50 times, with the same 50
sequences of realizations. We use a time limit of 300 seconds for both the VPP and the
TSP. For each run we record both the time required by each approach and the corre-
sponding solution cost, and we report their average values over the 50 realizations. In
 1
     http://myweb.uiowa.edu/bthoa/TSPTWBenchmarkDataSets.htm
 2
     Available at http://www.gurobi.com
                                                                                                    How to Tame Your Anticipatory Algorithm                                                          43

                       275                                                                                                                    120
                                                                                  ANTICIPATE                                                                                          ANTICIPATE


                                                                                                                   Total Travel Time (cost)
                       270                                                        ANTICIPATE-D
                                                                                                                                              110                                     ANTICIPATE-D
                                                                                  CONTINGENCY                                                                                         CONTINGENCY
                       265                                                        CONTINGENCY-D                                                                                       CONTINGENCY-D
  Cost (keuro)         260                                                        MYOPIC HEURISTIC (~ 316)                                    100                                     MYOPIC H (~ 136)

                       255                                                                                                                    90
                       250
                       245                                                                                                                    80
                       240                                                                                                                    70
                           5                 10            15              20        25                  30                                     5   10    15              20     25                  30
                       103                                                                                                                103
                                 ANTICIPATE
                                 ANTICIPATE-D
Online Comp Time (s)


                                                                                                              Online Comp Time (s)
                                 CONTINGENCY                                                                                              102                                         ANTICIPATE
                       102       CONTINGENCY-D
                                 MYOPIC HEURISTIC (~ 2)
                                                                                                                                                                                      ANTICIPATE-D
                                                                                                                                                                                      CONTINGENCY
                                                                                                                                          101                                         CONTINGENCY-D
                                                                                                                                                                                      MYOPIC H (~ 0.1)
                       101
                                                                                                                                          100

                       100   5               10            15               20       25                  30                            10-1 5       10    15               20    25                  30
                                                          # of scenarios/traces                                                                          # of scenarios/traces


                                 Fig. 1: (left) Methods solution/quality comparison for the VPP. (right) For the TSP.


all cases, |I| = |T | = 100, and for the CONTINGENCY method, the contingency table is
built using ANTICIPATE with 20 scenarios. Due to space constraints, we report results
only for a few representative instances.
     The offline training times of the KDE models are roughly the same for all the three
hybrid methods (∼ 65 sec for the VPP and ∼ 32 sec for the TSP). Building the con-
tingency tables for CONTINGENCY takes ∼ 6, 000 sec in the VPP and ∼ 15, 000 sec
in the TSP, but only ∼ 400 and ∼ 2, 000 sec for CONTINGENCY- D. In Section 5 we
show the cost/quality tradeoff of the proposed methods and of ANTICIPATE for the VPP
(base instance) and for the TSP (a representative 20 customers instance). The use of a
dynamic Ω set of scenarios allow ANTICIPATE - D to work better than ANTICIPATE. The
CONTINGENCY method is surprisingly close in terms of quality to the original anticipa-
tory algorithm, especially considered that its online computational cost is dramatically
smaller (one to two orders of magnitude). CONTINGENCY- D performs (as expected)
slightly worse than CONTINGENCY, but it still much better than the myopic heuristic.
Increasing the number of guiding traces is beneficial for both methods, and in particular
for CONTINGENCY- D.


6                            Conclusion
We have presented three methods that can be applied to a generic anticipatory algo-
rithm to reduce its online computational effort by exploiting offline information. In
particular, both CONTINGENCY and CONTINGENCY- D are dramatically faster than AN -
TICIPATE during online operation. Between the two of them CONTINGENCY is signifi-
cantly more reliable in terms of quality, but may require a substantial amount of offline
computation. The ANTICIPATE - D technique provides a modest advantage in terms of
solution time, but can match and even surpass ANTICIPATE in terms of quality.
    The ability to shift part of the computational cost to an offline stage provides a sig-
nificant degree of flexibility to stochastic anticipatory algorithm, and likely to increase
their applicability. We believe there is room for improving the scalability and efficiency
of our methods, and achieving this goal is part of our current research directions.
44       Allegra De Filippo, Michele Lombardi, and Michela Milano

References
 1. Bai, H., Miao, S., Ran, X., Ye, C.: Optimal dispatch strategy of a virtual power plant contain-
    ing battery switch stations in a unified electricity market. Energies 8(3), 2268–2289 (2015)
 2. Bent, R., Van Hentenryck, P.: Regrets only! online stochastic optimization under time con-
    straints. In: AAAI. vol. 4, pp. 501–506 (2004)
 3. Bent, R., Van Hentenryck, P.: Online stochastic optimization without distributions. In:
    ICAPS. vol. 5, pp. 171–180 (2005)
 4. Borodin, A., El-Yaniv, R.: Online computation and competitive analysis. cambridge univer-
    sity press (2005)
 5. Chang, H.S., Givan, R., Chong, E.K.: On-line scheduling via sampling. In: AIPS. pp. 62–71
    (2000)
 6. De Filippo, A., Lombardi, M., Milano, M.: Methods for off-line/on-line optimization under
    uncertainty. In: IJCAI. pp. 1270–1276 (2018)
 7. De Filippo, A., Lombardi, M., Milano, M.: How to tame your anticipatory algorithm. In: to
    be published at IJCAI (2019)
 8. Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real nvp. arXiv preprint
    arXiv:1605.08803 (2016)
 9. Espinosa, A.N., Ochoa, L.N.: Dissemination document “low voltage networks models and
    low carbon technology profiles”. Tech. rep., University of Manchester (June 2015)
10. Gauvain, J.L., Lee, C.H.: Maximum a posteriori estimation for multivariate gaussian mixture
    observations of markov chains. IEEE transactions on speech and audio processing 2(2), 291–
    298 (1994)
11. Hentenryck, P.V., Bent, R.: Online stochastic combinatorial optimization. The MIT Press
    (2009)
12. John, G.H., Langley, P.: Static versus dynamic sampling for data mining. In: KDD. vol. 96,
    pp. 367–370 (1996)
13. Lin, M., Tang, K., Yao, X.: Dynamic sampling approach to training neural networks for
    multiclass imbalance classification. IEEE Transactions on Neural Networks and Learning
    Systems 24(4), 647–660 (2013)
14. Mercier, L., Van Hentenryck, P.: Performance analysis of online anticipatory algorithms for
    large multistage stochastic integer programs. In: IJCAI. pp. 1979–1984 (2007)
15. Mercier, L., Van Hentenryck, P.: Amsaa: A multistep anticipatory algorithm for online
    stochastic combinatorial optimization. Integration of AI and OR Techniques in Constraint
    Programming for Combinatorial Optimization Problems pp. 173–187 (2008)
16. Philpott, A.B., De Matos, V.L.: Dynamic sampling algorithms for multi-stage stochastic pro-
    grams with risk aversion. European Journal of Operational Research 218(2), 470–483 (2012)
17. Powell, W.B.: A Unified Framework for Optimization Under Uncertainty, chap. 3, pp. 45–83.
    INFORMS (2016)
18. Rezende, D.J., Mohamed, S.: Variational inference with normalizing flows. arXiv preprint
    arXiv:1505.05770 (2015)
19. Sahinidis, N.V.: Optimization under uncertainty: state-of-the-art and opportunities. Comput-
    ers & Chemical Engineering 28(6), 971 – 983 (2004), http://www.sciencedirect.com/science/
    article/pii/S0098135403002369, fOCAPO 2003 Special issue
20. Silverman, B.W.: Density estimation for statistics and data analysis. Routledge (2018)

</pre>