On Augmented Stochastic Submodular Optimization:
Adaptivity, Multi-Rounds, Budgeted, and Robustness
Vincenzo Auletta1,† , Diodato Ferraioli1,*,† and Cosimo Vinci1,2,†
1
    Università degli Studi di Salerno, Fisciano, Salerno, Italy
2
    Università del Salento, Lecce, Italy


                                         Abstract
                                         In this work we consider the problem of Stochastic Submodular Maximization, in which we would like
                                         to maximize the value of a monotone and submodular objective function, subject to the fact that the
                                         values of this function depend on the realization of stochastic events. This problem has applications
                                         in several areas, and in particular it well models basic problems such as influence maximization and
                                         stochastic probing.
                                             In this work, we advocate the necessity to extend the study of this problem in order to include
                                         several different features such as a budget constraint on the number of observations, the chance of
                                         adaptively choosing what we observe or the presence of multiple rounds. We here speculate on the
                                         possible directions that this line of research can take. In particular, we will discuss about interesting
                                         open problems mainly in the settings of robust optimization and online learning.

                                         Keywords
                                         Influence Maximization, Stochastic Probing, Approximation Algorithms, Online Learning


Introduction
Consider the well-known problem of influence maximization: here we are given a (social)
network, with edge probabilities encoding the strength of the relationships among network’s
components; these probabilities describe how easily influence (e.g., an information or an in-
novation) spreads over the network starting from a limited set of nodes, usually termed seeds;
the goal is then to find the set of seeds that maximizes the number of “influenced” nodes. The
non-adaptive version of the problem, in which seeds are chosen before that influence starts, has
been widely studied, with numerous successful algorithms proposed [1, 2, 3] and applications
ranging from viral marketing [1], to election manipulation [4, 5], to infective disease prevention
[6]. Recently, focus moved to the adaptive variant of this problem (in which seeds can be added
even when the influence is spreading from previously placed seeds) [7, 8, 9, 10].

 IPS-RiCeRcA-SPIRIT 2022: 10th Italian Workshop on Planning and Scheduling, RiCeRcA Italian Workshop, and SPIRIT
Workshop on Strategies, Prediction, Interaction, and Reasoning in Italy.
*
  Corresponding author.
†
  These authors contributed equally.
" auletta@unisa.it (V. Auletta); dferraioli@unisa.it (D. Ferraioli); cosimo.vinci@unisalento.it (C. Vinci)
~ https://docenti.unisa.it/001366/home (V. Auletta); https://docenti.unisa.it/023604/home (D. Ferraioli);
https://www.unisalento.it/scheda-utente/-/people/cosimo.vinci (C. Vinci)
 0000-0002-7875-3366 (V. Auletta); 0000-0002-7962-5200 (D. Ferraioli); 0000-0001-7741-9342 (C. Vinci)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
   This variant has strong relationship with some Stochastic Probing problems [11], in which
there are 𝑛 items whose states are stochastically defined according to (not necessarily equal)
independent distributions and the goal is to choose a feasible subset of items maximizing the
value of a non-negative, monotone and submodular set function. Several stochastic probing
settings can be found in expert hiring [12], online dating and kidney exchange [13], Bayesian
mechanism design [14], and sequentially posted pricing [15, 16]. Research both in influence
maximization and stochastic probing mainly focused on the single-round setting. However, in
several cases the optimization is run in a multi-round setting. For example, viral marketing or
electoral campaigns are usually deployed in multiple rounds of (usually adaptive) influence;
similarly, expert hiring sessions and sequentially posted pricing are run in multiple steps.
   In this paper, we then advocate the necessity to extend the study of these problems in order to
consider all these properties: budget, adaptivity, and multiple-rounds. Indeed, previous works
only consider them separately.
   Indeed, the analysis of optimization advantages of adaptive w.r.t. non-adaptive algorithms
has been recently applied to stochastic maximization problems, including single-round influence
maximization and stochastic probing. In particular, Golovin and Krause [7] showed that when-
ever the objective function satisfies a property, named adaptive submodularity, then a greedy
algorithm achieves a constant approximation of the optimal greedy algorithm. Some results
have been recently presented about (Adaptive) Multi-round Influence Maximization [17, 18]:
these results however do not take into account budget.
   Multiple rounds have been considered mainly in the setting of online learning algorithms, aka
bandit algorithms, in which our algorithm are asked to work even if the input is only partially
known, and to improve their performance over multiple round (by essentially learning the input).
This class of algorithms has been introduced by Gittins [19], and it has been applied recently
also to influence maximization and other submodular maximization processes [20, 21, 22, 23, 24].
While these works focus on multi-round settings, they do not consider the chance of adaptively
choose observations, and often they also do not consider budget constraints.


Our contribution
In this work, we introduce the problem of Stochastic Budgeted Multi-round Adaptive Submodular
Maximization. The problem is defined by a number 𝑇 of rounds, a ground set of 𝑛 items, whose
state is drawn at each time step from a (node-specific) set Θ according to some time-dependent
distribution, a time-dependent monotone and submodular state function that assigns a non-
negative real-value to each realization of states upon specific item observations, and a budget
𝐵 on the possible observations done within the time-horizon 𝑇 . The problem is then to choose
which observations to make in each of the 𝑇 rounds, subject to the budget limit, in order to
maximize the expected total value of the state function over the 𝑇 rounds. It is not hard to see
how this problem generalizes both the influence maximization and stochastic probing problems:
in the first case, the state of an item 𝑣 defines which nodes would be influenced whether 𝑣 is
selected as a seed, with the distribution of the state of 𝑣 depending on edges’ probabilities; an
observation consists on choosing a given node as a seed, then running the influence spread, and
finally evaluating the realized state by, for example, counting the number or the total weight of
influenced nodes. In the second case, the state of an item can be active or inactive, and we can
observe the state of selected items only; then, the value of the submodular set function depends
on the set of selected items which are active. Note that SBMSm not only generalizes influence
maximization and stochastic probing problems by extending them to multiple round settings,
but it also allows to model more realistic settings where states can change over time: e.g., in the
influence maximization setting, we may allow edge probabilities or node weights to change
over time.
   Nevertheless the complexity of the problem, we claim that the state of our knowledge is
sufficient to face this kind of problems. In fact, we can prove that, by greedily allocating each
unit of budget to the round in which it is expected to produce the largest expected value of
the score function, given the amount of budget still available, and then running one of the
well-known single-round adaptive budgeted approaches to choose observations to make at each
round, we may achieve a constant approximation of the optimal algorithm (that adaptively
allocates the budget among rounds, and the allocations at each round).
   Let us now discuss few directions for the research about Stochastic Budgeted Multi-round
Adaptive Submodular Maximization that we deem as very interesting and promising. In partic-
ular, we will henceforth focus on two main approaches.

Online Learning. Suppose that state distributions are unknown but all rounds share the
same state distribution. Can we define an adaptive policy that maximizes the expected total
value of the state function by “learning” the underlying probability distribution? In particular,
we will consider an agnostic (adaptive multi-round) policy, where the choice of each picked item
𝑣 is based on the observed partial state (both in the current round and the previous ones), on the
initial instance, but not on the distribution of states. As each observation provides samples of
several local states, such samples will be used to determine an estimated distribution of states at
each round 𝑡, that is more accurate as 𝑡 increases and more samples are provided. The estimated
distribution will be used to "simulate" the real one with a good approximation (e.g., when
computing the item maximizing the expected increment).
   As benchmark quality metric for the proposed policy 𝜋, we will consider the 𝛼-approximate
regret 𝑅𝑒𝑔𝛼 (𝑇 ) = 𝛼 · 𝜎𝜇 (𝜋 * ) − 𝜎𝜇 (𝜋), where 𝜋 * is an optimal non-agnostic adaptive policy
(based on the knowledge of the distribution of states), and 𝜎𝜇 (𝜋) denotes the expected value of
𝜋.
   We believe it would be interesting to evaluate whether there is a policy 𝜋 with constant
approximation and sublinear regret. In particular, it has been recently introduced an approach,
named Combinatorial Upper Confidence Bound (CUCB), to design online learning algorithm
with constant approximation and poly-logarithmic regret [20, 21, 22], that has been successfully
used by Gabillon et al. [23] and by Das et al. [24] in setting that are similar but simpler than
the one proposed in this work. Unfortunately, it is not hard to see that a naive application
of CUCB to the model defined above cannot work, since we need not only to estimate which
observations to make at each round, but also how many, and the latter can be largely affected by
an imprecise estimation of states’ distribution. However, this raises other interesting questions:
are there settings in which we can be able to use the CUCB framework to achieve constant
approximation and poly-logarithmic regret? Can we design another framework such that is
able to return constant approximation and poly-logarithmic regret in our more general setting?
Or, how high is the cost that we have to pay in term of approximation guarantee in order to
achieve poly-logarithmic regret?
   While the regret is an useful measure (and the standard de facto) in order to understand the
learning abilities of an algorithm, other measures can be of some interest. For example, one may
be interested in agnostic policies that are able to reach a given goal (e.g., the aggregate state
function is above a given threshold) within a given deadline. Note that learning algorithm with
low regret may fail in achieving this goal: indeed, regret may depend on other time-independent
factors that may be often large. Moreover, low regret does not exclude long exploration phases
in which a lot of “value” has been lost.

Robust Optimization. The second approach is also motivated by partial or imprecise knowl-
edge of state distributions. However, while above we aimed to learn these distributions, we here
are concerned on how and how much this limited knowledge affects our optimization goals.
This approach is receiving large attention very recently, motivated by the necessity to run
optimization algorithms not on real data but on predictions often generated by machine learning
algorithms. The field has blossomed with applications to facility location [25], scheduling [26],
clustering [27], influence maximization [28].
   Here, natural interesting questions that we can ask are: how much the known optimization
frameworks are sensible to distortion on input data? Or how much has to be lost in that
framework if one want an algorithm that is robust to distortions within a given threshold?


Acknowledgments
This work is partially supported by the Italian MIUR PRIN 2017 Project ALGADIMAR “Al-
gorithms, Games, and Digital Markets”, by “GNCS–INdAM” and by the PON R&I 2014-2020
Project TEBAKA "Sistema per acquisizione conoscenze di base del territorio”.


References
 [1] D. Kempe, J. Kleinberg, É. Tardos, Maximizing the spread of influence through a social
     network, in: KDD, 2003, pp. 137–146.
 [2] N. Chen, On the approximability of influence in social networks, SIAM Journal on Discrete
     Mathematics 23 (2009) 1400–1415.
 [3] C. Borgs, M. Brautbar, J. Chayes, B. Lucier, Maximizing social influence in nearly optimal
     time, in: SODA, 2014, pp. 946–957.
 [4] B. Wilder, Y. Vorobeychik, Controlling elections through social influence, in: AAMAS,
     2018, pp. 265–273.
 [5] M. Castiglioni, D. Ferraioli, N. Gatti, G. Landriani, Election manipulation on social networks:
     Seeding, edge removal, edge addition, Journal of Artificial Intelligence Research 71 (2021)
     1049–1090.
 [6] A. Yadav, B. Wilder, E. Rice, R. Petering, J. Craddock, A. Yoshioka-Maxwell, M. Hemler,
     L. Onasch-Vera, M. Tambe, D. Woo, Influence maximization in the field: The arduous
     journey from emerging to deployed application, in: AAMAS, 2017, pp. 150–158.
 [7] D. Golovin, A. Krause, Adaptive submodularity: Theory and applications in active learning
     and stochastic optimization, Journal of Artificial Intelligence Research 42 (2011) 427–486.
 [8] W. Chen, B. Peng, On adaptivity gaps of influence maximization under the independent
     cascade model with full-adoption feedback, in: ISAAC, 2019, pp. 24:1–24:19.
 [9] G. D’Angelo, D. Poddar, C. Vinci, Better bounds on the adaptivity gap of influence
     maximization under full-adoption feedback, in: AAAI, 2021, pp. 12069–12077.
[10] G. D’Angelo, D. Poddar, C. Vinci, Improved approximation factor for adaptive influence
     maximization via simple greedy strategies, in: ICALP, 2021, pp. 59:1–59:19.
[11] A. Asadpour, H. Nazerzadeh, A. Saberi, Stochastic submodular maximization, in: WINE,
     2008, pp. 477–489.
[12] M. Hoefer, K. Schewior, D. Schmand, Stochastic probing with increasing precision, in:
     IJCAI, 2021, pp. 4069–4075.
[13] A. Gupta, V. Nagarajan, A stochastic probing problem with applications, in: IPCO, 2013,
     pp. 205–216.
[14] A. Asadpour, H. Nazerzadeh, Maximizing stochastic monotone submodular functions,
     Management Science 62 (2016) 2374–2391.
[15] S. Chawla, J. D. Hartline, D. L. Malec, B. Sivan, Multi-parameter mechanism design and
     sequential posted pricing, in: STOC, 2010, pp. 311–320.
[16] M. Adamczyk, A. Borodin, D. Ferraioli, B. d. Keijzer, S. Leonardi, Sequential posted price
     mechanisms with correlated valuations, in: WINE, 2015, pp. 1–15.
[17] L. Sun, W. Huang, P. S. Yu, W. Chen, Multi-round influence maximization, in: KDD, 2018,
     pp. 2249–2258.
[18] G. Tong, R. Wang, Z. Dong, X. Li, Time-constrained adaptive influence maximization,
     IEEE Transactions on Computational Social Systems 8 (2020) 33–44.
[19] J. C. Gittins, Bandit processes and dynamic allocation indices, Journal of the Royal
     Statistical Society: Series B (Methodological) 41 (1979) 148–164.
[20] S. Vaswani, L. Lakshmanan, M. Schmidt, et al., Influence maximization with bandits, arXiv
     preprint arXiv:1503.00024 (2015).
[21] W. Chen, Y. Wang, Y. Yuan, Q. Wang, Combinatorial multi-armed bandit and its extension
     to probabilistically triggered arms, The Journal of Machine Learning Research 17 (2016)
     1746–1778.
[22] Z. Wen, B. Kveton, M. Valko, S. Vaswani, Online influence maximization under independent
     cascade model with semi-bandit feedback, NeurIPS 30 (2017).
[23] V. Gabillon, B. Kveton, Z. Wen, B. Eriksson, S. Muthukrishnan, Adaptive submodular
     maximization in bandit setting, NeurIPS 26 (2013).
[24] D. Das, S. Jain, S. Gujar, Budgeted combinatorial multi-armed bandits, in: AAMAS, 2022,
     pp. 345–353.
[25] Y. Azar, D. Panigrahi, N. Touitou, Online graph algorithms with predictions, in: SODA,
     2022, pp. 35–66.
[26] Y. Azar, S. Leonardi, N. Touitou, Flow time scheduling with uncertain processing time, in:
     STOC, 2021, pp. 1070–1080.
[27] J. C. Ergun, Z. Feng, S. Silwal, D. Woodruff, S. Zhou, Learning-augmented k-means
     clustering, in: ICLR, 2021.
[28] B. Wilder, N. Immorlica, E. Rice, M. Tambe, Maximizing influence in an unknown social
     network, in: AAAI, 2018, pp. 4743–4750.