<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning Augmented Online Learning Algorithms - The Adversarial Bandit with Knapsacks framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Davide Drago</string-name>
          <email>davide.drago@studbocconi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Celli</string-name>
          <email>andrea.celli2@unibocconi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marek Eliáš</string-name>
          <email>marek.elias@unibocconi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bocconi University</institution>
          ,
          <addr-line>Via Guglielmo Röntgen 1, Milan, 20136</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We delve into the Bandit with Knapsacks framework with the aim of creating a learning-augmented online algorithm with better competitive guarantees than the state-of-the-art classical worst-case algorithms. In particular, we obtain better competitive ratios when the input predictions are accurate, while also upholding worst-case scenario guarantees for imprecise predictions. Two unique algorithms are introduced - the first working in a full feedback environment and the other tailored for a bandit setting. Both algorithms integrate a static prediction in a worst-case  -competitive algorithm. This results in an optimized competitive ratio of 1/[ + 1 (1 −  )] in scenarios where the prediction is perfect, and a marginally compromised constant competitive ratio of / (1 −  ) when the prediction is highly imprecise, with  ∈ (0, 1) parameter chosen by the decision-makers.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        1.1. Related Work
Bandit with Knapsacks. In the case of adversarial bandits with knapsacks, Immorlica et al.
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] provide a competitive ratio of ( log  ). This was improved by Kesselheim and Singla
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to (log  log  ). Subsequently, Castiglioni et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] provided the first constant-factor
competitive ratio for the case in which  = Ω(  ). Such competitive ratio is 1/ = /.
Learning Augmented online algorithms. The framework of Learning Augmented online
algorithms was formally established by Lykouris and Vassilvtiskii [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Applications of this
framework are wide-ranging and include scheduling [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], caching or paging algorithms [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ].
In addition, recently a general framework for integrating predictions into online primal-dual
algorithms was introduced in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Setting</title>
      <p>
        The decision maker makes a sequence of  decisions, drawing actions from a finite set . A
randomized strategy is defined as   ∈ ∆( ). We denote by   is the best predicted mixed
strategy, while  * is the best-fixed distribution. The decision maker has  available resources
and a budget  for each of them. A sequence of items  is selected by an adversary. In our
setting,   will be composed of a reward  ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] and a cost vector  ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]× . We focus
on the case in which  = Ω(  ). We denote as  = / the ratio of budget to time horizon.
Benchmark. The benchmark used in the paper is the Fixed Distribution benchmark, defined
in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and denoted as OPT . Such a quantity is defined as the expected total reward of the
distribution over actions  * , maximizing E[REW].
      </p>
      <p>Regret. To evaluate the algorithm we use the notion of pseudo-regret, expressed ac</p>
      <p>E[REWALG] ≥  OPT  + reg
where 1/ is the competitive ratio, OPT  is the profit of the fixed distribution benchmark, and
reg a sublinear regret term.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Algorithms</title>
      <p>Full-feedback. In the full-feedback algorithm, at each iteration, with probability  the
prediction is played, with probability  the iteration is skipped, and with the remaining probability the
worst-case algorithm is played. Both the prediction and the worst-case algorithm are assigned
the full budget  and are stopped when the budget assigned would be depleted, had they been
played for the full sequence. The worst-case algorithm is updated at each iteration.
Bandit. The diference in the bandit feedback algorithm lies in the update rules. The worst-case
algorithm is updated in the iterations in which it is played, otherwise, we set the feedback
at (0, 0). Moreover, since the calculation of the expected stopping times is not possible with
the bandit feedback, the budget must be divided preemptively between the two algorithms,
proportionally to the probability of being played.</p>
      <p>Results. Both algorithms have the same competitive ratio guarantees in their respective settings.
Theorem 3.1. The algorithms with   =  * and  = 2√2 log(1/ ) , for a sequence of inputs 
 1/2
achieve w.h.p. a competitive ratio of 1/[ +  (1 − )]. When ∑︀ ( ) = 0, the competitive ratio
degrades to 1/[ (1 − )].</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>Our findings, although encouraging, have some limitations. Specifically, our algorithms do not
ensure sublinear regret under stochastic inputs, and are not designed to adjust the probability
parameter  in response to real-time performance. Future research could focus on adapting
our framework to stochastic environments and creating algorithms capable of dynamically
modifying the parameter  as system dynamics change. Moreover, enhancing our model to
provide best-of-both-worlds guarantees may be useful in diverse applications.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R. D.</given-names>
            <surname>Benedictis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Castiglioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ferraioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Malvone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maratea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Scala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Serafini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Serina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tosello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Umbrico</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vallati</surname>
          </string-name>
          , Preface to the
          <source>Italian Workshop on Planning and Scheduling</source>
          , RCRA Workshop on
          <article-title>Experimental evaluation of algorithms for solving problems with combinatorial explosion, and</article-title>
          SPIRIT Workshop on Strategies, Prediction, Interaction, and
          <article-title>Reasoning in Italy (IPS-RCRA-SPIRIT</article-title>
          <year>2023</year>
          ),
          <source>in: Proceedings of the Italian Workshop on Planning and Scheduling</source>
          , RCRA Workshop on
          <article-title>Experimental evaluation of algorithms for solving problems with combinatorial explosion, and</article-title>
          SPIRIT Workshop on Strategies, Prediction, Interaction, and
          <article-title>Reasoning in Italy (IPS-RCRA-SPIRIT 2023) co-located with 22th International Conference of the Italian Association for Artificial Intelligence (AI* IA</article-title>
          <year>2023</year>
          ),
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Badanidiyuru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Slivkins</surname>
          </string-name>
          ,
          <article-title>Bandits with knapsacks</article-title>
          ,
          <source>in: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>207</fpage>
          -
          <lpage>216</lpage>
          . doi:
          <volume>10</volume>
          .1109/ FOCS.
          <year>2013</year>
          .
          <volume>30</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Immorlica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sankararaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schapire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Slivkins</surname>
          </string-name>
          ,
          <article-title>Adversarial bandits with knapsacks</article-title>
          ,
          <source>Journal of the ACM</source>
          <volume>69</volume>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1145/3557045.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kesselheim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singla</surname>
          </string-name>
          ,
          <article-title>Online learning with vector costs and bandits with knapsacks</article-title>
          , in: J.
          <string-name>
            <surname>Abernethy</surname>
          </string-name>
          , S. Agarwal (Eds.),
          <source>Proceedings of Thirty Third Conference on Learning Theory</source>
          , volume
          <volume>125</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>2286</fpage>
          -
          <lpage>2305</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Castiglioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Celli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kroer</surname>
          </string-name>
          ,
          <article-title>Online learning with knapsacks: the best of both worlds</article-title>
          , in: K. Chaudhuri,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jegelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Szepesvari</surname>
          </string-name>
          , G. Niu, S. Sabato (Eds.),
          <source>Proceedings of the 39th International Conference on Machine Learning</source>
          , volume
          <volume>162</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>2767</fpage>
          -
          <lpage>2783</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lykouris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vassilvtiskii</surname>
          </string-name>
          ,
          <article-title>Competitive caching with machine learned advice</article-title>
          , in: J.
          <string-name>
            <surname>Dy</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Krause (Eds.),
          <source>Proceedings of the 35th International Conference on Machine Learning</source>
          , volume
          <volume>80</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>3296</fpage>
          -
          <lpage>3305</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lattanzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavastida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Moseley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vassilvitskii</surname>
          </string-name>
          , Online Scheduling via Learned Weights,
          <year>2020</year>
          , pp.
          <fpage>1859</fpage>
          -
          <lpage>1877</lpage>
          . doi:
          <volume>10</volume>
          .1137/1.9781611975994.114.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mitzenmacher</surname>
          </string-name>
          ,
          <article-title>Scheduling with predictions and the price of misprediction</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1902</year>
          .00732.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Rohatgi</surname>
          </string-name>
          ,
          <article-title>Near-optimal bounds for online caching with machine learned advice</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1910</year>
          .12172.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Antoniadis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Coester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Elias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Simon</surname>
          </string-name>
          ,
          <article-title>Online metric algorithms with untrusted predictions</article-title>
          , in: H.
          <string-name>
            <surname>D. III</surname>
          </string-name>
          , A. Singh (Eds.),
          <source>Proceedings of the 37th International Conference on Machine Learning</source>
          , volume
          <volume>119</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>345</fpage>
          -
          <lpage>355</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Étienne</given-names>
            <surname>Bamas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maggiori</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. Svensson,</surname>
          </string-name>
          <article-title>The primal-dual method for learning augmented algorithms</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>2010</year>
          .11632.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>