<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Optimal Rates for Online Bayesian Persuasion (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martino Bernasconi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Castiglioni</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Celli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto Marchesi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Trovò</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Gatti</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computing Sciences, Università Bocconi</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Bayesian persuasion studies how an informed sender should influence beliefs of rational receivers who take decisions through Bayesian updating of a common prior. We focus on the online Bayesian persuasion framework, in which the sender repeatedly faces one or more receivers with unknown and adversarially selected types. We show how to obtain a tight ̃︀( 1/2) regret bound in the case in which the sender faces a single receiver and has partial feedback, improving over the best previously-known bound of ̃︀( 4/5). We also provide the first no-regret guarantees for the multi-receiver setting with partial feedback.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Bayesian Persuasion</kwd>
        <kwd>Online Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        with multiple receivers [27, 14, 28]. A key question that has emerged is whether computational
techniques can be used to ease some of the assumptions made in the original model by Kamenica
and Gentzkow [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Two main lines of research have emerged: one is aimed at developing robust
algorithms that can bypass the common-prior assumption [29, 30], and the other is focused on
the robustness of persuasion when the sender is unaware of the receiver’s goals [31, 32, 33].
      </p>
      <p>This work follows the second perspective, and studies the online Bayesian persuasion
framework introduced by Castiglioni et al. [31, 34], where the sender repeatedly faces a receiver
whose type is unknown and chosen adversarially at each round from a finite set of types.</p>
      <p>We start by describing a general no-regret algorithm for online learning against an oblivious
adversary with a finite number of possible loss functions. We use this algorithm to provide
a tight ˜( 1/2) regret upper bound in the setting with one receiver and partial feedback,
improving over the ˜( 4/5) rate by Castiglioni et al. [31]. This result also improves the best
known bound of ˜( 2/3) for online learning in repeated Stackelberg games provided by Balcan
et al. [35]. Then, we show that our general framework can be applied to obtain the first no-regret
guarantees under partial feedback in the multi-receiver setting introduced by Castiglioni et al.
[32]. In particular, we provide a tight ˜( 1/2) regret bound under the assumption the set of
possible type profiles of the receivers is known beforehand by the sender. In each of these
settings, our no-regret algorithms may sufer from exponential per-iteration running time, as
expected from known hardness results for the online Bayesian persuasion settings [31]. In the
last part of the paper, we provide the first no-regret algorithms for online Bayesian persuasion
with guaranteed polynomial per-iteration running time. We do that by considering the type
reporting framework introduced by Castiglioni et al. [36], where the sender can commit to a
menu of signaling schemes, and then let the receivers choose their preferred signaling scheme
depending on their private types. In such a setting, we provide a ( 1/2) regret upper bound for
the single-receiver setting. Moreover, by designing a general procedure based on the follow the
regularized leader algorithm, we show that it is possible to achieve the same rate of convergence
with polynomial-time per-iteration time complexity also in the multi-receiver setting, when
receivers have binary actions and the utility of the sender is specified by a function of receivers’
actions that is either supermodular or anonymous.</p>
      <p>The main motivation for introducing the reduction from online problems with finite number of
losses to online linear optimization was to solve online Bayesian persuasion problems. However,
this result finds applicability in other settings such as learning in security games and bidding in
combinatorial auctions.</p>
      <p>Learning in security games [35] extends classic (one-shot) security games (see, e.g., [37]) by
introducing the problem of learning a no-regret strategy for the defender against a sequence
of attackers that is adversarially selected. In this model, at each round , the defender chooses
a strategy , which is a distribution over  targets. Then, an attacker of type  ∈  best
responds to such a strategy and the defender experiences a loss of  (). Our reduction yields
a ˜(poly()√ ) regret bound under partial feedback, which improves the previously-known
regret bound given by Balcan et al. [35], which is of order (poly( ) 2/3).</p>
      <p>Another application of our framework is bidding in repeated combinatorial auctions [38].
In these auctions the action space is combinatorial and, therefore, exponentially large. Our
reduction to online linear optimization gives a ˜(poly()√ ) bound for this problem under
partial feedback.
This paper is supported by the Italian MIUR PRIN 2022 Project “Targeted Learning Dynamics:
Computing Eficient and Fair Equilibria through No-Regret Algorithms”, by the FAIR (Future
Artificial Intelligence Research) project, funded by the NextGenerationEU program within the
PNRR-PE-AI scheme (M4C2, Investment 1.3, Line on Artificial Intelligence), and by the EU
Horizon project ELIAS (European Lighthouse of AI for Sustainability, No. 101120237).
[10] R. Alonso, O. Câmara, Persuading voters, American Economic Review 106 (2016) 3590–
3605.
[11] M. Castiglioni, A. Celli, N. Gatti, Persuading voters: It’s easy to whisper, it’s hard to
speak loud, in: The Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020, pp.
1870–1877.
[12] M. Castiglioni, N. Gatti, Persuading voters in district-based elections, in: Proceedings of
the AAAI Conference on Artificial Intelligence, volume 35, 2021, pp. 5244–5251.
[13] S. Vasserman, M. Feldman, A. Hassidim, Implementing the wisdom of waze, in:
Twenty</p>
      <p>Fourth International Joint Conference on Artificial Intelligence, 2015, pp. 660–666.
[14] U. Bhaskar, Y. Cheng, Y. K. Ko, C. Swamy, Hardness results for signaling in Bayesian
zero-sum and network routing games, in: Proceedings of the 2016 ACM Conference on
Economics and Computation, 2016, pp. 479–496.
[15] M. Castiglioni, A. Celli, A. Marchesi, N. Gatti, Signaling in bayesian network congestion
games: the subtle power of symmetry, in: Proceedings of the AAAI Conference on Artificial
Intelligence, volume 35, 2021, pp. 5252–5259.
[16] Z. Rabinovich, A. X. Jiang, M. Jain, H. Xu, Information disclosure as a means to security, in:
Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent
Systems, 2015, pp. 645–653.
[17] H. Xu, R. Freeman, V. Conitzer, S. Dughmi, M. Tambe, Signaling in Bayesian Stackelberg
games, in: Proceedings of the 2016 International Conference on Autonomous Agents and
Multiagent Systems, 2016, pp. 150–158.
[18] J. Wu, Z. Zhang, Z. Feng, Z. Wang, Z. Yang, M. I. Jordan, H. Xu, Sequential information
design: Markov persuasion process and its eficient reinforcement learning, in: Proceedings
of the 23rd ACM Conference on Economics and Computation, 2022, p. 471–472.
[19] J. Gan, R. Majumdar, G. Radanovic, A. Singla, Bayesian persuasion in sequential
decisionmaking, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 36,
2022, pp. 5025–5033.
[20] M. Bernasconi, M. Castiglioni, A. Marchesi, N. Gatti, F. Trovò, Sequential information
design: Learning to persuade in the dark, in: NeurIPS, 2022.
[21] I. Kremer, Y. Mansour, M. Perry, Implementing the “wisdom of the crowd”, Journal of</p>
      <p>Political Economy 122 (2014) 988–1012.
[22] L. Cohen, Y. Mansour, Optimal algorithm for bayesian incentive-compatible exploration,
in: Proceedings of the 2019 ACM Conference on Economics and Computation, 2019, pp.
135–151.
[23] Y. Mansour, A. Slivkins, V. Syrgkanis, Z. S. Wu, Bayesian exploration: Incentivizing
exploration in Bayesian games, in: Proceedings of the 2016 ACM Conference on Economics
and Computation, 2016, pp. 661–661.
[24] M. Sellke, A. Slivkins, The price of incentivizing exploration: A characterization via
thompson sampling and sample complexity, in: Proceedings of the 22nd ACM Conference
on Economics and Computation, 2021, pp. 795–796.
[25] Y. Mansour, A. Slivkins, V. Syrgkanis, Z. S. Wu, Bayesian exploration: Incentivizing
exploration in Bayesian games, Operations Research 70 (2022) 1105–1127.
[26] S. Dughmi, H. Xu, Algorithmic Bayesian persuasion, in: Proceedings of the forty-eighth
annual ACM symposium on Theory of Computing, 2016, pp. 412–425.
[27] S. Dughmi, H. Xu, Algorithmic persuasion with no externalities, in: Proceedings of the
2017 ACM Conference on Economics and Computation, 2017, pp. 351–368.
[28] H. Xu, On the tractability of public persuasion with no externalities, in: Proceedings of
the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, 2020, pp.
2708–2727.
[29] Y. Zu, K. Iyer, H. Xu, Learning to persuade on the fly: Robustness against ignorance,</p>
      <p>Proceedings of the 22nd ACM Conference on Economics and Computation (2021) 927–928.
[30] M. K. Camara, J. D. Hartline, A. Johnsen, Mechanisms for a no-regret agent: Beyond the
common prior, in: 2020 ieee 61st annual symposium on foundations of computer science
(focs), IEEE, 2020, pp. 259–270.
[31] M. Castiglioni, A. Celli, A. Marchesi, N. Gatti, Online Bayesian persuasion, Advances in</p>
      <p>Neural Information Processing Systems 33 (2020) 16188–16198.
[32] M. Castiglioni, A. Marchesi, A. Celli, N. Gatti, Multi-receiver online bayesian persuasion,
in: International Conference on Machine Learning, PMLR, 2021, pp. 1314–1323.
[33] Y. Babichenko, I. Talgam-Cohen, H. Xu, K. Zabarnyi, Regret-minimizing Bayesian
persuasion, arXiv preprint arXiv:2105.13870 (2021).
[34] M. Castiglioni, A. Celli, A. Marchesi, N. Gatti, Regret minimization in online bayesian
persuasion: Handling adversarial receiver’s types under full and partial feedback models,
Artificial Intelligence 314 (2023) 103821.
[35] M.-F. Balcan, A. Blum, N. Haghtalab, A. D. Procaccia, Commitment without regrets: Online
learning in Stackelberg security games, in: Proceedings of the Sixteenth ACM Conference
on Economics and Computation, 2015, p. 61–78.
[36] M. Castiglioni, A. Marchesi, N. Gatti, Bayesian persuasion meets mechanism design: Going
beyond intractability with type reporting, 2022, pp. 226–234.
[37] M. Tambe, Security and game theory: algorithms, deployed systems, lessons learned,</p>
      <p>Cambridge university press, 2011.
[38] C. Daskalakis, V. Syrgkanis, Learning in auctions: Regret is hard, envy is easy, Games and
Economic Behavior (2022).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>R. De Benedictis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Gatti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maratea</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Murano</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Scala</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Serafini</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Serina</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Tosello</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Umbrico</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Vallati</surname>
          </string-name>
          , Preface to the
          <source>Italian Workshop on Planning and Scheduling</source>
          , RCRA Workshop on
          <article-title>Experimental evaluation of algorithms for solving problems with combinatorial explosion, and</article-title>
          SPIRIT Workshop on Strategies, Prediction, Interaction, and
          <article-title>Reasoning in Italy (IPS-RCRA-SPIRIT</article-title>
          <year>2023</year>
          ),
          <source>in: Proceedings of the Italian Workshop on Planning and Scheduling</source>
          , RCRA Workshop on
          <article-title>Experimental evaluation of algorithms for solving problems with combinatorial explosion, and</article-title>
          SPIRIT Workshop on Strategies, Prediction, Interaction, and
          <article-title>Reasoning in Italy (IPS-RCRA-SPIRIT 2023) co-located with 22th International Conference of the Italian Association for Artificial Intelligence (AI* IA</article-title>
          <year>2023</year>
          ),
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bernasconi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Castiglioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Celli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marchesi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Trovò</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gatti</surname>
          </string-name>
          ,
          <article-title>Optimal rates and eficient algorithms for online Bayesian persuasion</article-title>
          , in: A.
          <string-name>
            <surname>Krause</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Brunskill</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Engelhardt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sabato</surname>
          </string-name>
          , J. Scarlett (Eds.),
          <source>Proceedings of the 40th International Conference on Machine Learning</source>
          , volume
          <volume>202</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>2164</fpage>
          -
          <lpage>2183</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kamenica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gentzkow</surname>
          </string-name>
          , Bayesian persuasion,
          <source>American Economic Review</source>
          <volume>101</volume>
          (
          <year>2011</year>
          )
          <fpage>2590</fpage>
          -
          <lpage>2615</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bro Miltersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Shefet</surname>
          </string-name>
          ,
          <article-title>Send mixed signals: earn more, work less</article-title>
          ,
          <source>in: Proceedings of the 13th ACM Conference on Electronic Commerce</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>234</fpage>
          -
          <lpage>247</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Emek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Feldman</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gamzu</surname>
          </string-name>
          , R. PaesLeme, M. Tennenholtz,
          <article-title>Signaling schemes for revenue maximization</article-title>
          ,
          <source>ACM Transactions on Economics and Computation</source>
          <volume>2</volume>
          (
          <year>2014</year>
          )
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Badanidiyuru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bhawalkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Targeting and signaling in ad auctions</article-title>
          ,
          <source>in: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>2545</fpage>
          -
          <lpage>2563</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Castiglioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Romano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marchesi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gatti</surname>
          </string-name>
          ,
          <article-title>Signaling in posted price auctions</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>36</volume>
          (
          <year>2022</year>
          )
          <fpage>4941</fpage>
          -
          <lpage>4948</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bacchiocchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Castiglioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marchesi</surname>
          </string-name>
          , G. Romano,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gatti</surname>
          </string-name>
          ,
          <article-title>Public signaling in bayesian ad auctions</article-title>
          ,
          <source>in: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI</source>
          <year>2022</year>
          , Vienna, Austria,
          <fpage>23</fpage>
          -
          <issue>29</issue>
          <year>July 2022</year>
          ,
          <article-title>ijcai</article-title>
          .org,
          <year>2022</year>
          , pp.
          <fpage>39</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cheng</surname>
          </string-name>
          , H. Y. Cheung,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dughmi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Emamjomeh-Zadeh</surname>
          </string-name>
          , L. Han,
          <string-name>
            <given-names>S.-H.</given-names>
            <surname>Teng</surname>
          </string-name>
          ,
          <article-title>Mixture selection, mechanism design, and signaling</article-title>
          ,
          <source>in: 56th Annual Symposium on Foundations of Computer Science</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1426</fpage>
          -
          <lpage>1445</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>