<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Pretrained Language Models Rankers on Private Data: Is Online and Federated Learning the Solution?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Guido Zuccon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The University of Queensland</institution>
          ,
          <addr-line>St Lucia</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Modern search engines rely upon extensive mining of users queries and interactions: this is the case also for recent advances in rankers based on pretrained language models. Users rightly worry about the privacy implications of this. For domains such as health, personal and enterprise search, sharing of data is strictly prohibited. In this paper we outline a possible solution for building efective and privacy preserving rankers based on state-of-the-art pretrained language models, and the problems that need to resolved for such a solution to successfully materialise.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>and enterprise search, sharing of data is strictly prohibited. Three main barriers are present:
1. Editorial labelling (i.e., labels provided by editors that are not the user/entity who owns the
data) would pose a great privacy threat as it requires others to access private data, e.g., an
editor reading your WeChat messages.
2. It is unlikely that users would provide a large quantity of explicit labels for their own data,
e.g., it does not seem viable to ask you to label a large number of your own Messenger chats.
3. The data itself needs to be transferred to a central search service where the ranker would be
trained and operated – and this is often not possible, e.g., consider extracting private medical
records from a hospital and allowing a commercial search provider to index and search them.
An alternative is to fine-tune a private PLM ranker just on the (likely small) user data and with
limited labels: this however will not produce an efective ranker. A key problem then arises:
How do we improve a PLM ranker capacity to be efective without large training data, while still
preserving the users’ privacy (i.e., without sharing or leaking of their data)?</p>
      <p>
        The intuition that I present in this paper is that a novel framework where PLM rankers are
ifne-tuned in a federated and online manner using implicit user interactions, such as queries
and clicks, may provide solutions for creating efective PLM rankers on private, not shared
data and with no need for explicit labels. The on-device federated learning of a ranker would
allow the data and the ranker to reside directly on the user device while exploiting signals from
multiple users for training the ranker in a privacy-preserving manner, i.e. without sharing the
actual data. Instead of sharing data, clients fine-tune the ranker on their local data and then
communicate to a central server the ranker updates they obtained. Then the server aggregates
the local updates from clients to create a new global ranker, which is then shared back with each
client. This process may be repeated throughout the lifetime of the search product. Exploiting
data from multiple users is vital, as PLM rankers are data-hungry. The online learning process
would allow learning rankers “on-the-fly” via continuous updates and by letting the ranker
probe the search space to gain higher efectiveness and robustness to shifting query intents [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
This online learning is coupled with the exploitation of user interactions, such as queries posed
to the search service and clicks made on search results. This would make the unlikely labelling
of user data not required, as implicit feedback, although noisier than labels, can be efectively
exploited for learning rankers [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
      </p>
      <p>Key to the realisation of the envisioned novel federated online framework for creating efective
PLM rankers is the need to address the following challenges:</p>
      <p>
        Challenge 1: How to learn PLM rankers federatively, in an efective and eficient
manner? Our previous work on federated online learning to rank has contributed efective
rankers and aggregation methods [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] but in the context of feature-based rankers. PLM rankers
have been shown to be more efective than traditional feature-based learning to rank methods.
The architecture of rankers based on PLMs is largely diferent, and it is unclear how advances
in federated learning to rank apply to PLMs. Recent research has explored federated learning
for the training of PLMs [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ], however: (1) these methods are for the training of the actual
language model, not for the creation of rankers, (2) the loss attributed to the federated process
VS. centralised learning is substantial [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Besides learning efective PLM rankers, a major
challenge will be to do so in an eficient manner. BERT-based rankers are 400+ MB in size,
other PLMs are even larger (e.g. GPT-3 is 350 GB). It is then unimaginable in practical settings
that require frequent real-time ranker updates to pass complete models within the federated
online learning process after a few interactions (searches) have occurred. On the other hand,
infrequent model updates may lead to low ranker efectiveness, as we have shown in the context
of federated online learning to rank with feature-based rankers [
        <xref ref-type="bibr" rid="ref10 ref7">7, 10</xref>
        ].
      </p>
      <p>
        Challenge 2: What are efective ways to handle noise and bias from clicks when
training PLMs rankers? The envisioned framework relies on learning PLM rankers using
interaction data, such as clicks, in place of editorial labels. PLM rankers in fact require a
substantial amount of data for fine-tuning: the use of readily available interaction data would
make up for the absence of labelled data. Learning from this noisy and biased feedback has been
largely investigated in the context of learning to rank [
        <xref ref-type="bibr" rid="ref11 ref6">11, 6</xref>
        ] and online learning to rank [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
Key to efective learning in the presence of such a signal are methods for de-biasing and
denoising clicks, e.g., online and counterfactual learning. Previous work has devised efective
techniques for this problem in the context of feature-based rankers, including merging the
counterfactual and online techniques [13]. However, aspects still unexplored are how this noisy
and biased signal afects the learning of PLM rankers and how efective learning can take place
in this context. Recent work has made initial inroads towards this problem, by creating a dense
retrievers method that can exploit historical interaction data (however, not online data) [14].
      </p>
      <p>Challenge 3: What is the efect of out-of-distribution data, e.g., non identically and
independently distributed (non-IID) data? Our previous work on non-IID data in federated
online learning to rank has shown the efect of out-of-distribution data on rankers and in
which situations this is a problem [15]. This was done for feature-based rankers: whether these
problems apply to PLM rankers and to what extent is hard to forecast. We note however that,
in non-federated settings, PLM rankers have been already shown to sufer when fine-tuning
and testing data are out-of-distribution [16, 17, 18].</p>
      <p>Challenge 4: Are federated PLM rankers secure and is user privacy maintained?
There are two components that can compromise the security of federated rankers and expose
users’ private data. One is the message passing between clients and server; this can generally
be secured either via noise injection or via encryption – but their efect on PLM rankers needs
investigation. The other component is the actual ranker. Assume a malicious agent joins the
federation and accesses the PLM ranker. Can the ranker be “inspected” to gain knowledge
about user data and queries? Unlike feature-based rankers, PLMs rankers directly learn from
the raw user searchable data (and not its features) and queries. Previous work has shown that
PLMs (not for ranking) can expose private information contained in the text they were learned
from [19, 20]. How can we then ensure that the model parameters cannot be “reversed” to
understand the data used to obtain them?</p>
      <p>
        Note that the use of online learning and of federated learning is not new to Information
Retrieval; in fact, the two have even been already combined in the context of learning to
rank [
        <xref ref-type="bibr" rid="ref10 ref7">21, 7, 10, 22, 15</xref>
        ] – but no work has investigated how these apply to PLMs rankers. We also
note that while federated learning has been shown highly efective in general machine learning
tasks [23], in online ranking tasks federated learning still displays substantial gaps compared to
centralised learning [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This may well be the case also for initial attempts at creating federated
PLMs rankers. Similarly, federated learning has been applied to pretrained language models in
the context of natural language processing tasks [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], but these works do not consider online
learning. We believe that while federated online learning of PLM rankers would likely share
some of the challenges present also in the aforementioned areas, it also present specific twist to
these challenges as well as challenges that are unique.
      </p>
      <p>I believe that embracing the framework for PLM rankers envisioned in this paper and
addressing the challenges outlined above, would allow to create PLM search engines that can
be trained in a federated manner, without surrendering user data to a central search service,
enabling search vendors to shift ranker creation directly to the user device, a vital requirement
when user data should remain not shared.</p>
      <p>Acknowledgments
This research has been partially sponsored by CCF-Baidu Open Fund. I would like to thank
Mr Shengyao Zhuang and Dr Harrisen Scells (IElab, The University of Queensland), and Prof
Jimmy Lin (Waterloo University) for comments and suggestions on the ideas exposed here.
[13] S. Zhuang, G. Zuccon, Counterfactual online learning to rank, in: ECIR’20, 2020.
[14] S. Zhuang, H. Li, G. Zuccon, Implicit feedback for dense passage retrieval: A counterfactual
approach, in: SIGIR’22, 2022.
[15] S. Wang, G. Zuccon, Is non-iid data a threat in federated online learning to rank?, in:</p>
      <p>SIGIR’22, 2022.
[16] S. Zhuang, G. Zuccon, Dealing with typos for BERT-based passage retrieval and ranking,
in: EMNLP’21, 2021.
[17] S. Zhuang, G. Zuccon, Characterbert and self-teaching for improving the robustness of
dense retrievers on queries with typos, in: SIGIR’22, 2022.
[18] C. Sciavolino, Z. Zhong, J. Lee, D. Chen, Simple entity-centric questions challenge dense
retrievers, in: EMNLP’21, 2021.
[19] T. Vakili, H. Dalianis, Are Clinical BERT Models Privacy Preserving? The Dificulty of</p>
      <p>Extracting Patient-Condition Associations (2021).
[20] C. Qu, W. Kong, L. Yang, M. Zhang, M. Bendersky, M. Najork, Natural language
understanding with privacy-preserving bert, in: CIKM’21, 2021.
[21] E. Kharitonov, Federated online learning to rank with evolution strategies, in: WSDM’19,
2019, pp. 249–257.
[22] C. Li, H. Ouyang, Federated unbiased learning to rank, arXiv preprint arXiv:2105.04761
(2021).
[23] Q. Yang, Y. Liu, Y. Cheng, Y. Kang, T. Chen, H. Yu, Federated learning, Synthesis Lectures
on Artificial Intelligence and Machine Learning 13 (2019) 1–207.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Yates</surname>
          </string-name>
          ,
          <article-title>Pretrained transformers for text ranking: Bert and beyond</article-title>
          , Synth. Lec. on Info. Conc.,
          <string-name>
            <surname>Ret</surname>
          </string-name>
          . &amp;
          <string-name>
            <surname>Serv</surname>
          </string-name>
          .
          <volume>14</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: NAACL'19</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: NIPS'17</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          , G. Zuccon,
          <article-title>How do online learning to rank methods adapt to changes of intent?</article-title>
          ,
          <source>in: SIGIR'21</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Whiteson</surname>
          </string-name>
          , M. de Rijke,
          <article-title>Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval</article-title>
          ,
          <source>Information Retrieval</source>
          <volume>16</volume>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Jagerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Oosterhuis</surname>
          </string-name>
          , M. de Rijke,
          <article-title>To model or to intervene: A comparison of counterfactual and online learning to rank from user interactions</article-title>
          ,
          <source>in: SIGIR</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          , G. Zuccon,
          <article-title>Efective and privacy-preserving federated online learning to rank</article-title>
          , in: ICTIR'
          <fpage>21</fpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Basu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Naidu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Muftuoglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mireshghallah</surname>
          </string-name>
          ,
          <article-title>Benchmarking diferential privacy and federated learning for bert models</article-title>
          ,
          <source>arXiv preprint arXiv:2106.13973</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jin</surname>
          </string-name>
          , L. Sun, Fedbert:
          <article-title>When federated learning meets pre-training</article-title>
          ,
          <source>TIST</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          , G. Zuccon,
          <article-title>Federated online learning to rank with evolution strategies: A reproducibility study</article-title>
          ,
          <source>in: ECIR'21</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Joachims</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Swaminathan</surname>
          </string-name>
          , T. Schnabel,
          <article-title>Unbiased learning-to-rank with biased feedback</article-title>
          ,
          <source>in: WSDM'17</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>781</fpage>
          -
          <lpage>789</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qiao</surname>
          </string-name>
          , G. Zuccon,
          <article-title>Reinforcement online learning to rank with unbiased reward shaping</article-title>
          ,
          <source>Information Retrieval</source>
          (
          <year>2022</year>
          (To Appear).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>