<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluating Diferential Privacy Approaches for Query Obfuscation in Information Retrieval⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Discussion Paper</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guglielmo Faggioli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Ferro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Padova</institution>
          ,
          <addr-line>Padova</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Protecting the privacy of a user while they interact with an Information Retrieval (IR) system is crucial. This becomes more challenging when the IR system is not cooperative in satisfying the user's privacy needs. Recent advancements in Natural Language Processing (NLP) have demonstrated Diferential Privacy's (DP) efectiveness in safeguarding text privacy for tasks like spam detection and sentiment analysis, even under the assumption of a non-cooperative system. Our investigation explores if DP methods, originally designed for specific NLP tasks, can efectively obscure queries in IR. Our analyses show that using the Vickrey DP mechanism, employing the Mahalanobis norm with a privacy budget ranging from  = 10 to 12.5, provides cutting-edge privacy protection and enhances efectiveness. Unlike previous methods, DP allows users to fine-tune their desired level of privacy by adjusting the privacy budget  . This flexibility ofers a balance between how efective the system is and how much privacy is maintained, unlike the more rigid nature of previous approaches.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        the randomness of the mechanism. DP is particularly efective in the NLP domain. A line of
research [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ] operationalizes DP to release text by obfuscating each word individually. Such
mechanisms work as follows: i) each word in the text is mapped to a non-contextual embedding
space; ii) the embeddings are perturbed with noise drawn from a specific distribution; iii) each
word is replaced with the word closest to the noisy embedding. A major advantage of DP
is that it allows setting the privacy budget based on the needs of the user. This is diferent
from current obfuscation mechanisms in IR, which are either active or not and cannot be
tuned based on the user’s needs. In this work, we focus on three of DP mechanisms: the
Calibrated Multivariate Perturbation (CMP) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the Mahalanobis [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and the Vickrey [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. These
approaches were originally devised and tested for NLP tasks that include text classification
and sentiment analysis. We assume the IR system to not preserve user privacy, and to possibly
be malicious. In our use case, users are the ones concerned about their privacy. They do not
want to reveal their real information needs and prefer to transmit obfuscated queries to the IR
system while still retrieving relevant documents. Therefore, to operationalize our mechanism,
we assume each user to locally obfuscate their query and transmit the obfuscated query, or
possibly multiple queries, to the IR system instead of their real query. Our goal is to determine
if the DP mechanisms introduced above can successfully obfuscate users’ information needs
while still retrieving relevant documents.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Approaches</title>
      <p>
        All the approaches described in this work are based on a relaxation of classical DP, called
Metric-DP. To achieve traditional DP in a metric space, an obfuscation mechanism should
have an equal probability of obfuscating any pair of points as the same point, irrespective of
their distance. While this grants the highest level of privacy, it also requires high levels of
noise, decreasing the utility of the data. In the case of metric spaces, it is often suficient if
the probability of obfuscating two points with the same one is proportional to the distance
between the two points. Alternatively, the proportion of sampling a certain noise is inversely
proportional to the norm of the noise itself. To this end, a relaxation of DP, called Metric-DP,
has been introduced. Metric-DP [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8, 9, 10</xref>
        ] is defined as follows: given a privacy budget  and
a distance measure  : R × R → [0, ∞), a randomized mechanism ℳ : R → R defined
over a geometric space is Metric-DP if, for any three points in the space , ′, ˆ ∈ R, the
following holds: {{ℳℳ((′))==^^}} ≤ exp( {, ′}) If the {, ′} is small,  and ′ are more
likely to be obfuscated with the same point. Vice-versa, far apart points might be obfuscated
with diferent points, without violating privacy constraints.
      </p>
      <p>
        We describe here the three major DP eforts for obfuscating text in the NLP scenario, which
we evaluate for the IR task. More in detail, these approaches take as input a sequence of
words. Each word is mapped into a non-contextual embedding, such as GloVe [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Then,
the embedding is obfuscated by adding some appositely sampled noise to it. To ensure that
Metric-DP is achieved, the noise vector  is expected to be sampled from a distribution  such
that the probability of observing  is  () ∝ exp(−  ||||), i.e., the probability of sampling a
noise with norm |||| is inversely proportional to ||||. Finally, the closest word to the noisy
embedding is used to obfuscate the corresponding word in the original text. We propose to use
these approaches in the IR scenario to perturb the queries instead of the documents, as done for
NLP tasks.
      </p>
      <p>
        The Calibrated Multivariate Perturbation (CMP) mechanism, defined by Feyisetan et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], is
based on sampling a noise vector for each term in the query following an n-dimensional Laplace
distribution. Such sampling works by sampling two vectors: i) an n-dimensional unitary vector
 ∈ R that represents the direction of the perturbation. ii) the radius of the perturbation
 ∈ R+ is sampled from a Gamma distribution. To sample , a vector  ∈ R is sampled
from a multivariate normal distribution, with location 0 and identity covariance matrix I:
 ∼  (0, I). Then  = /|| ||2. The radius  of the noise is sampled from a Gamma
distribution with shape  and scale 1 as  ∼ (, 1 ). It is possible to observe that, the
larger the privacy requirement, i.e., the smaller the  , the bigger the noise. The noise  is defined
as  =  · . To perturb a word , the noise vector  is added to the original word embedding
() ∈ R, and the word closest to the noisy word embedding is used as obfuscation. Feyisetan
et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] demonstrate that for any word sequence  of length  ≥ 1 and any  &gt; 0, CMP
satisfies  -privacy with respect to , where  is the Euclidean distance.
      </p>
      <p>
        The second mechanism investigated is the Mahalanobis (Mhl) mechanism. Xu et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] noticed
how the perturbation induced by CMP mechanism tends to be weak, especially for high  . They
hypothesize that sampling the direction of the perturbation on a circumference (||||2 = 1)
increases the risk of sampling a point on an empty region. Therefore, Xu et al. adapt the CMP
mechanism by transforming the direction of the noise from a circumference to an ellipsis whose
orientation can be set to be towards the other embeddings. To do so, it is necessary to modify
the sampling mechanism, so that, instead of sampling  such that ||||2 = 1,  is sampled so
that |||| = 1 where || · ||  is the Mahalanobis norm. To ensure that the noise  is sampled
such that its probability distribution is  () ∝ exp(− |||| ) a vector  is sampled from the
multivariate normal distribution  ∼  (0, ). Then,  is such that  = Σ 1/2 · (/|| ||2),
where Σ ∈ R×  is the covariance matrix of all the word embeddings. This forces the noise
towards more populated areas. The sampling of the norm of the noise  is the same as for CMP.
      </p>
      <p>
        Finally, we investigate the Vickrey (Vkr) mechanism. The Mhl still tends to obfuscate a word
with itself for large  . To reduce the probability of masking a token with itself, Xu et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
define the Vickrey DP mechanism (we refer to it as Vkr). Vkr is based on two steps. In the
ifrst step, a noisy vector is sampled using any of the mechanisms described above: we can
instantiate Vkr with either Mhl mechanism (Vkrℎ) or the CMP mechanism (Vkr ). In
the second step, with probability   the word corresponding to the closest embedding to the
noisy vector is used as the obfuscation word. Vice versa, with probability 1 −   the word
corresponding to the second closest embedding is used as obfuscation. The probability   is
(1− )||(2)− ^||2
defined as  (, ˆ) = ||(1)− ^||2+(1− )||(2)− ^||2 , where (1) and (2) are respectively the
closest and second closest word embeddings to ˆ, the perturbed embedding of , and  is an
additional free parameter. We set  = 0.75, being the best performing [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Evaluation</title>
      <p>
        We consider two diferent collections TREC Robust ‘04 and TREC Deep Learning (DL ‘19). As
word embeddings, we used GloVe [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] with 300 dimensions trained on the Common Crawl.
Average MiniLM sentence similarity between the original query and 20 obfuscation queries generated with diferent approaches.
DP Vkr mechanism with  ∈ [
        <xref ref-type="bibr" rid="ref10 ref5">5, 10</xref>
        ] or a Vkrℎ with  ∈ [10, 12.5] for the Robust ‘04, and
DP Vkr and a Vkrℎ mechanism with  ∈ [
        <xref ref-type="bibr" rid="ref10 ref5">5, 10</xref>
        ] for the DL ‘19. The privacy achieved
by AED can be achieved with  in the range [10; 12.5] by CMP and Mhl on both collections. 
values that grant a comparable level of privacy are much higher for Vkr-based mechanisms,
especially Vkrℎ, on both collections – this means that the Vkr mechanisms are substantially
more secure from a privacy perspective.
      </p>
      <p>As both CMP and Mhl are less efective from a privacy perspective, we focus the following
analyses on the Vkr mechanism, with  ∈ {10, 12.5, 15}. More in detail, we compare these DP
mechanisms with AED and FSH, based on three axes: i) the obfuscation; ii) the pooled recall;
iii) the nDCG@10 observed if we re-rank the documents pooled by the obfuscation queries.
We define the</p>
      <p>
        obfuscation as 1 minus the similarity of the original query and the obfuscated
one. The pooled recall is obtained by transmitting to the IR system 20 obfuscated queries: for
each ranked list in response to an obfuscated query, we select the first 100 documents and
merge all the sets of documents obtained. We compute the recall on this new set of documents.
Finally, to compute nDCG@10, we rerank the pooled documents using a diferent IR model
(we use TAS-B to avoid biasing toward any IR model) and evaluate the quality of this ranked
list. For each approach, these measures are reported on a radar plot where, as a rule of thumb,
a larger area corresponds to more desirable results. Figure 1 reports the radar plots, showing
the performance of diferent obfuscation approaches over the three axes mentioned above. We
notice that the area corresponding to the AED approach (in red) is encompassed within the
area corresponding to Vkrℎ with  = 15 (green). In fact, on the Robust ‘04 collection, AED
VkrMhl =10
VkrMhl =12.5
VkrMhl =15
AED
FSH
achieves nDCG@10 of 0.410 and 0.424 for BM25 and Contriever respectively, recall of 0.420 and
0.419, and obfuscation of 0.513. Vice versa Vkrℎ with  = 15 obtains nDCG@10 of 0.416 and
0.431, recall of 0.493 and 0.462, and obfuscation of 0.618. The exception is DL ‘19 with Contriever
as the IR system, where AED has higher recall than Vkrℎ (0.497 against 0.418). Nevertheless,
this larger recall does not correspond to much larger nDCG@10, indicating that Vkrℎ is
preferable over AED, as it has comparable nDCG@10 (0.604 for Vkrℎ against 0.607 for AED),
with improved obfuscation (0.785 against 0.491). When it comes to FSH (purple), the behaviour
depends on the collection. In the DL ‘19, using Vkrℎ with  = 10 (blue) provides an edge
over FSH: they have comparable obfuscation (0.916 the former, 0.923 the latter), but Vkrℎ has
much larger nDCG@10 (0.254 compared to 0.064). On the Robust ‘04 collection, to observe an
improvement in terms of nDCG@10, it is necessary to use Vkrℎ with  = 12.5 (nDCG@10 of
0.349 and 0.355 for BM25 and Contriever respectively) to overcome FSH in terms of nDCG@10
(0.140 and 0.194). While Vkrℎ with  = 12.5 exhibits nDCG@10 performance slightly lower
than AED, it also has obfuscation (0.719) relatively close to FSH, which has obfuscation of
0.797, much closer than AED, with obfuscation 0.513. As a general guideline, we propose to use
Vkrℎ as the obfuscation mechanism, with  chosen in the interval [
        <xref ref-type="bibr" rid="ref10">10, 15</xref>
        ], depending on the
optimal trade-of between privacy and efectiveness, as chosen by the user.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and Future Work</title>
      <p>In this work, we analyzed for the first time the performance of three DP mechanisms, originally
designed for NLP, in the proxy query obfuscation IR task. We evaluated these mechanisms on
the IR setting by considering three aspects: their obfuscation capabilities, their efectiveness in
terms of recall, and their ability in allowing to retrieve highly relevant documents. Our findings
highlight that the Vickrey mechanism with  ∈ [10, 12.5] achieves higher privacy guarantees,
with improved efectiveness, than current state-of-the-art approaches. Furthermore, lower
or higher levels of  allow for better satisfy the user, either in terms of privacy or accuracy,
depending on their inclinations. As a future work, we plan to investigate how to perturb dense
representations of the queries and combine them with generative language models to produce
obfuscation queries with the same dense representation, but diferent terms.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Faggioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <article-title>Query Obfuscation for Information Retrieval through Diferential Privacy</article-title>
          , in: Procs.
          <source>of ECIR</source>
          <year>2024</year>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Arampatzis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Efraimidis</surname>
          </string-name>
          , G. Drosatos,
          <article-title>Enhancing deniability against query-logs</article-title>
          ,
          <source>in: Procs. of ECIR</source>
          <year>2011</year>
          ,
          <year>2011</year>
          , pp.
          <fpage>117</fpage>
          -
          <lpage>128</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. O.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <article-title>Eficient query obfuscation with keyqueries</article-title>
          ,
          <source>in: Procs. of WI-IAT '21</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>154</fpage>
          -
          <lpage>161</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <article-title>The algorithmic foundations of diferential privacy</article-title>
          ,
          <source>Found. Trends Theor. Comput. Sci. 9</source>
          (
          <year>2014</year>
          )
          <fpage>211</fpage>
          -
          <lpage>407</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>O.</given-names>
            <surname>Feyisetan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Balle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Drake</surname>
          </string-name>
          , T. Diethe,
          <article-title>Privacy- and utility-preserving textual analysis via calibrated multivariate perturbations</article-title>
          ,
          <source>in: Procs. of WSDM '20</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>178</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Feyisetan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Teissier</surname>
          </string-name>
          ,
          <article-title>A diferentially private text perturbation method using regularized mahalanobis metric</article-title>
          , in: Procs. of the Second Workshop on Privacy in NLP,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Feyisetan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Teissier</surname>
          </string-name>
          ,
          <article-title>On a utilitarian approach to privacy preserving text generation</article-title>
          ,
          <source>CoRR abs/2104</source>
          .11838 (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Andrés</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bordenabe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chatzikokolakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Palamidessi</surname>
          </string-name>
          ,
          <article-title>Geo-indistinguishability: diferential privacy for location-based systems</article-title>
          , in: A.
          <string-name>
            <surname>Sadeghi</surname>
            ,
            <given-names>V. D.</given-names>
          </string-name>
          <string-name>
            <surname>Gligor</surname>
          </string-name>
          , M. Yung (Eds.),
          <source>2013 ACM SIGSAC Conference on Computer and Communications Security, CCS'13</source>
          , Berlin, Germany, November 4-
          <issue>8</issue>
          ,
          <year>2013</year>
          , ACM,
          <year>2013</year>
          , pp.
          <fpage>901</fpage>
          -
          <lpage>914</lpage>
          . doi:
          <volume>10</volume>
          .1145/2508859. 2516735.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Chatzikokolakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Andrés</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bordenabe</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Palamidessi, Broadening the scope of diferential privacy using metrics</article-title>
          , in: E. D.
          <string-name>
            <surname>Cristofaro</surname>
          </string-name>
          , M. K. Wright (Eds.),
          <string-name>
            <surname>Privacy Enhancing</surname>
          </string-name>
          Technologies - 13th
          <source>International Symposium, PETS</source>
          <year>2013</year>
          ,
          <article-title>Bloomington</article-title>
          , IN, USA, July
          <volume>10</volume>
          -
          <issue>12</issue>
          ,
          <year>2013</year>
          . Proceedings, volume
          <volume>7981</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2013</year>
          , pp.
          <fpage>82</fpage>
          -
          <lpage>102</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -39077-
          <issue>7</issue>
          _5. doi:
          <volume>10</volume>
          . 1007/978-3-
          <fpage>642</fpage>
          -39077-7\_5.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Laud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pankova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pettai</surname>
          </string-name>
          ,
          <article-title>A framework of metrics for diferential privacy from local sensitivity</article-title>
          ,
          <source>Proc. Priv. Enhancing Technol</source>
          .
          <year>2020</year>
          (
          <year>2020</year>
          )
          <fpage>175</fpage>
          -
          <lpage>208</lpage>
          . doi:
          <volume>10</volume>
          .2478/ popets-2020-0023.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , Glove:
          <article-title>Global vectors for word representation</article-title>
          ,
          <source>in: Procs. of EMNLP</source>
          <year>2014</year>
          ,
          <year>2014</year>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Caron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hosseini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , E. Grave,
          <article-title>Unsupervised dense information retrieval with contrastive learning</article-title>
          ,
          <source>Trans. Mach. Learn. Res</source>
          . (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Zhou, MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers</article-title>
          , in: Procs. of NeurIPS '20,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>