<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Preserving Privacy When Processing User Queries in IR.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Doctoral Consortium Paper</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Luigi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>De Faveri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering, University of Padova</institution>
          ,
          <addr-line>Padova</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Data represents one of the most crucial assets of today's digital age. Privacy-preserving strategies play a crucial role in safeguarding the confidentiality of sensitive user data during the overall processing pipeline in Natural Language Processing (NLP) and Information Retrieval (IR) tasks. This paper presents an overview of obfuscation strategies and evaluation metrics employed to process users' textual information privately when interacting with IR systems, framing these solutions within the formal framework of -Diferential Privacy (DP). The methodologies and findings presented in this paper describe the author's preliminary studies in his current PhD activity.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Privacy Preserving Information Access</kwd>
        <kwd>Diferential Privacy</kwd>
        <kwd>Information Retrieval</kwd>
        <kwd>Information Security</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Data has become one of the most valuable resources for researchers and industry in today’s digital
age. In such a scenario, an ever-growing amount of data for training, validation, and testing is needed
to enhance the performance of NLP and IR systems. This includes highly sensitive and personal
information, such as health records [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], financial situations [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and individual preferences [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], all
of which are used to refine and enhance models’ performance. For example, when a user interacts
with an IR system, like a search engine, the information need is formulated into a natural language
query. When the search engine processes such a query to retrieve relevant documents, confidential
information, like the motivations of the search and personal identifiers, e.g., social security number and
other personal attributes for ego-surfing [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], can be extracted and analysed [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ], thus presenting the
user with the dilemma of exchanging personal information in order to retrieve relevant ones.
      </p>
      <p>
        Recent works in NLP and IR [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8, 9, 10</xref>
        ] have shown the potentialities of applying the formal
Diferential Privacy ( DP) framework [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to provide privacy guarantees to textual data employing
obfuscation mechanisms. In this context, an obfuscation mechanism is an algorithm that, upon receiving
a text as input, randomly produces another text composed of diferent words as output. In -DP, the
number of changed words and the semantic relations with the original texts depend on the  value, which
sets the statistical noise used during the output computation. However, introducing -DP mechanisms
to obfuscate the real meaning of texts poses some open research challenges. State-of-the-Art DP
mechanisms do not guarantee that given a term  is changed in loose privacy regimes, i.e., high values
of . In addition, standard evaluation measures pivot the analysis on varying the formal privacy budget
, leading to extreme cases where a low , i.e., a strong privacy setting, may result in preserving the
original text, thus giving a false perception of the privacy granted [12].
      </p>
      <p>In this paper, submitted to the Doctoral Consortium track, the author reports the methodology
proposed in previous works [13, 14, 15, 16] to address the above open challenges, providing robust
privacy guarantees for texts proposing an obfuscation mechanism based on the -DP framework which
ensures removing original words from the obfuscated output produced. Moreover, to address the
problem of measuring actual privacy beyond the formal privacy budget , we report the privacy analysis
in an adversarial scenario where the attacker exploits a public query log to infer the original query.</p>
      <p>The paper is structured as follows: Section 2 presents the related work on providing and measuring
text privacy, also presenting the query obfuscation protocol in IR. Section 3 explains the methodology
used to ensure privacy for textual queries and the method proposed to evaluate actual privacy. Finally,
Section 4 outlines the findings of prior studies, and Section 5 concludes, highlighting open challenges.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work and Background</title>
      <p>
        Obfuscating Texts with -DP. -DP framework was introduced by Dwork et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to formalize
the privacy guarantees when releasing data publicly. Given a privacy budget  ∈ R+, and any pair of
neighbouring datasets , ′, i.e., datasets that difer for only one entry, an obfuscation mechanism ℳ
is DP if it holds the inequality Pr [ℳ() ∈ ] ≤  · Pr [ℳ(′) ∈ ] ∀ ⊂ Im(ℳ). DP introduces
calibrated noise levels during output computation using the privacy budget , which controls the balance
between data privacy and utility. The adoption of the DP framework for metric spaces, and therefore for
NLP tasks, has been proposed in [17]. Metric-DP extends the traditional DP definition by ensuring that
the probability of obfuscating two distinct points , ′ is proportional to the distance (, ′) between
them. The DP framework has enabled the privacy research community to develop two main obfuscation
strategies: either based on leveraging noisy embeddings or randomly sampling a new obfuscation term.
The former approaches involve introducing statistical noise into text term embeddings based on the 
budget like in the Cumulative Multivariate Perturbation (CMP), Mahalanobis (Mhl), and their respective
Vickrey-based variant mechanisms [18, 19, 20]. The latter employs random sampling to select a term as
the obfuscated text, like in the Custumized Text (CusText), Sanitization Text (SanText), and Truncated
Exponential (TEM) mechanisms [21, 22, 23]. For full details, we recommend the original papers.
Measuring Privacy. Wagner and Eckhof [ 24] systematically classified over eighty privacy metrics,
ofering a comprehensive framework for assessing privacy across diferent domains, e.g., communication,
databases, and social networks. The work proposes specific aspects of privacy that a metric aims to
quantify, suggesting nine guiding questions for selecting the appropriate privacy measures. Specifically,
the authors underlined the importance of considering the adversary’s knowledge and capability when
evaluating privacy. In addition, Sousa and Kern [25] described how diferent mechanisms developed
for NLP tasks provide privacy for textual data with Habernal [26] stressing the importance of not
relying strictly on formal analysis of DP in its application on NLP, encouraging research towards new
privacy metrics. Traditional privacy measures focus on calculating the failure rates of obfuscation
mechanisms [27] or assessing the similarities between original and obfuscated texts [
        <xref ref-type="bibr" rid="ref9">28, 9</xref>
        ]. Uncertainty
measures such as  and  [18, 19] estimate the probability that a term  remains unchanged
after obfuscation and the minimum cardinality of the set of words to which the mechanism maps ,
respectively. The similarity between the input and output texts is commonly estimated using metrics
like the Jaccard or cosine similarity between sentence embeddings computed by a Transformer.
The Query Obfuscation Protocol in IR. Figure 1 reports a high-level view of the query obfuscation
protocol, considering two distinct sides: one for the user (“Safe Side”) and one for the IR system (“Unsafe
Side”). On the user side, the original query is formulated considering the User information need and
privatized using an obfuscation mechanism, i.e., an algorithm that, given an original sensitive query,
generates diferent non-sensitive obfuscated queries that (theoretically) prevent the unveiling of the
original information need and still can retrieve relevant documents from the system for the user, without
explicitly disclosing their information need. On the IR system side, documents are retrieved considering
the queries received. If the obfuscation has been correctly performed, relevant documents to the user’s
original query are placed at a lower rank in the resultant document list (yellow documents in Figure 1),
thus masking the actual intentions of the user. Once the list returns to the user, the latter can privately
use its original query to re-rank the documents, placing the correct relevant ones first in the final
list. The scenario studied works under the assumption of an IR system that does not collaborate to
e
d
i
S
e
f
a
S
ifeedS QuerAytItnafcekrence
a
s
n
U
Query
      </p>
      <p>Obfuscation</p>
      <p>Mechanism
Obfuscated Queries</p>
      <p>Document</p>
      <p>Reranking
Query Log</p>
      <p>Information
Retrieval System</p>
      <p>Retrieved Documents
protect the privacy of the received user query. Therefore, the user is willing to renounce part of the
efectiveness of the search to protect his privacy.</p>
      <p>A final remark to consider is the use of cryptographic protocols, such as Private Information Retrieval
(PIR) protocols, to ensure privacy when interacting with IR systems. This approach introduces open
challenges and limitations considering the higher computational demand and time needed to retrieve
relevant information from the systems [29, 30]. However, implementing PIR protocols can be seen as
complementary to query obfuscation protocols. While query obfuscation focuses on concealing the
user’s true intent and altering the original query sent to the system, PIR protocols can interact with the
system’s index, ensuring that the documents are retrieved without revealing sensitive information.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Obfuscating a Text: The Words Blending Boxes (WBB) -DP mechanism.</title>
        <p>Current state-of-the-art obfuscation mechanisms either ensure the privacy of obfuscated queries by
providing formal privacy guarantees under the DP framework or account for the presence of synonyms
and holonyms. The WBB mechanism [13] addresses the limitations of these approaches by integrating
both strategies. Specifically, the mechanism ensures that the top-  most semantically similar words–i.e.,
synonyms and holonyms closely positioned to the original term in the embedding space–are excluded
from the obfuscation process. Instead, it selects the  words that are similar but do not belong to the
top- set as obfuscation candidates. The final obfuscation term is then sampled according to the DP
exponential mechanism [31], which defines the selection probability based on the privacy parameter .</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Evaluating Privacy: The Query Inference for Privacy and Utility (QuIPU) Score.</title>
        <p>Traditional methods (see Section 2) often rely on theoretical privacy guarantees, such as those provided
by the  in DP, which may not accurately reflect the real-world privacy risks associated with obfuscated
queries. The QuIPU score [15] addresses this gap by assessing the extent to which an obfuscated query
hides the user’s original intent from potential adversaries. Specifically, the score evaluates diferent
obfuscation strategies by examining both the utility of the obfuscated query in performing the intended
task and the risk of re-identification. The computation of risk probabilities in the QuIPU framework is
grounded in assessing the efectiveness of adversarial strategies that attempt to reverse engineer the
original user intent using a Transformer model to cluster obfuscated queries and an available query log.
The probability of successfully reconstructing the original query is computed based on its rank among
the most similar queries within the log, following the adversary’s clustering of the obfuscated queries.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Preliminary Experimental Findings</title>
      <p>
        The mechanisms based on -DP are tested on TREC collections using the Python package
ir_datasets1. Specifically, we used the TREC Deep Learning‘19 [ 32] (DL‘19) and Deep
Learning‘20 [33] (DL‘20) collections, thus considering 43 and 54 queries. In addition, to understand the
impact on a diferent distribution of the queries, we also employed the obfuscations on the TREC
Robust collection [34] (Robust ‘04), containing 250 queries. For each privacy setting of the mechanisms,
i.e.,  ∈ {1, 5, 10, 15, 20, 25, 30, 50}, each query produces 20 diferent variant obfuscations, as done
in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. To generate such obfuscations and measure the privacy guarantees provided, we employed the
pyPANTERA framework [14], leaving as default vocabulary the words and embeddings 300-d from
GloVe [35]. Moreover, to compute the QuIPU score, we analysed the scenarios described in [15] of
three diferent attackers, i.e., Lazy-Active-Motivated, using as query log the AOL-dataset2. To avoid
encumbering, we report the performance analysis only on the DL‘19, using as IR system the Contriever
model [36] for the retrieval and reranking. We refer to the original papers [13, 14, 15] for the full results.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Performance Analysis</title>
        <p>Evaluating obfuscation mechanisms, Table 1, across diferent privacy budgets  reveals a clear
tradeof between formal privacy and utility gained by the user during the retrieval pipeline, measured
as Precision (P) and normalized Discounted Cumulative Gain (nDCG) at cut-of point 10. At low 
values, the obfuscation is performed in a strong privacy regime, reducing performance in both ranking
metrics for all the mechanisms analysed, except for CusText and WBB. Among the tested mechanisms,
embedding-based methods show significant improvements as  increases, achieving stable performance
at higher  values. On the other hand, sampling-based mechanisms ofer diferent behaviours, with
TEM maintaining consistently high performance across all privacy budgets. Generally speaking, for the
 considered, the sampling mechanisms are not influenced by the formal parameter  above 5.</p>
        <p>These experiments demonstrate that ranking performance deteriorates significantly under stringent
privacy constraints (i.e., low ). Moreover, utility improves as privacy constraints decrease (i.e., high
), with most mechanisms achieving utility levels comparable to non-private settings. Furthermore,
another insight is related to the obfuscation strategy: sampling-based mechanisms achieve higher
performance at lower , while noisy embedding methods require higher  values to reach saturation.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Privacy Analysis</title>
        <p>Table 2 compares two diferent aspects of privacy. The average failure rate  of the mechanism
ℳ assesses the probability that a term  is mapped to itself over  obfuscations, with higher values
indicating weaker privacy. Conversely, the QuIPU score measures how well the mechanism resists a
query inference attack [15] from diferent attackers. The higher the score, the better the resistance.
1https://ir-datasets.com/
2https://ir-datasets.com/aol-ia.html</p>
        <p>Noisy embedding-based methods such as CMP and Mhl show a gradual loss of privacy as  increases,
while VickreyCMP and VickreyMhl maintain lower  values, indicating more robust privacy
guarantees. Sampling-based methods exhibit a diferent behaviour: CusText, SanText, and TEM rapidly
lose their obfuscation capability, reaching  = 1.00 for relatively low . WBB, in contrast, preserves
complete privacy with  = 0 across all budgets by design: the original word is always changed.</p>
        <p>The QuIPU scores demonstrate the robustness of these mechanisms against diferent modelizations of
the attackers [15]. Vickrey-based embedding methods ofer better resistance among all the mechanisms
studied, while the sampling-based methods, particularly SanText and TEM, do not perform well in all
the adversarial settings. WBB provides a null QuIPU score, which means an equal performance-utility
trade-of. Future research is needed to improve the robustness against the query inference attack.</p>
        <p>In conclusion, WBB and Vickrey-based embeddings are more suitable for scenarios requiring stringent
privacy guarantees. In contrast, CMP and Mhl obfuscations provide a more balanced trade-of between
privacy and utility. Finally, sampling-based approaches indicate lower efectiveness in adversarial
environments, considering their probability of failure and resilience against inference attacks.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>The paper presented the privacy problems faced when providing privacy to textual data in the author’s
ifrst studies during his initial works during the PhD studies. The paper shows the methodology adopted
to evaluate the privacy provided to the texts analysed and the strategies adopted to assess the privacy
guarantees obtained. Possible Open Research Discussions (RD1-3) that will be proposed during the
Doctoral Consortium session are formulated as follows:
RD1. Current privacy-preserving obfuscation techniques often operate independently of the underlying
IR models. How can obfuscation methods be optimized to leverage the characteristics of specific
retrieval models while maintaining formal privacy guarantees?
RD2. The trade-of between privacy and utility in obfuscated queries remains a critical challenge for
the WBB mechanism. Can we design adaptive obfuscation mechanisms to dynamically balance
privacy and retrieval efectiveness based on user needs and system constraints?
RD3. The efectiveness of privacy-preserving obfuscation methods can vary depending on the structure
and semantics of diferent query types. How can obfuscation techniques be adapted to diferent
query characteristics while ensuring consistent privacy guarantees? Can we adapt the obfuscation
to domain-specific sensitive context like health scenarios?</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Grammarly for Readability and Spelling
checks. After using this tool, the author(s) reviewed and edited the content as needed and take(s) full
responsibility for the publication’s content.
analysis, in: S. Halevi, T. Rabin (Eds.), Theory of Cryptography, Springer Berlin Heidelberg, Berlin,
Heidelberg, 2006, pp. 265–284.
[12] A. Blanco-Justicia, D. Sánchez, J. Domingo-Ferrer, K. Muralidhar, A critical review on the use (and
misuse) of diferential privacy in machine learning, ACM Comput. Surv. 55 (2023) 160:1–160:16.</p>
      <p>URL: https://doi.org/10.1145/3547139. doi:10.1145/3547139.
[13] F. L. De Faveri, G. Faggioli, N. Ferro, Words blending boxes. obfuscating queries in information
retrieval using diferential privacy, CoRR abs/2405.09306 (2024). URL: https://doi.org/10.48550/
arXiv.2405.09306. doi:10.48550/ARXIV.2405.09306. arXiv:2405.09306.
[14] F. L. De Faveri, G. Faggioli, N. Ferro, pyPANTERA: A python package for natural language
obfuscation enforcing privacy &amp; anonymization, in: E. Serra, F. Spezzano (Eds.), Proceedings
of the 33rd ACM International Conference on Information and Knowledge Management, CIKM
2024, Boise, ID, USA, October 21-25, 2024, ACM, 2024, pp. 5348–5353. URL: https://doi.org/10.1145/
3627673.3679173. doi:10.1145/3627673.3679173.
[15] F. L. De Faveri, G. Faggioli, N. Ferro, Measuring actual privacy of obfuscated queries in
information retrieval, in: Proceedings of the 47th European Conference on Information
Retrieval, Lucca, Italy, 2025. URL: https://www.dei.unipd.it/~defaverifr/papers/25_ECIR_DFF_
MeasuringActualPrivacyQuIPU_CameraReady.pdf.
[16] F. L. De Faveri, G. Faggioli, N. Ferro, A comparative study of large language models and
traditional privacy measures to evaluate query obfuscation approaches, in: Proceedings of the 48th
International ACM SIGIR Conference on Research and Development in Information Retrieval,
SIGIR ’25, Association for Computing Machinery, New York, NY, USA, 2025, p. 2711–2716. URL:
https://doi.org/10.1145/3726302.3730158. doi:10.1145/3726302.3730158.
[17] K. Chatzikokolakis, M. E. Andrés, N. E. Bordenabe, C. Palamidessi, Broadening the scope of
diferential privacy using metrics, in: E. D. Cristofaro, M. K. Wright (Eds.), Privacy Enhancing
Technologies - 13th International Symposium, PETS 2013, Bloomington, IN, USA, July 10-12, 2013.
Proceedings, volume 7981 of Lecture Notes in Computer Science, Springer, 2013, pp. 82–102. URL:
https://doi.org/10.1007/978-3-642-39077-7_5. doi:10.1007/978-3-642-39077-7\_5.
[18] O. Feyisetan, B. Balle, T. Drake, T. Diethe, Privacy- and utility-preserving textual analysis via
calibrated multivariate perturbations, in: J. Caverlee, X. B. Hu, M. Lalmas, W. Wang (Eds.),
Proceedings of the 13th International Conference on Web Search and Data Mining, ACM, 2020, pp.
178–186. doi:10.1145/3336191.3371856.
[19] Z. Xu, A. Aggarwal, O. Feyisetan, N. Teissier, A diferentially private text perturbation method
using regularized mahalanobis metric, in: Proceedings of the Second Workshop on Privacy in NLP,
Association for Computational Linguistics, 2020. doi:10.18653/v1/2020.privatenlp-1.2.
[20] Z. Xu, A. Aggarwal, O. Feyisetan, N. Teissier, On a utilitarian approach to privacy
preserving text generation, CoRR abs/2104.11838 (2021). doi:10.48550/ARXIV.2104.11838.
arXiv:2104.11838.
[21] S. Chen, F. Mo, Y. Wang, C. Chen, J.-Y. Nie, C. Wang, J. Cui, A customized text sanitization
mechanism with diferential privacy, in: A. Rogers, J. Boyd-Graber, N. Okazaki (Eds.), Findings
of the Association for Computational Linguistics: ACL 2023, Association for Computational
Linguistics, Toronto, Canada, 2023, pp. 5747–5758. URL: https://aclanthology.org/2023.findings-acl.
355. doi:10.18653/v1/2023.findings-acl.355.
[22] X. Yue, M. Du, T. Wang, Y. Li, H. Sun, S. S. M. Chow, Diferential privacy for text analytics via
natural text sanitization, in: C. Zong, F. Xia, W. Li, R. Navigli (Eds.), Findings of the Association
for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics,
Online, 2021, pp. 3853–3866. URL: https://aclanthology.org/2021.findings-acl.337. doi: 10.18653/
v1/2021.findings-acl.337.
[23] R. S. Carvalho, T. Vasiloudis, O. Feyisetan, K. Wang, TEM: high utility metric diferential privacy
on text, in: S. Shekhar, Z. Zhou, Y. Chiang, G. Stiglic (Eds.), Proceedings of the 2023 SIAM
International Conference on Data Mining, SDM 2023, Minneapolis-St. Paul Twin Cities, MN, USA,
April 27-29, 2023, SIAM, 2023, pp. 883–890. URL: https://doi.org/10.1137/1.9781611977653.ch99.
doi:10.1137/1.9781611977653.CH99.
[24] I. Wagner, D. Eckhof, Technical privacy metrics: A systematic survey, ACM Comput. Surv. 51
(2018) 57:1–57:38. URL: https://doi.org/10.1145/3168389. doi:10.1145/3168389.
[25] S. Sousa, R. Kern, How to keep text private? A systematic review of deep learning methods
for privacy-preserving natural language processing, Artif. Intell. Rev. 56 (2023) 1427–1492. URL:
https://doi.org/10.1007/s10462-022-10204-6. doi:10.1007/S10462-022-10204-6.
[26] I. Habernal, When diferential privacy meets NLP: the devil is in the detail, in: M. Moens,
X. Huang, L. Specia, S. W. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods
in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic,
7-11 November, 2021, Association for Computational Linguistics, 2021, pp. 1522–1528. URL: https:
//doi.org/10.18653/v1/2021.emnlp-main.114. doi:10.18653/V1/2021.EMNLP-MAIN.114.
[27] S. Clauß, S. Schifner, Structuring anonymity metrics, in: A. Juels, M. Winslett, A. Goto
(Eds.), Proceedings of the 2006 Workshop on Digital Identity Management, Alexandria, VA,
USA, November 3, 2006, ACM, 2006, pp. 55–62. URL: https://doi.org/10.1145/1179529.1179539.
doi:10.1145/1179529.1179539.
[28] S. J. Meisenbacher, N. Nandakumar, A. Klymenko, F. Matthes, A comparative analysis of word-level
metric diferential privacy: Benchmarking the privacy-utility trade-of, in: N. Calzolari, M. Kan,
V. Hoste, A. Lenci, S. Sakti, N. Xue (Eds.), Proceedings of the 2024 Joint International Conference
on Computational Linguistics, Language Resources and Evaluation, LREC/COLING 2024, 20-25
May, 2024, Torino, Italy, ELRA and ICCL, 2024, pp. 174–185. URL: https://aclanthology.org/2024.
lrec-main.16.
[29] H. Seo, H. Lee, W. Choi, Fundamental limits of private information retrieval with unknown cache
prefetching, IEEE Transactions on Communications 69 (2021) 8132–8144. doi:10.1109/TCOMM.
2021.3117936.
[30] G. Persiano, K. Yeo, Limits of preprocessing for single-server PIR, in: J. S. Naor, N. Buchbinder
(Eds.), Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022,
Virtual Conference / Alexandria, VA, USA, January 9 - 12, 2022, SIAM, 2022, pp. 2522–2548. URL:
https://doi.org/10.1137/1.9781611977073.99. doi:10.1137/1.9781611977073.99.
[31] F. McSherry, K. Talwar, Mechanism design via diferential privacy, in: 48th Annual IEEE
Symposium on Foundations of Computer Science (FOCS 2007), October 20-23, 2007, Providence, RI, USA,
Proceedings, IEEE Computer Society, 2007, pp. 94–103. URL: https://doi.org/10.1109/FOCS.2007.41.
doi:10.1109/FOCS.2007.41.
[32] N. Craswell, B. Mitra, E. Yilmaz, D. Campos, E. M. Voorhees, Overview of the TREC
2019 deep learning track, CoRR abs/2003.07820 (2020). URL: https://arxiv.org/abs/2003.07820.
arXiv:2003.07820.
[33] N. Craswell, B. Mitra, E. Yilmaz, D. Campos, Overview of the TREC 2020 deep learning track,
in: E. M. Voorhees, A. Ellis (Eds.), Proceedings of the Twenty-Ninth Text REtrieval Conference,
TREC 2020, Virtual Event [Gaithersburg, Maryland, USA], November 16-20, 2020, volume 1266
of NIST Special Publication, National Institute of Standards and Technology (NIST), 2020. URL:
https://trec.nist.gov/pubs/trec29/papers/OVERVIEW.DL.pdf.
[34] E. M. Voorhees, Overview of the TREC 2004 robust track, in: E. M. Voorhees, L. P. Buckland (Eds.),
Proceedings of the Thirteenth Text REtrieval Conference, TREC 2004, Gaithersburg, Maryland,
USA, November 16-19, 2004, volume 500-261 of NIST Special Publication, National Institute of
Standards and Technology (NIST), 2004. URL: http://trec.nist.gov/pubs/trec13/papers/ROBUST.</p>
      <p>OVERVIEW.pdf.
[35] J. Pennington, R. Socher, C. D. Manning, Glove: Global Vectors for Word Representation, in:
A. Moschitti, B. Pang, W. Daelemans (Eds.), Proceedings of the 2014 Conference on Empirical
Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A
meeting of SIGDAT, a Special Interest Group of the ACL, ACL, 2014, pp. 1532–1543. URL: https:
//doi.org/10.3115/v1/d14-1162. doi:10.3115/v1/d14-1162.
[36] G. Izacard, M. Caron, L. Hosseini, S. Riedel, P. Bojanowski, A. Joulin, E. Grave, Unsupervised
dense information retrieval with contrastive learning, Trans. Mach. Learn. Res. 2022 (2022). URL:
https://openreview.net/forum?id=jKN1pXi7b0.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Martínez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ruiz-Martínez</surname>
          </string-name>
          ,
          <article-title>A comprehensive review of the state-of-the-art on security and privacy issues in healthcare</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <volume>249</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>249</lpage>
          :
          <fpage>38</fpage>
          . URL: https://doi.org/10.1145/3571156. doi:
          <volume>10</volume>
          .1145/3571156.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sawhney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Neerkaje</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Habernal</surname>
          </string-name>
          , L. Flek,
          <article-title>How much user context do we need? privacy by design in mental health NLP applications</article-title>
          , in: Y.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Cha</surname>
          </string-name>
          , D. Quercia (Eds.),
          <source>Proceedings of the Seventeenth International AAAI Conference on Web and Social Media</source>
          ,
          <string-name>
            <surname>ICWSM</surname>
          </string-name>
          <year>2023</year>
          , Limassol, Cyprus, June 5-8,
          <year>2023</year>
          , AAAI Press,
          <year>2023</year>
          , pp.
          <fpage>766</fpage>
          -
          <lpage>776</lpage>
          . URL: https://doi.org/10.1609/icwsm.v17i1. 22186. doi:
          <volume>10</volume>
          .1609/ICWSM.V17I1.22186.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>O.</given-names>
            <surname>Akanfe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Valecha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <article-title>Design of an inclusive financial privacy index (INF-PIE): A ifnancial privacy and digital financial inclusion perspective</article-title>
          ,
          <source>ACM Trans. Manag. Inf. Syst</source>
          .
          <volume>12</volume>
          (
          <year>2021</year>
          ) 7:
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          :
          <fpage>21</fpage>
          . URL: https://doi.org/10.1145/3403949. doi:
          <volume>10</volume>
          .1145/3403949.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cohn</surname>
          </string-name>
          ,
          <article-title>My tivo thinks i'm gay: Algorithmic culture and its discontents</article-title>
          ,
          <source>Television &amp; New Media</source>
          <volume>17</volume>
          (
          <year>2016</year>
          )
          <fpage>675</fpage>
          -
          <lpage>690</lpage>
          . URL: https://doi.org/10.1177/1527476416644978. doi:
          <volume>10</volume>
          .1177/ 1527476416644978. arXiv:https://doi.org/10.1177/1527476416644978.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W. U.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Rahman</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Topic model based privacy protection in personalized web search</article-title>
          , in: R.
          <string-name>
            <surname>Perego</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Aslam</surname>
            ,
            <given-names>I. Ruthven</given-names>
          </string-name>
          , J. Zobel (Eds.),
          <source>Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval</source>
          ,
          <string-name>
            <surname>SIGIR</surname>
          </string-name>
          <year>2016</year>
          , Pisa, Italy,
          <source>July 17-21</source>
          ,
          <year>2016</year>
          , ACM,
          <year>2016</year>
          , pp.
          <fpage>1025</fpage>
          -
          <lpage>1028</lpage>
          . URL: https://doi.org/10.1145/ 2911451.2914753. doi:
          <volume>10</volume>
          .1145/2911451.2914753.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Poblete</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Spiliopoulou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Baeza-Yates</surname>
          </string-name>
          ,
          <article-title>Privacy-preserving query log mining for business confidentiality protection</article-title>
          ,
          <source>ACM Trans. Web</source>
          <volume>4</volume>
          (
          <year>2010</year>
          )
          <volume>10</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          :
          <fpage>26</fpage>
          . URL: https://doi.org/10.1145/ 1806916.1806919. doi:
          <volume>10</volume>
          .1145/1806916.1806919.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tigunova</surname>
          </string-name>
          ,
          <article-title>Extracting personal information from conversations</article-title>
          , in: A.
          <string-name>
            <given-names>E. F.</given-names>
            <surname>Seghrouchni</surname>
          </string-name>
          , G. Sukthankar, T. Liu, M. van Steen (Eds.),
          <source>Companion of The 2020 Web Conference</source>
          <year>2020</year>
          , Taipei, Taiwan,
          <source>April 20-24</source>
          ,
          <year>2020</year>
          , ACM / IW3C2,
          <year>2020</year>
          , pp.
          <fpage>284</fpage>
          -
          <lpage>288</lpage>
          . URL: https://doi.org/10.1145/3366424. 3382089. doi:
          <volume>10</volume>
          .1145/3366424.3382089.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>O.</given-names>
            <surname>Klymenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Meisenbacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Matthes</surname>
          </string-name>
          ,
          <article-title>Diferential privacy in natural language processing: The story so far</article-title>
          , in: O.
          <string-name>
            <surname>Feyisetan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ghanavati</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Thaine</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Habernal</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Mireshghallah</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the Fourth Workshop on Privacy in Natural Language Processing</source>
          , Association for Computational Linguistics, Seattle, United States,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .privatenlp-
          <volume>1</volume>
          .1/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .privatenlp-
          <volume>1</volume>
          .1.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Faggioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <article-title>Query obfuscation for information retrieval through diferential privacy</article-title>
          , in: N.
          <string-name>
            <surname>Goharian</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Tonellotto</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lipani</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
          </string-name>
          , I. Ounis (Eds.),
          <source>Advances in Information Retrieval - 46th European Conference on Information Retrieval</source>
          ,
          <string-name>
            <surname>ECIR</surname>
          </string-name>
          <year>2024</year>
          , Glasgow, UK, March
          <volume>24</volume>
          -28,
          <year>2024</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , volume
          <volume>14608</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2024</year>
          , pp.
          <fpage>278</fpage>
          -
          <lpage>294</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -56027-9_
          <fpage>17</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -56027-9\_
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Habernal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Diferentially private natural language models: Recent advances and future directions</article-title>
          , in: Y. Graham, M. Purver (Eds.),
          <source>Findings of the Association for Computational Linguistics: EACL</source>
          <year>2024</year>
          ,
          <article-title>St</article-title>
          .
          <source>Julian's, Malta, March</source>
          <volume>17</volume>
          -22,
          <year>2024</year>
          , Association for Computational Linguistics,
          <year>2024</year>
          , pp.
          <fpage>478</fpage>
          -
          <lpage>499</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .findings-eacl.
          <volume>33</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>McSherry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nissim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>Calibrating noise to sensitivity in private data</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>