<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SEBD</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Privacy of Textual Data: The pyPANTERA Package.⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Discussion Paper</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Luigi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>De Faveri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guglielmo</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Faggioli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ferro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering, University of Padova</institution>
          ,
          <addr-line>Padova</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>33</volume>
      <fpage>16</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>Privacy is an essential aspect to consider when processing sensitive textual information in Natural Language Processing (NLP) and Information Retrieval (IR) tasks. Private medical records, queries, online posts and reviews can contain sensitive information that can endanger the confidentiality of users' data. To address this privacy issue, the gold-standard framework employed to protect such sensitive information when dealing with textual sentences is the -Diferential Privacy ( DP) obfuscation framework. However, to implement, develop and test state-of-the-art mechanisms, there is a need for a unified framework for such new obfuscation mechanisms. pyPANTERA is designed as a modular, extensible library developed to enrich DP techniques, enabling the integration of new DP mechanisms and allowing reproducible comparison of the current mechanisms. The efectiveness of the pyPANTERA package is measured by applying it to sentiment analysis and query obfuscation protocols. The library's source code is available in the public repository at https://github.com/Kekkodf/pypantera.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Privacy Preserving Mechanisms</kwd>
        <kwd>Diferential Privacy</kwd>
        <kwd>NLP</kwd>
        <kwd>Information Retrieval</kwd>
        <kwd>Security</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Natural Language Processing (NLP) and Information Retrieval (IR) systems are commonly developed and
trained on textual data, e.g., queries, documents, and online posts, that contain sensitive and personal
user information. Such a processing of textual data can pose privacy risks to the safety of users. For
example, the queries a user submits to a search engine or the textual content that they can post on online
social networks can contain personally identifiable information, e.g., the name or address of the searcher
and details about the user’s private sphere, e.g., political views, sexual orientation, that might expose
them to blackmailing and cyber bullying [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or even endanger their safety in illiberal countries [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
Consequently, the privacy research community [
        <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8 ref9">5, 6, 7, 8, 9</xref>
        ] has stressed the importance of privacy for
textual data analysis proposing diferent strategies of textual obfuscation. Such privatization techniques
are based on the gold-standard definition of privacy, represented by -Diferential Privacy ( DP) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
The DP formal framework was introduced to provide users with the “Plausible Deniability” property,
i.e., the outcome of any analysis is statistically indistinguishable considering a given privacy budget
. A limitation within state-of-the-art obfuscation methodologies is that these approaches have been
evaluated across diferent tasks and datasets; nevertheless, they have not been structured within a
unified framework for text obfuscation in NLP and IR. Therefore, privacy practitioners can benefit
from a compact, modular, and adaptable framework that encourages the rapid design of novel DP
methodologies and permits a consistent and eficient evaluation against state-of-the-art techniques
across multiple experimental tasks.
      </p>
      <p>
        In this work, we discuss the pyPANTERA [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], an open-source unified, flexible and user-friendly
framework for DP mechanisms implementation and comparison. Moreover, we bring together
stateof-the-art mechanisms [
        <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14 ref15 ref16">11, 12, 13, 14, 15, 16</xref>
        ] based on the DP framework and used for NLP and IR
privacy tasks. pyPANTERA is structured into modules that implement diferent families of obfuscation
mechanisms, specifically sampling and embedding perturbation approaches, along with an evaluation
module used to assess privacy and text the empirical correctness of the mechanisms. The obfuscation
modules provide distinct interfaces for diferent mechanism families, ensuring the integration of new
algorithms alongside existing ones. On the other hand, the evaluation module enables practitioners
to assess the privacy of the obfuscated text by measuring the similarity between the original and
obfuscated sentences and testing the efectiveness of the obfuscation mechanisms implemented.
      </p>
      <p>Finally, we report the results of the implemented mechanisms, enforcing their use in real NLP and
IR tasks and proving that those findings are comparable to those found in the original mechanism
studies. This highlights the efectiveness of pyPANTERA as an important tool for privacy practitioners
to implement prospective obfuscation techniques and accurately replicate results from prior studies.
The code is open source under the GNU General Public License version 3.0 and publically available1.</p>
      <p>The paper is organized as follows: Section 2 describes other related works related to obfuscation
techniques and tools publically available; moreover, Section 3 illustrates the design of the Python
package, providing technical information about the resource, and finally Section 4 reports the results
obtained from the tasks performed to evaluate the overall obfuscation framework.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <sec id="sec-2-1">
        <title>2.1. Background and Diferential Privacy Approaches</title>
        <p>
          Formal privacy is mathematically guaranteed by the definition of -Diferential Privacy ( DP) introduced
by Dwork et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. A DP obfuscation mechanism ℳ is an algorithm that receives as input a text and
produces as output one or more noisy versions of the received input, regulating the amount of noise
provided depending on the parameter  ∈ R, called privacy budget of the mechanism. An obfuscation
mechanism ℳ satisfy -DP if and only if, for any pair of neighbouring datasets , ′, i.e., datasets
that difer for only one record, and given  &gt; 0, Equation 1 holds for all subsets  ⊆ Image(ℳ) .
        </p>
        <p>Pr{ℳ() ∈ } ≤ Pr{ℳ(′) ∈ }
(1)
Equation 1 grants the property of “plausible deniability” to the user: an adversary cannot confirm with
absolute certainty the specific input (the user’s original data) corresponding to a selected output.</p>
        <p>
          However, to provide this property to textual data, the original definition of -DP is extended to metric
spaces [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Once a text is encoded into a vector, Metric-DP [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] ensures that a randomized mechanism
ℳ : R → R defined over a geometric space with distance function  : R × R → R+ respects the
definition of DP, if, for any triplets of points , ′, ^ ∈ R, the inequality in Equation 2 is respected.
        </p>
        <p>Pr{ℳ() = ^} ≤ (,′)Pr{ℳ(′) = ^}
(2)</p>
        <p>
          Obfuscation mechanism based on -DP and -Metric DP for natural texts has gained strong interests
from the research and industry community [
          <xref ref-type="bibr" rid="ref18 ref5">18, 5</xref>
          ]. Specifically concerning these types of obfuscation
mechanisms, the common categorization is based on the nature of the obfuscation perturbation applied
to the texts. On the one hand, the mechanisms presented in [
          <xref ref-type="bibr" rid="ref11 ref12 ref13">11, 12, 13</xref>
          ] obfuscate the embeddings of the
terms within the sentence by adding statistical noise following the privacy budget . Conversely, the
mechanisms outlined in [
          <xref ref-type="bibr" rid="ref14 ref15 ref16">14, 15, 16</xref>
          ] rely on the initial computation of a score between word embeddings
to rank analogous terms, utilizing  to modify the probability of sampling the new words.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Diferential Privacy Resources</title>
        <p>
          Several endeavours are available to provide privacy to structured tabular data. Such libraries primarily
facilitate the implementation of private statistical interrogation and private machine learning pipelines,
such as the computation of Diferentially Private Stochastic Gradient Descent [
          <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
          ]. According
to the evaluation proposed in [
          <xref ref-type="bibr" rid="ref21 ref22">21, 22</xref>
          ], examples of such libraries include IBM Difprivlib [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], Meta
        </p>
        <sec id="sec-2-2-1">
          <title>1https://github.com/Kekkodf/pypantera.</title>
          <p>
            PyTorch Opacus [
            <xref ref-type="bibr" rid="ref24">24</xref>
            ], and Google TensorFlow Diferential Privacy [
            <xref ref-type="bibr" rid="ref25">25</xref>
            ] toolkit. Furthermore, built as a
forked project of Google TensorFlow Diferential Privacy and OpenDP [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ], OpenMined has released
PyDP [
            <xref ref-type="bibr" rid="ref27">27</xref>
            ], a wrapper library in Python used for aggregating sensitive statistics across tabular datasets.
          </p>
          <p>
            In NLP, text sanitization and anonymization are another privacy aspect. Text sanitization and
anonymization concern removing sensitive data from textual data by substituting them with placeholders
or censoring with random symbols. Microsoft Presidio [
            <xref ref-type="bibr" rid="ref28">28</xref>
            ] is constructed employing the SpaCy [29]
library and consists of an Analyzer and an Anonymizer, which are designed to detect and mask personally
identifiable information within a specified sentence. The Analyzer leverages regular expression rules
and Named Entity Recognition Machine Learning models supplied by SpaCy to identify sensitive terms
within the provided context. Thus, after the identification phase, Presidio employs the anonymization
module to obfuscate such information by redacting, hashing, or replacing the identified sensitive data,
generating an obfuscated censored version of the original text. Although Presidio and pyPANTERA
operate on textual data, they address distinct privacy considerations: data sanitization and obfuscation,
respectively. Therefore, these two resources can be considered complementary. In future work, we
intend to merge the functionalities of Presidio and pyPANTERA to integrate Presidio’s data identification
and sanitization capabilities with the semantic obfuscation features of the DP framework in pyPANTERA
within the obfuscation pipeline, thus designing an DP mechanism able to redact texts formally.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. pyPANTERA</title>
      <sec id="sec-3-1">
        <title>3.1. Obfuscation Pipeline</title>
        <p>Figure 1 reports the general obfuscation pipeline of how the text is processed using pyPANTERA. The
initial step involves the tokenization and parsing of the input text to eliminate punctuation while also
converting all capitalized letters within the sentence to lowercase. The Initialization Phase concludes
upon receiving the practitioner’s selected parameters necessary to initialize the chosen obfuscation
mechanism. The mechanisms available are either based on noisy embedding obfuscation strategies or
on the noisy sampling of the terms employed in the obfuscated text produced. After each term in the
sentence is finally obfuscated, all texts are reassembled in order to generate the user-required number of
obfuscation variants. Such obfuscation produced is either stored in a single text or a suitable data frame
and saved in a CSV file. Finally, the obfuscated versions of the texts can be used to perform the NLP
and IR tasks privately. Additionally, pyPANTERA ofers a module to assess the level of privacy granted.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Development Workflow</title>
        <p>3.2.1. Requirements and Initial Usage
pyPANTERA is developed in Python (version 3.10) and requires Python ≥ 3.7 as the minimum version.
Python was selected due to its accessibility, fast prototyping, and active user community. Moreover, as
the tasks for which the obfuscation mechanisms are implemented depend on deep learning methods,
ensuring rapid interoperability between obfuscation and the overall pipeline significantly enhances
the eficiency of conducting experiments within the framework. The library can be installed and used
in two ways: the first manner, i.e., the recommended one, is by cloning the repository of the resource
available in GitHub2. The reason for the cloning is to ensure the last version of the mechanisms and
methods. In addition, the README provides detailed instructions for setting up the virtual environment
for conducting obfuscation and analysis. Alternatively, pyPANTERA can be installed using pip to
download the package from PyPI3, using the command pip install pypantera.</p>
        <p>One of the advantages of pyPANTERA is that it is accessible to privacy practitioners of all expertise. To
achieve this, pyPANTERA constructs upon popular data science libraries, i.e., Numpy [30], Pandas [31],
and SciPy [32]. In addition, to optimize large amounts of text obfuscation, the library supports parallel
computing with the Python library multiprocessing4, increasing the eficiency.
3.2.2. Mechanisms Overview
New obfuscation mechanisms can be developed using the abstract classes provided by pyPANTERA.
The library’s UML diagram is accessible in the project repository, and it features a general abstract
DP mechanism class for initializing new mechanisms. Additionally, distinct child abstract classes
corresponding to embedding and sampling perturbation define each specific obfuscation process. An
obfuscation mechanism has three main phases, i.e., Preprocessing, Distortion and Selection, depicted in
Figure 2. The preprocessing phase deals with the tokenization and removal of alpha-numeric terms,
after which an embedding model is usually employed to obtain the vectors of the terms in the original
text. The second phase, i.e., the Distortion phase, modifies such term embeddings, considering the 
privacy budget and the other parameters specific to each mechanism. Finally, there is the Selection
phase, where the final obfuscated word is selected to compose the produced privatized text.</p>
        <p>Original</p>
        <p>Text</p>
        <p>Obfuscation
Mechanism</p>
        <p>Obfuscated</p>
        <p>Text
Preprocessing</p>
        <p>Distortion</p>
        <p>Selection
Parsing</p>
        <p>&amp;
POS Tagging</p>
        <p>Embedding
model</p>
        <p>Computation
of obfuscated
terms</p>
        <p>Production of
obfuscated
text</p>
        <p>We report a list of state-of-the-art mechanisms available in the package. The mechanisms have been
categorized into Embedding (Cumulative Multivariate Perturbation Mechanism (CMP), Mahalanobis</p>
        <sec id="sec-3-2-1">
          <title>2https://github.com/Kekkodf/pypantera</title>
          <p>3https://pypi.org/
4https://docs.python.org/3/library/multiprocessing.html
(Mhl), Vickrey CMP Mechanism (VickreyCMP), Vickrey Mahalanobis Mechanism (VickreyMhl)) and
Sampling (Custumized Text Mechansim (CusText), Sanitization Text Mechanism (SanText), Truncated
Exponential Mechanism (TEM)) perturbation groups to delineate the type of obfuscation process they
perform. More details can be found in the repository.</p>
          <p>• Embedding Obfuscation: Generally speaking, this family of obfuscation mechanisms is related to
the alteration of the terms embeddings in the original text, adding a certain amount of statistical
noise sampled proportionally to the  privacy budget.</p>
          <p>
            – CMP [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]: After encoding each term in the original text, the statistical noise sampled from
an  - dimensional Laplace distribution and it is added to the embedding of the terms.
Finally, the new obfuscated terms are selected considering the proximity of the respective
embeddings to the noisy ones computed.
– Mhl [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]: Similarly to the CMP mechanisms, after the encoding of the terms in the input
text, the noise is sampled from an  - dimensional Normal distribution proportional to the
 regularized Mahalanobis norm of the term embedding, stretching the obfuscation noise
towards more similar terms, and the  parameter. Finally, the selection of the new term is
based on the proximity of the obfuscated embeddings to the original one.
– VickreyCMP and VickreyMhl [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ]: In this mechanism, the preprocessing and distortion
with noise is defined by the parent method ( CMP or Mhl) and the obfuscation term is then
selected based on a free parameter threshold  ∈ (0, 1).
• Sampling Obfuscation: While the Embedding obfuscation mainly considered the Distortion phase
of an obfuscation mechanism, these strategies deal with the noisy selection of the obfuscated
terms in the output texts. In this case, the mechanisms do not alter the embedding representation
of the terms, thus missing the distortion phase in Figure 2.
          </p>
          <p>
            – CusText [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ]: Selecting a new term involves a sampling approach, where the replacement
word is chosen from a set of  possible term candidates. The determination of these
candidates is based on their similarity to the original term, which is assessed through the
distances between word embeddings. The  words with the highest similarity scores, i.e.,
lowest distances, are identified, from which one is selected exponentially proportional to .
– SanText [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]: In this mechanism, there is no limitation of the top  most similar words,
conversely with the CusText method, but all possible terms can be used.
– TEM [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ]: The noise, sampled from an  - dimensional Gumbel distribution, is incorporated
into the score computed based on the distances between the vector embeddings. A truncation
parameter  is introduced to limit the possible obfuscation candidates during the selection
phase. Thus, the new term is chosen according to the maximum noisy score obtained by a
term using the exponential mechanism [33] for sampling.
3.2.3. Functionalities
pyPANTERA enforces diferent utility functions to help the practitioner get an exhaustive view of all
the pipeline steps. Therefore, the package ofers an appropriate class to speed up the initialization
of the embedding vocabulary that uses parallelization to read the embeddings from the supplied file.
Moreover, using the logging python library5 the method creates a folder containing a logger file to
report all the action information regarding mechanism parameters, time of execution and steps executed.
Finally, to evaluate the similarities between the original and obfuscated texts and thus assess the privacy
provided to the texts, pyPANTERA implements the Jaccard Index to compute the overlapping terms,
i.e., ofering a proxy measure on the lexical similarity of the produced texts, and a cosine similarity
among the contextual embeddings of the sentences depending on a configurable Transformer model,
i.e., showcasing the sentence similarity between input and output texts.
          </p>
        </sec>
        <sec id="sec-3-2-2">
          <title>5https://docs.python.org/3/library/logging.html</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Evaluation</title>
      <p>
        In this Section, we report the experiments performed to verify the efectiveness of the pyPANTERA
package. As a downstream task, we employed the setups and methodology of the original studies,
i.e., sentiment analysis, classification and document retrieval. Finally, to assess the levels of privacy
provided, we followed the methodology proposed in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and computed the cosine similarity and Jaccard
scores between original and obfuscated texts using the methods implemented in pyPANTERA.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Dataset and Experimental Setup</title>
        <p>
          To evaluate the correctness and efectiveness of the pyPANTERA library, we conduct experiments using
NLP tasks similar to those employed in the original state-of-the-art mechanism studies, specifically
sentiment analysis. Additionally, following the methodology proposed by Faggioli and Ferro [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], we
assess the library’s robustness in implementing the query obfuscation pipeline for an IR task, i.e.,
document retrieval, while ensuring user privacy protection. For the sentiment analysis task, we used
the Kaggle Twitter sentiment analysis6 test set. On the other hand, in the document retrieval task,
we obfuscated the queries from the TREC Deep Learning (DL’19) [34], based on the MSMARCO [35]
passage corpus. Finally, we measured the privacy levels achieved by the former query collection
using the metrics module in pyPANTERA. The default initialization parameters used to configure
the obfuscation mechanisms are reported in Table 1. The privacy budget  was selected to verify the
impact of such parameter in a wide range of possible values, i.e.,  ∈ {1, 5, 10, 12.5, 15, 17.5, 20, 50}.
To encode the texts, the default embeddings in the package and used for all the tasks are read from a
local file containing the pre-computed vectors of GloVe [ 36] from Wikipedia 2014 publically available7.
        </p>
        <p>A key feature of the pyPANTERA package is its flexibility in allowing practitioners to configure various
parameters for the obfuscation mechanisms directly via command-line arguments. This functionality
enables users to customize the behaviour of the mechanisms without modifying the underlying code. A
practitioner can specify the desired mechanism and its parameters by executing a command such as:
python3 testObfuscationIR.py --mechanism VickreyCMP -t 0.5</p>
        <p>In this example, the –mechanism flag selects the “VickreyCMP” obfuscation mechanism, while the
-t parameter sets a threshold value of 0.5. This approach facilitates experimentation and fine-tuning,
allowing users to eficiently adapt the obfuscation process to specific use cases.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Natural Language Processing</title>
        <p>To demonstrate the capabilities of pyPANTERA, we conducted a standard NLP task—sentiment
analysis—on a dataset of tweets collected from Twitter. For sentiment classification, we utilized the</p>
        <sec id="sec-4-2-1">
          <title>6https://www.kaggle.com/datasets/jp797498e/twitter-entity-sentiment-analysis/data 7https://nlp.stanford.edu/projects/glove/</title>
          <p>
            Twitter-roBERTa-base model, commonly referred to as TweetNLP [37], to extract sentiment from
the preprocessed version of the tweets. As a performance metric, we measured accuracy in correctly
identifying the sentiment labels of the tweets, aligning our evaluation with prior studies on obfuscation
mechanisms [
            <xref ref-type="bibr" rid="ref14 ref15 ref16">14, 16, 15</xref>
            ]. This task aimed to demonstrate how the obfuscated tweets generated by
pyPANTERA can be completely integrated into a basic NLP task, comparing diferent obfuscation
techniques regarding their impact on model performance.
          </p>
          <p>
            Figure 3 presents the accuracy results as a function of the privacy budget  for diferent obfuscation
mechanisms. The findings are consistent with those reported in previous studies [
            <xref ref-type="bibr" rid="ref14 ref15 ref16">14, 16, 15</xref>
            ]. The
TEM mechanism surpasses the CMP mechanism in sentiment classification, confirming the results
obtained by Carvalho et al. [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ]. Moreover, CusText performs better than SanText, aligning with the
observations of Chen et al. [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ]. In the context of noisy embeddings obfuscation, CMP and Mahalanobis
exhibit a similar performance trend across diferent values of . In contrast, the Vickrey-based variants
consistently demonstrate lower performance. The results highlight a clear distinction between the
two obfuscation families: the sampling-based approach achieves higher precision for lower  values,
whereas the noisy embedding methods maintain lower performance under the same privacy constraints.
          </p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Information Retrieval and Privacy Analysis</title>
        <p>
          Following the experimental methodology outlined by Faggioli and Ferro [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], we applied obfuscated
MSMARCO DL’19 queries to retrieve relevant documents from the collection, ensuring user privacy
during the retrieval process. Therefore, we re-ranked the retrieved results using the original
(nonobfuscated) queries. For both retrieval and re-ranking, we utilized the Meta Contriever dense model [38].
The performance of the retrieval pipeline, measured in terms of Recall and nDCG@10, is presented in
Table 2. Table 3 presents the similarity results, which quantify the relationship between the original and
obfuscated DL’19 queries. Specifically, we evaluated two types of similarity using the metric functions
available in pyPANTERA: lexical similarity, measured using the Jaccard index, and sentence-level
similarity, computed as the cosine similarity between the contextual embeddings of the queries obtained
from the Sentence-BERT MiniLM model [39]. In future versions of the library, we plan to implement
new privacy measures like the one proposed in [40].
        </p>
        <p>
          As observed by Faggioli and Ferro [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], and consistent with the theoretical expectations of a DP
obfuscation mechanism, increasing the privacy budget  results in enhanced performance of the
obfuscation mechanism. However, this performance improvement comes with a weakening in the
privacy guarantees, as illustrated in both Table 2 and Table 3. Moreover, the Sampling perturbation
mechanisms tend to exhibit higher similarity between the original and obfuscated queries for lower
values of  compared to the Embedding perturbation mechanisms. This pattern suggests a trade-of
 - Privacy Budget
future work. We leave this observation as an open issue for future experiments.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>
        Given the increasing concerns surrounding data confidentiality in textual analysis, privacy remains an
important research domain for NLP and IR. In this paper, we introduced the functionality of pyPANTERA
introduced in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a highly adaptable and extensible framework designed to systematically evaluate
and compare diferent DP obfuscation mechanisms. Our framework significantly contributes to the
privacy-preserving research community by establishing a well-defined and user-friendly text obfuscation
pipeline, facilitating the development and integration of novel obfuscation techniques by researchers and
practitioners in the privacy research field. The pyPANTERA library encompasses diverse functionalities,
including real-time monitoring of the obfuscation process and a list of evaluation metrics to assess the
level of privacy preserved beyond the formal analysis of the  privacy budget. Furthermore, we conduct
an extensive empirical analysis across standard NLP and IR tasks, demonstrating the efectiveness
of pyPANTERA in providing a robust and unified environment for the comparative assessments of
diferent obfuscation strategies based on DP. As part of future research, we aim to broaden the number
of available obfuscation mechanisms in the framework and enhance the privacy evaluation module by
introducing additional metric functions to refine the assessment of privacy guarantees.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly for Readability and Spelling checks.
After using this tool, they reviewed and edited the content as needed and took full responsibility for
the publication’s content.
aware, pluggable and customizable pii anonymization service for text and images, 2018.
[29] M. Honnibal, I. Montani, S. Van Landeghem, A. Boyd, spaCy: Industrial-strength Natural Language</p>
      <p>Processing in Python (2020). doi:10.5281/zenodo.1212303.
[30] C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser,
J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane,
J. F. del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser,
H. Abbasi, C. Gohlke, T. E. Oliphant, Array programming with NumPy, Nature 585 (2020) 357–362.</p>
      <p>URL: https://doi.org/10.1038/s41586-020-2649-2. doi:10.1038/s41586-020-2649-2.
[31] Wes McKinney, Data Structures for Statistical Computing in Python, in: Stéfan van der Walt,
Jarrod Millman (Eds.), Proceedings of the 9th Python in Science Conference, 2010, pp. 56 – 61.
doi:10.25080/Majora-92bf1922-00a.
[32] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski,
P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov,
A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas,
D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald,
A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, SciPy 1.0 Contributors, SciPy 1.0: Fundamental
Algorithms for Scientific Computing in Python, Nature Methods 17 (2020) 261–272. doi: 10.1038/
s41592-019-0686-2.
[33] F. McSherry, K. Talwar, Mechanism design via diferential privacy, in: 48th Annual IEEE
Symposium on Foundations of Computer Science (FOCS 2007), October 20-23, 2007, Providence, RI, USA,
Proceedings, IEEE Computer Society, 2007, pp. 94–103. URL: https://doi.org/10.1109/FOCS.2007.41.
doi:10.1109/FOCS.2007.41.
[34] N. Craswell, B. Mitra, E. Yilmaz, D. Campos, E. M. Voorhees, Overview of the TREC
2019 deep learning track, CoRR abs/2003.07820 (2020). URL: https://arxiv.org/abs/2003.07820.
arXiv:2003.07820.
[35] T. Nguyen, M. Rosenberg, X. Song, J. Gao, S. Tiwary, R. Majumder, L. Deng, MS MARCO: A
human generated machine reading comprehension dataset, in: T. R. Besold, A. Bordes, A. S. d’Avila
Garcez, G. Wayne (Eds.), Proceedings of the Workshop on Cognitive Computation: Integrating
neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural
Information Processing Systems (NIPS 2016), Barcelona, Spain, December 9, 2016, volume 1773 of
CEUR Workshop Proceedings, CEUR-WS.org, 2016.
[36] J. Pennington, R. Socher, C. Manning, GloVe: Global vectors for word representation, in: A.
Moschitti, B. Pang, W. Daelemans (Eds.), Proceedings of the 2014 Conference on Empirical Methods in
Natural Language Processing (EMNLP), Association for Computational Linguistics, Doha, Qatar,
2014, pp. 1532–1543. URL: https://aclanthology.org/D14-1162. doi:10.3115/v1/D14-1162.
[37] J. Camacho-collados, K. Rezaee, T. Riahi, A. Ushio, D. Loureiro, D. Antypas, J. Boisson, L.
Espinosa Anke, F. Liu, E. Martínez Cámara, TweetNLP: Cutting-edge natural language processing
for social media, in: W. Che, E. Shutova (Eds.), Proceedings of the 2022 Conference on Empirical
Methods in Natural Language Processing: System Demonstrations, Association for Computational
Linguistics, Abu Dhabi, UAE, 2022, pp. 38–49. URL: https://aclanthology.org/2022.emnlp-demos.5.
doi:10.18653/v1/2022.emnlp-demos.5.
[38] G. Izacard, M. Caron, L. Hosseini, S. Riedel, P. Bojanowski, A. Joulin, E. Grave, Unsupervised
dense information retrieval with contrastive learning, Trans. Mach. Learn. Res. 2022 (2022). URL:
https://openreview.net/forum?id=jKN1pXi7b0.
[39] N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks,
in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing,
Association for Computational Linguistics, 2019. URL: https://arxiv.org/abs/1908.10084.
[40] F. L. De Faveri, G. Faggioli, N. Ferro, Measuring actual privacy of obfuscated queries in
information retrieval, in: Advances in Information Retrieval: 47th European Conference on
Information Retrieval, ECIR 2025, Lucca, Italy, April 6–10, 2025, Proceedings, Part I,
SpringerVerlag, Berlin, Heidelberg, 2025, p. 49–66. URL: https://doi.org/10.1007/978-3-031-88708-6_4.
doi:10.1007/978-3-031-88708-6_4.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>L. De Faveri</surname>
          </string-name>
          , G. Faggioli,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Ferro, pyPANTERA: A python PAckage for Natural language obfuscaTion Enforcing pRivacy</article-title>
          &amp; Anonymization, in: E.
          <string-name>
            <surname>Serra</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Spezzano</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 33rd ACM International Conference on Information and Knowledge Management</source>
          ,
          <string-name>
            <surname>CIKM</surname>
          </string-name>
          <year>2024</year>
          ,
          <article-title>Boise</article-title>
          ,
          <string-name>
            <surname>ID</surname>
          </string-name>
          , USA, October
          <volume>21</volume>
          -
          <issue>25</issue>
          ,
          <year>2024</year>
          , ACM,
          <year>2024</year>
          , pp.
          <fpage>5348</fpage>
          -
          <lpage>5353</lpage>
          . URL: https://doi.org/10.1145/ 3627673.3679173. doi:
          <volume>10</volume>
          .1145/3627673.3679173.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Chetty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Alathur</surname>
          </string-name>
          ,
          <article-title>Hate speech review in the context of online social networks</article-title>
          ,
          <source>Aggression and Violent Behavior</source>
          <volume>40</volume>
          (
          <year>2018</year>
          )
          <fpage>108</fpage>
          -
          <lpage>118</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/ S1359178917301064. doi:https://doi.org/10.1016/j.avb.
          <year>2018</year>
          .
          <volume>05</volume>
          .003.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Maragh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ekdale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>High</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Havens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shafiq</surname>
          </string-name>
          ,
          <article-title>Measuring political personalization of google news search</article-title>
          ,
          <source>in: The World Wide Web Conference</source>
          , WWW '19,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2019</year>
          , p.
          <fpage>2957</fpage>
          -
          <lpage>2963</lpage>
          . URL: https://doi.org/10.1145/3308558.3313682. doi:
          <volume>10</volume>
          .1145/3308558.3313682.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mustafaraj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lurie</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. Devine,</surname>
          </string-name>
          <article-title>The case for voter-centered audits of search engines during political elections</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency</source>
          , FAT* '
          <volume>20</volume>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2020</year>
          , p.
          <fpage>559</fpage>
          -
          <lpage>569</lpage>
          . URL: https://doi.org/10.1145/3351095.3372835. doi:
          <volume>10</volume>
          .1145/3351095.3372835.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>A survey on diferential privacy for unstructured data content</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>54</volume>
          (
          <year>2022</year>
          ). URL: https://doi.org/10.1145/3490237. doi:
          <volume>10</volume>
          .1145/3490237.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>I. Habernal</surname>
          </string-name>
          ,
          <article-title>When diferential privacy meets NLP: the devil is in the detail</article-title>
          , in: M.
          <string-name>
            <surname>Moens</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Specia</surname>
          </string-name>
          , S. W. Yih (Eds.),
          <source>Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP</source>
          <year>2021</year>
          , Virtual Event / Punta Cana, Dominican Republic,
          <fpage>7</fpage>
          -
          <issue>11</issue>
          <year>November</year>
          ,
          <year>2021</year>
          , Association for Computational Linguistics,
          <year>2021</year>
          , pp.
          <fpage>1522</fpage>
          -
          <lpage>1528</lpage>
          . URL: https: //doi.org/10.18653/v1/
          <year>2021</year>
          .emnlp-main.
          <volume>114</volume>
          . doi:
          <volume>10</volume>
          .18653/V1/
          <year>2021</year>
          .EMNLP-MAIN.
          <year>114</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>I.</given-names>
            <surname>Habernal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mireshghallah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Thaine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghanavati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Feyisetan</surname>
          </string-name>
          ,
          <article-title>Privacy-preserving natural language processing</article-title>
          , in: F.
          <string-name>
            <given-names>M.</given-names>
            <surname>Zanzotto</surname>
          </string-name>
          , S. Pradhan (Eds.),
          <article-title>Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts, Association for Computational Linguistics</article-title>
          , Dubrovnik, Croatia,
          <year>2023</year>
          , pp.
          <fpage>27</fpage>
          -
          <lpage>30</lpage>
          . URL: https: //aclanthology.org/
          <year>2023</year>
          .eacl-tutorials.6. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .eacl-tutorials.
          <volume>6</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carlini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tramèr</surname>
          </string-name>
          , E. Wallace,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jagielski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Herbert-Voss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          , D. Song, Ú. Erlingsson,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oprea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <article-title>Extracting training data from large language models</article-title>
          ,
          <source>in: 30th USENIX Security Symposium (USENIX Security 21)</source>
          , USENIX Association,
          <year>2021</year>
          , pp.
          <fpage>2633</fpage>
          -
          <lpage>2650</lpage>
          . URL: https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Faggioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <article-title>Query obfuscation for information retrieval through diferential privacy</article-title>
          , in: N.
          <string-name>
            <surname>Goharian</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Tonellotto</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lipani</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
          </string-name>
          , I. Ounis (Eds.),
          <source>Advances in Information Retrieval</source>
          , Springer Nature Switzerland, Cham,
          <year>2024</year>
          , pp.
          <fpage>278</fpage>
          -
          <lpage>294</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>McSherry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nissim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>Calibrating noise to sensitivity in private data analysis</article-title>
          , in: S. Halevi, T. Rabin (Eds.),
          <source>Theory of Cryptography</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2006</year>
          , pp.
          <fpage>265</fpage>
          -
          <lpage>284</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
            <surname>Feyisetan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Balle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Drake</surname>
          </string-name>
          , T. Diethe,
          <article-title>Privacy- and utility-preserving textual analysis via calibrated multivariate perturbations</article-title>
          ,
          <source>in: Proceedings of the 13th International Conference on Web Search and Data Mining</source>
          , WSDM '20,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2020</year>
          , pp.
          <fpage>178</fpage>
          -
          <lpage>186</lpage>
          . URL: https://doi.org/10.1145/3336191.3371856. doi:
          <volume>10</volume>
          .1145/3336191. 3371856.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Feyisetan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Teissier</surname>
          </string-name>
          ,
          <article-title>A diferentially private text perturbation method using regularized mahalanobis metric</article-title>
          , in: O.
          <string-name>
            <surname>Feyisetan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ghanavati</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Malmasi</surname>
            , P. Thaine (Eds.), Proceedings of the Second Workshop on Privacy in
            <given-names>NLP</given-names>
          </string-name>
          , Association for Computational Linguistics, Online,
          <year>2020</year>
          , pp.
          <fpage>7</fpage>
          -
          <lpage>17</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .privatenlp-
          <volume>1</volume>
          .2.pdf. doi:
          <volume>10</volume>
          . 18653/v1/
          <year>2020</year>
          .privatenlp-
          <volume>1</volume>
          .2.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Feyisetan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Teissier</surname>
          </string-name>
          ,
          <article-title>On a utilitarian approach to privacy preserving text generation</article-title>
          , in: O.
          <string-name>
            <surname>Feyisetan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ghanavati</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Malmasi</surname>
          </string-name>
          , P. Thaine (Eds.),
          <source>Proceedings of the Third Workshop on Privacy in Natural Language Processing</source>
          , Association for Computational Linguistics, Online,
          <year>2021</year>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>20</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .privatenlp-
          <volume>1</volume>
          .2. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .privatenlp-
          <volume>1</volume>
          .2.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S. M.</given-names>
            <surname>Chow</surname>
          </string-name>
          ,
          <article-title>Diferential privacy for text analytics via natural text sanitization</article-title>
          , in: C.
          <string-name>
            <surname>Zong</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          (Eds.),
          <article-title>Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021</article-title>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>3853</fpage>
          -
          <lpage>3866</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          .findings-acl.
          <volume>337</volume>
          . doi:
          <volume>10</volume>
          .18653/ v1/
          <year>2021</year>
          .findings-acl.
          <volume>337</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <article-title>A customized text sanitization mechanism with diferential privacy</article-title>
          , in: A.
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Boyd-Graber</surname>
          </string-name>
          , N. Okazaki (Eds.),
          <source>Findings of the Association for Computational Linguistics: ACL</source>
          <year>2023</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>5747</fpage>
          -
          <lpage>5758</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .findings-acl.
          <volume>355</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .findings-acl.
          <volume>355</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Carvalho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vasiloudis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Feyisetan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <source>TEM: High Utility Metric Diferential Privacy on Text, "2023"</source>
          , pp.
          <fpage>883</fpage>
          -
          <lpage>890</lpage>
          . URL: https://epubs.siam. org/doi/abs/10.1137/1.9781611977653.ch99.
          <source>doi:10.1137/1</source>
          .9781611977653.ch99. arXiv:https://epubs.siam.org/doi/pdf/10.1137/1.9781611977653.
          <year>ch99</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K.</given-names>
            <surname>Chatzikokolakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Andrés</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. E.</given-names>
            <surname>Bordenabe</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Palamidessi, Broadening the scope of diferential privacy using metrics</article-title>
          , in: E. De Cristofaro, M. Wright (Eds.),
          <source>Privacy Enhancing Technologies</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2013</year>
          , pp.
          <fpage>82</fpage>
          -
          <lpage>102</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>O.</given-names>
            <surname>Klymenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Meisenbacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Matthes</surname>
          </string-name>
          ,
          <article-title>Diferential privacy in natural language processing the story so far</article-title>
          , in: O.
          <string-name>
            <surname>Feyisetan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ghanavati</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Thaine</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Habernal</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Mireshghallah</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the Fourth Workshop on Privacy in Natural Language Processing</source>
          , Association for Computational Linguistics, Seattle, United States,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .privatenlp-
          <volume>1</volume>
          .1. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .privatenlp-
          <volume>1</volume>
          .1.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          , H. B.
          <string-name>
            <surname>McMahan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Mironov</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Talwar</surname>
            ,
            <given-names>L. Zhang,</given-names>
          </string-name>
          <article-title>Deep learning with diferential privacy</article-title>
          , in: E. R. Weippl,
          <string-name>
            <given-names>S.</given-names>
            <surname>Katzenbeisser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kruegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Myers</surname>
          </string-name>
          , S. Halevi (Eds.),
          <source>Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security</source>
          , Vienna, Austria,
          <source>October 24-28</source>
          ,
          <year>2016</year>
          , ACM,
          <year>2016</year>
          , pp.
          <fpage>308</fpage>
          -
          <lpage>318</lpage>
          . URL: https://doi.org/10. 1145/2976749.2978318. doi:
          <volume>10</volume>
          .1145/2976749.2978318.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carlini</surname>
          </string-name>
          , C. Liu,
          <string-name>
            <given-names>U.</given-names>
            <surname>Erlingsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <article-title>The secret sharer: evaluating and testing unintended memorization in neural networks</article-title>
          ,
          <source>in: Proceedings of the 28th USENIX Conference on Security Symposium</source>
          , SEC'19,
          <string-name>
            <given-names>USENIX</given-names>
            <surname>Association</surname>
          </string-name>
          , USA,
          <year>2019</year>
          , p.
          <fpage>267</fpage>
          -
          <lpage>284</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hagermalm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Slavnic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Schiller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Almgren</surname>
          </string-name>
          ,
          <article-title>Evaluation of open-source tools for diferential privacy</article-title>
          ,
          <source>Sensors</source>
          <volume>23</volume>
          (
          <year>2023</year>
          )
          <article-title>6509</article-title>
          . URL: https://doi.org/10.3390/s23146509. doi:
          <volume>10</volume>
          .3390/S23146509.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>I. C.</given-names>
            <surname>Ngong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stenger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Near</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <article-title>Evaluating the usability of diferential privacy tools with data practitioners</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2309</volume>
          .
          <fpage>13506</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>N.</given-names>
            <surname>Holohan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Braghin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mac Aonghusa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Levacher</surname>
          </string-name>
          ,
          <article-title>Difprivlib: the IBM diferential privacy library</article-title>
          , ArXiv e-prints
          <year>1907</year>
          .
          <article-title>02444 [cs</article-title>
          .CR] (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Yousefpour</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Shilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sablayrolles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Testuggine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Prasad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Malek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bharadwaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Cormode</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Mironov</surname>
          </string-name>
          ,
          <article-title>Opacus: User-friendly diferential privacy library in pytorch</article-title>
          ,
          <source>CoRR abs/2109</source>
          .12298 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2109.12298. arXiv:
          <volume>2109</volume>
          .
          <fpage>12298</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>P.</given-names>
            <surname>Subramani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Vadivelu</surname>
          </string-name>
          , G. Kamath,
          <article-title>Enabling fast diferentially private SGD via justin-time compilation and vectorization</article-title>
          , in: M.
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Beygelzimer</surname>
            ,
            <given-names>Y. N.</given-names>
          </string-name>
          <string-name>
            <surname>Dauphin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>J. W.</given-names>
          </string-name>
          <string-name>
            <surname>Vaughan</surname>
          </string-name>
          (Eds.),
          <source>Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems</source>
          <year>2021</year>
          ,
          <article-title>NeurIPS 2021</article-title>
          , December 6-
          <issue>14</issue>
          ,
          <year>2021</year>
          , virtual,
          <year>2021</year>
          , pp.
          <fpage>26409</fpage>
          -
          <lpage>26421</lpage>
          . URL: https://proceedings.neurips.cc/paper/2021/hash/ ddf9029977a61241841edeae15e9b53f-Abstract.html.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gaboardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vadhan</surname>
          </string-name>
          ,
          <article-title>A programming framework for opendp</article-title>
          , Manuscript, May (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>OpenMinded</surname>
          </string-name>
          , Pydp,
          <year>2021</year>
          . URL: https://github.com/OpenMined/PyDP, accessed:
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>O.</given-names>
            <surname>Mendels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Peled</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. Vaisman</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lahiani</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Microsoft</surname>
            <given-names>Presidio</given-names>
          </string-name>
          : Context
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>