<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SP-Guard: Selective Prompt-adaptive Guidance for Safe Text-to-Image Generation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sumin Yu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Taesup Moon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Electrical and Computer Engineering, Seoul National University</institution>
          ,
          <addr-line>Seoul</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IPAI / ASRI / INMC, Seoul National University</institution>
          ,
          <addr-line>Seoul</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>While difusion-based T2I models have achieved remarkable image generation quality, they also enable easy creation of harmful content, raising social concerns and highlighting the need for safer generation. Existing inference-time guiding methods lack both adaptivity-adjusting guidance strength based on the prompt-and selectivity-targeting only unsafe regions of the image. Our method, SP-Guard, addresses these limitations by estimating prompt harmfulness and applying a selective guidance mask to guide only unsafe areas. Experiments show that SP-Guard generates safer images than existing methods while minimizing unintended content alteration. Beyond improving safety, our findings highlight the importance of transparency and controllability in image generation. WARNING: This paper contains AI-generated images that may be ofensive. Sensitive contents are masked.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Risk Management for Trustworthy AI</kwd>
        <kwd>Safe Generative AI</kwd>
        <kwd>Text-to-Image Difusion Model</kwd>
        <kwd>Safe Image Generation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The rapid advancements in text-to-image (T2I) difusion models [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] have enabled the generation of
high-quality images based on textual inputs. However, the extensive training data often contain unsafe
content and inherent biases [
        <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
        ], posing significant risks of generating unexpected unsafe images [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
There are also concerns about malicious users exploiting model vulnerabilities to create harmful images
by generating attacking prompts [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ].
      </p>
      <p>
        To mitigate these risks, existing defenses fall into two main categories. Detection-based methods
[
        <xref ref-type="bibr" rid="ref1 ref7 ref8">8, 1, 7</xref>
        ] attempt to identify harmful images, but often sufer from false positives that block benign
content [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Removal-based methods intervene before or during generation by adjusting the difusion
process at inference time [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], editing model weights [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ], or optimizing prompts [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Most existing
methods struggle to handle multiple harmful concepts simultaneously. Weight-editing and
promptbased approaches require retraining for new unsafe concepts. In contrast, inference-time methods
enable safe generation through lightweight manipulations. One notable approach in this category is
Safe Latent Difusion (SLD) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which utilizes classifier-free guidance to adjust noise estimates away
from unsafe concept directions, even when multiple concepts are present. Despite its efectiveness, we
observe that SLD sometimes fails to remove harmfulness from images, even under maximum guidance
(SLD-max). Moreover, its guidance is applied inconsistently across prompts, i.e., some prompts become
suficiently safe while others remain unsafe with the same configuration – see Fig. 1.
      </p>
      <p>In light of these limitations, we propose SP-Guard, an inference-time method emphasizing the
importance of selective and prompt-adaptive safe guidance to prevent unsafe image generation. We
suspect that the reason SLD fails is that it does not reflect how unsafe the generated image will be.
Therefore, before presenting our method in detail, we underscore the importance of adapting safety
guidance (i.e., unsafe concept removal) to each individual prompt. We demonstrate this in Section 2.2
SLD-medium
through a comparative analysis of images generated in a straightforward experiment. SP-Guard, detailed
in Section 2.3, is based on the intuition that the similarity between the noise predictions conditioned on
the prompt and those conditioned on unsafe concepts can serve as a proxy for estimating the unsafe
degree of the generated image. Specifically, SP-Guard proactively estimates the unsafe degree of a
prompt and provides safe guidance during inference. It also employs noise predictions at each timestep
to generate a guiding mask that precisely identifies where and to what extent each step is unsafe. Since
images with harmful elements typically also contain benign elements, such as backgrounds and detailed
objects, our masking strategy is designed to selectively eliminate only the visual components related to
the unsafe content while preserving the rest. The efectiveness of SP-Guard is shown in Fig. 1. While
SLD yields inconsistent results under the same guidance level (i.e., SLD-medium in the second row), due
to the varying degrees of prompt harmfulness, SP-Guard consistently produces safe images regardless of
the initial prompt harmfulness. Moreover, SP-Guard employs a precise masking strategy that selectively
captures regions associated with unsafe concepts, whereas SLD often applies guidance more broadly,
afecting unrelated areas and resulting in images that diverge from the original intent (third row).</p>
      <p>In Section 3, we conduct both quantitative and qualitative evaluations of SP-Guard on four
benchmark datasets. Our findings indicate that SP-Guard achieves a lower detection rate of unsafe content,
efectively preserving the integrity of the original image content. Qualitative analyses further verify
that SP-Guard consistently converts unsafe elements into safe content, regardless of the potential harm
of the prompt. Furthermore, SP-Guard maintains image fidelity and text alignment comparable to
SD, supporting its practicality. The underlying idea of prompt-adaptive and selective guidance opens
up new opportunities for broader applications in safe and controllable image generation. Moreover,
by concretely analyzing the limitations of previous approaches and highlighting the importance of
estimating the prompt-specific potential harmfulness, SP-Guard contributes to the transparency and
controllability of safe image generation. We further elaborate on these aspects in Section 4.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Method</title>
      <sec id="sec-2-1">
        <title>2.1. Preliminaries</title>
        <p>
          Difusion-based T2I and Classifier-Free Guidance. Difusion models [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] are generative models that
create samples from Gaussian noise, progressively denoising based on a learned data distribution. The
model iteratively predicts an estimate of the noise to be removed. For text-based image generation [
          <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
          ],
the estimated noises are conditioned on the text prompt. Classifier-free guidance approach [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] allows
conditioning without an additional pre-trained classifier, training the model with or without text
prompts randomly to handle both conditional and unconditional images. During inference, the model
uses noise estimates ˜ at each steps formulated as follows:
        </p>
        <p>˜ (z, c) :=   (z) + (  (z, c) −   (z)), (1)
(a)
(b)
(c)
(d)
(e)</p>
        <p>SLD-weak
SLD-medium
SLD-strong
SLD-max
t=T
t=0
(a) Examples generated by original SD and SLD. (b) Visualization of   of SLD for applying safe guidance.
Figure 2: Limitations of SLD in safety guidance. See Section 2.2 for details.
where z is the latent variable at timestep , and c is the text embedding for the prompt .  is the
guidance scale that controls the strength of conditioning.</p>
        <p>
          Semantic and Safe Guidance at Inference. Controlling T2I models to faithfully reflect user intentions
in generated images remains a challenging task. One line of research addresses this challenge by
classifier-free-guidance to enable semantic control at inference time [
          <xref ref-type="bibr" rid="ref10 ref16">16, 10</xref>
          ]. This approach introduces
a semantic guidance term,  (z, c, c) into Eq. (1), where  is a concept capturing the user’s intent.
This results in
˜ (z, c, c) =   (z) + (︀   (z, c) −   (z) +  (z, c, c)︀) .
(2)
To reflect the concept  in the image,  applies positive guidance in the direction of   (z, c).
Conversely, to prevent the appearance of , negative guidance is applied. In the realm of safe T2I, where the
objective is to exclude unsafe concepts  from the generated image,  is specifically defined as,
 (z, c, c) = −  (c, c) · (  (z, c) −   (z)),
(3)
where   adjusts the guidance strength to avoid generating unsafe content. Schramowski et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]
propose SLD which defines   as follows:
 (c, c) =
{︃min(1, | |), if   (z, c) ⊖   (z, c) &lt; 
0,
otherwise
,  = (  (z, c) −   (z, c)). (4)
 represents the scaled diference between the noise estimates conditioned on the prompt and those
conditioned on the unsafe concept. It is used to modulate   based on a predefined threshold  .
Intuitively, SLD increases the guidance strength when the current generation direction is close to an
unsafe concept, and otherwise turns the guidance of. The authors propose four configurations with five
hyperparameters to adjust guidance strength. More details of Eq. (4) will be discussed in Section 2.2.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Safety Considerations in T2I models</title>
        <p>
          This section delineates critical safety considerations for ensuring safe image generation in T2I models,
particularly highlighting the limitations inherent in the SLD framework evidenced by Eq. (4). First, the
guidance scale in SLD is clipped to 1, which restricts the model’s ability to provide adequate guidance
for highly unsafe prompts, even at its maximum strength setting. As shown in Fig. 2a (d) and (e), even
with maximal guidance (i.e.,SLD-max), the generated images retain unsafe content. Moreover, as seen
in (e), while the images diverge significantly from the standard SD outputs, similar unsafe concepts
persist. This issue stems from the masking condition specified in Eq. (4), which permits guidance on
regions not closely related to the unsafe concepts. This efect is visible in the mask visualizations shown
in Fig. 2b, where increasing safe guidance strength spreads its influence across a broader area instead
of focusing on the precise regions associated with the unsafe concepts. Under SLD-max settings, this
dispersion can result in the generation of diferent yet equally unsafe images. In addition,  increases
with the diference in noise estimates between the input prompt and the unsafe concept, resulting in
stronger guidance where they diverge. This contradicts the intuition that guidance should be stronger
in regions where the noise contains unsafe signals. Moreover, the method is applied inconsistently,
failing to adapt to the safety requirements of each prompt. As shown in Fig. 2a (a)-(d), the efectiveness
of safety configurations varies significantly across prompts. Based on these observations, we argue that
efective safety control requires evaluating the potential risk of unsafe content from the input prompt
and applying targeted guidance to relevant regions accordingly. To our knowledge, our work provides
the first in-depth analysis of the SLD framework, identifying why its performance, reported in previous
studies [
          <xref ref-type="bibr" rid="ref11 ref12 ref17">11, 17, 12</xref>
          ], frequently fails to address safety concerns adequately. Notably, no previous work
has investigated this limitation. In the following section, we introduce SP-Guard which guarantees safe
image generation by applying precise, prompt-specific guidance at inference time.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. SP-Guard: Selective Prompt-adaptive Guidance</title>
        <p>
          To ensure safe image generation, it is crucial to estimate whether a prompt is likely to produce unsafe
content and to what extent before the final image is generated. We estimate the prompt’s potential
to produce unsafe content using a proxy derived from noise estimates during denoising. Since noise
estimates conditioned on texts contain semantic information [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], they are pivotal for safety assessments.
Prior works have successfully leveraged noise estimates for semantic control [
          <xref ref-type="bibr" rid="ref10 ref16 ref18 ref19">18, 16, 10, 19</xref>
          ]. Building
on this, we define the noise direction Δc, for a given text prompt  and timestep  as follows:
Δc, =   (z, c) −   (z, )
where  is a null-text embedding. Intuitively, Eq. (5) tells us which direction the prompt is pushing the
image toward in semantic space. Then, we compute the cosine similarity between the noise direction
of a given text prompt  and that of an unsafe concept , i.e., Sim (Δc,, Δc,) for each timestep ,
where Sim (· , · ) denotes the cosine similarity between two vectors. This similarity measure serves as a
proxy for identifying potential unsafety in the generated images.
        </p>
        <p>We propose SP-Guard, which uses the similarity between noise estimates in the early difusion steps
as a proxy for the prompt’s unsafety level. Since diferent prompts can lead to varying degrees of unsafe
content, the guidance scale should be adjusted to reflect the severity of each prompt. Given
unsafeconcept set S = {1, ... }, SP-Guard first estimates the proxy value  (c, cS), which represents the
prompt-specific unsafe degree, during the earlier  timesteps.</p>
        <p>(c, cS) =</p>
        <p>{︁ 1  − +1
max ∑︁
∈{1,...,}   =</p>
        <p>Sim(Δc, , Δc, )
}︁
where Δc is the noise direction introduced in Eq. (5), and Sim(· , · ) is the cosine similarity. By estimating
how similar the direction of the given prompt  is to that of the harmful concept set , Eq. (6) serves
as a risk score that predicts how harmful a prompt is before the final image is generated. After the
initial  timesteps, SP-Guard incorporates this risk score through a new guidance weight  (c, c),
which is used in  (z, c, c) defined in Eq. (3). This weight controls both the strength and the spatial
positioning of the guidance at each timestep:</p>
        <p>(c, c) =  () · +(c, cS) ·  (z, c, c)
where +(c, cS) = max(0,  (c, cS)) to ensure only non-negative contributions influence the
guidance.  () is a pre-defined function of timestep t, detailed later in this section.  (z, c, c) acts as
a mask that selectively applies guidance to regions likely to contain unsafe content, thereby promoting
safe image generation. To elaborate on  (z, c, c), we scale each mask value based on the pixel-wise
proxy value, similar to the Sim(· , · ) function used in Eq. (6). Each pixel value for  (z, c, c) is defined
as follows:
(5)
(6)
(7)
calculate
%($", $#)</p>
        <p>calculate ! "!, $", $# for each timestep
t=0
SD</p>
        <p>NEG</p>
        <p>SEGA</p>
        <p>SLD weak SLD medium
{︃1 + max(0, | |) if |Δc,[, , ]| &gt;  (|Δc,|)
0 else
(8)
with  = Sim(Δc,[, , :], Δc,[, , :]),</p>
        <p>where  (|Δc,|) = -percentile of |Δc,|.</p>
        <p>
          The masking condition is motivated by Brack et al. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], who showed that the noise space consists of
semantic concepts, with each concept concentrated in the upper and lower tails of the noise distribution.
Accordingly, we mask the top -percentile elements and compute cosine similarity for the corresponding
pixels. In Fig. 3, we visualize  (z, c, c), showcasing how our mask design strategically applies safe
guidance to specific regions, such as nude body parts. In contrast, the masking process of SLD in Fig. 2b
spreads the guidance over unrelated areas of the image, often altering benign content unnecessarily.
This diference stems from the novelty of SP-Guard, which combines the prompt-adaptive risk score in
Eq. (6) with the selective masking in Eq. (8), enabling more precise and targeted guidance.
        </p>
        <p>
          Lastly, we use a step function as a default form of  () in Eq. (7). Specifically, after  timesteps,  ()
is set to  max and subsequently reduced to 1.0 in the later steps. The reduction is essential to avoid
visual artifacts, as prior work [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] shows that no such artifacts or distortion occur when the guidance
scale is capped at 1.0. Furthermore, Yi et al. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] and Balaji et al. [21] observe that text prompts primarily
influence the early difusion steps, while the later stages focus on denoising and completing details
using the latent image itself. These observations support our design: safe guidance should be prominent
in the early steps, but does not require strong influence later on, and should be limited to preserve
image quality. We validate the efectiveness of the step function in Section 3 and also explore the impact
of varying  max or using alternative scheduling strategies for  ().
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <sec id="sec-3-1">
        <title>3.1. Experimental Setup</title>
        <p>
          We compare SP-Guard with the original Stable Difusion (SD) [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and inference-time guiding methods:
SD with a simple negative prompt (NEG), SEGA [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], and four configurations of SLD [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Since many
existing methods [
          <xref ref-type="bibr" rid="ref7">7, 22, 23, 24, 25, 26, 27, 28</xref>
          ] report their results only on a single unsafe concept or
handle diferent unsafe categories separately, their practical applicability is somewhat limited in
multiconcept scenarios. Therefore, we primarily evaluate SP-Guard against SLD variants, as both address
multiple unsafe concepts concurrently through a unified guidance process, enabling a fair comparison.
We evaluate safe image generation on four datasets: I2P [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], Ring-A-Bell [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], MMA-Difusion [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ],
and UnlearnDif [ 29]. To assess image quality, we use DrawBench [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and COCO-30k [30], which
contain benign prompts. To assess the safety of generated images, we primarily report the unsafe
content detection rate and its relative improvement over SD. We use an average score across four safety
classifiers, MHSC [ 31], Q16 [32], NudeNet [33], and SD’s built-in Safety-Checker [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], to provide a
balanced estimate of overall harmfulness. To assess content preservation, we use LPIPS [34], which
measures perceptual similarity between images generated by SD and each method, thereby quantifying
how well the non-unsafe regions are retained. We also report CLIP-score [35] to evaluate image-text
alignment and FID [36] to assess image fidelity. We use SD v1.4 with 50 difusion steps and default
settings across all baselines. For our method,  max=4.0, =0.9, and =10, unless specified otherwise.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Qualitative analysis</title>
        <p>The efectiveness of SP-Guard is demonstrated in Fig. 4. SP-Guard consistently generates safe images
where SD fails. For example, it adds clothing in prompts involving nudity and replaces excessive
blood in violent prompts with benign red elements. Unlike SLD, which applies guidance inconsistently,
SP-Guard achieves reliable and prompt-adaptive safety through proxy-based guidance. Moreover,
SP-Guard efectively confines guidance to areas identified as unsafe.
3.3. Quantitative results &amp; Analysis
Evaluation results of safe image generation and content
preservation across all datasets are shown in Table 1.</p>
        <p>As shown in the table, SP-Guard achieves safety
perftoorpminanfecreencocme-ptiamraebgleuitdoinSLgDm- metahxo,drsa.nHkionwg eavmeor,nigt sthige- 0.5 LPIPS 0.4 0.3
nificantly outperforms SLD-max in image preservation. Fmigeunrtea5n:dTcroandtee-noftbpertewseerevnatsiaofne.tyPoiimntsprfuorvteh-er
Notably, the LPIPS values of SP-Guard are comparable to the upper right indicate safer image generation
to baselines that exhibit minimal safety gains, highlight- with better content preservation.
ing the efectiveness of our selective masking strategy.</p>
        <p>We further evaluate the image quality using FID and CLIP scores on COCO-30k and DrawBench. As
shown in the right-most columns of Table 1, SP-Guard achieves FID and CLIP scores closer to those of
the original SD, while maintaining superior or comparable safe generation performance to SLD-max.
Notably, SP-Guard outperforms the other baselines, except SLD-weak, which shows considerably lower
performance in safety. To illustrate the trade-of between safety and content preservation, Fig. 5 shows
the results for the top-performing methods: SLD-strong, SLD-max, and SP-Guard. The y-axis represents
the average relative improvement over SD in unsafe detection rates, while the reversed x-axis shows
the LPIPS, indicating perceptual similarity to images generated by SD. Points closer to the upper right
indicate a better trade-of between safety and content preservation. SP-Guard consistently achieves
Methods (Colors)</p>
        <p>SLD-strong
SLD-max</p>
        <p>SP-Guard
Datasets (Shapes)</p>
        <p>I2P
Ring-A-Bell
MMA-Diffusion
UnlearnDiff
lower LPIPS scores than SLD-max and SLD-strong (except against SLD-strong on Ring-A-Bell), showing
that the generated images by SP-Guard remain closer to the original SD outputs while ensuring safety.</p>
        <p>We vary the maximum guidance scale  max from 2.0 to 6.0 in increments of 0.5, and also evaluate a
cosine-based schedule as an alternative to the step function. Fig. 6 shows the results in the same format
as the trade-of plot. SP-Guard consistently aligns with the Pareto front, showing robust performance
across diferent  max values and scheduling strategies.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion &amp; Conclusion</title>
      <p>This work highlights the importance of accurately estimating the potential harmfulness of generated
content. Moreover, as SP-Guard is an inference-time guiding approach, it allows flexible modification or
addition of unsafe concepts without retraining. Such adaptability enables rapid alignment with evolving
social norms and regulations [37], making the method practical for real-world moderation pipelines
and dynamic regulatory environments. Moreover, since our method relies on the general mechanism of
guidance and the similarity between the intended semantics and harmful concepts, the framework can
be naturally extended to other modalities such as video or speech generation. Beyond improving safety,
our work strengthens the trustworthiness of generative AI systems in two ways. First, by diagnosing the
failure modes of prior approaches, we emphasize the importance of carefully designing both the guidance
mechanism and the masking process. Second, by estimating the prompt-specific potential harmfulness,
SP-Guard ofers transparency and controllability: users and deployers can see when and why safety
interventions are applied. These features enhance trustworthiness rather than merely increasing safety.
However, operating at inference time introduces some slowdown compared to standard SD. This could
be mitigated by integrating recent advances in accelerating difusion models [ 38, 39]. Finally, although
SP-Guard reduces hyperparameter complexity compared to SLD, it still requires tuning values such as
 () and . A promising future direction is to dynamically adjust  () based on the guidance signal
at each timestep. Despite these limitations, SP-Guard provides a lightweight, adaptable, and selective
inference-time approach for safer text-to-image generation. Experiments on four unsafe-related datasets
demonstrate significant improvements in safe generation with strong content preservation, while results
on two benign datasets confirm its ability to maintain high fidelity. Looking ahead, we believe SP-Guard
can be further enhanced and integrated with advancements in difusion models, paving the way toward
safe, responsible, and trustworthy AI.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was supported in part by the National Research Foundation of Korea (NRF) grant
[No.2021R1A2C2007884] and by Institute of Information &amp; communications Technology Planning
&amp; Evaluation (IITP) grants [RS-2021-II211343, RS-2021-II212068, RS-2022-II220113, RS-2022-II220959]
funded by the Korean government (MSIT).</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author used Grammarly in order to check grammar and spelling.
After using this tool, the author reviewed and edited the content as needed and takes full responsibility
for the publication’s content.
[21] Y. Balaji, S. Nah, X. Huang, A. Vahdat, J. Song, Q. Zhang, K. Kreis, M. Aittala, T. Aila, S. Laine,
et al., edif-i: Text-to-image difusion models with an ensemble of expert denoisers, arXiv preprint
arXiv:2211.01324 (2022).
[22] A. Heng, H. Soh, Selective amnesia: A continual learning approach to forgetting in deep generative
models, Advances in Neural Information Processing Systems 36 (2023) 17170–17194.
[23] S. Kim, S. Jung, B. Kim, M. Choi, J. Shin, J. Lee, Safeguard text-to-image difusion models with
human feedback inversion, arXiv preprint arXiv:2407.21032 (2024).
[24] S. Lu, Z. Wang, L. Li, Y. Liu, A. W.-K. Kong, Mace: Mass concept erasure in difusion models, in:
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp.
6430–6440.
[25] X. Li, Y. Yang, J. Deng, C. Yan, Y. Chen, X. Ji, W. Xu, Safegen: Mitigating unsafe content generation
in text-to-image models, arXiv e-prints (2024) arXiv–2404.
[26] D. Chen, Z. Li, M. Fan, C. Chen, W. Zhou, Y. Li, Eiup: A training-free approach to erase
noncompliant concepts conditioned on implicit unsafe prompts, arXiv preprint arXiv:2408.01014
(2024).
[27] C. Gong, K. Chen, Z. Wei, J. Chen, Y.-G. Jiang, Reliable and eficient concept erasure of
textto-image difusion models, in: European Conference on Computer Vision, Springer, 2024, pp.
73–88.
[28] J. Yoon, S. Yu, V. Patil, H. Yao, M. Bansal, Safree: Training-free and adaptive guard for safe
text-to-image and video generation, arXiv preprint arXiv:2410.12761 (2024).
[29] Y. Zhang, J. Jia, X. Chen, A. Chen, Y. Zhang, J. Liu, K. Ding, S. Liu, To generate or not?
safetydriven unlearned difusion models are still easy to generate unsafe images... for now, in: European
Conference on Computer Vision, 2024.
[30] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick, Microsoft
coco: Common objects in context, in: Computer Vision–ECCV 2014: 13th European Conference,
Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, Springer, 2014, pp. 740–755.
[31] Y. Qu, X. Shen, X. He, M. Backes, S. Zannettou, Y. Zhang, Unsafe difusion: On the generation of
unsafe images and hateful memes from text-to-image models, in: Proceedings of the 2023 ACM
SIGSAC Conference on Computer and Communications Security, 2023, pp. 3403–3417.
[32] P. Schramowski, C. Tauchmann, K. Kersting, Can machines help us answering question 16 in
datasheets, and in turn reflecting on inappropriate content?, in: Proceedings of the 2022 ACM
Conference on Fairness, Accountability, and Transparency, 2022, pp. 1350–1361.
[33] P. Bedapudi, Nudenet: Neural nets for nudity classification, detection and selective censoring,
2019.
[34] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, O. Wang, The unreasonable efectiveness of deep
features as a perceptual metric, in: CVPR, 2018.
[35] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin,
J. Clark, et al., Learning transferable visual models from natural language supervision, in:
International conference on machine learning, PMLR, 2021, pp. 8748–8763.
[36] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, Gans trained by a two time-scale
update rule converge to a local nash equilibrium, Advances in neural information processing
systems 30 (2017).
[37] I. Solaiman, Z. Talat, W. Agnew, L. Ahmad, D. Baker, S. L. Blodgett, C. Chen, H. Daumé III, J. Dodge,
I. Duan, et al., Evaluating the social impact of generative ai systems in systems and society, arXiv
preprint arXiv:2306.05949 (2023).
[38] A. Habibian, A. Ghodrati, N. Fathima, G. Sautiere, R. Garrepalli, F. Porikli, J. Petersen, Clockwork
difusion: Eficient generation with model-step distillation, in: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, 2024, pp. 8352–8361.
[39] Y.-H. Chen, R. Sarokin, J. Lee, J. Tang, C.-L. Chang, A. Kulik, M. Grundmann, Speed is all you need:
On-device acceleration of large difusion models via gpu-aware optimizations, in: Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4651–4655.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rombach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Blattmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lorenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Esser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ommer</surname>
          </string-name>
          ,
          <article-title>High-resolution image synthesis with latent difusion models</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>10684</fpage>
          -
          <lpage>10695</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Saharia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saxena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Whang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Denton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ghasemipour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Gontijo</given-names>
            <surname>Lopes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Karagol</given-names>
            <surname>Ayan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Salimans</surname>
          </string-name>
          , et al.,
          <article-title>Photorealistic text-to-image difusion models with deep language understanding</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>36479</fpage>
          -
          <lpage>36494</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Schuhmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Beaumont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vencu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gordon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wightman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cherti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Coombes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Katta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mullis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wortsman</surname>
          </string-name>
          , et al.,
          <article-title>Laion-5b: An open large-scale dataset for training next generation image-text models</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>25278</fpage>
          -
          <lpage>25294</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Birhane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. U.</given-names>
            <surname>Prabhu</surname>
          </string-name>
          , E. Kahembwe,
          <article-title>Multimodal datasets: misogyny, pornography, and malignant stereotypes</article-title>
          ,
          <source>arXiv preprint arXiv:2110</source>
          .
          <year>01963</year>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.-L.</given-names>
            <surname>Tsai</surname>
          </string-name>
          , C.-Y. Hsu,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-H. Lin</surname>
            ,
            <given-names>J.-Y.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>P.-Y.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-M. Yu</surname>
          </string-name>
          , C.-Y. Huang,
          <article-title>Ring-a-bell! how reliable are concept removal methods for difusion models?</article-title>
          ,
          <source>arXiv preprint arXiv:2310.10012</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          , T.-Y. Ho,
          <string-name>
            <given-names>N.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Mma-difusion: Multimodal attack on difusion models</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>7737</fpage>
          -
          <lpage>7746</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Guardt2i: Defending text-to-image models from adversarial prompts</article-title>
          ,
          <source>arXiv preprint arXiv:2403.01446</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khakzar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Torr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pizzati</surname>
          </string-name>
          ,
          <article-title>Latent guard: a safety framework for text-to-image generation</article-title>
          ,
          <source>arXiv preprint arXiv:2404.08031</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Backes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zannettou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Unsafebench:
          <article-title>Benchmarking image safety classifiers on real-world and ai-generated images</article-title>
          ,
          <source>arXiv preprint arXiv:2405.03486</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Schramowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Deiseroth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kersting</surname>
          </string-name>
          ,
          <article-title>Safe latent difusion: Mitigating inappropriate degeneration in difusion models</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>22522</fpage>
          -
          <lpage>22531</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Gandikota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Materzynska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fiotto-Kaufman</surname>
          </string-name>
          , D. Bau,
          <article-title>Erasing concepts from difusion models</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF International Conference on Computer Vision</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>2426</fpage>
          -
          <lpage>2436</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Gandikota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Orgad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Belinkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Materzyńska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bau</surname>
          </string-name>
          ,
          <article-title>Unified concept editing in difusion models</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>5111</fpage>
          -
          <lpage>5120</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Wang,
          <article-title>Universal prompt optimizer for safe text-to-image generation</article-title>
          ,
          <source>arXiv preprint arXiv:2402.10882</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ermon</surname>
          </string-name>
          ,
          <article-title>Generative modeling by estimating gradients of the data distribution</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ho</surname>
          </string-name>
          , T. Salimans,
          <article-title>Classifier-free difusion guidance</article-title>
          ,
          <source>arXiv preprint arXiv:2207.12598</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Brack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hintersdorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Struppek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schramowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kersting</surname>
          </string-name>
          , Sega:
          <article-title>Instructing text-to-image models using semantic guidance</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>36</volume>
          (
          <year>2023</year>
          )
          <fpage>25365</fpage>
          -
          <lpage>25389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>R.</given-names>
            <surname>Chavhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hospedales</surname>
          </string-name>
          , Conceptprune:
          <article-title>Concept editing in difusion models via skilled neuron pruning</article-title>
          ,
          <source>arXiv preprint arXiv:2405.19237</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dalva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Yanardag</surname>
          </string-name>
          ,
          <article-title>Noiseclr: A contrastive learning approach for unsupervised discovery of interpretable directions in difusion models</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>24209</fpage>
          -
          <lpage>24218</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Brack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kornmeier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tsaban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schramowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kersting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          , Ledits++:
          <article-title>Limitless image editing using text-to-image models</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>8861</fpage>
          -
          <lpage>8870</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Towards understanding the working mechanism of text-to-image difusion model</article-title>
          ,
          <source>arXiv preprint arXiv:2405.15330</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>