<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>METR: Image Watermarking with Large Number of Unique Messages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexander Varlamov</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daria Diatlova</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Egor Spirin</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>VK Lab</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>deepvk</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Improvements in difusion models have boosted the quality of image generation, which has led researchers, companies, and creators to focus on improving watermarking algorithms. This provision would make it possible to clearly identify the creators of generative art. The main challenges that modern watermarking algorithms face have to do with their ability to withstand attacks and encrypt many unique messages, such as user IDs. In this paper, we present METR: Message Enhanced Tree-Ring, which is an approach that aims to address these challenges. METR is built on the Tree-Ring watermarking algorithm, a technique that makes it possible to encode multiple distinct messages without compromising attack resilience or image quality. This ensures the suitability of this watermarking algorithm for any Difusion Model. In order to surpass the limitations on the quantity of encoded messages, we propose METR++, an enhanced version of METR. This approach, while limited to the Latent Difusion Model architecture, is designed to inject a virtually unlimited number of unique messages. We demonstrate its robustness to attacks and ability to encrypt many unique messages while preserving image quality, which makes METR and METR++ hold great potential for practical applications in real-world settings. Our code is available at https://github.com/deepvk/metr.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;generative models</kwd>
        <kwd>difusion</kwd>
        <kwd>image watermarking</kwd>
        <kwd>watermark robustness</kwd>
        <kwd>message encryption</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Nowadays, image generation is one of the major applications of computer vision technology. It is used
in various spheres, including entertainment [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], medicine [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], security [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and retail [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Recent
advances in deep learning [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], such as Variational Autoencoders (VAE) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Generative Adversarial
Networks (GAN) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and Difusion models [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], have allowed it to become a rapidly growing area of
research. The latest achievements in text-to-image models, namely DALL-E 2 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Kandinsky [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and
Stable Difusion [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], facilitate the generation of highly realistic images based on specific prompts.
      </p>
      <p>
        With the growing popularity of image generation, the risks of it being used inappropriately or
maliciously are also increasing. These risks include policy [
        <xref ref-type="bibr" rid="ref1 ref12">1, 12</xref>
        ] and privacy violations [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the
generation of fake news [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], document fraud [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ], and the creation of harmful content [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. To help
alleviate these risks, it is necessary to develop a mechanism to detect whether an image is generated or
not. One possible solution is to label generated images with special messages, watermarks [
        <xref ref-type="bibr" rid="ref18 ref19 ref20">18, 19, 20</xref>
        ].
To ensure the suitability of this solution for practical applications, it is essential that these watermarks
remain invisible [
        <xref ref-type="bibr" rid="ref20 ref21">21, 20</xref>
        ]. Additionally, the watermarks should be robust to a range of attacks [
        <xref ref-type="bibr" rid="ref21 ref22">21, 22</xref>
        ],
ensuring that perturbations to the image cannot remove an encoded message.
      </p>
      <p>
        Watermarking has been a widely used technique for image content protection long before the
emergence of generative models. The first works on the subject introduced algorithms that make
it possible to add watermarks to existing images [
        <xref ref-type="bibr" rid="ref18 ref19 ref23">18, 19, 23</xref>
        ]. These methods can be used to apply
watermarks to generated images as well. However, this approach has several drawbacks. For example,
the watermarks are not necessarily completely invisible to a human’s eye [
        <xref ref-type="bibr" rid="ref20 ref24">20, 24</xref>
        ], and the latest
algorithms [
        <xref ref-type="bibr" rid="ref25">25, 26, 27</xref>
        ] require training of additional model. Moreover, if the watermarking step is not
built into the image generation process, then those with access to the generative model could generate
images without watermarks.
      </p>
      <p>X T</p>
      <p>STF
S</p>
      <p>FIT
eMsag: 01 . 1</p>
      <p>mw
X T
WM adiusR
aSmpling
mopt”“Pr</p>
      <p>IDM</p>
      <p>EMTR Image
PAI user ID: 10.
servIn
IDM</p>
      <p>X T</p>
      <p>I</p>
      <p>TF
les
fI</p>
      <p>ON
MW</p>
      <p>
        A diferent group of algorithms was proposed, which add watermarks to images during the generation
process [
        <xref ref-type="bibr" rid="ref24">28, 24, 29</xref>
        ]. These approaches successfully address the challenges of previous algorithms. Newer
watermarking methods require either tuning of the model’s weights [
        <xref ref-type="bibr" rid="ref24">28, 24, 29</xref>
        ], or adjustments to
some latent image representation (e.g., initial noise), that is used in the generation process [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Both
approaches make it possible to create “truly invisible” watermarks [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Moreover, the second one is
applicable to any difusion architecture with a limitation of only sampling strategy ( e.g., DDIM [30]),
and it also does not require additional model training. The only disadvantage of the second approach
for watermarking images in a generation process can occur when someone gains access to the model’s
weights. In this scenario, the algorithm responsible for watermark embedding could be removed from
the inference pipeline. But this drawback is fixed for the first approach, since the ability to watermark
images is contained within the tuned weights of a model. In a recent work by Wen et. al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], the
authors present Tree-Ring, a watermarking algorithm for difusion models that utilizes initial noise
as a latent representation of the image. A watermark is injected as a subtle modification of the initial
noise used during difusion model sampling. This approach shows high robustness to any white-box
attacks [
        <xref ref-type="bibr" rid="ref21 ref22">21, 22</xref>
        ], i.e., attacks only on the output of the model, as the model’s weights stay inaccessible.
      </p>
      <p>
        In addition to being able to determine whether an image is generated or not, it is also important to
ifnd out the author of the generated image. This can be done by incorporating watermarks that contain
specific information, such as user IDs. Despite the high reliability and imperceptibility of Tree-Ring
watermarks, the algorithm cannot encrypt messages within the watermark, limiting its practical use for
certain real-world scenarios. One of the existing algorithms capable of encrypting messages, Stable
Signature [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], requires training a separate model for each user, as it can only manage one unique
message per model. This is also not ideal for practical use in real-world applications.
      </p>
      <p>
        In this paper, we propose METR, a watermarking algorithm based on Tree-Ring watermarking [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]
of difusion models [
        <xref ref-type="bibr" rid="ref9">9, 30</xref>
        ]. This approach is able to handle many unique messages and is robust to
white-box attacks [
        <xref ref-type="bibr" rid="ref21 ref22">21, 22</xref>
        ] without a loss in image quality. Similar to Tree-Ring, METR utilizes a
modification of initial noise distribution without changing model architecture. Therefore, it does not
require any additional training and can be easily transferred to another model. In addition, we suggest
a simple modification of
      </p>
      <p>
        METR, METR++. It combines METR with Stable Signature [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and extends
the amount of unique messages encrypted to as many as one might need. Our contributions can be
summarized as follows:
• METR (Message Enhanced Tree-Ring) – a new watermarking algorithm that is capable of encoding
a large number of unique messages into a Tree-Ring watermark without noticeable image quality
degradation or a decrease in watermark robustness to attacks.
• We introduce an algorithm to select an optimal value of the hyperparameter  for the METR
watermark based on the “detection resolution” metric, which captures the diference in detection
distances between an image with and without a watermark.
• METR++ – an extended version of the METR watermarking algorithm, combines METR with
the Stable Signature algorithm and allows the encryption of an even larger number of unique
messages compared to METR.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <sec id="sec-2-1">
        <title>2.1. Difusion Process</title>
        <p>x =
√¯x0 + √</p>
        <p>1
distribution (x|x− 1).</p>
        <p>
          Difusion models [
          <xref ref-type="bibr" rid="ref9">9, 30</xref>
          ] are generative models employed to approximate a data distribution (x0) using
a parametrized form,  (x0), which is expressed via latent variables x1, ..., x . The parameters  are
optimized to maximize the evidence lower bound (ELBO).
        </p>
        <p>The forward difusion process involves the gradual addition of noise to the initial data point x0:
− ¯ , where  ∼  (0, I), ¯ are values that parametrize variance at step  for
D (x) =</p>
        <p>√
x −
1
− ¯ ·   (x, )
√¯</p>
        <p>
          A point from the initial distribution can be approximated by employing a denoising process. While
reverse difusion was characterized as probabilistic in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], a deterministic sampling method with an
equivalent evidence lower bound (ELBO) was introduced in [30], known as DDIM sampling: x′0() :=
, where   is a trained model that predicts the noise added to x0 to
√
x′ ≈
        </p>
        <p>x .</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Tree-Ring Watermark</title>
        <p>obtain x. Meanwhile, x′() represents an estimate of x0 derived from the denoising of x.</p>
        <p>0</p>
        <p>
          The determinism of DDIM sampling [30] can be leveraged to trace back the initial noise from which
the image was generated. This process is called “inverse difusion” [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. We can obtain an estimate
of the initial noise, x′ , based on the assumption that: x+1 −
x ≈
x −
x− 1. Each step of DDIM
1
inversion mirrors a step of the forward process, but utilizes “trained noise”: x+1 = √¯+1x′() +
0
− ¯+1  (x, ).  steps of DDIM Inversion estimate the initial noise of an image: D(x0) =

The Tree-Ring watermarking method, proposed in [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], employs the DDIM inversion technique [30]. In
this method, the watermark is encoded as concentric circles or squares within the Fourier space of the
initial noise: ℱ (x ). This encoding results in a modified version of the initial noise, denoted as
Subsequently, DDIM sampling is applied to produce a watermarked image from the noise that carries
the embedded fingerprint: xwm = D (xw m). For watermark detection, the DDIM Inversion process is
0
used to estimate the initial noise and identify the watermark within the Fourier representation of the
xw m.
approximated initial noise of an image: WM′ = ℱ (D(x0wm)).
form. Since ℱ [− 2 ]() ∼
        </p>
        <p>
          The Tree-Ring watermark, as presented in [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] transforms the distribution of x into a non-Gaussian
−  22/, we can determine whether a watermark is present or absent
on the image by assessing if the distribution of  = ℱ (x′ ), where x′ is predicted initial noise in
the inverse DDIM process, deviates from normality. Non-normality can be assessed by performing
a test on the null hypothesis ℋ0:  = ℱ (x′ ) ∼  (0,  2).  can be estimated for every input:
1 ∑︀|=1| ||2, where  is a watermarked area of an image. To calculate p-value for ℋ0, we can
∑︀|=1| |WM − |2, where WM denotes encrypted watermark values on the area  .
 ≈
        </p>
        <p>define () =
1
 2
We then apply the following equation:
 = P( 2
||, &lt; |ℋ0) =  2 (),
(1)
where  2</p>
        <p>||, denotes a non-central chi-squared random variable [31] with  =
 2 representing its cumulative distribution function [32]. Large p-values indicate the presence of a
1
 2
∑︀|=1| |WM|2, and
watermark in the image, while small p-values indicate its absence.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Latent Difusion and Stable Signature</title>
        <p>
          The Latent Difusion Model concept, proposed in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], involves implementing the difusion process
within a latent space, which substantially enhances the quality of image generation. A key architectural
update is the incorporation of a Variational Autoencoder (VAE) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] model, which converts the original
image into a compact, low-dimensional representation for use in the subsequent difusion process.
During inference, noise is sampled and then transformed into a latent representation using a trained
difusion model. This representation is finally decoded back into the end image through the decoder
component of the VAE.
        </p>
        <p>
          Drawing on the Latent Difusion Model concept that leverages VAE, the Stable Signature watermarking
method was proposed in [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. Stable Signature enables the encryption of binary messages into images
during their generation. To embed a Stable Signature key into an image, the decoder weights of the
Variational Autoencoder (VAE) in the latent difusion model are fine-tuned. This fine-tuning incorporates
a loss function that includes an extra term for message detection:  = message +  image. Message
detection is carried out by the pre-trained network  , as detailed in [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. During the inference process,
Stable Signature relies solely on the  network to detect and retrieve the watermark. For further
information on the Fine-Tuning and Extraction procedures, refer to Figure 2.
        </p>
        <p>The inference process for Stable Signature is simple, yet the watermarking algorithm requires a unique
Variational Autoencoder (VAE) decoder to be trained for each specific message. In real-world scenarios,
especially for services with millions of users where messages are often user IDs, the implementation of
Stable Signature is not feasible.</p>
        <p>iFne
rtue</p>
        <p>soL
nelat</p>
        <p>D
2</p>
        <p>W
iande-trp
uopgr</p>
        <p>W</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Watermark Attacks</title>
        <p>
          To assess the robustness of watermarking
algorithms, it is common practice to not only
evaluate detection accuracy metrics, but also to
measure the resilience of watermark detection against
attacks [
          <xref ref-type="bibr" rid="ref20 ref24 ref25">25, 20, 24</xref>
          ]. In the field of image
watermarking, there are standard white-box attacks, as
detailed in [
          <xref ref-type="bibr" rid="ref21 ref21 ref22">33, 21, 22, 21</xref>
          ]. These attacks involve
transformations of the generated image to verify
the robustness of the proposed method, and include
operations such as rotation, JPEG compression, and
cropping, followed by scaling, Gaussian blur,
Gausdifusion model [
          <xref ref-type="bibr" rid="ref9">9, 30</xref>
          ] attack described in [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], as well as the VAE [
          <xref ref-type="bibr" rid="ref7">7, 34</xref>
          ] attack detailed in [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. The
VAE attack involves embedding an image into the latent space of a Variational Autoencoder (VAE) and
then reconstructing it. The difusion model attack works by denoising injected Gaussian noise on the
generated image with the aim of altering the image to erase the watermark.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <sec id="sec-3-1">
        <title>In this section, we first provide a detailed description of</title>
      </sec>
      <sec id="sec-3-2">
        <title>METR, the Message Enhanced Tree-Ring [20] algorithm and its extension METR++. We then present an algorithm designed to select the best hyperparameters for optimal METR performance in any use case.</title>
        <p>
          3.1. METR
Figure 1 presents the pipelines for watermark generation and detection using METR. Similar to
TreeRing [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], METR operates with the noise or latent noise of an image. The watermarking procedure
modifies this noise through the corresponding Fourier [ 35] space. The resulting image can be generated
using any difusion model with the DDIM [30] sampling algorithm.
        </p>
        <p>In METR, messages consist of binary sequences encoded using concentric circles with radii increasing
from 1 to , where  is defined as the watermark radius. This radius is a fixed hyperparameter
representing the number of bits in the message. Therefore, it is possible to encode 2 messages. Each bit
of the message is represented by a single circle, where the ones are assigned the value of  and the zeros
are given the value of − .  is another crucial parameter that we call the message scaler. Figure 3
illustrates examples of concentric circles with both encrypted and decrypted messages. Algorithm 1
details the pseudocode for generating an image with message encryption using the METR algorithm.</p>
        <p>Algorithm 1: METR image generation</p>
        <p>Input: Scaler , radius , binary message</p>
        <p>(e.g., user ID)</p>
        <p>Output: Generated image
1 x ∼  (0, );
2 xℱ ← Fourier(x );
3 for  = 1 to  do
4 mask ← circle of radius ;
5 if [] = 1 then
6 xℱ [mask] ← ;
7 else
8 xℱ [mask] ← − ;
9 end
10 end
11 xwm ← Inverse Fourier(xℱ );
12 Image ← DDIM Sampling(xwm);
13 return Image</p>
      </sec>
      <sec id="sec-3-3">
        <title>Algorithm 2: METR message detection</title>
        <p>Input: Image, radius , 0</p>
        <p>Output: Predicted watermark
1 x′ ← DDIM Inversion(Image);
2 x′ℱ ← Fourier(x′ );
3 if  2 ((x′ℱ )) &lt; 0 // see Equation 1
then</p>
        <p>return NO WM
4
5 end
6  ← [];
7 for  = 1 to  do
8 mask ← circle of radius ;
9 if x′ℱ [mask].() &gt; 0 then
10 .(1);
11 else
12 .(0);
13 end
14 end
15 return</p>
        <p>The message can be decoded by reverting the image to its noise using DDIM Inversion, followed by a
transformation of the inverted image into the Fourier space. Then, we determine whether the image
was watermarked or not by evaluating the p-value using Equation 1 and comparing it with a previously
defined threshold 0.</p>
        <p>The decryption process for a binary message requires determining the sign of the value for each
circle, obtained by averaging all values across the circle.</p>
        <p>The corresponding pseudocode for message decryption is shown in Algorithm 2.
(b)  = 60
(c)  = 100
(d)  = 140</p>
        <sec id="sec-3-3-1">
          <title>3.2. Detection Resolution Metric</title>
          <p>In this section we describe how to select watermark radius  and message scaler , which are the
parameters of METR watermarking algorithm introduced in Section 3.1, and propose the metric
Detection Resolution Metric for selecting the scale parameter.</p>
          <p>The increase of the radius parameter leads to the increase of the number of the potential messages,
and the enlarging scale makes the message more detectable. However, increasing each of the parameter
also leads to the production of corrupted images marked by various visible artifacts. See Figure 4 and
1
and restored watermark: det(x0) = ||</p>
          <p>∑︀|=1| |WM − Detect(x0)| here  is the watermarked area,
WM is a true watermark, x0 is the original image with a possible watermark and “Detect” function is
Fourier transform of DDIM Inversion of its argument.</p>
          <p>uopgr
taioGenr</p>
          <p>EAV
iwth
uTFine2 noise</p>
          <p>X T
X T</p>
          <p>X T
EMTR
EMTR
X T</p>
          <p>IDM
IDM
a watermark x0 and an image with one x*0:
det(x0, x*0) = (x0) − (x*0)
(2)</p>
          <p>A binary message  is considered accurately detected if the inverse process error does not negate the
value of . This means that  − (x*0) &lt; 0 should hold for watermarked images, and  − (x0) &gt; 0
for non-watermarked ones. Furthermore, it is essential that the detection resolution metric, which
is the error contrast between watermarked and non-watermarked images, is a relatively large value,
compared to . To specify this, we computed  = det/ for situations with a perfect detection of
message, and it occurred that  =  + , where  and  are linear fit parameters. It means that to
prevent detection collisions, we need  ≥  +  or det/(2 + ) ≥ 1. During early experiments,
we found out that the most reliable  and  are − 2.23 · 10− 3 and 0.653 respectively.</p>
          <p>Since the error of the DDIM Inversion [30] process with a fixed model remains relatively constant
with respect to the chosen image, we can assess the detection accuracy for a given value of  using just
a single pair of (x0, x*0). This way, we bring the selection of  down to defining the possible range of
its value and selecting the best one based on a subset of images. The algorithm consists of two steps.
Initially, it measures the detection resolution using a generated image with and without watermark
from the same noise. Then, if the criterion is satisfied, it assesses the image quality across the full test
dataset. See Algorithm 3 for details.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>Algorithm 3: Parameter  search</title>
        <p>Input: latent noise x ∼  (0, ), test dataset D = ⟨prompt, xtrue⟩, image quality threshold Θ,
, , radius</p>
        <p>Output: best message scaller 
1 xℱ ← Fourier(x′ );
2 min, max ← min(xℱ ), max(xℱ );
3 for  = min to  = max; step do
4 x0 ← Generate(x );
5 x*0 ← Generate(METR(x , , ));
6 0, *0 ← (0), (*0);</p>
        <p>det(2+0,*0) ;
Det_Res ←
if 0 &gt;  and *0 &lt;  and Det_Res ≥ 1 then
test ← Generate from prompts in  with WM;
 ← Eval(test, ) // i.e., FID;
if  &lt; Θ then</p>
        <p>
          Return S
end
end
7
8
9
10
11
12
13
14
15 end
3.3. METR++
Although METR already provides support for encoding messages into images, the selected radius 
still restricts their number. To overcome this, we developed METR++, an extension of the original
METR algorithm. METR++ is designed to significantly increase the original algorithm’s potential for
encoding various unique messages, while being limited in terms of models that it can be applied to. The
extended algorithm can only be used with Latent Difusion [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], which is currently the most commonly
used image generation architecture [
          <xref ref-type="bibr" rid="ref10 ref11 ref2">11, 2, 10</xref>
          ]. METR++ incorporates two watermarks into an image.
As demonstrated in Figure 6, it adopts the METR watermarking algorithm 3.1, augmented by a VAE
decoder derived from the latent difusion model [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], as proposed in Stable Signature [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
        </p>
        <p>To encode multiple messages, they are first categorized into groups with a capacity of 2, a size that
reflects the number of potential unique messages in the METR watermarking algorithm. Then, each
group is assigned a distinct ID, and uses a specifically fine-tuned VAE decoder designed to encode the
group’s ID as a Stable Signature watermark. Consequently, METR++ expands the capacity for encoding
unique messages within METR, multiplying it by the number of specially fine-tuned VAE decoders for
each group.</p>
        <p>The identification process consists of two parts, that can be done in parallel. First is the group’s
unique key decryption via the Stable Signature approach. Second is the METR message decoding, which
identifies a user within a group. Decrypting the METR message requires carrying out the inverse
difusion process on the image latent, that is obtained with a VAE encoder part of LDM, which weights
stay intact.</p>
        <p>To conclude, METR++ can be adapted to any Latent Difusion model and can encrypt approximately
2 ·  unique messages, where  is METR’s watermark radius, and  is the number of fine-tuned VAE
decoders.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>In this section, we describe our experimental setup and present the results of experiments conducted
using METR and METR++.</p>
      <sec id="sec-4-1">
        <title>4.1. Experimental Setup</title>
        <p>
          In all experiments, we utilize the base version of Stable Difusion 2.1 [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and limit the generation
process to 40 steps of DDIM [30] sampling. The inverse difusion process used for METR message
detection was run with no prompt for the same number of steps as the generation process. The guidance
scale was set to 7.5.
        </p>
        <p>
          For the baseline, we selected the Tree-Ring [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] and Stable Signature [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] watermarking algorithms
for several reasons. In this paper, we aim to present a robust watermarking method capable of encrypting
multiple messages. Tree-Ring [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] is considered a state-of-the-art algorithm when it comes to robustness
to various attacks [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. To our knowledge, Stable Signature is the only method capable of encrypting
messages non-post-generation, where watermarking happens during the generation process [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
Finally, METR and METR++ build upon these existing watermarking algorithms.
        </p>
        <p>
          We compare the models based on three main
criteria: the accuracy of watermark detection, the
accuracy of watermarked message decryption, and
image quality, since we aim to ensure that the
encrypted message does not degrade the generated
image. The accuracy of watermark detection is
measured by False Positive Rate (FPR), True Positive
Rate (TPR), Area Under Curve (AUC) of Receiver
Operator Characteristic (ROC), TPR value when Figure 7: METR evaluation on image quality
FPR equals to 1%, denoted as “TPR@1%FPR”. The with diferent radii
quality of message detection is assessed using Bit Accuracy, Word Accuracy (the accuracy of fully
detected messages, as proposed in [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]), and the detection resolution metric described in Section 3.2.
To assess image quality, we use FID (Fréchet Inception Distance) [36] and CLIP score [37]. FID is
calculated by comparing the generated watermarked image to its corresponding ground-truth pair from
the dataset, denoted as FID gt, or to a generated image created with the same prompt but without a
watermark, denoted as FID gen. Comparison with the ground truth image indicates overall quality,
while comparison with the generated image illustrates the impact of watermark encryption on the
generation process. For the CLIP score, we measure the cosine similarity between the embedding of the
watermarked image and the embedding of the reference prompt, following OpenCLIP-ViT/G [38].
        </p>
        <p>We utilized the MSCOCO-5000 dataset [39] to evaluate FID. This dataset consists of 5000 paired
images and prompts. To assess watermark detection accuracy, message detection quality, and image
quality using CLIP, the images were generated using a subset of 1000 randomly selected prompts from
the Stable Difusion prompts [40].</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. METR Evaluation</title>
        <p>In this subsection, we describe a set of experiments that compare METR with Tree-Ring. We focus
on evaluating their robustness to white-box attacks, including ones that utilize generative models, as
described in Section 2.4, watermark detection accuracy and the overall quality of the generated images.</p>
        <p>To select proper METR parameters, we first searched for the optimal message scale  with the
algorithm described in Section 3.2 in range 60 ≤  ≤ 160 with multiple diferent radii. The resulting
optimal value was always between 80 and 100. For the rest of our experiments,  was set to 100 unless
specified otherwise. Regarding message radius , we evaluated image quality for diferent radii on a
subset of the MSCOCO-5000 dataset with 500 images. See Figure 7 for results.</p>
        <p>It is clear that keeping the radius as small as possible benefits the quality of the resulting image.
However, we argue that increasing the radius to 16 can still maintain acceptable overall quality. The
benefit of doing so is that, this way, METR can encode up to 216 = 65536 messages. In our experiments,
we set  = 10, which allowed us to encode 1024 messages with almost no loss in quality.</p>
        <p>
          Detection accuracy under image transformation attacks. The detection metrics for cases with
white-box attacks [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] involving METR and Tree-Ring are presented in Table 1. “B acc” and “W acc”
represent Bit Accuracy and Word Accuracy, respectively. Since Tree-Ring cannot encode messages,
these metrics are only calculated for METR. As demonstrated by the results in Table 1, METR is as
resilient as, or even more resilient than, Tree-Ring against most image transformations. Both algorithms
ifnd rotation and cropping to be the most challenging types of attacks.
        </p>
        <p>
          Detection accuracy under generative models’ attacks. We performed a difusion attack using
the base version of Stable-Difusion 2.1 [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and a VAE attack using the model [34] with a quality
hyperparameter  = 1. The detection metrics for images generated with Tree-Ring and METR
watermark, both with the difusion attack, can be seen in Figure 8. As one can see in the chart, the AUC and
TPR@1%FPR for METR are higher than those for Tree-Ring, and decrease with the number of difusion
steps more slowly. Similar results are shown for the VAE attack in Figure 9, parametrized by . One can
see that Tree-Ring detection metrics are below 1 until  = 4, while METR is robust to this attack and
the watermark is almost always detected correctly. The Tree-Ring algorithm is not capable of message
encryption, and thus word and bit accuracy are shown only for the METR algorithm.
(a) Watermark detection
(b) Watermark decryption
algorithm when compared to Tree-Ring, with less than a 2% diference. However, METR makes it
possible to encode multiple unique messages in a watermark. The accuracy of the decrypted messages
showed high resilience to the majority of white-box attacks, with exceptions being rotation and cropping.
The word accuracy for messages without any attack reached 0.91.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. METR++ Evaluation</title>
        <p>
          In this section, we perform a series of experiments to evaluate the robustness of METR++ against
white-box attacks. We also compare its detection metrics with those of METR and the Stable Signature
watermark [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
        </p>
        <p>
          As previously detailed in Section 3.3, METR++ consists of the METR watermark and a fine-tuned VAE
decoder designed to encrypt a 48-bit Stable Signature [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] message. Therefore, our evaluation begins by
investigating whether the inclusion of the METR watermark in the Fourier space of the latent noise
decreases detection resilience. To assess this, we fine-tune the VAE decoder using images drawn from
standard latent noise and a distribution in which the METR watermark is embedded into the latent
noise space. We decode the messages encoded into the images using both the standard VAE decoder
and the one that was fine-tuned on images with the METR watermark. The results are presented in
Table 4. The term  -METR refers to the VAE decoder that was fine-tuned on images sampled from a
normal distribution and subsequently used to work with images with the METR watermark. It can be
observed that the bit accuracy of the decrypted Stable Signature message in METR++ is not afected by
whether the VAE decoder has been fine-tuned on images sampled from latent noise with the embedded
METR watermark. When it comes to the attacks, the bit accuracy of the decrypted Stable Signature
message remains almost constant, showing a change of less than 1% compared to basic Stable Signature.
        </p>
        <p>The detection accuracy of METR++ consists of the detection accuracies for both the Stable Signature
and METR watermarks. Table 2 presents the results of the detection accuracy evaluation. The term
“METR: ft VAE” refers to the accuracy of detecting the METR watermark when processed through the
METR++ pipeline. “Stable-Signature” refers to the accuracy of decrypting the referenced watermark
using the METR++ pipeline. Lastly, “METR++ ” denotes the overall detection accuracy of the complete
METR++ message, which includes both the METR and Stable Signature messages. The detection
accuracy of the METR watermark remains unafected by the METR++ pipeline, as the image decoded by
the fine-tuned decoder closely resembles the one generated by the original VAE decoder. Consequently,
both the Bit and Word accuracy for METR watermark detection, as previously shown in Table 1,
demonstrate robustness against most white-box attacks, aside from rotation and cropping. On the other
hand, Stable Signature is vulnerable to white-box attacks, which in turn impacts the Word Accuracy of
METR++. In Table 1, we highlighted a higher Word Accuracy between the METR and Stable Signature
watermarks. It is important to note that the robustness of METR++, in terms of word accuracy, is
limited to the lesser robustness of the two watermarks since it requires the correct decryption of both
and in terms of bit accuracy it is highly correlated with Stable Signature part, because it has more bits,
than METR part of METR++.</p>
        <p>In terms of the potential number of messages, we use 58 bits for each message therefore, it is possible
to encode 258 unique messages. When considering the practical application of either METR or METR++,
it is important to evaluate the trade-of between the capability to encode numerous unique messages
and the resilience of the watermarking method against white-box attacks.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Future Work</title>
      <p>In this study, we assessed the detection accuracy of METR and METR++ against several white-box
attacks. METR demonstrated high resilience to most of the attacks, including state-of-the-art ones.
We also propose researching methods to fine-tune the weights of the generative model to enhance the
robustness of the METR watermarking algorithm. When it comes to METR++, this method showed
low robustness to most attacks due to its word accuracy robustness being constrained by the lowest
detection accuracy among the two messages. The Stable Signature watermark message is generally less
robust to attacks than the METR watermark message. Therefore, to increase the encoding capacity
of the METR watermarking algorithm, we suggest exploring alternative watermarking methods that
could replace Stable Signature, or investigating Stable Signature modifications within METR++ that
improve its detection accuracy under attacks.</p>
      <p>Another area of research that needs further exploration is the impact of specific messages on the
quality of the generated images. Our findings indicate that the quality of images generated using an
average METR message is comparable to those produced with the Tree-Ring watermark. However, in
real-world applications, it is essential to examine various combinations of binary values to ascertain
whether certain messages with “extreme values” might significantly afect image quality. This
consideration could be crucial for commercial projects aiming to ensure consistent image generation quality for
all their users.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this paper, we introduced the METR watermarking algorithm, which can be applied to any difusion
model architecture with a non-probabilistic sampler (e.g., DDIM [30]). Building upon the Tree-Ring
watermarking algorithm, METR retains its robustness in detection under white-box attacks and high
image quality. The most significant advantage of METR is its ability to encrypt multiple unique
messages without the need for model’s weights fine-tuning. This positions the proposed watermarking
algorithm as one that can be considered the current leading watermarking algorithm in terms of
robustness, image quality, and message encoding capacity. We also propose an extension of METR,
named METR++, which is specifically tailored to Latent Difusion Models and requires additional
ifne-tuning of a VAE decoder for each new user group. METR++ increases the potential number of
encoded messages by the factor of fine-tuned VAE decoders. Comparing METR with its extension
METR++ shows that the decision of which algorithm to use for practical applications should balance
the total number of messages that can be encoded and the watermark’s overall robustness to attacks.
Implementation of our work can be found at https://github.com/deepvk/metr.</p>
      <p>arXiv:1807.09937.
[26] K. A. Zhang, L. Xu, A. Cuesta-Infante, K. Veeramachaneni, Robust invisible video watermarking
with attention, 2019. arXiv:1909.01285.
[27] M. Tancik, B. Mildenhall, R. Ng, Stegastamp: Invisible hyperlinks in physical photographs, 2020.</p>
      <p>arXiv:1904.05343.
[28] Z. Jiang, J. Zhang, N. Z. Gong, Evading watermark based detection of ai-generated content, 2023.</p>
      <p>arXiv:2305.03807.
[29] Y. Liu, Z. Li, M. Backes, Y. Shen, Y. Zhang, Watermarking difusion model, 2023.</p>
      <p>arXiv:2305.12502.
[30] J. Song, C. Meng, S. Ermon, Denoising difusion implicit models, 2022. arXiv:2010.02502.
[31] P. B. Patnaik, The non-central  2- and  -distribution and their applications, Biometrika 36 (1949)
202–232. URL: https://www.jstor.org/stable/2332542. doi:10.2307/2332542.
[32] P. Glasserman, Monte Carlo Methods in Financial Engineering, volume 53 of Stochastic Modelling
and Applied Probability, Springer, New York, NY, 2003. URL: http://link.springer.com/10.1007/
978-0-387-21617-1. doi:10.1007/978-0-387-21617-1.
[33] S. Peng, Y. Chen, C. Wang, X. Jia, Intellectual property protection of difusion models via the
watermark difusion process, 2023. arXiv:2306.03436.
[34] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, N. Johnston, Variational image compression with a scale
hyperprior, 2018. arXiv:1802.01436.
[35] E. O. Brigham, R. E. Morrow, The fast fourier transform, IEEE Spectrum 4 (1967) 63–70. doi:10.</p>
      <p>1109/MSPEC.1967.5217220.
[36] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, Gans trained by a two time-scale
update rule converge to a local nash equilibrium, 2018. arXiv:1706.08500.
[37] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin,
J. Clark, G. Krueger, I. Sutskever, Learning transferable visual models from natural language
supervision, 2021. arXiv:2103.00020.
[38] M. Cherti, R. Beaumont, R. Wightman, M. Wortsman, G. Ilharco, C. Gordon, C. Schuhmann,
L. Schmidt, J. Jitsev, Reproducible scaling laws for contrastive language-image learning, 2022.
arXiv:2212.07143.
[39] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick, Microsoft
coco: Common objects in context, in: European conference on computer vision, Springer, 2014,
pp. 740–755.
[40] Gustavosta, Stable difusion prompts, 2022. URL: https://huggingface.co/datasets/Gustavosta/
Stable-Difusion-Prompts.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] CivitAI, Civitai,
          <year>2024</year>
          . URL: https://civitai.com/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nichol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Hierarchical text-conditional image generation with clip latents</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2204</volume>
          .
          <fpage>06125</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Khader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mueller-Franzes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Arasteh</surname>
          </string-name>
          , T. Han,
          <string-name>
            <given-names>C.</given-names>
            <surname>Haarburger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schulze-Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Schad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Engelhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Baessler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Foersch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stegmaier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kuhl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nebelung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. N.</given-names>
            <surname>Kather</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Truhn</surname>
          </string-name>
          ,
          <article-title>Medical difusion: Denoising difusion probabilistic models for 3d medical image generation</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2211</volume>
          .
          <fpage>03364</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Somepalli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Singla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goldblum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Geiping</surname>
          </string-name>
          , T. Goldstein,
          <article-title>Difusion art or digital forgery? investigating data replication in difusion models</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2212</volume>
          .
          <fpage>03860</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Choo</surname>
          </string-name>
          ,
          <article-title>High-resolution virtual try-on with misalignment and occlusion-handled conditions</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2206</volume>
          .
          <fpage>14180</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          , Y. Bengio, G. Hinton,
          <article-title>Deep learning</article-title>
          ,
          <source>Nature</source>
          <volume>521</volume>
          (
          <year>2015</year>
          )
          <fpage>436</fpage>
          -
          <lpage>444</lpage>
          . URL: https://doi.org/10. 1038/nature14539. doi:
          <volume>10</volume>
          .1038/nature14539.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Welling</surname>
          </string-name>
          , Auto-encoding variational bayes,
          <year>2022</year>
          . arXiv:
          <volume>1312</volume>
          .
          <fpage>6114</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pouget-Abadie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mirza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Warde-Farley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ozair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          , Generative adversarial networks,
          <year>2014</year>
          . arXiv:
          <volume>1406</volume>
          .
          <fpage>2661</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Abbeel</surname>
          </string-name>
          , Denoising difusion probabilistic models,
          <year>2020</year>
          . arXiv:
          <year>2006</year>
          .11239.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Razzhigaev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shakhmatov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maltseva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Arkhipkin</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Pavlov</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Ryabov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kuts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          ,
          <article-title>Kandinsky: an improved text-to-image synthesis with image prior and latent difusion</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2310</volume>
          .
          <fpage>03502</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rombach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Blattmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lorenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Esser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ommer</surname>
          </string-name>
          ,
          <article-title>High-resolution image synthesis with latent difusion models</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2112</volume>
          .
          <fpage>10752</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Pinterest</surname>
          </string-name>
          , Pinterest,
          <year>2024</year>
          . URL: https://www.pinterest.com/.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Novelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Casolari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hacker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Spedicato</surname>
          </string-name>
          , L. Floridi,
          <article-title>Generative ai in eu law: Liability, privacy, intellectual property</article-title>
          , and cybersecurity,
          <year>2024</year>
          . arXiv:
          <volume>2401</volume>
          .
          <fpage>07348</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bashardoust</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Feuerriegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. R.</given-names>
            <surname>Shrestha</surname>
          </string-name>
          ,
          <article-title>Comparing the willingness to share for humangenerated vs</article-title>
          .
          <source>ai-generated fake news</source>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2402</volume>
          .
          <fpage>07395</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Benalcazar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Tapia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Busch</surname>
          </string-name>
          ,
          <article-title>Synthetic id card image generation for improving presentation attack detection</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2211</volume>
          .
          <fpage>00098</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>N.</given-names>
            <surname>Raman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Veloso</surname>
          </string-name>
          ,
          <article-title>Synthetic document generator for annotation-free layout recognition</article-title>
          ,
          <source>Pattern Recognition</source>
          <volume>128</volume>
          (
          <year>2022</year>
          )
          <article-title>108660</article-title>
          . URL: http://dx.doi.org/10.1016/j.patcog.
          <year>2022</year>
          .
          <volume>108660</volume>
          . doi:
          <volume>10</volume>
          . 1016/j.patcog.
          <year>2022</year>
          .
          <volume>108660</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Backes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zannettou</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Zhang,</surname>
          </string-name>
          <article-title>Unsafe difusion: On the generation of unsafe images and hateful memes from text-to-image models</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>13873</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>I.</given-names>
            <surname>Cox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bloom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fridrich</surname>
          </string-name>
          , T. Kalker, Digital Watermarking and Steganography, 2nd ed., Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>C.-C. Chang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Tsai</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-C. Lin</surname>
          </string-name>
          ,
          <article-title>Svd-based digital image watermarking scheme</article-title>
          ,
          <source>Pattern Recognition Letters</source>
          <volume>26</volume>
          (
          <year>2005</year>
          )
          <fpage>1577</fpage>
          -
          <lpage>1586</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/ S0167865505000140. doi:
          <volume>10</volume>
          .1016/j.patrec.
          <year>2005</year>
          .
          <volume>01</volume>
          .004.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kirchenbauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Geiping</surname>
          </string-name>
          , T. Goldstein,
          <article-title>Tree-ring watermarks: Fingerprints for difusion images that are invisible</article-title>
          and robust,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>20030</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vasan</surname>
          </string-name>
          , I. Grishchenko,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kruegel</surname>
          </string-name>
          , G. Vigna,
          <string-name>
            <given-names>Y.-X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Invisible image watermarks are provably removable using generative ai</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <fpage>2306</fpage>
          .
          <year>01953</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Barni</surname>
          </string-name>
          ,
          <article-title>A survey of deep neural network watermarking techniques</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2103</volume>
          .
          <fpage>09274</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>R. B. Wolfgang</surname>
            ,
            <given-names>E. J.</given-names>
          </string-name>
          <string-name>
            <surname>Delp</surname>
          </string-name>
          ,
          <article-title>A watermark for digital images</article-title>
          ,
          <source>in: Proceedings of 3rd IEEE International Conference on Image Processing</source>
          , volume
          <volume>3</volume>
          , IEEE,
          <year>1996</year>
          , pp.
          <fpage>219</fpage>
          -
          <lpage>222</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>P.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          , G. Couairon,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jégou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Douze</surname>
          </string-name>
          , T. Furon,
          <article-title>The stable signature: Rooting watermarks in latent difusion models</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>15435</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , L.
          <string-name>
            <surname>Fei-Fei</surname>
          </string-name>
          ,
          <article-title>Hidden: Hiding data with deep networks</article-title>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>