METR: Image Watermarking with Large Number of Unique Messages Alexander Varlamov1,2 , Daria Diatlova3 and Egor Spirin2 1 MIPT 2 VK Lab 3 deepvk, VK Abstract Improvements in diffusion models have boosted the quality of image generation, which has led researchers, companies, and creators to focus on improving watermarking algorithms. This provision would make it possible to clearly identify the creators of generative art. The main challenges that modern watermarking algorithms face have to do with their ability to withstand attacks and encrypt many unique messages, such as user IDs. In this paper, we present METR: Message Enhanced Tree-Ring, which is an approach that aims to address these challenges. METR is built on the Tree-Ring watermarking algorithm, a technique that makes it possible to encode multiple distinct messages without compromising attack resilience or image quality. This ensures the suitability of this watermarking algorithm for any Diffusion Model. In order to surpass the limitations on the quantity of encoded messages, we propose METR++, an enhanced version of METR. This approach, while limited to the Latent Diffusion Model architecture, is designed to inject a virtually unlimited number of unique messages. We demonstrate its robustness to attacks and ability to encrypt many unique messages while preserving image quality, which makes METR and METR++ hold great potential for practical applications in real-world settings. Our code is available at https://github.com/deepvk/metr. Keywords generative models, diffusion, image watermarking, watermark robustness, message encryption, 1. Introduction Nowadays, image generation is one of the major applications of computer vision technology. It is used in various spheres, including entertainment [1, 2], medicine [3], security [4], and retail [5]. Recent advances in deep learning [6], such as Variational Autoencoders (VAE) [7], Generative Adversarial Networks (GAN) [8], and Diffusion models [9], have allowed it to become a rapidly growing area of research. The latest achievements in text-to-image models, namely DALL-E 2 [2], Kandinsky [10], and Stable Diffusion [11], facilitate the generation of highly realistic images based on specific prompts. With the growing popularity of image generation, the risks of it being used inappropriately or maliciously are also increasing. These risks include policy [1, 12] and privacy violations [13], the generation of fake news [14], document fraud [15, 16], and the creation of harmful content [17]. To help alleviate these risks, it is necessary to develop a mechanism to detect whether an image is generated or not. One possible solution is to label generated images with special messages, watermarks [18, 19, 20]. To ensure the suitability of this solution for practical applications, it is essential that these watermarks remain invisible [21, 20]. Additionally, the watermarks should be robust to a range of attacks [21, 22], ensuring that perturbations to the image cannot remove an encoded message. Watermarking has been a widely used technique for image content protection long before the emergence of generative models. The first works on the subject introduced algorithms that make it possible to add watermarks to existing images [18, 19, 23]. These methods can be used to apply watermarks to generated images as well. However, this approach has several drawbacks. For example, the watermarks are not necessarily completely invisible to a human’s eye [20, 24], and the latest algorithms [25, 26, 27] require training of additional model. Moreover, if the watermarking step is not built into the image generation process, then those with access to the generative model could generate images without watermarks. CREAI 2024 - Workshop on Artificial Intelligence and Creativity, editors Allegra De Filippo, Francois Pachet, Valentina Presutti, Luc Steels. © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings METR Generation Detection DDIM FFT IFFT Sampling Inverse DDIM FFT If detected METR Message “Prompt” Detection wm XT XT METR Image I XT else -S NO predicted Message: 101 ... 1 WM S message WM Radius API user ID: 10...1 (a) METR Generation process (b) METR Detection process Figure 1: METR watermarking pipeline. Figure (a) outlines the steps to encrypt a binary message into an image via corresponding latent noise. Figure (b) details the process of detecting whether an image contains a watermark and deciphering the encrypted message. A different group of algorithms was proposed, which add watermarks to images during the generation process [28, 24, 29]. These approaches successfully address the challenges of previous algorithms. Newer watermarking methods require either tuning of the model’s weights [28, 24, 29], or adjustments to some latent image representation (e.g., initial noise), that is used in the generation process [20]. Both approaches make it possible to create “truly invisible” watermarks [20]. Moreover, the second one is applicable to any diffusion architecture with a limitation of only sampling strategy (e.g., DDIM [30]), and it also does not require additional model training. The only disadvantage of the second approach for watermarking images in a generation process can occur when someone gains access to the model’s weights. In this scenario, the algorithm responsible for watermark embedding could be removed from the inference pipeline. But this drawback is fixed for the first approach, since the ability to watermark images is contained within the tuned weights of a model. In a recent work by Wen et. al. [20], the authors present Tree-Ring, a watermarking algorithm for diffusion models that utilizes initial noise as a latent representation of the image. A watermark is injected as a subtle modification of the initial noise used during diffusion model sampling. This approach shows high robustness to any white-box attacks [21, 22], i.e., attacks only on the output of the model, as the model’s weights stay inaccessible. In addition to being able to determine whether an image is generated or not, it is also important to find out the author of the generated image. This can be done by incorporating watermarks that contain specific information, such as user IDs. Despite the high reliability and imperceptibility of Tree-Ring watermarks, the algorithm cannot encrypt messages within the watermark, limiting its practical use for certain real-world scenarios. One of the existing algorithms capable of encrypting messages, Stable Signature [24], requires training a separate model for each user, as it can only manage one unique message per model. This is also not ideal for practical use in real-world applications. In this paper, we propose METR, a watermarking algorithm based on Tree-Ring watermarking [20] of diffusion models [9, 30]. This approach is able to handle many unique messages and is robust to white-box attacks [21, 22] without a loss in image quality. Similar to Tree-Ring, METR utilizes a modification of initial noise distribution without changing model architecture. Therefore, it does not require any additional training and can be easily transferred to another model. In addition, we suggest a simple modification of METR, METR++. It combines METR with Stable Signature [24] and extends the amount of unique messages encrypted to as many as one might need. Our contributions can be summarized as follows: • METR (Message Enhanced Tree-Ring) – a new watermarking algorithm that is capable of encoding a large number of unique messages into a Tree-Ring watermark without noticeable image quality degradation or a decrease in watermark robustness to attacks. • We introduce an algorithm to select an optimal value of the hyperparameter 𝑆 for the METR watermark based on the “detection resolution” metric, which captures the difference in detection distances between an image with and without a watermark. • METR++ – an extended version of the METR watermarking algorithm, combines METR with the Stable Signature algorithm and allows the encryption of an even larger number of unique messages compared to METR. 2. Related Works 2.1. Diffusion Process Diffusion models [9, 30] are generative models employed to approximate a data distribution 𝑞(x0 ) using a parametrized form, 𝑝𝜃 (x0 ), which is expressed via latent variables x1 , ..., x𝑇 . The parameters 𝜃 are optimized to maximize the evidence lower bound (ELBO). The forward diffusion √ process involves the gradual addition of noise to the initial data point x0 : √ x𝑡 = 𝛼 ¯ 𝑡 x0 + 1 − 𝛼 ¯ 𝑡 𝜖, where 𝜖 ∼ 𝒩 (0, I), 𝛼 ¯ 𝑡 are values that parametrize variance at step 𝑡 for distribution 𝑞(x𝑡 |x𝑡−1 ). A point from the initial distribution can be approximated by employing a denoising process. While reverse diffusion was characterized as probabilistic in [9], a deterministic sampling method with an ′(𝑡) equivalent evidence√ lower bound (ELBO) was introduced in [30], known as DDIM sampling: x0 := x𝑡 − 1 − 𝛼 ¯ 𝑡 · 𝜖𝜃 (x𝑡 , 𝑡) D𝜃 (x𝑡 ) = √ , where 𝜖𝜃 is a trained model that predicts the noise added to x0 to 𝛼 ¯𝑡 ′(𝑡) obtain x𝑡 . Meanwhile, x0 represents an estimate of x0 derived from the denoising of x𝑡 . The determinism of DDIM sampling [30] can be leveraged to trace back the initial noise from which the image was generated. This process is called “inverse diffusion” [20]. We can obtain an estimate of the initial noise, x′𝑇 , based on the assumption that: x𝑡+1 − x𝑡 ≈ x𝑡 − x𝑡−1 . Each step of DDIM √ ′(𝑡) inversion mirrors a step of the forward process, but utilizes “trained noise”: x𝑡+1 = 𝛼 ¯ 𝑡+1 x0 + √ 1−𝛼¯ 𝑡+1 𝜖𝜃 (x𝑡 , 𝑡). 𝑇 steps of DDIM Inversion estimate the initial noise of an image: D𝜃 (x0 ) =𝑖𝑛𝑣 x𝑇 ≈ x𝑇 . ′ 2.2. Tree-Ring Watermark The Tree-Ring watermarking method, proposed in [20], employs the DDIM inversion technique [30]. In this method, the watermark is encoded as concentric circles or squares within the Fourier space of the initial noise: ℱ(x𝑇 ). This encoding results in a modified version of the initial noise, denoted as xwm 𝑇 . Subsequently, DDIM sampling is applied to produce a watermarked image from the noise that carries the embedded fingerprint: xwm wm 0 = D𝜃 (x𝑇 ). For watermark detection, the DDIM Inversion process is used to estimate the initial noise and identify the watermark within the Fourier representation of the approximated initial noise of an image: WM′ = ℱ(D𝑖𝑛𝑣 wm 𝜃 (x0 )). The Tree-Ring watermark, as presented in [20] transforms the distribution of x𝑇 into a non-Gaussian 2 2 2 form. Since ℱ[𝑒−𝑎𝑥 ](𝑞) ∼ 𝑒−𝜋 𝑞 /𝑎 , we can determine whether a watermark is present or absent on the image by assessing if the distribution of 𝑦 = ℱ(x′𝑇 ), where x′𝑇 is predicted initial noise in the inverse DDIM process, deviates from normality. Non-normality can be assessed by performing a test on the null hypothesis ℋ0 : 𝑦 = ℱ(x′𝑇 ) ∼ 𝒩 (0, 𝜎 2 𝐼). 𝜎 can be estimated for every input: 1 ∑︀|𝑀 | 𝜎≈ |𝑦𝑖 |2 , where 𝑀 is a watermarked area of an image. To calculate p-value for ℋ0 , we can 𝑀 𝑖=1 1 ∑︀|𝑀 | define 𝑧(𝑦) = 2 𝑖=1 |WM𝑖 − 𝑦𝑖 |2 , where WM denotes encrypted watermark values on the area 𝑀 . 𝜎 We then apply the following equation: 𝑝 = P(𝜒2|𝑀 |,𝜆 < 𝑧|ℋ0 ) = 𝐹𝜒2 (𝑧), (1) 1 ∑︀|𝑀 | where 𝜒2|𝑀 |,𝜆 denotes a non-central chi-squared random variable [31] with 𝜆 = 2 𝑖=1 |WM𝑖 |2 , and 𝜎 𝐹𝜒2 representing its cumulative distribution function [32]. Large p-values indicate the presence of a watermark in the image, while small p-values indicate its absence. 2.3. Latent Diffusion and Stable Signature The Latent Diffusion Model concept, proposed in [11], involves implementing the diffusion process within a latent space, which substantially enhances the quality of image generation. A key architectural update is the incorporation of a Variational Autoencoder (VAE) [7] model, which converts the original image into a compact, low-dimensional representation for use in the subsequent diffusion process. During inference, noise is sampled and then transformed into a latent representation using a trained diffusion model. This representation is finally decoded back into the end image through the decoder component of the VAE. Drawing on the Latent Diffusion Model concept that leverages VAE, the Stable Signature watermarking method was proposed in [24]. Stable Signature enables the encryption of binary messages into images during their generation. To embed a Stable Signature key into an image, the decoder weights of the Variational Autoencoder (VAE) in the latent diffusion model are fine-tuned. This fine-tuning incorporates a loss function that includes an extra term for message detection: 𝐿 = 𝐿message + 𝜆𝐿image . Message detection is carried out by the pre-trained network 𝑊 , as detailed in [25]. During the inference process, Stable Signature relies solely on the 𝑊 network to detect and retrieve the watermark. For further information on the Fine-Tuning and Extraction procedures, refer to Figure 2. The inference process for Stable Signature is simple, yet the watermarking algorithm requires a unique Variational Autoencoder (VAE) decoder to be trained for each specific message. In real-world scenarios, especially for services with millions of users where messages are often user IDs, the implementation of Stable Signature is not feasible. Extraction Fine tuning pre-trained WM extractor true ID Loss W W predicted ID Imagewm group ID latent image D2 Figure 2: Stable-Signature [24] scheme. Algorithms for watermark extraction and VAE decoder fine-tuning. 2.4. Watermark Attacks To assess the robustness of watermarking algo- rithms, it is common practice to not only evalu- ate detection accuracy metrics, but also to mea- sure the resilience of watermark detection against attacks [25, 20, 24]. In the field of image water- marking, there are standard white-box attacks, as detailed in [33, 21, 22, 21]. These attacks involve transformations of the generated image to verify (a) encrypted message (b) decrypted message the robustness of the proposed method, and include operations such as rotation, JPEG compression, and Figure 3: Message becomes circles in Fourier cropping, followed by scaling, Gaussian blur, Gaus- space ℱ of the latent noise x𝑇 sian noise, and color jitter. In our study, we also employ attacks utilizing generative models, such as the diffusion model [9, 30] attack described in [21], as well as the VAE [7, 34] attack detailed in [21]. The VAE attack involves embedding an image into the latent space of a Variational Autoencoder (VAE) and then reconstructing it. The diffusion model attack works by denoising injected Gaussian noise on the generated image with the aim of altering the image to erase the watermark. 3. Methods In this section, we first provide a detailed description of METR, the Message Enhanced Tree-Ring [20] algorithm and its extension METR++. We then present an algorithm designed to select the best hyperparameters for optimal METR performance in any use case. 3.1. METR Figure 1 presents the pipelines for watermark generation and detection using METR. Similar to Tree- Ring [20], METR operates with the noise or latent noise of an image. The watermarking procedure modifies this noise through the corresponding Fourier [35] space. The resulting image can be generated using any diffusion model with the DDIM [30] sampling algorithm. In METR, messages consist of binary sequences encoded using concentric circles with radii increasing from 1 to 𝑅, where 𝑅 is defined as the watermark radius. This radius is a fixed hyperparameter representing the number of bits in the message. Therefore, it is possible to encode 2𝑅 messages. Each bit of the message is represented by a single circle, where the ones are assigned the value of 𝑆 and the zeros are given the value of −𝑆. 𝑆 is another crucial parameter that we call the message scaler. Figure 3 illustrates examples of concentric circles with both encrypted and decrypted messages. Algorithm 1 details the pseudocode for generating an image with message encryption using the METR algorithm. Algorithm 1: METR image generation Algorithm 2: METR message detection Input: Scaler 𝑆, radius 𝑅, binary message 𝑚 Input: Image, radius 𝑅, 𝑝0 (e.g., user ID) Output: Predicted watermark ′ Output: Generated image 1 x𝑇 ← DDIM Inversion(Image); 1 x𝑇 ∼ 𝒩 (0, 𝐼); ′ℱ ′ 2 x𝑇 ← Fourier(x𝑇 ); ℱ ′ℱ 2 x𝑇 ← Fourier(x𝑇 ); 3 if 𝐹𝜒2 (𝑧(x𝑇 )) < 𝑝0 // see Equation 1 3 for 𝑟 = 1 to 𝑅 do then 4 mask𝑟 ← circle of radius 𝑟; 4 return NO WM 5 if 𝑚[𝑟] = 1 then 5 end 6 xℱ𝑇 [mask𝑟 ] ← 𝑆; 6 𝑚 ← []; 7 else 7 for 𝑟 = 1 to 𝑅 do 8 xℱ𝑇 [mask𝑟 ] ← −𝑆; 8 mask𝑟 ← circle of radius 𝑟; 9 end 9 if x′ℱ 𝑇 [mask𝑟 ].𝑚𝑒𝑎𝑛() > 0 then 10 end 10 𝑚.𝑎𝑝𝑝𝑒𝑛𝑑(1); wm ℱ 11 x𝑇 ← Inverse Fourier(x𝑇 ); 11 else wm 12 Image ← DDIM Sampling(x𝑇 ); 12 𝑚.𝑎𝑝𝑝𝑒𝑛𝑑(0); 13 return Image 13 end 14 end 15 return 𝑚 The message can be decoded by reverting the image to its noise using DDIM Inversion, followed by a transformation of the inverted image into the Fourier space. Then, we determine whether the image was watermarked or not by evaluating the p-value using Equation 1 and comparing it with a previously defined threshold 𝑝0 . The decryption process for a binary message requires determining the sign of the value for each circle, obtained by averaging all values across the circle. The corresponding pseudocode for message decryption is shown in Algorithm 2. (a) no WM (b) 𝑟 = 4 (c) 𝑟 = 16 (d) 𝑟 = 34 Figure 4: Original image and the corresponding one with a METR watermark, generated with a fixed scale 𝑆, but with different radii 𝑟. (a) no WM (b) 𝑆 = 60 (c) 𝑆 = 100 (d) 𝑆 = 140 Figure 5: Original image and the corresponding one with a METR watermark, generated with fixed radius 𝑟, but with different scales 𝑆. 3.2. Detection Resolution Metric In this section we describe how to select watermark radius 𝑟 and message scaler 𝑆, which are the parameters of METR watermarking algorithm introduced in Section 3.1, and propose the metric Detection Resolution Metric for selecting the scale parameter. The increase of the radius parameter leads to the increase of the number of the potential messages, and the enlarging scale makes the message more detectable. However, increasing each of the parameter also leads to the production of corrupted images marked by various visible artifacts. See Figure 4 and Figure 5 for examples. When aiming to encrypt the desired number of messages using the proposed algorithm, it is advisable to estimate the upper bound of possible messages for selecting the radius parameter 𝑟. Utilizing the smallest suitable radius is recommended in order to preserve the highest possible image quality. When choosing the message scaler 𝑆, we recommend finding a balance to ensure precise watermark detection and high image quality within the proposed Detection Resolution metric. This metric is designed to capture the contrast between watermarked images and those without watermarks. It is based on the detection distance [20], which is essentially an average error per pixel in the true watermark 1 ∑︀|𝑀 | and restored watermark: 𝑑det (x0 ) = |𝑀 | 𝑖=1 |WM𝑖 − Detect(x0 )𝑖 | here 𝑀 is the watermarked area, WM is a true watermark, x0 is the original image with a possible watermark and “Detect” function is Fourier transform of DDIM Inversion of its argument. METR++ Generation Detection Tuned Fine-Tune DDIM METR VAE Decoder XT XT with group ID user 1 user 2 D1 noise latent image D1 METR group 1: ID1 : 10 ... 1 METR++ detection W Image Tuned Fine-Tune DDIM METR VAE Decoder XT XT user ID group ID user 1 with group ID user 2 D2 noise latent image D2 METR++ group 2: ID2 : 11 ... 0 group ID + user ID Image Figure 6: METR++ pipeline where messages are divided into groups. METR is used to encode messages inside a group, and Stable Signature [24] is used to encode group itself. Detection Resolution metric computes the difference in detection distances between an image without a watermark x0 and an image with one x*0 : 𝑅det (x0 , x*0 ) = 𝑑(x0 ) − 𝑑(x*0 ) (2) A binary message 𝑚 is considered accurately detected if the inverse process error does not negate the value of 𝑆. This means that 𝑆 − 𝑑(x*0 ) < 0 should hold for watermarked images, and 𝑆 − 𝑑(x0 ) > 0 for non-watermarked ones. Furthermore, it is essential that the detection resolution metric, which is the error contrast between watermarked and non-watermarked images, is a relatively large value, compared to 𝑆. To specify this, we computed 𝑔 = 𝑅det /𝑆 for situations with a perfect detection of message, and it occurred that 𝑔 = 𝑘𝑆 + 𝑏, where 𝑘 and 𝑏 are linear fit parameters. It means that to prevent detection collisions, we need 𝑔 ≥ 𝑘𝑆 + 𝑏 or 𝑅det /(𝑘𝑆 2 + 𝑏𝑆) ≥ 1. During early experiments, we found out that the most reliable 𝑘 and 𝑏 are −2.23 · 10−3 and 0.653 respectively. Since the error of the DDIM Inversion [30] process with a fixed model remains relatively constant with respect to the chosen image, we can assess the detection accuracy for a given value of 𝑆 using just a single pair of (x0 , x*0 ). This way, we bring the selection of 𝑆 down to defining the possible range of its value and selecting the best one based on a subset of images. The algorithm consists of two steps. Initially, it measures the detection resolution using a generated image with and without watermark from the same noise. Then, if the criterion is satisfied, it assesses the image quality across the full test dataset. See Algorithm 3 for details. Algorithm 3: Parameter 𝑆 search Input: latent noise x𝑇 ∼ 𝒩 (0, 𝐼), test dataset D = ⟨prompt, xtrue ⟩, image quality threshold Θ, 𝑘, 𝑏, radius 𝑟 Output: best message scaller 𝑆 𝑇 ← Fourier(x𝑇 ); 1 xℱ ′ 2 𝑝min , 𝑝max ← min(xℱ ℱ 𝑇 ), max(x𝑇 ); 3 for 𝑆 = 𝑝min to 𝑆 = 𝑝max ; step do 4 x0 ← Generate(x𝑇 ); 5 x*0 ← Generate(METR(x𝑇 , 𝑆, 𝑟)); 6 𝑑0 , 𝑑*0 ← 𝑑(𝑥0 ), 𝑑( 𝑥*0 ); 𝑅 (𝑑 ,𝑑* ) 7 Det_Res ← 𝑘𝑆 2 +𝑏𝑆 ; det 0 0 8 if 𝑑0 > 𝑆 and 𝑑*0 < 𝑆 and Det_Res ≥ 1 then 9 𝐷test ← Generate from prompts in 𝐷 with WM; 10 𝐿 ← Eval(𝐷test , 𝐷) // i.e., FID; 11 if 𝐿 < Θ then 12 Return S 13 end 14 end 15 end 3.3. METR++ Although METR already provides support for encoding messages into images, the selected radius 𝑟 still restricts their number. To overcome this, we developed METR++, an extension of the original METR algorithm. METR++ is designed to significantly increase the original algorithm’s potential for encoding various unique messages, while being limited in terms of models that it can be applied to. The extended algorithm can only be used with Latent Diffusion [11], which is currently the most commonly used image generation architecture [11, 2, 10]. METR++ incorporates two watermarks into an image. As demonstrated in Figure 6, it adopts the METR watermarking algorithm 3.1, augmented by a VAE decoder derived from the latent diffusion model [11], as proposed in Stable Signature [24]. To encode multiple messages, they are first categorized into groups with a capacity of 2𝑟 , a size that reflects the number of potential unique messages in the METR watermarking algorithm. Then, each group is assigned a distinct ID, and uses a specifically fine-tuned VAE decoder designed to encode the group’s ID as a Stable Signature watermark. Consequently, METR++ expands the capacity for encoding unique messages within METR, multiplying it by the number of specially fine-tuned VAE decoders for each group. The identification process consists of two parts, that can be done in parallel. First is the group’s unique key decryption via the Stable Signature approach. Second is the METR message decoding, which identifies a user within a group. Decrypting the METR message requires carrying out the inverse diffusion process on the image latent, that is obtained with a VAE encoder part of LDM, which weights stay intact. To conclude, METR++ can be adapted to any Latent Diffusion model and can encrypt approximately 2 · 𝑛 unique messages, where 𝑟 is METR’s watermark radius, and 𝑛 is the number of fine-tuned VAE 𝑟 decoders. 4. Experiments In this section, we describe our experimental setup and present the results of experiments conducted using METR and METR++. 4.1. Experimental Setup In all experiments, we utilize the base version of Stable Diffusion 2.1 [11] and limit the generation process to 40 steps of DDIM [30] sampling. The inverse diffusion process used for METR message detection was run with no prompt for the same number of steps as the generation process. The guidance scale was set to 7.5. For the baseline, we selected the Tree-Ring [20] and Stable Signature [24] watermarking algorithms for several reasons. In this paper, we aim to present a robust watermarking method capable of encrypting multiple messages. Tree-Ring [20] is considered a state-of-the-art algorithm when it comes to robustness to various attacks [21]. To our knowledge, Stable Signature is the only method capable of encrypting messages non-post-generation, where watermarking happens during the generation process [24]. Finally, METR and METR++ build upon these existing watermarking algorithms. We compare the models based on three main cri- teria: the accuracy of watermark detection, the ac- curacy of watermarked message decryption, and image quality, since we aim to ensure that the en- crypted message does not degrade the generated image. The accuracy of watermark detection is mea- sured by False Positive Rate (FPR), True Positive Rate (TPR), Area Under Curve (AUC) of Receiver Operator Characteristic (ROC), TPR value when Figure 7: METR evaluation on image quality FPR equals to 1%, denoted as “TPR@1%FPR”. The with different radii quality of message detection is assessed using Bit Accuracy, Word Accuracy (the accuracy of fully detected messages, as proposed in [21]), and the detection resolution metric described in Section 3.2. To assess image quality, we use FID (Fréchet Inception Distance) [36] and CLIP score [37]. FID is calculated by comparing the generated watermarked image to its corresponding ground-truth pair from the dataset, denoted as FID gt, or to a generated image created with the same prompt but without a watermark, denoted as FID gen. Comparison with the ground truth image indicates overall quality, while comparison with the generated image illustrates the impact of watermark encryption on the generation process. For the CLIP score, we measure the cosine similarity between the embedding of the watermarked image and the embedding of the reference prompt, following OpenCLIP-ViT/G [38]. We utilized the MSCOCO-5000 dataset [39] to evaluate FID. This dataset consists of 5000 paired images and prompts. To assess watermark detection accuracy, message detection quality, and image quality using CLIP, the images were generated using a subset of 1000 randomly selected prompts from the Stable Diffusion prompts [40]. 4.2. METR Evaluation In this subsection, we describe a set of experiments that compare METR with Tree-Ring. We focus on evaluating their robustness to white-box attacks, including ones that utilize generative models, as described in Section 2.4, watermark detection accuracy and the overall quality of the generated images. To select proper METR parameters, we first searched for the optimal message scale 𝑆 with the algorithm described in Section 3.2 in range 60 ≤ 𝑆 ≤ 160 with multiple different radii. The resulting optimal value was always between 80 and 100. For the rest of our experiments, 𝑆 was set to 100 unless specified otherwise. Regarding message radius 𝑟, we evaluated image quality for different radii on a subset of the MSCOCO-5000 dataset with 500 images. See Figure 7 for results. It is clear that keeping the radius as small as possible benefits the quality of the resulting image. However, we argue that increasing the radius to 16 can still maintain acceptable overall quality. The benefit of doing so is that, this way, METR can encode up to 216 = 65536 messages. In our experiments, we set 𝑟 = 10, which allowed us to encode 1024 messages with almost no loss in quality. Table 1 Detection metrics for Tree-Ring and METR. 𝑅det denotes detection resolution metric. Note that for most of the white-box attacks METR is as resilient as, or even more resilient than, Tree-Ring. Tree-Ring METR METR Attack AUC (↑) AUC (↑) Bit Acc (↑) Word Acc (↑) Rdet (↑) No attack 1.000 (+0.0%) 1.000 (+0.0%) 0.991 0.910 42.7 VAE [34] 𝑞 = 1 0.999 (-0.1%) 0.999 (-0.1%) 0.968 0.747 22.4 diff, 150 steps 0.952 (-4.8%) 0.999(-0.1%) 0.967 (-3.3%) 0.732 21.3 rotate 75 0.959 (-4.1%) 0.907 (-9.3%) 0.694 0.033 5.1 brightness 6.0 0.994 (-0.6%) 0.995 (-0.5%) 0.912 0.463 26.0 noise 𝜎 = 0.1 0.941 (-5.9%) 0.833 (-16.7%) 0.695 0.134 7.9 blur 𝑟 = 4 1.000 (-0.0%) 1.000 (-0.0%) 0.979 0.805 34.1 crop 0.75 0.911 (-8.9%) 0.807 (-19.3%) 0.668 0.020 5.4 JPEG 25 0.995 (-0.5%) 0.992 (-0.8%) 0.953 0.719 27.0 Detection accuracy under image transformation attacks. The detection metrics for cases with white-box attacks [21] involving METR and Tree-Ring are presented in Table 1. “B acc” and “W acc” represent Bit Accuracy and Word Accuracy, respectively. Since Tree-Ring cannot encode messages, these metrics are only calculated for METR. As demonstrated by the results in Table 1, METR is as resilient as, or even more resilient than, Tree-Ring against most image transformations. Both algorithms find rotation and cropping to be the most challenging types of attacks. Table 2 Message decryption accuracy for METR++ and its components: METR and Stable Signature watermarks. Note that the word accuracy of METR++ is constrained by the lower word accuracy of either the METR or Stable Signature watermarks. METR: ft VAE Stable-Signature METR++ Attack Bit Acc Word Acc Bit Acc Word Acc Bit Acc Word Acc None 0.991 0.914 0.995 0.843 0.995 0.772 VAE 𝑞 = 1 0.970 0.760 0.490 0.000 0.572 0.000 diff, 150 steps 0.966 0.719 0.477 0.000 0.561 0.000 rot 75 0.687 0.029 0.547 0.000 0.572 0.000 bright 6.0 0.909 0.460 0.904 0.264 0.905 0.209 noise 𝜎 = 0.1 0.694 0.137 0.539 0.000 0.565 0.000 blur 𝑟 = 4 0.981 0.816 0.419 0.000 0.516 0.000 crop 0.75 0.669 0.023 0.982 0.569 0.928 0.013 JPEG 25 0.941 0.672 0.769 0.000 0.799 0.000 Detection accuracy under generative models’ attacks. We performed a diffusion attack using the base version of Stable-Diffusion 2.1 [11] and a VAE attack using the model [34] with a quality hyperparameter 𝑞 = 1. The detection metrics for images generated with Tree-Ring and METR water- mark, both with the diffusion attack, can be seen in Figure 8. As one can see in the chart, the AUC and TPR@1%FPR for METR are higher than those for Tree-Ring, and decrease with the number of diffusion steps more slowly. Similar results are shown for the VAE attack in Figure 9, parametrized by 𝑞. One can see that Tree-Ring detection metrics are below 1 until 𝑞 = 4, while METR is robust to this attack and the watermark is almost always detected correctly. The Tree-Ring algorithm is not capable of message encryption, and thus word and bit accuracy are shown only for the METR algorithm. (a) Watermark detection (b) Watermark decryption Figure 8: Watermark detection for Tree-Ring and METR with the Diffusion attack [34]. Note that the watermark detection metrics for METR are always higher than for Tree-Ring and decrease with the number of diffusion steps more slowly. (a) Watermark detection (b) Watermark decryption Figure 9: Robustness of Tree-Ring and METR with a VAE attack [34]. Note that the detection metrics for Tree-Ring are below 1 until 𝑞 = 4, while for METR they always equal 1. Image quality. The results of comparing METR with Tree-Ring are presented in Table 3. For the experiment, we sampled a random message for the METR algorithm. It can be seen that the quality of images generated with the METR watermark is close to those with the Tree-Ring watermark. In terms of the CLIP score, the performance of Tree-Ring is similar to that of an image without a watermark, while it drops by 2% for METR. When it comes to the FID score, the relative increase for images generated without a watermark and those with Tree-Ring and METR watermarks are 1.7% and 3.2%, respectively. In conclusion, the METR algorithm demonstrates high resilience to all white-box attacks. The accuracy of watermark detection, even without any attacks or under white-box attack conditions, is very close to the detection accuracy achieved by Tree-Ring, and surpasses Tree-Ring for attacks with generative models [21]. As for image quality, we noticed only a slight decrease with the METR Table 3 The image quality for images without watermarks is assessed using Tree-Ring and METR. Note that for watermark-free images, those generated using Tree-Ring produce results that are nearly identical to those generated with METR, although Tree-Ring scores are slightly higher. Method Message FID gen (↓) FID gt (↓) CLIP (↑) No WM - - 25.570 (+0.0%) 0.364 (-0.0%) Tree-Ring - 10.283 (+0.0%) 26.007 (+1.7%) 0.364 (-0.0%) METR + 10.971 (+6.7%) 26.398 (+3.2%) 0.357 (-1.9%) METR++ + 11.017 (+6.9%) 25.206 (-1.0%) 0.365 (+0.3%) algorithm when compared to Tree-Ring, with less than a 2% difference. However, METR makes it possible to encode multiple unique messages in a watermark. The accuracy of the decrypted messages showed high resilience to the majority of white-box attacks, with exceptions being rotation and cropping. The word accuracy for messages without any attack reached 0.91. 4.3. METR++ Evaluation In this section, we perform a series of experiments to evaluate the robustness of METR++ against white-box attacks. We also compare its detection metrics with those of METR and the Stable Signature watermark [24]. As previously detailed in Section 3.3, METR++ consists of the METR watermark and a fine-tuned VAE decoder designed to encrypt a 48-bit Stable Signature [24] message. Therefore, our evaluation begins by investigating whether the inclusion of the METR watermark in the Fourier space of the latent noise decreases detection resilience. To assess this, we fine-tune the VAE decoder using images drawn from standard latent noise and a distribution in which the METR watermark is embedded into the latent noise space. We decode the messages encoded into the images using both the standard VAE decoder and the one that was fine-tuned on images with the METR watermark. The results are presented in Table 4. The term 𝒩 -METR refers to the VAE decoder that was fine-tuned on images sampled from a normal distribution and subsequently used to work with images with the METR watermark. It can be observed that the bit accuracy of the decrypted Stable Signature message in METR++ is not affected by whether the VAE decoder has been fine-tuned on images sampled from latent noise with the embedded METR watermark. When it comes to the attacks, the bit accuracy of the decrypted Stable Signature message remains almost constant, showing a change of less than 1% compared to basic Stable Signature. Table 4 The bit accuracy of the Stable-Signature watermark. Note that the bit accuracy of the decrypted Stable Signature message from the images containing a METR watermark remains nearly constant, with a variation of less than 1% compared to the images without a METR watermark. VAE decoder no attack crop 0.1 bright 2 jpeg 50 𝒩 -𝒩 0.997 0.969 (+0.0%) 0.990 (+0.0%) 0.870 (+0.0%) 𝒩 -METR 0.997 0.972 (+0.3%) 0.991 (+0.1%) 0.867 (-0.3%) METR-𝒩 0.997 0.971 (+0.3%) 0.989 (-0.1%) 0.866 (-0.4%) METR-METR 0.997 0.974 (+0.5%) 0.990 (-0.0%) 0.863 (-0.8%) The detection accuracy of METR++ consists of the detection accuracies for both the Stable Signature and METR watermarks. Table 2 presents the results of the detection accuracy evaluation. The term “METR: ft VAE” refers to the accuracy of detecting the METR watermark when processed through the METR++ pipeline. “Stable-Signature” refers to the accuracy of decrypting the referenced watermark using the METR++ pipeline. Lastly, “METR++ ” denotes the overall detection accuracy of the complete METR++ message, which includes both the METR and Stable Signature messages. The detection accuracy of the METR watermark remains unaffected by the METR++ pipeline, as the image decoded by the fine-tuned decoder closely resembles the one generated by the original VAE decoder. Consequently, both the Bit and Word accuracy for METR watermark detection, as previously shown in Table 1, demonstrate robustness against most white-box attacks, aside from rotation and cropping. On the other hand, Stable Signature is vulnerable to white-box attacks, which in turn impacts the Word Accuracy of METR++. In Table 1, we highlighted a higher Word Accuracy between the METR and Stable Signature watermarks. It is important to note that the robustness of METR++, in terms of word accuracy, is limited to the lesser robustness of the two watermarks since it requires the correct decryption of both and in terms of bit accuracy it is highly correlated with Stable Signature part, because it has more bits, than METR part of METR++. In terms of the potential number of messages, we use 58 bits for each message therefore, it is possible to encode 258 unique messages. When considering the practical application of either METR or METR++, it is important to evaluate the trade-off between the capability to encode numerous unique messages and the resilience of the watermarking method against white-box attacks. 5. Future Work In this study, we assessed the detection accuracy of METR and METR++ against several white-box attacks. METR demonstrated high resilience to most of the attacks, including state-of-the-art ones. We also propose researching methods to fine-tune the weights of the generative model to enhance the robustness of the METR watermarking algorithm. When it comes to METR++, this method showed low robustness to most attacks due to its word accuracy robustness being constrained by the lowest detection accuracy among the two messages. The Stable Signature watermark message is generally less robust to attacks than the METR watermark message. Therefore, to increase the encoding capacity of the METR watermarking algorithm, we suggest exploring alternative watermarking methods that could replace Stable Signature, or investigating Stable Signature modifications within METR++ that improve its detection accuracy under attacks. Another area of research that needs further exploration is the impact of specific messages on the quality of the generated images. Our findings indicate that the quality of images generated using an average METR message is comparable to those produced with the Tree-Ring watermark. However, in real-world applications, it is essential to examine various combinations of binary values to ascertain whether certain messages with “extreme values” might significantly affect image quality. This consider- ation could be crucial for commercial projects aiming to ensure consistent image generation quality for all their users. 6. Conclusion In this paper, we introduced the METR watermarking algorithm, which can be applied to any diffusion model architecture with a non-probabilistic sampler (e.g., DDIM [30]). Building upon the Tree-Ring watermarking algorithm, METR retains its robustness in detection under white-box attacks and high image quality. The most significant advantage of METR is its ability to encrypt multiple unique messages without the need for model’s weights fine-tuning. This positions the proposed watermarking algorithm as one that can be considered the current leading watermarking algorithm in terms of robustness, image quality, and message encoding capacity. We also propose an extension of METR, named METR++, which is specifically tailored to Latent Diffusion Models and requires additional fine-tuning of a VAE decoder for each new user group. METR++ increases the potential number of encoded messages by the factor of fine-tuned VAE decoders. Comparing METR with its extension METR++ shows that the decision of which algorithm to use for practical applications should balance the total number of messages that can be encoded and the watermark’s overall robustness to attacks. Implementation of our work can be found at https://github.com/deepvk/metr. References [1] CivitAI, Civitai, 2024. URL: https://civitai.com/. [2] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, M. Chen, Hierarchical text-conditional image generation with clip latents, 2022. arXiv:2204.06125. [3] F. Khader, G. Mueller-Franzes, S. T. Arasteh, T. Han, C. Haarburger, M. Schulze-Hagen, P. Schad, S. Engelhardt, B. Baessler, S. Foersch, J. Stegmaier, C. Kuhl, S. Nebelung, J. N. Kather, D. Truhn, Medical diffusion: Denoising diffusion probabilistic models for 3d medical image generation, 2023. arXiv:2211.03364. [4] G. Somepalli, V. Singla, M. Goldblum, J. Geiping, T. Goldstein, Diffusion art or digital forgery? investigating data replication in diffusion models, 2022. arXiv:2212.03860. [5] S. Lee, G. Gu, S. Park, S. Choi, J. Choo, High-resolution virtual try-on with misalignment and occlusion-handled conditions, 2022. arXiv:2206.14180. [6] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436–444. URL: https://doi.org/10. 1038/nature14539. doi:10.1038/nature14539. [7] D. P. Kingma, M. Welling, Auto-encoding variational bayes, 2022. arXiv:1312.6114. [8] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial networks, 2014. arXiv:1406.2661. [9] J. Ho, A. Jain, P. Abbeel, Denoising diffusion probabilistic models, 2020. arXiv:2006.11239. [10] A. Razzhigaev, A. Shakhmatov, A. Maltseva, V. Arkhipkin, I. Pavlov, I. Ryabov, A. Kuts, A. Panchenko, A. Kuznetsov, D. Dimitrov, Kandinsky: an improved text-to-image synthesis with image prior and latent diffusion, 2023. arXiv:2310.03502. [11] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, B. Ommer, High-resolution image synthesis with latent diffusion models, 2022. arXiv:2112.10752. [12] Pinterest, Pinterest, 2024. URL: https://www.pinterest.com/. [13] C. Novelli, F. Casolari, P. Hacker, G. Spedicato, L. Floridi, Generative ai in eu law: Liability, privacy, intellectual property, and cybersecurity, 2024. arXiv:2401.07348. [14] A. Bashardoust, S. Feuerriegel, Y. R. Shrestha, Comparing the willingness to share for human- generated vs. ai-generated fake news, 2024. arXiv:2402.07395. [15] D. Benalcazar, J. E. Tapia, S. Gonzalez, C. Busch, Synthetic id card image generation for improving presentation attack detection, 2022. arXiv:2211.00098. [16] N. Raman, S. Shah, M. Veloso, Synthetic document generator for annotation-free layout recognition, Pattern Recognition 128 (2022) 108660. URL: http://dx.doi.org/10.1016/j.patcog.2022.108660. doi:10. 1016/j.patcog.2022.108660. [17] Y. Qu, X. Shen, X. He, M. Backes, S. Zannettou, Y. Zhang, Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models, 2023. arXiv:2305.13873. [18] I. Cox, M. Miller, J. Bloom, J. Fridrich, T. Kalker, Digital Watermarking and Steganography, 2nd ed., Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2007. [19] C.-C. Chang, P. Tsai, C.-C. Lin, Svd-based digital image watermarking scheme, Pattern Recognition Letters 26 (2005) 1577–1586. URL: https://www.sciencedirect.com/science/article/pii/ S0167865505000140. doi:10.1016/j.patrec.2005.01.004. [20] Y. Wen, J. Kirchenbauer, J. Geiping, T. Goldstein, Tree-ring watermarks: Fingerprints for diffusion images that are invisible and robust, 2023. arXiv:2305.20030. [21] X. Zhao, K. Zhang, Z. Su, S. Vasan, I. Grishchenko, C. Kruegel, G. Vigna, Y.-X. Wang, L. Li, Invisible image watermarks are provably removable using generative ai, 2023. arXiv:2306.01953. [22] Y. Li, H. Wang, M. Barni, A survey of deep neural network watermarking techniques, 2021. arXiv:2103.09274. [23] R. B. Wolfgang, E. J. Delp, A watermark for digital images, in: Proceedings of 3rd IEEE International Conference on Image Processing, volume 3, IEEE, 1996, pp. 219–222. [24] P. Fernandez, G. Couairon, H. Jégou, M. Douze, T. Furon, The stable signature: Rooting watermarks in latent diffusion models, 2023. arXiv:2303.15435. [25] J. Zhu, R. Kaplan, J. Johnson, L. Fei-Fei, Hidden: Hiding data with deep networks, 2018. arXiv:1807.09937. [26] K. A. Zhang, L. Xu, A. Cuesta-Infante, K. Veeramachaneni, Robust invisible video watermarking with attention, 2019. arXiv:1909.01285. [27] M. Tancik, B. Mildenhall, R. Ng, Stegastamp: Invisible hyperlinks in physical photographs, 2020. arXiv:1904.05343. [28] Z. Jiang, J. Zhang, N. Z. Gong, Evading watermark based detection of ai-generated content, 2023. arXiv:2305.03807. [29] Y. Liu, Z. Li, M. Backes, Y. Shen, Y. Zhang, Watermarking diffusion model, 2023. arXiv:2305.12502. [30] J. Song, C. Meng, S. Ermon, Denoising diffusion implicit models, 2022. arXiv:2010.02502. [31] P. B. Patnaik, The non-central 𝜒2 - and 𝐹 -distribution and their applications, Biometrika 36 (1949) 202–232. URL: https://www.jstor.org/stable/2332542. doi:10.2307/2332542. [32] P. Glasserman, Monte Carlo Methods in Financial Engineering, volume 53 of Stochastic Modelling and Applied Probability, Springer, New York, NY, 2003. URL: http://link.springer.com/10.1007/ 978-0-387-21617-1. doi:10.1007/978-0-387-21617-1. [33] S. Peng, Y. Chen, C. Wang, X. Jia, Intellectual property protection of diffusion models via the watermark diffusion process, 2023. arXiv:2306.03436. [34] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, N. Johnston, Variational image compression with a scale hyperprior, 2018. arXiv:1802.01436. [35] E. O. Brigham, R. E. Morrow, The fast fourier transform, IEEE Spectrum 4 (1967) 63–70. doi:10. 1109/MSPEC.1967.5217220. [36] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, Gans trained by a two time-scale update rule converge to a local nash equilibrium, 2018. arXiv:1706.08500. [37] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, Learning transferable visual models from natural language supervision, 2021. arXiv:2103.00020. [38] M. Cherti, R. Beaumont, R. Wightman, M. Wortsman, G. Ilharco, C. Gordon, C. Schuhmann, L. Schmidt, J. Jitsev, Reproducible scaling laws for contrastive language-image learning, 2022. arXiv:2212.07143. [39] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick, Microsoft coco: Common objects in context, in: European conference on computer vision, Springer, 2014, pp. 740–755. [40] Gustavosta, Stable diffusion prompts, 2022. URL: https://huggingface.co/datasets/Gustavosta/ Stable-Diffusion-Prompts.