Towards Image Data Hiding via Facial Stego Synthesis With Generative Model Li Dong1,2 , Jie Wang1,2 , Rangding Wang1,2 , Yuanman Li3 and Weiwei Sun4 1 Faculty of Electrical Engineering and Computer Science, Ningbo University, Zhejiang, China, 315211 2 Southeast Digital Economic Development Institute, Zhejiang, China, 324000 3 Shenzhen University, Guangdong, China, 518061 4 Alibaba Group, Zhejiang, China, 310052 Abstract Stego synthesis-based data hiding aims to directly produce a plausible natural image to convey secret message. However, most of the existing works neglected the possible communication degradations and forensic actions, which commonly occur in practice. In this paper, we devise a generative adversarial network (GAN)-based framework to synthesize facial stego images. The framework consists of four components: generator, extractor, discriminator and forensic network. Specifically, the generator is deployed to generate a realistic facial stego image from the secret message and key, while the extractor aims at extracting the secret message from the stego image with the provided secret key. To combat forensics, we explicitly integrate a forensic network into the proposed framework, which is responsible for guiding the update of generator. Three degradation layers are further incorporated, enforcing the generator to characterize the communication degradations. Experimental results demonstrate that the proposed framework could accurately extract the secret message and effectively resist the forensic detection and certain degradations, while attaining realistic facial stego images. Keywords data hiding, stego synthesis, generative adversarial network 1. Introduction tion probability maps. For the methods HayersGAN [7], HiDDeN [8] and SteganoGAN [9], they all designed an Data hiding aims to embed the secret message into a encoder-decoder alike framework based on GAN. These cover signal, without incurring awareness of an adver- methods could automatically learn the suitable areas for sary. It is widely used in many applications, e.g., covert embedding the secret bitstream message. communication [1] and multimedia data protection [2, 3]. For the last several years, the adversarial examples The primitive ad-hoc Least-Significant Bit (LSB) replaces to neural networks meet data hiding, and continuously the bit in least significant bit-plane of each pixel with drawing extensive attention from the community. Some the secret bit. While the modern data hiding methods at- studies, e.g., [10, 11], found that adding slight pertur- tempt to eliminate the traces of data hiding action and im- bations to the input data would paralyze the prediction prove the steganographic capacity. For example, content- capability of learning-based classifiers. As the opponent adaptive steganography [1] designed sophisticated dis- of data hiding, steganalysis aims to expose the data hiding tortion function according to prior knowledge and used on stego signal and usually involves machine-learning Syndrome-Trellis coding to embed the secret message. classifiers. Therefore, it is possible for data hiding meth- Recently, neural network-based data hiding is becoming ods to bypass steganalysis by borrowing some strategies one of the active research directions. Baluja [4] employed from the adversarial examples-related works. Tang et al. convolutional neural networks to hide an entire secret [12] presented the Adversarial Embedding (ADV-EMB) image into the cover image in an end-to-end fashion. The method that adjusts the modification cost of image ele- work SSGAN [5] attempted to exploit GAN to synthesize ments, according to the gradients that back-propagated a cover image which is more suitable for the subsequent from the target steganalytic neural network. The con- steganographic data embedding. ASDL-GAN [6] inte- structed adversarial stego could effectively fool the ste- grated the content-adaptive steganography and GAN, in ganalytic network, revealing the vulnerability of the deep which the generator was able to produce the modifica- learning-based steganalyzer. Note that, all aforementioned data hiding techniques International Workshop on Safety & Security of Deep Learning, 21st -26th August, 2021 are based on the cover modification. The common char- Envelope-Open dongli@nbu.edu.cn (L. Dong); 1811082196@nbu.edu.cn acteristic is that these methods can not be independent of (J. Wang); wangrangding@nbu.edu.cn (R. Wang); the modification on the given cover image. As such, it in- yuanmanli@szu.edu.cn (Y. Li); sunweiwei.sww@alibaba-inc.com evitably leaves artifacts exposing to steganalysis. On the (W. Sun) contrary, stego synthesis-based data hiding, e.g., [13, 14], Β© 2021 Copyright for this paper by its authors. Use permitted under Creative CEUR Workshop http://ceur-ws.org ISSN 1613-0073 Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) refers to synthesizing the stego image directly from the Proceedings Adversarial Training Anti-Forensics secret message. It could pose more challenges for ste- ganalysis. Under this concept, traditional methods tried Discriminator 𝓓 Forensic network π“•πœ½ to produce stego image based on some hand-crafted des- Genuine image 𝐈 ignations. Although the capacity was relatively higher, Secret Degraded stego they were limited to synthesizing patterned images, such message 𝐦 Generator 𝓖 Degradation image 𝓝(𝐒) layers 𝓝 as textures and fingerprints. As an alternative solution, 10101Β·Β·Β·11 Synthesized Extracted stego image 𝐒 message 𝐦′ some methods [15, 16] use GAN to synthesize stego im- 10010Β·Β·Β·01 Extractor 𝓔 10101Β·Β·Β·01 ages with rich semantics, e.g., face and food. However, Secret key 𝐀 the accuracy of message extraction was unsatisfactory Facial Stego Image Synthesis and Message Extraction under image degradations. Moreover, the synthesized Figure 1: Overview of the proposed FSIS-GAN framework. stego images can be easily identified by a well-trained forensic detector. It is thus urgent to further improve the robustness of message extraction and anti-forensic modification would leave embedding traces that can be capability of stego synthesis-based data hiding methods. detected. To resist the detection by steganalyzer, stego In this work, we propose a Facial Stego Image Synthe- synthesis-based data hiding method could directly pro- sis method for data hiding with GAN, which is termed as duce the stego images from the given secret message. FSIS-GAN. Unlike the cover modification-based data hid- For early attempts, Wu et al. citewu2014steganography ing methods, FSIS-GAN is designed without providing proposed a texture image synthesis-based method, which a cover image beforehand. Compared with the existing selectively distributes the source patches of the original stego synthesis-based methods, FSIS-GAN can not only texture image onto the synthesized stego image. The synthesize realistic facial stego images, but also achieve message hiding and extraction depend on the choice of superior performance in terms of robustness and anti- source patches. Motivated by the fingerprint biomet- forensic capability. Experimental results conducted on rics, Li et al. [14] proposed to use the hologram phase the public facial dataset validate such merits of our pro- constructed from the secret message to synthesize fin- posed method. The main contributions of this work can gerprint stego image. The hologram phase consists of be summarized as follows, two phases: The first spiral phase encodes the secret message to the two-dimensional points with different po- β€’ We explicitly consider the image degradation dur- larities, and the second continuous phase is to synthesize ing the covert communication, and integrate mul- fingerprint images. It is worth noting that conventional tiple degradation layers into the framework. This stego image synthesis-based methods can only synthe- boost the robustness performance in terms of the size patterned stego image such as textures, lacking rich message extraction. semantics, which limits their practical applications. β€’ We incorporate a forensic network during train- Instead, Hu et al. [15] suggested using the genera- ing FSIS-GAN. By exploiting the gradients from tor of GAN to synthesize a facial stego image from the such a forensic network, the stego image pro- secret message. Meanwhile, the secret message can be duced by the learned generator could effectively extracted from the stego image by the corresponding ex- fool the forensic network. tractor network. Similarly, Zhang et al. [16] exploited β€’ We explicitly adopt the secret key into the data GAN to generate stego image with different semantic hiding procedure of FISI-GAN, which could fur- labels, which could improve the robustness of data ex- ther improve the reliability of the secret message traction but significantly scarifying the steganographic extraction. capacity. The main advantage of the GAN-based works is that they could synthesize stego images with rich seman- The rest of this paper is organized as follows. Section tics. However, we shall note that stego images can be II briefly reviews the related work on stego synthesis- easily identified by some well-trained forensic networks. based data hiding. Section III describes the proposed FSIS- In addition, there is no trade-off between capacity and GAN, including network architecture and loss function. extraction accuracy. Section IV presents the experimental results, and the final conclusions are drawn in Section V. 3. Facial Image Data Hiding via 2. Stego Synthesis-based Data Generative Stego Synthesis Hiding In this section, we first give an overview of the proposed FSIS-GAN framework and then introduce each compo- The majority of data hiding method involves the modifi- nent of the framework, accompanied with thorough dis- cation on the given cover images. However, such cover cussion on the loss function, network structure and train- ing procedure. where N(β‹…) models the image degradation process, and N(S) is the degraded stego image. Here, mβ€² ∈ (0, 1)π‘™π‘š 3.1. Overview of FSIS-GAN denotes the extracted secret message. It shall be noted that the extracted message mβ€² shall be (approximately) The proposed FSIS-GAN framework is illustrated in Fig- equals the original secret message m, and thus one can ure 1. In general, it is an end-to-end framework consist- employ error correcting mechanism to fully correct the ing of three parts, where each part is designed to achieve erroneous bits. a specific goal. First, the part of facial stego image syn- To measure the distortion between the original secret thesis and message extraction contains a generator G, an message m and the extracted message mβ€² , we use the extractor E and the degradation layers N. The generator cross-entropy loss to calculate the message extraction loss G is deployed to convert the secret message along with LE , which is given by the secret key into a facial stego image. The degradation π‘™π‘š layers N are used to simulate possible common image 1 LE (m, mβ€² ) = βˆ’ βˆ‘ [π‘šπ‘– log(π‘šπ‘–β€² ) + (1 βˆ’ π‘šπ‘– )log(1 βˆ’ π‘šπ‘–β€² )], degradations within the communication channel. The 𝑙 π‘š 𝑖=1 extractor E is learned to recover the secret message from (3) β€² β€² the degraded stego image. Second, there is a discrimina- where π‘šπ‘– and π‘šπ‘– is 𝑖-th element of m and m , respectively. tor D in the part of adversarial training, which aims at Note that, our proposed FSIS-GAN framework explic- distinguishing the genuine data sample from the ones pro- itly receiving a secret key as an input, which is designed duced by the generator G. Third, a well-trained existing to satisfy the Kerckhoffs’ principle. It means that even forensic network Fπœƒ (parameterized by πœƒ) is introduced the extractor E network is completely exposed to an at- in the part of anti-forensics, which could distinguish the tacker, the secret message m will be recovered only if genuine from the synthesized facial stego image. Note the receiver obtain both the secret key k and the facial that this target forensic network is treated as a fixed stego image S. It is worth emphasizing that, for most of adversary, and its network parameters are always frozen. the existing GAN-based methods, e.g., [15, 16], there is no involvement of a secret key. Further notice that as the input of the extractor E, the dimensions of secret key 3.2. Stego Image Synthesis and Message k is greatly smaller than that of the facial stego image Extraction S. Thus, the extractor E tends to discard the secret key The part of facial stego image synthesis and message because it carries much less information. To mitigate this extraction achieve two functionalities. First, by using issue, we propose to use randomly generated incorrect Μƒ 𝑙 Μƒ the generator G, one can convert the given secret mes- secret key k ∈ {0, 1} π‘˜ , where k β‰  k, as input during train- sage into a facial stego image. Second, the extractor E is ing stage. Instead of directly using the correct secret key responsible for extracting the secret message from the and minimize the difference between the extracted and input stego image. Furthermore, a secret key is intro- original message, we maximize the differences between duced to ensure the communication reliability and high the extracted and original message when applying incor- diversity of the generated facial stego image. rect secret key. Mathematically, the loss term inverse loss Generally, generator G and extractor E aim to learn LE Μƒ , can be expressed by the negative cross-entropy loss: two mappings, i.e., mapping the given secret message π‘™π‘š into a stego image, and vice versa. More formally, let 1 LE β€² Μƒ (m, mΜƒ ) = βˆ‘ [π‘šπ‘– log(π‘š ̃𝑖′ )], (4) ̃𝑖′ ) + (1 βˆ’ π‘šπ‘– )log(1 βˆ’ π‘š m ∈ {0, 1}π‘™π‘š and k ∈ {0, 1}π‘™π‘˜ be the binary secret message π‘™π‘š 𝑖=1 and the secret key, respectively. Generator G is intended where π‘š ̃𝑖′ is the 𝑖-th element of the extracted message mβ€²Μƒ to learn the first mapping, transforming the message m with the incorrect key k,Μƒ i.e., mβ€²Μƒ = E(N(S), k). Μƒ along with the secret key k into a stego image: Enhancing robustness with degradation layers: In a practical communication channel, there often ex- S = G(m, k), (1) ists degradations on the synthesized stego image S, when transmitting the stego to a receiver. To this end, the data where S denotes the synthesized facial stego image of hiding system requires certain robustness to ensure the shape 𝐢 Γ— 𝐻 Γ— π‘Š. To recover the secret message, we next accuracy of message extraction. Therefore, in this work, introduce the extractor E. Considering that the facial we take three representative degradations into account, stego image S may be degraded during transmission, the i.e., image noise pollution, blurring, and compression. second mapping should be from the degraded stego image For noise pollution, we consider the one of the most along with the secret key k to the secret message, which widely-used noise models: Gaussian noise. For blurring, can be expressed by the Gaussian blurring is used. For signal compression, mβ€² = E(N(S), k), (2) JPEG image compression is employed, which is exten- sively used for reducing the bandwidth of transmission process. In experiments, we implement these three types 3.4. Anti-forensics Part of degradation as neural network layers N to degrade the Remind that there is no explicit cover images involved in stego image. Specifically, three network layers are used stego synthesis-based data hiding methods. This merit for simulating each type of degradation. Gaussian noise makes such type of data hiding method could effectively layer (GNL) is to add Gaussian noise to the facial stego resist to conventional steganalysis detection. However, image S. Gaussian blurring layer (GBL) blurs S. For JPEG as pointed in [15], a well-trained forensic network could compression, considering that the quantitation operation readily distinguish a synthesized stego image from the is non-differentiable, we approximate the quantization genuine one, even the synthesized stego image is of no operation with a differentiable polynomial function. Such perceptual differences to an observer. differentiating technique can also be referred to the work Although Fπœƒ is an expert in such a detection task, some HiDDeN [8]. studies [10, 11] have shown that deep neural network- based classifiers are vulnerable to adversarial examples. 3.3. Adversarial Training Part Inspired by this, we propose to apply strategies of ob- As aforementioned, the hand-crafted stego synthesis- taining adversarial examples to evade the stego detection based data hiding methods [13, 14] only could synthesize network as a way for realizing anti-forensics. In FSIS- patterned images such as texture and fingerprint, limiting GAN framework, we consider a white-box scenario, i.e., their practical applications. Synthesizing a natural image assuming one has full knowledge of the target foren- with semantics is a challenging task. However, this prob- sic network. The target forensic network F is trained lem can be alleviated with the guidance of adversarial with the genuine images from a publicly available facial training. In this part, the purpose of the discriminator dataset and the synthesized images that produced by BE- D is to conduct adversarial training with the generator GAN [17]. Then, we integrate the well-trained Fπœƒ into G and improve the plausibility of the synthesized facial the FSIS-GAN framework, in which Fπœƒ receives the syn- stego images. thesized facial stego image S and output the confidence. More specifically, let I be the genuine facial image sam- The gradients that back-propagated by the Fπœƒ are used ple of shape 𝐢 Γ— 𝐻 Γ— π‘Š from a publicly available genuine to update the parameters of the generator G. To measure facial image dataset. The discriminator D estimates the the loss of resisting forensic detection, we define the anti- probability that a given image sample belonging to a syn- forensic loss LFπœƒ to computes the cross-entropy between thesized by the generator G. The generator G attempts to the output of Fπœƒ and our target genuine image label: fool the discriminator D. Through such adversarial train- LFπœƒ (S) = βˆ’ log (1 βˆ’ Fπœƒ (S)), (8) ing, the generator G is encouraged to synthesize much more realistic facial stego images. As a variant of GAN, where F (S) ∈ (0, 1) is the confidence output by F . πœƒ πœƒ the network structure and loss function of BEGAN [17] Clearly, the decrement of L indicates the probability Fπœƒ provides a good reference for improving training stability. increment of S being identified as a genuine image by F . πœƒ Thus, we in this work employ the adversarial training loss used in BEGAN. Mathematically, the adversarial loss Ladv for the generator G can be calculated as 3.5. Network Structure and Training 1 Strategy Ladv (D(S), S) = [|D(S) βˆ’ S|], (5) 𝐢𝐻 π‘Š The network architecture of the generator G and the where the shape of output D(S) is same as the facial stegoextractor E are shown in Figure 2. For generator G, the image. The adversarial loss LD for the discriminator D secret key vector k is first concatenated to the secret is message vector m and then fed to subsequent layers. 1 Then, G applies two fully-connected (FC) layers and three LD (I, S) = [|D(I) βˆ’ I| βˆ’ β„Žπ‘‘ β‹… |D(S) βˆ’ S|], (6) convtranspose (ConvT) layers to produce the facial stego 𝐢𝐻 π‘Š where β„Žπ‘‘ controls the discrimination ability of D in the image S. In particular, after each FC layer or ConvT layer, we apply batch normalization (BN) [18] and ReLU 𝑑-th training step to equilibrate the adversarial training. It can be computed as activation function to process intermediate vectors. In experiments, we found that both m and k are composed πœ† β„Žπ‘‘+1 = β„Žπ‘‘ + [𝛾|D(I) βˆ’ I| βˆ’ |D(S) βˆ’ S|]. (7) of binary number 0 or 1, and such form is not suitable 𝐢𝐻 π‘Š as input and the adversarial training loss would diverge. Here the parameter πœ† is the learning rate of training, To solve this issue, additional BN layers were added, and and 𝛾 is a hyper-parameter to control the diversity of normalization operation is carried out inside the network. synthesized facial images. The quality and diversity of Experiential results show that this trick could greatly the facial stego images can be freely adjusted by tuning alleviate the divergence problem. the parameter 𝛾. 4. Experiment results !+,%-"'./"0* '%")1 .&()"*# !"#$"% 3"+*" (! !"0! !"#$"% !"#$%&'#%&' (! In this section, we first introduce the experimental setup. Then, to verify the robustness of our proposed FSIS-GAN, &"''()"*! !"#)*+,- !"#)*+,- !"#)* +.'/0 +.'/0 +*%#1 !"#" !"$0" % 0! & *)" + - " , , it is evaluated under image degradation and without degradation, respectively. Finally, the anti-forensic capa- + - !"!'() .(" " ) ) + - !*" " ( ( /"+"- bility of FSIS-GAN is validated. (a) Network structure of the generator G !"#$%&'#%&' 4.1. Experimental Setup (! Our experiments are conducted on the CelebA dataset !"#$"% &"'(! !"#/)0'12 !"#/)0'12 !"#/)0'12 !"#/)0'12 (!)*+,-"+. [20], where the region with face is identified and ex- !"/" +*," # $ " *!+" # $ " !, !, !"/! tracted. All images are reshaped into 3 Γ— 64 Γ— 64. The following three metrics are used for evaluation: . . # $ 45%$2#%". !+." " - - 1"++23"(#$ # $ !"#"$ ,-" " + + !')%*"+,-".( &"#"$ '& ( !)"#"$ +%"3/ ,123"(% β€’ FrΓ©chet Inception Distance (FID) [21], which (b) Network structure of the extractor E is a widely-used perceptual image quality assess- Figure 2: Network structure of the generator G and the extrac- ment metric for synthesized images. FID is a de tor E. β€œConcat”, β€œFC”, β€œConvT”, β€œBN”, β€œConv” denote the con- facto metric for assessing the image quality cre- catenation, fully-connected layer, convtranspose layer, batch ated by generator of GANs’. Lower FID score norm, and convolution layer, respectively. indicates better consistency with human’s per- ception on natural images. β€’ Accuracy of message extraction (ACC) that is For extractor E, we shall ensure the secret key vector 𝐿 k and the facial stego image matrix S in a way such computed by ACC = 𝐿Ext , where 𝐿Ext is the length that the extractor E would not neglect the information of correctly extracted message and 𝐿 is the length provided by the secret key. To this end, the extractor of secret message m. E first applies FC layer to the secret key to form the β€’ Probability of missed detection (PMD). This intermediate matrix, i.e., 1 Γ— π‘Š Γ— 𝐻. Then, the facial stego metric can be calculated by PMD = 𝐹 𝑁𝐹+𝑇𝑁 𝑃 , where image S and the intermediate matrix are concatenated, 𝐹 𝑁 (False Negative) is the ratio for case β€œsynthe- and then feed the fused tensor to the four convolutional sized facial image is misclassified as a genuine (Conv) layers. Finally, the extractor E applies the FC layer one”, and 𝑇 𝑃 (True Positive) is the ratio for case and Sigmoid activation function to produce the message β€œsynthesized facial image is correctly detected”. vector mβ€² (or mβ€²Μƒ ) with size of 1 Γ— π‘™π‘š . Larger PMD indicates higher resisting ability to For the discriminator D, we adopt the auto-encoder the forensic network. alike structure from BEGAN [17]. For the target forensic network F, we use Ye-Net [19], which is a widely-used The proposed FSIS-GAN framework is implemented steganalytic method. with PyTorch and train on four NVIDIA GTX1080Ti The training process of the proposed FSIS-GAN frame- GPUs with 11GB memory. The number of training work is iteratively optimize the loss function of each epochs is set to 400 with a mini batch-size of 64. We network, except the well-trained forensic network Fπœƒ . use Adam [22] as the optimizer with a learning rate of We apply the extraction loss LE and the adversarial loss 2 Γ— 10βˆ’4 . For the hyper-parameters 𝛼 and 𝛽 in (9), with LD as the loss function for the extractor E and the dis- a number of trials and errors, we empirically set them criminator D, respectively. In particular, The total loss as 0.1 in experiments. The parameter 𝛾 in (7) is set to LG for the generator G is a proper fusion of the four 0.7, which is expected to produce reasonably diverse fa- losses aforementioned as follows cial stego images. The competing method is the most related work [15]. We implement this work by ourselves LG = Ladv + 𝛼(LE + LE Μƒ ) + 𝛽L Fπœƒ , (9) because there is no publicly available code. With certain tweaking and fine-tuning, the tested results were com- where Ladv is the adversarial loss for G, LE Μƒ is the in- parable to the originally reported data from [15]. For a verse loss, and LFπœƒ is the anti-forensic loss. The hyper- fair comparison, the length of the secret message π‘™π‘š and parameters of 𝛼 and 𝛽 are used to control the relative the secret key π‘™π‘˜ are all set to 100, so as to the payload is importance among the four losses. identical to that of work [15]. Table 1 Comparison of message extraction accuracy (%) for the case of no communication degradations. Here, k and kΜƒ denote the correct and incorrect secret key, respectively. FSIS-GAN- WD is a variant of the proposed method by excluding the degradation layers, and FSIS-GAN-WD (ex LE Μƒ ) represents the FSIS-GAN-WD trained without inverse loss LE Μƒ. Hu et al. FSIS-GAN-WD FSIS-GAN-WD (ex LEΜƒ) Figure 3: Comparison of exemplar synthesized stego images. Scheme [15] with k with kΜƒ with k with kΜƒ Top: Hu et al. [15]; Bottom: Proposed FSIS-GAN-WD. Accuracy 85.23 98.76 71.50 99.41 97.01 4.2. Performance Without Degradations would significantly deduce the extraction accuracy from Notice that the competing method [15] does not consider 97.01% to 71.50%, while FSIS-GAN-WD almost retains the image degradations. To verify the effectiveness of the same extraction accuracy. This phenomena means the proposed method under same settings and make a that the involvement of the secret key will not work if fair comparison. We in this subsection to evaluate the we exclude the inverse loss. In contrast, FSIS-GAN-WD performance without degradation layers N. The facial (ex LE Μƒ Μƒ ) with the incorrect key k still attains a quite high stego image S will be transmitted to extractor E without extraction accuracy of (> 97%). In a short summary, with- any degradation. To avoid confusion, this variation of out the inverse loss LE Μƒ , the variant FSIS-GAN-WD (ex our proposed method is termed as FSIS-GAN-WD (WD is LEΜƒ ) will violate the Kerckhsoffs’ principle. abbreviated for Without Degradations). We first compare the visual quality of the facial stego images with the competing method [15]. As can be seen from Figure 3, the 4.3. Performance With Degradations proposed FSIS-GAN-WD could synthesize more realistic In this subsection, we test the robustness performance facial stego images in comparison with Hu et al. [15]. of the proposed framework under certain image degra- With more careful inspection, one can notice that the dations. The image degradation type and level are given stego images produced by FSIS-GAN-WD are more vivid as prior knowledge. This scenario is common in practice and with more correct semantic structures. It is difficult because one can obtain some prior knowledge on the for a common human to aware the inauthenticity of the degradation through probing the communication chan- facial stego images synthesized by FSIS-GAN-WD. In nel. Thus, one can fix the degradation layers N and its contrast, the stego images generated by Hu et al. [15] are associated parameters during training stage. Specifically, typically blurry and severely distorted, which apparently in our experiments, the standard deviation 𝜎1 of the Gaus- draw attentions from a forensic analyzer. For the FID sian noise layer (GNL) is set to 0.2. The kernel width 𝑑 evaluation experiment, we use 10, 000 pairs of genuine and the standard deviation 𝜎2 of the Gaussian blurring images and synthesized facial stego images to compute layer (GBL) are set to 3 and 1, respectively. The differ- the FID score. The FID score of FSIS-GAN-WD is 23.20, entiable JPEG compression layer (JCL) is implemented which is much smaller than that of Hu et al. [15]’s 32.07. as suggested by the work HiDDEN [8] For referring sim- Then, we evaluate the extraction accuracy for the case plicity, this variation is termed as FSIS-GAN-FD (FD is of without degradation. The results are tabulated in Table abbreviated for Fixed Degradation) in the sequel. 1. To demonstrate the impact of the inverse loss LE Μƒ on Firstly, the stego images synthesized by FSIS-GAN-FD the extraction accuracy, the ablation experiments are also are provided in Figure 4. One can observe that some conducted, by excluding the inverse loss during training. speckle noises emerge in the generated stego images, This LE Μƒ -ablated version is denoted as FSIS-GAN-WD (ex which can be clearly seen from the highlighted regions LEΜƒ ). From the Table 1, one can draw the following con- with red line in Figure 4 (b). Quantitatively, the FID score clusions. First, the extraction accuracy of FSIS-GAN-WD of FSIS-GAN-FD is 41.40, which is inferior to that of with the correct secret key k is 98.76%, which dramati- FSIS-GAN-WD (23.20) and Hu et al. [15] (32.07). Never- cally outperforms 85.23% of the competing method [15]. theless, the stego images produced by FSIS-GAN-FD are Second, by comparing FSIS-GAN-WD and FSIS-GAN- intuitively more realistic than that of Hu et al. [15]. WD (ex LE Μƒ ), one can see that, the extraction accuracy of Secondly, in Table 2, we report the extraction accuracy FSIS-GAN-WD with a correct secret key k slightly infe- performance under fixed degradations. Not surprisingly, rior to that of FSIS-GAN-WD (ex LE Μƒ ). This suggests that one can notice that the extraction accuracy of Hu et al. the introduced inverse loss would marginally harm the [15] and FSIS-GAN-WD greatly degrade, which can be extraction accuracy. However, when comparing the case attributed to the overlooking on degradation-resistant of incorrect key k,Μƒ the participation of the inverse loss LEΜƒ message extraction issue. In contrast, FSIS-GAN-FD ex- 100 Hu et al. FSIS-GAN-WD FSIS-GAN-FD 90 Message extraction accuracy 80 70 60 (a) (b) Figure 4: The comparison of synthesized facial stego images, 50 95 75 55 45 15 5 where four images of (a) are produced by FSIS-GAN-WD; Compression quality factors (QF) images of (b) are stego images produced by FSIS-GAN-FD. Figure 5: Comparison of the message extraction accuracy (%) With the introduction of degradation layers, minor speckle under various levels of JPEG compression degradation. noises emerge (highlighted with red rectangular). ever, as pointed in [15], a well-trained forensic network Table 2 can effectively identify a synthesized image. To solve this Comparison of message extraction accuracy (%) under various degradation conditions. The bold and marked value with issue, we explicitly considered the anti-forensics scenario an asterisk (*) denote the highest extraction accuracy with and introduce the anti-forensic loss LFπœƒ . correct secret key k and the lowest extraction accuracy with To demonstrate the influence of anti-forensic loss LFπœƒ , the incorrect secret key k,Μƒ respectively. we conduct the ablation experiment by excluding the loss term LFπœƒ , and thus this variant is termed as FSIS-GAN (ex Hu et al. FSIS-GAN-WD FSIS-GAN-FD LFπœƒ ). For a concrete example, we employ the well-trained Scheme [15] with k with kΜƒ with k with kΜƒ forensic network Ye-Net [19] Fπœƒ to detect 3000 facial W/o degradation 85.23 98.76 71.50βˆ— 98.22 72.08 stego images produced by different methods, and record the probability of missed detection (PMD). The PMD’s Fixed GNL 52.72 59.78 56.23βˆ— 95.58 72.74 of Hu et al. [15], FSIS-GAN (ex LFπœƒ ), and FSIS-GAN are βˆ— Fixed GBL 69.68 57.52 54.68 98.58 73.78 3.23%, 8.84%, and 89.91, respectively. As clearly shown, Fixed JCL 65.33 61.38 58.00βˆ— 98.46 72.67 for FSIS-GAN (ex LFπœƒ ), despite the facial stego images look natural for human, they are easily exposed to the forensic network, where the PMD value is lower than 10%. hibits quite promising results. Under three types of degra- In contrast, by introducing the anti-forensic loss term, the dation layers, the extraction accuracy typically exceeds value of PMD of FSIS-GAN could reach 89.91%. This 94% (though lower than that of FSIS-GAN-WD, which is means the proposed method FSIS-GAN could effectively specifically designed for the non-degradation scenario). bypass the existing forensic network, retaining an nice The results verify that for the case of known degrada- anti-forensic capability. tions, the proposed framework could learn to effectively resistant the fixed degradations, by employing the fixed degradation layers during the training. 5. Conclusion Finally, to illustrate how the robustness of message extraction changes under different degradation levels, In this work, we proposed a stego-synthesis based data we test different degradation types with a variety of hiding method using generative neural network, by degradation levels. Due to space limit, we only report explicitly considering the image degradation and anti- the JPEG compression degradation in Figure 5. As can forensic need. Specifically, the generator is to synthe- be seen, with the decrement of quality factor (𝑄𝐹), the size a facial stego image from the given secret message extraction accuracy generally decreases. Although the and secret key. The extractor aims to recover the secret JCL that adopted from HiDDEN [8] could handle non- message with the secret key. Through the adversarial differentiable JPEG compression, it cannot perfectly re-training with the discriminator, the generator could pro- produce the JPEG compression artifacts. Nevertheless, duce realistic facial stego images. The degradation layers FSIS-GAN-FD still achieve superior robustness, when are introduced during the training, which significantly comparing with other two schemes. enhance the robustness of message extraction. A forensic network is incorporated during training, in response to the possible adversarial forensic analysis in communi- 4.4. Performance of Anti-forensics cation channel. Experimental results verified that, our Recall that, owing to that no cover images are involve- approach could generate more natural facial stego im- ment for data hiding, our method has a relatively good ages, while retaining higher message extraction accuracy undetectability when exposed to a steganalyzer. How- and nice anti-forensic ability. Acknowledgments phy, IEEE Trans. Inf. Forensics Security 14 (2019) 2074–2087. This work was supported in part by the National Natu- [13] K. Wu, C. Wang, Steganography using reversible ral Science Foundation of China under Grant 61901237, texture synthesis, IEEE Trans. Image Process. 24 in part by the Open Project Program of the State Key (2014) 130–139. Laboratory of CADCG, Zhejiang University under Grant [14] S. Li, X. Zhang, Toward construction-based data A2006, and in part by the Ningbo Natural Science Foun- hiding: From secrets to fingerprint images, IEEE dation under Grant 2019A610103. Thanks to Southeast Trans. Image Process. 28 (2018) 1482–1497. Digital Economic Development Institute for supporting [15] D. Hu, L. Wang, W. Jiang, S. Zheng, B. Li, A novel im- the computing facility. age steganography method via deep convolutional generative adversarial networks, IEEE Access 6 (2018) 38303–38314. References [16] Z. Zhang, G. Fu, R. Ni, J. Liu, X. Yang, A genera- [1] V. Sedighi, R. Cogranne, J. Fridrich, Content- tive method for steganography by cover synthesis adaptive steganography by minimizing statistical with auxiliary semantics, Tsinghua Science and detectability, IEEE Trans. Inf. Forensics Security 11 Technology 25 (2020) 516–527. (2015) 221–234. [17] D. Berthelot, T. Schumm, L. Metz, BEGAN: Bound- [2] J. Zhou, W. Sun, L. Dong, X. Liu, O. C. Au, Y. Y. Tang, ary equilibrium generative adversarial networks, Secure reversible image data hiding over encrypted arXiv preprint arXiv:1703.10717 (2017). domain via key modulation, IEEE Trans. Circuits [18] S. Ioffe, C. Szegedy, Batch normalization: Acceler- Syst. Video Technol. 26 (2015) 441–452. ating deep network training by reducing internal [3] L. Dong, J. Zhou, W. Sun, D. Yan, R. Wang, First covariate shift, arXiv preprint arXiv:1502.03167 steps toward concealing the traces left by reversible (2015). image data hiding, IEEE Trans. Circuits Syst. II, Exp. [19] J. Ye, J. Ni, Y. Yi, Deep learning hierarchical repre- Briefs 67 (2020) 951–955. sentations for image steganalysis, IEEE Trans. Inf. [4] S. Baluja, Hiding images within images, IEEE Trans. Forensics Security 12 (2017) 2545–2557. Pattern Anal. Mach. Intell. 42 (2020) 1685–1697. [20] Z. Liu, P. Luo, X. Wang, X. Tang, Deep learning [5] H. Shi, J. Dong, W. Wang, Y. Qian, X. Zhang, SS- face attributes in the wild, in: Proc. IEEE Int. Conf. GAN: secure steganography based on generative Comput. Vis., 2015. adversarial networks, in: Pacific Rim Conference [21] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, on Multimedia, 2017, pp. 534–544. S. Hochreiter, Gans trained by a two time-scale [6] W. Tang, S. Tan, B. Li, J. Huang, Automatic stegano- update rule converge to a local nash equilibrium, graphic distortion learning using a generative ad- in: Proc. Adv. Neural Inf. Process. Syst., 2017, pp. versarial network, IEEE Signal Process. Lett. 24 6629–6640. (2017) 1547–1551. [22] D. P. Kingma, J. Ba, Adam: A method for stochas- [7] J. Hayes, G. Danezis, Generating steganographic tic optimization, arXiv preprint arXiv:1412.6980 images via adversarial training, in: Proc. Adv. Neu- (2014). ral Inf. Process. Syst., 2017, pp. 1954–1963. [8] J. Zhu, R. Kaplan, J. Johnson, F. Li, HiDDen: Hid- ing data with deep networks, in: Proc. Eur. Conf. Comput. Vis., 2018, pp. 657–672. [9] K. A. Zhang, A. Cuesta-Infante, L. Xu, K. Veera- machaneni, SteganoGAN: High capacity im- age steganography with GANs, arXiv preprint arXiv:1901.03892 (2019). [10] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Er- han, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, arXiv preprint arXiv:1312.6199 (2013). [11] I. J. Goodfellow, J. Shlens, C. Szegedy, Explain- ing and harnessing adversarial examples, arXiv preprint arXiv:1412.6572 (2014). [12] W. Tang, B. Li, S. Tan, M. Barni, J. Huang, CNN- based adversarial embedding for image steganogra-