Towards Image Data Hiding via Facial Stego Synthesis With Generative Model

Towards Image Data Hiding via Facial Stego Synthesis With Generative Model LiDong Faculty of Electrical Engineering and Computer Science Ningbo University

315211 Zhejiang China

Southeast Digital Economic Development Institute

324000 Zhejiang China

JieWang Faculty of Electrical Engineering and Computer Science Ningbo University

315211 Zhejiang China

Southeast Digital Economic Development Institute

324000 Zhejiang China

RangdingWang YuanmanLi Faculty of Electrical Engineering and Computer Science Ningbo University

315211 Zhejiang China

Southeast Digital Economic Development Institute

324000 Zhejiang China

Shenzhen University

518061 Guangdong China

WeiweiSun Alibaba Group

310052 Zhejiang China

Towards Image Data Hiding via Facial Stego Synthesis With Generative Model 1613-0073 FA78DDB3AA82ECFED4DD3263385E3DE9 GROBID - A machine learning software for extracting information from scholarly documents data hiding stego synthesis generative adversarial network

Stego synthesis-based data hiding aims to directly produce a plausible natural image to convey secret message. However, most of the existing works neglected the possible communication degradations and forensic actions, which commonly occur in practice. In this paper, we devise a generative adversarial network (GAN)-based framework to synthesize facial stego images. The framework consists of four components: generator, extractor, discriminator and forensic network. Specifically, the generator is deployed to generate a realistic facial stego image from the secret message and key, while the extractor aims at extracting the secret message from the stego image with the provided secret key. To combat forensics, we explicitly integrate a forensic network into the proposed framework, which is responsible for guiding the update of generator. Three degradation layers are further incorporated, enforcing the generator to characterize the communication degradations. Experimental results demonstrate that the proposed framework could accurately extract the secret message and effectively resist the forensic detection and certain degradations, while attaining realistic facial stego images.

Introduction

Data hiding aims to embed the secret message into a cover signal, without incurring awareness of an adversary. It is widely used in many applications, e.g., covert communication [1] and multimedia data protection [2,3]. The primitive ad-hoc Least-Significant Bit (LSB) replaces the bit in least significant bit-plane of each pixel with the secret bit. While the modern data hiding methods attempt to eliminate the traces of data hiding action and improve the steganographic capacity. For example, contentadaptive steganography [1] designed sophisticated distortion function according to prior knowledge and used Syndrome-Trellis coding to embed the secret message. Recently, neural network-based data hiding is becoming one of the active research directions. Baluja [4] employed convolutional neural networks to hide an entire secret image into the cover image in an end-to-end fashion. The work SSGAN [5] attempted to exploit GAN to synthesize a cover image which is more suitable for the subsequent steganographic data embedding. ASDL-GAN [6] integrated the content-adaptive steganography and GAN, in which the generator was able to produce the modifica-

International Workshop on Safety & Security of Deep Learning, 21st -26th August, 2021

Envelope dongli@nbu.edu.cn (L. Dong); 1811082196@nbu.edu.cn (J. Wang); wangrangding@nbu.edu.cn (R. Wang); yuanmanli@szu.edu.cn (Y. Li); sunweiwei.sww@alibaba-inc.com (W. Sun) tion probability maps. For the methods HayersGAN [7], HiDDeN [8] and SteganoGAN [9], they all designed an encoder-decoder alike framework based on GAN. These methods could automatically learn the suitable areas for embedding the secret bitstream message.

For the last several years, the adversarial examples to neural networks meet data hiding, and continuously drawing extensive attention from the community. Some studies, e.g., [10,11], found that adding slight perturbations to the input data would paralyze the prediction capability of learning-based classifiers. As the opponent of data hiding, steganalysis aims to expose the data hiding on stego signal and usually involves machine-learning classifiers. Therefore, it is possible for data hiding methods to bypass steganalysis by borrowing some strategies from the adversarial examples-related works. Tang et al. [12] presented the Adversarial Embedding (ADV-EMB) method that adjusts the modification cost of image elements, according to the gradients that back-propagated from the target steganalytic neural network. The constructed adversarial stego could effectively fool the steganalytic network, revealing the vulnerability of the deep learning-based steganalyzer.

Note that, all aforementioned data hiding techniques are based on the cover modification. The common characteristic is that these methods can not be independent of the modification on the given cover image. As such, it inevitably leaves artifacts exposing to steganalysis. On the contrary, stego synthesis-based data hiding, e.g., [13,14], refers to synthesizing the stego image directly from the secret message. It could pose more challenges for steganalysis. Under this concept, traditional methods tried to produce stego image based on some hand-crafted designations. Although the capacity was relatively higher, they were limited to synthesizing patterned images, such as textures and fingerprints. As an alternative solution, some methods [15,16] use GAN to synthesize stego images with rich semantics, e.g., face and food. However, the accuracy of message extraction was unsatisfactory under image degradations. Moreover, the synthesized stego images can be easily identified by a well-trained forensic detector. It is thus urgent to further improve the robustness of message extraction and anti-forensic capability of stego synthesis-based data hiding methods.

In this work, we propose a Facial Stego Image Synthesis method for data hiding with GAN, which is termed as FSIS-GAN. Unlike the cover modification-based data hiding methods, FSIS-GAN is designed without providing a cover image beforehand. Compared with the existing stego synthesis-based methods, FSIS-GAN can not only synthesize realistic facial stego images, but also achieve superior performance in terms of robustness and antiforensic capability. Experimental results conducted on the public facial dataset validate such merits of our proposed method. The main contributions of this work can be summarized as follows,

• We explicitly consider the image degradation during the covert communication, and integrate multiple degradation layers into the framework. This boost the robustness performance in terms of the message extraction. • We incorporate a forensic network during training FSIS-GAN. By exploiting the gradients from such a forensic network, the stego image produced by the learned generator could effectively fool the forensic network. • We explicitly adopt the secret key into the data hiding procedure of FISI-GAN, which could further improve the reliability of the secret message extraction.

The rest of this paper is organized as follows. Section II briefly reviews the related work on stego synthesisbased data hiding. Section III describes the proposed FSIS-GAN, including network architecture and loss function. Section IV presents the experimental results, and the final conclusions are drawn in Section V.

Stego Synthesis-based Data Hiding

The majority of data hiding method involves the modification on the given cover images. However, such cover Instead, Hu et al. [15] suggested using the generator of GAN to synthesize a facial stego image from the secret message. Meanwhile, the secret message can be extracted from the stego image by the corresponding extractor network. Similarly, Zhang et al. [16] exploited GAN to generate stego image with different semantic labels, which could improve the robustness of data extraction but significantly scarifying the steganographic capacity. The main advantage of the GAN-based works is that they could synthesize stego images with rich semantics. However, we shall note that stego images can be easily identified by some well-trained forensic networks. In addition, there is no trade-off between capacity and extraction accuracy.

Facial Image Data Hiding via Generative Stego Synthesis

In this section, we first give an overview of the proposed FSIS-GAN framework and then introduce each component of the framework, accompanied with thorough discussion on the loss function, network structure and train-ing procedure.

Overview of FSIS-GAN

The proposed FSIS-GAN framework is illustrated in Figure 1. In general, it is an end-to-end framework consisting of three parts, where each part is designed to achieve a specific goal. First, the part of facial stego image synthesis and message extraction contains a generator G, an extractor E and the degradation layers N. The generator G is deployed to convert the secret message along with the secret key into a facial stego image. The degradation layers N are used to simulate possible common image degradations within the communication channel. The extractor E is learned to recover the secret message from the degraded stego image. Second, there is a discriminator D in the part of adversarial training, which aims at distinguishing the genuine data sample from the ones produced by the generator G. Third, a well-trained existing forensic network F 𝜃 (parameterized by 𝜃) is introduced in the part of anti-forensics, which could distinguish the genuine from the synthesized facial stego image. Note that this target forensic network is treated as a fixed adversary, and its network parameters are always frozen.

Stego Image Synthesis and Message Extraction

The part of facial stego image synthesis and message extraction achieve two functionalities. First, by using the generator G, one can convert the given secret message into a facial stego image. Second, the extractor E is responsible for extracting the secret message from the input stego image. Furthermore, a secret key is introduced to ensure the communication reliability and high diversity of the generated facial stego image. Generally, generator G and extractor E aim to learn two mappings, i.e., mapping the given secret message into a stego image, and vice versa. More formally, let m ∈ {0, 1} 𝑙 𝑚 and k ∈ {0, 1} 𝑙 𝑘 be the binary secret message and the secret key, respectively. Generator G is intended to learn the first mapping, transforming the message m along with the secret key k into a stego image:

S = G(m, k),(1)

where S denotes the synthesized facial stego image of shape 𝐶 × 𝐻 × 𝑊. To recover the secret message, we next introduce the extractor E. Considering that the facial stego image S may be degraded during transmission, the second mapping should be from the degraded stego image along with the secret key k to the secret message, which can be expressed by

m ′ = E(N(S), k),(2)

where N(⋅) models the image degradation process, and N(S) is the degraded stego image. Here, m ′ ∈ (0, 1) 𝑙 𝑚 denotes the extracted secret message. It shall be noted that the extracted message m ′ shall be (approximately) equals the original secret message m, and thus one can employ error correcting mechanism to fully correct the erroneous bits.

To measure the distortion between the original secret message m and the extracted message m ′ , we use the cross-entropy loss to calculate the message extraction loss L E , which is given by

L E (m, m ′ ) = − 1 𝑙 𝑚 𝑙 𝑚 ∑ 𝑖=1 [𝑚 𝑖 log(𝑚 ′ 𝑖 ) + (1 − 𝑚 𝑖 )log(1 − 𝑚 ′ 𝑖 )],

(3) where 𝑚 𝑖 and 𝑚 ′ 𝑖 is 𝑖-th element of m and m ′ , respectively. Note that, our proposed FSIS-GAN framework explicitly receiving a secret key as an input, which is designed to satisfy the Kerckhoffs' principle. It means that even the extractor E network is completely exposed to an attacker, the secret message m will be recovered only if the receiver obtain both the secret key k and the facial stego image S. It is worth emphasizing that, for most of the existing GAN-based methods, e.g., [15,16], there is no involvement of a secret key. Further notice that as the input of the extractor E, the dimensions of secret key k is greatly smaller than that of the facial stego image S. Thus, the extractor E tends to discard the secret key because it carries much less information. To mitigate this issue, we propose to use randomly generated incorrect secret key k ∈ {0, 1} 𝑙 𝑘 , where k ≠ k, as input during training stage. Instead of directly using the correct secret key and minimize the difference between the extracted and original message, we maximize the differences between the extracted and original message when applying incorrect secret key. Mathematically, the loss term inverse loss L Ẽ, can be expressed by the negative cross-entropy loss:

L Ẽ(m, m ′ ) = 1 𝑙 𝑚 𝑙 𝑚 ∑ 𝑖=1 [𝑚 𝑖 log( m′ 𝑖 ) + (1 − 𝑚 𝑖 )log(1 − m′ 𝑖 )],(4)

where m′ 𝑖 is the 𝑖-th element of the extracted message m ′ with the incorrect key k , i.e., m ′ = E(N(S), k

Enhancing robustness with degradation layers:

In a practical communication channel, there often exists degradations on the synthesized stego image S, when transmitting the stego to a receiver. To this end, the data hiding system requires certain robustness to ensure the accuracy of message extraction. Therefore, in this work, we take three representative degradations into account, i.e., image noise pollution, blurring, and compression. For noise pollution, we consider the one of the most widely-used noise models: Gaussian noise. For blurring, the Gaussian blurring is used. For signal compression, JPEG image compression is employed, which is extensively used for reducing the bandwidth of transmission process. In experiments, we implement these three types of degradation as neural network layers N to degrade the stego image. Specifically, three network layers are used for simulating each type of degradation. Gaussian noise layer (GNL) is to add Gaussian noise to the facial stego image S. Gaussian blurring layer (GBL) blurs S. For JPEG compression, considering that the quantitation operation is non-differentiable, we approximate the quantization operation with a differentiable polynomial function. Such differentiating technique can also be referred to the work HiDDeN [8].

Adversarial Training Part

As aforementioned, the hand-crafted stego synthesisbased data hiding methods [13,14] only could synthesize patterned images such as texture and fingerprint, limiting their practical applications. Synthesizing a natural image with semantics is a challenging task. However, this problem can be alleviated with the guidance of adversarial training. In this part, the purpose of the discriminator D is to conduct adversarial training with the generator G and improve the plausibility of the synthesized facial stego images.

More specifically, let I be the genuine facial image sample of shape 𝐶 × 𝐻 × 𝑊 from a publicly available genuine facial image dataset. The discriminator D estimates the probability that a given image sample belonging to a synthesized by the generator G. The generator G attempts to fool the discriminator D. Through such adversarial training, the generator G is encouraged to synthesize much more realistic facial stego images. As a variant of GAN, the network structure and loss function of BEGAN [17] provides a good reference for improving training stability. Thus, we in this work employ the adversarial training loss used in BEGAN. Mathematically, the adversarial loss L adv for the generator G can be calculated as

L adv (D(S), S) = 1 𝐶𝐻 𝑊 [|D(S) − S|],(5)

where the shape of output D(S) is same as the facial stego image. The adversarial loss L D for the discriminator D is

L D (I, S) = 1 𝐶𝐻 𝑊 [|D(I) − I| − ℎ 𝑡 ⋅ |D(S) − S|],(6)

where ℎ 𝑡 controls the discrimination ability of D in the 𝑡-th training step to equilibrate the adversarial training.

It can be computed as

ℎ 𝑡+1 = ℎ 𝑡 + 𝜆 𝐶𝐻 𝑊 [𝛾|D(I) − I| − |D(S) − S|].(7)

Here the parameter 𝜆 is the learning rate of training, and 𝛾 is a hyper-parameter to control the diversity of synthesized facial images. The quality and diversity of the facial stego images can be freely adjusted by tuning the parameter 𝛾.

Anti-forensics Part

Remind that there is no explicit cover images involved in stego synthesis-based data hiding methods. This merit makes such type of data hiding method could effectively resist to conventional steganalysis detection. However, as pointed in [15], a well-trained forensic network could readily distinguish a synthesized stego image from the genuine one, even the synthesized stego image is of no perceptual differences to an observer. Although F 𝜃 is an expert in such a detection task, some studies [10,11] have shown that deep neural networkbased classifiers are vulnerable to adversarial examples. Inspired by this, we propose to apply strategies of obtaining adversarial examples to evade the stego detection network as a way for realizing anti-forensics. In FSIS-GAN framework, we consider a white-box scenario, i.e., assuming one has full knowledge of the target forensic network. The target forensic network F is trained with the genuine images from a publicly available facial dataset and the synthesized images that produced by BE-GAN [17]. Then, we integrate the well-trained F 𝜃 into the FSIS-GAN framework, in which F 𝜃 receives the synthesized facial stego image S and output the confidence. The gradients that back-propagated by the F 𝜃 are used to update the parameters of the generator G. To measure the loss of resisting forensic detection, we define the antiforensic loss L F 𝜃 to computes the cross-entropy between the output of F 𝜃 and our target genuine image label:

L F 𝜃 (S) = − log (1 − F 𝜃 (S)),(8)

where F 𝜃 (S) ∈ (0, 1) is the confidence output by F 𝜃 .

Clearly, the decrement of L F 𝜃 indicates the probability increment of S being identified as a genuine image by F 𝜃 .

Network Structure and Training Strategy

The network architecture of the generator G and the extractor E are shown in Figure 2. For generator G, the secret key vector k is first concatenated to the secret message vector m and then fed to subsequent layers. Then, G applies two fully-connected (FC) layers and three convtranspose (ConvT) layers to produce the facial stego image S. In particular, after each FC layer or ConvT layer, we apply batch normalization (BN) [18] and ReLU activation function to process intermediate vectors. In experiments, we found that both m and k are composed of binary number 0 or 1, and such form is not suitable as input and the adversarial training loss would diverge.

To solve this issue, additional BN layers were added, and normalization operation is carried out inside the network. Experiential results show that this trick could greatly alleviate the divergence problem. For extractor E, we shall ensure the secret key vector k and the facial stego image matrix S in a way such that the extractor E would not neglect the information provided by the secret key. To this end, the extractor E first applies FC layer to the secret key to form the intermediate matrix, i.e., 1 × 𝑊 × 𝐻. Then, the facial stego image S and the intermediate matrix are concatenated, and then feed the fused tensor to the four convolutional (Conv) layers. Finally, the extractor E applies the FC layer and Sigmoid activation function to produce the message vector m ′ (or m ′ ) with size of 1 × 𝑙 𝑚 .

For the discriminator D, we adopt the auto-encoder alike structure from BEGAN [17]. For the target forensic network F, we use Ye-Net [19], which is a widely-used steganalytic method.

The training process of the proposed FSIS-GAN framework is iteratively optimize the loss function of each network, except the well-trained forensic network F 𝜃 . We apply the extraction loss L E and the adversarial loss L D as the loss function for the extractor E and the discriminator D, respectively. In particular, The total loss L G for the generator G is a proper fusion of the four losses aforementioned as follows

L G = L adv + 𝛼(L E + L Ẽ) + 𝛽L F 𝜃 ,(9)

where L adv is the adversarial loss for G, L Ẽ is the inverse loss, and L F 𝜃 is the anti-forensic loss. The hyperparameters of 𝛼 and 𝛽 are used to control the relative importance among the four losses.

Experiment results

In this section, we first introduce the experimental setup. Then, to verify the robustness of our proposed FSIS-GAN, it is evaluated under image degradation and without degradation, respectively. Finally, the anti-forensic capability of FSIS-GAN is validated.

Experimental Setup

Our experiments are conducted on the CelebA dataset [20], where the region with face is identified and extracted. All images are reshaped into 3 × 64 × 64. The following three metrics are used for evaluation:

• Fréchet Inception Distance (FID) [21], which is a widely-used perceptual image quality assessment metric for synthesized images. • Probability of missed detection (PMD). This metric can be calculated by PMD = 𝐹 𝑁 𝐹 𝑁 +𝑇 𝑃 , where 𝐹 𝑁 (False Negative) is the ratio for case "synthesized facial image is misclassified as a genuine one", and 𝑇 𝑃 (True Positive) is the ratio for case "synthesized facial image is correctly detected". Larger PMD indicates higher resisting ability to the forensic network.

The proposed FSIS-GAN framework is implemented with PyTorch and train on four NVIDIA GTX1080Ti GPUs with 11GB memory. The number of training epochs is set to with a mini batch-size of 64. We use Adam [22] as the optimizer with a learning rate of 2 × 10 −4 . For the hyper-parameters 𝛼 and 𝛽 in (9), with a number of trials and errors, we empirically set them as 0.1 in experiments. The parameter 𝛾 in ( 7) is set to 0.7, which is expected to produce reasonably diverse facial stego images. The competing method is the most related work [15]. We implement this work by ourselves because there is no publicly available code. With certain tweaking and fine-tuning, the tested results were comparable to the originally reported data from [15]. For a fair comparison, the length of the secret message 𝑙 𝑚 and the secret key 𝑙 𝑘 are all set to 100, so as to the payload is identical to that of work [15].

Performance Without Degradations

Notice that the competing method [15] does not consider the image degradations. To verify the effectiveness of the proposed method under same settings and make a fair comparison. We in this subsection to evaluate the performance without degradation layers N. The facial stego image S will be transmitted to extractor E without any degradation. To avoid confusion, this variation of our proposed method is termed as FSIS-GAN-WD (WD is abbreviated for Without Degradations). We first compare the visual quality of the facial stego images with the competing method [15]. As can be seen from Figure 3, the proposed FSIS-GAN-WD could synthesize more realistic facial stego images in comparison with Hu et al. [15]. With more careful inspection, one can notice that the stego images produced by FSIS-GAN-WD are more vivid and with more correct semantic structures. It is difficult for a common human to aware the inauthenticity of the facial stego images synthesized by FSIS-GAN-WD. In contrast, the stego images generated by Hu et al. [15] are typically blurry and severely distorted, which apparently draw attentions from a forensic analyzer. For the FID evaluation experiment, we use 10, 000 pairs of genuine images and synthesized facial stego images to compute the FID score. The FID score of FSIS-GAN-WD is 23.20, which is much smaller than that of Hu et al. [15]'s 32.07.

Then, we evaluate the extraction accuracy for the case of without degradation. The results are tabulated in Table 1. To demonstrate the impact of the inverse loss L Ẽ on the extraction accuracy, the ablation experiments are also conducted, by excluding the inverse loss during training. This L Ẽ-ablated version is denoted as FSIS-GAN-WD (ex L Ẽ). From the Table 1, one can draw the following conclusions. First, the extraction accuracy of FSIS-GAN-WD with the correct secret key k is 98.76%, which dramatically outperforms 85.23% the competing method [15]. Second, by comparing FSIS-GAN-WD and FSIS-GAN-WD (ex L Ẽ), one can see that, the extraction accuracy of FSIS-GAN-WD with a correct secret key k slightly inferior to that of FSIS-GAN-WD (ex L Ẽ). This suggests that the introduced inverse loss would marginally harm the extraction accuracy. However, when comparing the case of incorrect key k , the participation of the inverse loss L Ẽ Table 1 Comparison of message extraction accuracy (%) for the case of no communication degradations. Here, k and k denote the correct and incorrect secret key, respectively. FSIS-GAN-WD is a variant of the proposed method by excluding the degradation layers, and FSIS-GAN-WD (ex L Ẽ) represents the FSIS-GAN-WD trained without inverse loss L Ẽ.

Scheme

Hu et al.

Performance With Degradations

In this subsection, we test the robustness performance of the proposed framework under certain image degradations. The image degradation type and level are given as prior knowledge. This scenario is common in practice because one can obtain some prior knowledge on the degradation through probing the communication channel. Thus, one can fix the degradation layers N and its associated parameters during training stage. Specifically, in our experiments, the standard deviation 𝜎 1 of the Gaussian noise layer (GNL) is set to 0.2. The kernel width 𝑑 and the standard deviation 𝜎 2 of the Gaussian blurring layer (GBL) are set to 3 and 1, respectively. The differentiable JPEG compression layer (JCL) is implemented as suggested by the work HiDDEN [8] For referring simplicity, this variation is termed as FSIS-GAN-FD (FD is abbreviated for Fixed Degradation) in the sequel. Firstly, the stego images synthesized by FSIS-GAN-FD are provided in Figure 4. One can observe that some speckle noises emerge in the generated stego images, which can be clearly seen from the highlighted regions with red line in Figure 4 (b). Quantitatively, the FID score of FSIS-GAN-FD is 41.40, which is inferior to that of FSIS-GAN-WD (23.20) and Hu et al. [15] (32.07). Nevertheless, the stego images produced by FSIS-GAN-FD are intuitively more realistic than that of Hu et al. [15].

Secondly, in Table 2, we report the extraction accuracy performance under fixed degradations. Not surprisingly, one can notice that the extraction accuracy of Hu et al. [15] and FSIS-GAN-WD greatly degrade, which can be attributed to the overlooking on degradation-resistant message extraction issue. In contrast, FSIS-GAN-FD ex- The results verify that for the case of known degradations, the proposed framework could learn to effectively resistant the fixed degradations, by employing the fixed degradation layers during the training. Finally, to illustrate how the robustness of message extraction changes under different degradation levels, we test different degradation types with a variety of degradation levels. Due to space limit, we only report the JPEG compression degradation in Figure 5. As can be seen, with the decrement of quality factor (𝑄𝐹), the extraction accuracy generally decreases. Although the JCL that adopted from HiDDEN [8] could handle nondifferentiable JPEG compression, it cannot perfectly reproduce the JPEG compression artifacts. Nevertheless, FSIS-GAN-FD still achieve superior robustness, when comparing with other two schemes.

Performance of Anti-forensics

Recall that, owing to that no cover images are involvement for data hiding, our method has a relatively good undetectability when exposed to a steganalyzer. How- ever, as pointed in [15], a well-trained forensic network can effectively identify a synthesized image. To solve this issue, we explicitly considered the anti-forensics scenario and introduce the anti-forensic loss L F 𝜃 .

To demonstrate the influence of anti-forensic loss L F 𝜃 , we conduct the ablation experiment by excluding the loss term L F 𝜃 , and thus this variant is termed as FSIS-GAN (ex L F 𝜃 ). For a concrete example, we employ the well-trained forensic network Ye-Net [19] F 𝜃 to detect 3000 facial stego images produced by different methods, and record the probability of missed detection (PMD). The PMD's of Hu et al. [15], FSIS-GAN (ex L F 𝜃 ), and FSIS-GAN are 3.23%, 8.84%, and 89.91, respectively. As clearly shown, for FSIS-GAN (ex L F 𝜃 ), despite the facial stego images look natural for human, they are easily exposed to the forensic network, where the PMD value is lower than 10%. In contrast, by introducing the anti-forensic loss term, the value of PMD of FSIS-GAN could reach 89.91%. This means the proposed method FSIS-GAN could effectively bypass the existing forensic network, retaining an nice anti-forensic capability.

Conclusion

In this work, we proposed a stego-synthesis based data hiding method using generative neural network, by explicitly considering the image degradation and antiforensic need. Specifically, the generator is to synthesize a facial stego image from the given secret message and secret key. The extractor aims to recover the secret message with the secret key. Through the adversarial training with the discriminator, the generator could produce realistic facial stego images. The degradation layers are introduced during the training, which significantly enhance the robustness of message extraction. A forensic network is incorporated during training, in response to the possible adversarial forensic analysis in communication channel. Experimental results verified that, our approach could generate more natural facial stego images, while retaining higher message extraction accuracy and nice anti-forensic ability.

Figure 1 :1Figure 1: Overview of the proposed FSIS-GAN framework.

( a )Figure 2 :a2Figure 2: Network structure of the generator G and the extractor E. "Concat", "FC", "ConvT", "BN", "Conv" denote the concatenation, fully-connected layer, convtranspose layer, batch norm, and convolution layer, respectively.

FID is a de facto metric for assessing the image quality created by generator of GANs'. Lower score indicates better consistency with human's perception on natural images. • Accuracy of message extraction (ACC) that is computed by ACC = 𝐿 Ext 𝐿 , where 𝐿 Ext is the length of correctly extracted message and 𝐿 is the length of secret message m.

Figure 3 :3Figure 3: Comparison of exemplar synthesized stego images. Top: Hu et al. [15]; Bottom: Proposed FSIS-GAN-WD.

[ 15 ]15SchemeHu et al.[15]

Figure 4 :4Figure 4: The comparison of synthesized facial stego images, where four images of (a) are produced by FSIS-GAN-WD; images of (b) are stego images produced by FSIS-GAN-FD. With the introduction of degradation layers, minor speckle noises emerge (highlighted with red rectangular).

Figure 5 :5Figure 5: Comparison of the message extraction accuracy (%) under various levels of JPEG compression degradation.

Table 22Comparison of message extraction accuracy (%) under various degradation conditions. The bold and marked value with an asterisk (*) denote the highest extraction accuracy with correct secret key k and the lowest extraction accuracy with the incorrect secret key k , respectively.SchemeHu et al. [15]FSIS-GAN-WD FSIS-GAN-FD with k with k with k withkW/o degradation 85.2398.76 71.50 * 98.22 72.08Fixed GNL52.7259.78 56.23 * 95.58 72.74Fixed GBL69.6857.52 54.68 * 98.58 73.78Fixed JCL65.3361.38 58.00

* 98.46 72.67 hibits quite promising results. Under three types of degradation layers, the extraction accuracy typically exceeds 94% (though lower than that of FSIS-GAN-WD, which is specifically designed for the non-degradation scenario).

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 61901237, in part by the Open Project Program of the State Key Laboratory of CADCG, Zhejiang University under Grant A2006, and in part by the Ningbo Natural Science Foundation under Grant 2019A610103. Thanks to Southeast Digital Economic Development Institute for supporting the computing facility.

Contentadaptive steganography by minimizing statistical detectability VSedighi RCogranne JFridrich IEEE Trans. Inf. Forensics Security 11 2015 Secure reversible image data hiding over encrypted domain via key modulation JZhou WSun LDong XLiu OCAu YYTang IEEE Trans. Circuits Syst. Video Technol 26 2015 First steps toward concealing the traces left by reversible image data hiding LDong JZhou WSun DYan RWang IEEE Trans. Circuits Syst. II, Exp. Briefs 67 2020 Hiding images within images SBaluja IEEE Trans. Pattern Anal. Mach. Intell 42 2020 SS-GAN: secure steganography based on generative adversarial networks HShi JDong WWang YQian XZhang Pacific Rim Conference on Multimedia 2017 Automatic steganographic distortion learning using a generative adversarial network WTang STan BLi JHuang IEEE Signal Process. Lett 24 2017 Generating steganographic images via adversarial training JHayes GDanezis Proc. Adv. Neural Inf. Process. Syst Adv. Neural Inf. ess. Syst 2017 HiDDen: Hiding data with deep networks JZhu RKaplan JJohnson FLi Proc. Eur. Conf. Comput. Vis Eur. Conf. Comput. Vis 2018 KAZhang ACuesta-Infante LXu KVeeramachaneni arXiv:1901.03892 SteganoGAN: High capacity image steganography with GANs 2019 arXiv preprint CSzegedy WZaremba ISutskever JBruna DErhan IGoodfellow RFergus arXiv:1312.6199 Intriguing properties of neural networks 2013 arXiv preprint IJGoodfellow JShlens CSzegedy arXiv:1412.6572 Explaining and harnessing adversarial examples 2014 arXiv preprint CNNbased adversarial embedding for image steganography WTang BLi STan MBarni JHuang IEEE Trans. Inf. Forensics Security 14 2019 Steganography using reversible texture synthesis KWu CWang IEEE Trans. Image Process 24 2014 Toward construction-based data hiding: From secrets to fingerprint images SLi XZhang IEEE Trans. Image Process 28 2018 A novel image steganography method via deep convolutional generative adversarial networks DHu LWang WJiang SZheng BLi IEEE Access 6 2018 A generative method for steganography by cover synthesis with auxiliary semantics ZZhang GFu RNi JLiu XYang Tsinghua Science and Technology 25 2020 DBerthelot TSchumm LMetz arXiv:1703.10717 BEGAN: Boundary equilibrium generative adversarial networks 2017 arXiv preprint SIoffe CSzegedy arXiv:1502.03167 Batch normalization: Accelerating deep network training by reducing internal covariate shift 2015 arXiv preprint Deep learning hierarchical representations for image steganalysis JYe JNi YYi IEEE Trans. Inf. Forensics Security 12 2017 Deep learning face attributes in the wild ZLiu PLuo XWang XTang Proc. IEEE Int. Conf. Comput. Vis IEEE Int. Conf. Comput. Vis 2015 Gans trained by a two time-scale update rule converge to a local nash equilibrium MHeusel HRamsauer TUnterthiner BNessler SHochreiter Proc. Adv. Neural Inf. Process. Syst Adv. Neural Inf. ess. Syst 2017 DPKingma JBa arXiv:1412.6980 Adam: A method for stochastic optimization 2014 arXiv preprint