1. Introduction

Hiding via Facial Stego Synthesis With Generative Model

Li Dong

dongli@nbu.edu.cn 1 3 4

Jie Wang

1 3 4

Rangding Wang

1 3 4

Yuanman Li

yuanmanli@szu.edu.cn 2 4

Weiwei Sun

sunweiwei.sww@alibaba-inc.com 0 4 0 Alibaba Group , Zhejiang, China, 310052 1 Faculty of Electrical Engineering and Computer Science, Ningbo University , Zhejiang, China, 315211 2 Shenzhen University , Guangdong, China, 518061 3 Southeast Digital Economic Development Institute , Zhejiang, China, 324000 4 Workshop Proce dings

Stego synthesis-based data hiding aims to directly produce a plausible natural image to convey secret message. However, most of the existing works neglected the possible communication degradations and forensic actions, which commonly occur in practice. In this paper, we devise a generative adversarial network (GAN)-based framework to synthesize facial stego images. The framework consists of four components: generator, extractor, discriminator and forensic network. Specifically, the generator is deployed to generate a realistic facial stego image from the secret message and key, while the extractor aims at extracting the secret message from the stego image with the provided secret key. To combat forensics, we explicitly integrate a forensic network into the proposed framework, which is responsible for guiding the update of generator. Three degradation layers are further incorporated, enforcing the generator to characterize the communication degradations. Experimental results demonstrate that the proposed framework could accurately extract the secret message and efectively resist the forensic detection and certain degradations, while attaining realistic facial stego images.

data hiding stego synthesis generative adversarial network

1. Introduction

Data hiding aims to embed the secret message into a cover signal, without incurring awareness of an adversary. It is widely used in many applications, e.g., covert communication [1] and multimedia data protection [2, 3].

The primitive ad-hoc Least-Significant Bit (LSB) replaces

the bit in least significant bit-plane of each pixel with tempt to eliminate the traces of data hiding action and improve the steganographic capacity. For example, contentadaptive steganography [1] designed sophisticated distortion function according to prior knowledge and used Recently, neural network-based data hiding is becoming one of the active research directions. Baluja [4] employed convolutional neural networks to hide an entire secret image into the cover image in an end-to-end fashion. The work SSGAN [5] attempted to exploit GAN to synthesize a cover image which is more suitable for the subsequent steganographic data embedding. ASDL-GAN [6] integrated the content-adaptive steganography and GAN, in which the generator was able to produce the modifica(W. Sun) HiDDeN [8] and SteganoGAN [9], they all designed an encoder-decoder alike framework based on GAN. These methods could automatically learn the suitable areas for embedding the secret bitstream message.

For the last several years, the adversarial examples to neural networks meet data hiding, and continuously drawing extensive attention from the community. Some bations to the input data would paralyze the prediction capability of learning-based classifiers. As the opponent of data hiding, steganalysis aims to expose the data hiding on stego signal and usually involves machine-learning ods to bypass steganalysis by borrowing some strategies from the adversarial examples-related works. Tang et al. [12] presented the Adversarial Embedding (ADV-EMB) method that adjusts the modification cost of image elements, according to the gradients that back-propagated from the target steganalytic neural network. The constructed adversarial stego could efectively fool the steganalytic network, revealing the vulnerability of the deep learning-based steganalyzer.

Note that, all aforementioned data hiding techniques are based on the cover modification. The common characteristic is that these methods can not be independent of the modification on the given cover image. As such, it inevitably leaves artifacts exposing to steganalysis. On the contrary, stego synthesis-based data hiding, e.g., [13, 14], refers to synthesizing the stego image directly from the secret message. It could pose more challenges for ste- Adversarial Training Anti-Forensics ganalysis. Under this concept, traditional methods tried Discriminator Forensic network to produce stego image based on some hand-crafted des- Genuine image ignations. Although the capacity was relatively higher, tahsetyexwtuerreesliamnidtefindgteorspyrninthtse.siAzisnagnpaalttteerrnnaetdivimeasogleust,isounc,h 1Sm0ee1csr0sea1tg··e·11 Generator Synthesized Delagyreardsation DiemgraagdeedEs(txet)groacted some methods [15, 16] use GAN to synthesize stego im- 10010···01 stego image Extractor 1m0e1s0sa1g·e··01′ ages with rich semantics, e.g., face and food. However, Secret key the accuracy of message extraction was unsatisfactory Facial Stego Image Synthesis and Message Extraction under image degradations. Moreover, the synthesized stego images can be easily identified by a well-trained Figure 1: Overview of the proposed FSIS-GAN framework. forensic detector. It is thus urgent to further improve the robustness of message extraction and anti-forensic modification would leave embedding traces that can be capability of stego synthesis-based data hiding methods. detected. To resist the detection by steganalyzer, stego

In this work, we propose a Facial Stego Image Synthe- synthesis-based data hiding method could directly prosis method for data hiding with GAN, which is termed as duce the stego images from the given secret message. FSIS-GAN. Unlike the cover modification-based data hid- For early attempts, Wu et al. citewu2014steganography ing methods, FSIS-GAN is designed without providing proposed a texture image synthesis-based method, which a cover image beforehand. Compared with the existing selectively distributes the source patches of the original stego synthesis-based methods, FSIS-GAN can not only texture image onto the synthesized stego image. The synthesize realistic facial stego images, but also achieve message hiding and extraction depend on the choice of superior performance in terms of robustness and anti- source patches. Motivated by the fingerprint biometforensic capability. Experimental results conducted on rics, Li et al. [14] proposed to use the hologram phase the public facial dataset validate such merits of our pro- constructed from the secret message to synthesize finposed method. The main contributions of this work can gerprint stego image. The hologram phase consists of be summarized as follows, two phases: The first spiral phase encodes the secret message to the two-dimensional points with diferent po• We explicitly consider the image degradation dur- larities, and the second continuous phase is to synthesize ing the covert communication, and integrate mul- fingerprint images. It is worth noting that conventional tiple degradation layers into the framework. This stego image synthesis-based methods can only syntheboost the robustness performance in terms of the size patterned stego image such as textures, lacking rich message extraction. semantics, which limits their practical applications. • We incorporate a forensic network during train- Instead, Hu et al. [15] suggested using the generaing FSIS-GAN. By exploiting the gradients from tor of GAN to synthesize a facial stego image from the such a forensic network, the stego image pro- secret message. Meanwhile, the secret message can be duced by the learned generator could efectively extracted from the stego image by the corresponding exfool the forensic network. tractor network. Similarly, Zhang et al. [16] exploited • We explicitly adopt the secret key into the data GAN to generate stego image with diferent semantic hiding procedure of FISI-GAN, which could fur- labels, which could improve the robustness of data exther improve the reliability of the secret message traction but significantly scarifying the steganographic extraction. capacity. The main advantage of the GAN-based works is that they could synthesize stego images with rich seman

The rest of this paper is organized as follows. Section tics. However, we shall note that stego images can be II briefly reviews the related work on stego synthesis- easily identified by some well-trained forensic networks. based data hiding. Section III describes the proposed FSIS- In addition, there is no trade-of between capacity and GAN, including network architecture and loss function. extraction accuracy.

Section IV presents the experimental results, and the final conclusions are drawn in Section V.

2. Stego Synthesis-based Data Hiding The majority of data hiding method involves the modification on the given cover images. However, such cover 3. Facial Image Data Hiding via Generative Stego Synthesis In this section, we first give an overview of the proposed

FSIS-GAN framework and then introduce each component of the framework, accompanied with thorough discussion on the loss function, network structure and training procedure.

3.1. Overview of FSIS-GAN

The proposed FSIS-GAN framework is illustrated in Figure 1. In general, it is an end-to-end framework consisting of three parts, where each part is designed to achieve a specific goal. First, the part of facial stego image synthesis and message extraction contains a generator G, an extractor E and the degradation layers N. The generator G is deployed to convert the secret message along with the secret key into a facial stego image. The degradation layers N are used to simulate possible common image degradations within the communication channel. The extractor E is learned to recover the secret message from the degraded stego image. Second, there is a discriminator D in the part of adversarial training, which aims at duced by the generator G. Third, a well-trained existing forensic network F (parameterized by ) is introduced in the part of anti-forensics, which could distinguish the genuine from the synthesized facial stego image. Note that this target forensic network is treated as a fixed

3.2. Stego Image Synthesis and Message Extraction

The part of facial stego image synthesis and message extraction achieve two functionalities. First, by using the generator G, one can convert the given secret message into a facial stego image. Second, the extractor E is responsible for extracting the secret message from the input stego image. Furthermore, a secret key is introduced to ensure the communication reliability and high diversity of the generated facial stego image.

Generally, generator G and extractor E aim to learn two mappings, i.e., mapping the given secret message into a stego image, and vice versa. More formally, let m ∈ {0, 1} and k ∈ {0, 1} be the binary secret message and the secret key, respectively. Generator G is intended to learn the first mapping, transforming the message along with the secret key k into a stego image:

S = G(m, k), where S denotes the synthesized facial stego image of shape × ×

. To recover the secret message, we next introduce the extractor E. Considering that the facial stego image S may be degraded during transmission, the second mapping should be from the degraded stego image along with the secret key k to the secret message, which can be expressed by

m′ = E(N(S),k), distinguishing the genuine data sample from the ones pro- itly receiving a secret key as an input, which is designed adversary, and its network parameters are always frozen. the existing GAN-based methods, e.g., [15, 16], there is where N(⋅) models the image degradation process, and denotes the extracted secret message. It shall be noted that the extracted message m′ shall be (approximately) equals the original secret message m, and thus one can employ error correcting mechanism to fully correct the erroneous bits.

To measure the distortion between the original secret

message m and the extracted message m′, we use the cross-entropy loss to calculate the message extraction loss LE, which is given by

LE(m, m′) = −

∑ [ log( ′) + (1 − )log(1 − ′)], 1

=1 (3) where and ′ is -th element of m and m′, respectively.

Note that, our proposed FSIS-GAN framework explicto satisfy the Kerckhofs’ principle. It means that even the extractor E network is completely exposed to an attacker, the secret message m will be recovered only if the receiver obtain both the secret key k and the facial stego image S. It is worth emphasizing that, for most of no involvement of a secret key. Further notice that as the input of the extractor E, the dimensions of secret key k is greatly smaller than that of the facial stego image S. Thus, the extractor E tends to discard the secret key because it carries much less information. To mitigate this issue, we propose to use randomly generated incorrect secret key k̃ ∈ {0, 1} , where k̃ ≠ k, as input during training stage. Instead of directly using the correct secret key and minimize the diference between the extracted and original message, we maximize the diferences between the extracted and original message when applying incorrect secret key. Mathematically, the loss term inverse loss LẼ , can be expressed by the negative cross-entropy loss: LẼ (m, m̃′) = 1

∑ [ log( ̃′) + (1 − )log(1 − ̃′)], (4) m (1) (2) with the incorrect key k̃, i.e., m̃′ = E(N(S),k̃). where ̃′ is the -th element of the extracted message m̃′

Enhancing robustness with degradation layers: In a practical communication channel, there often exists degradations on the synthesized stego image S, when transmitting the stego to a receiver. To this end, the data hiding system requires certain robustness to ensure the accuracy of message extraction. Therefore, in this work, we take three representative degradations into account, i.e., image noise pollution, blurring, and compression. For noise pollution, we consider the one of the most widely-used noise models: Gaussian noise. For blurring, the Gaussian blurring is used. For signal compression, JPEG image compression is employed, which is extensively used for reducing the bandwidth of transmission process. In experiments, we implement these three types of degradation as neural network layers N to degrade the stego image. Specifically, three network layers are used for simulating each type of degradation. Gaussian noise layer (GNL) is to add Gaussian noise to the facial stego image S. Gaussian blurring layer (GBL) blurs S. For JPEG compression, considering that the quantitation operation is non-diferentiable, we approximate the quantization operation with a diferentiable polynomial function. Such diferentiating technique can also be referred to the work

HiDDeN [8]. 3.3. Adversarial Training Part

As aforementioned, the hand-crafted stego synthesisbased data hiding methods [13, 14] only could synthesize patterned images such as texture and fingerprint, limiting their practical applications. Synthesizing a natural image with semantics is a challenging task. However, this problem can be alleviated with the guidance of adversarial training. In this part, the purpose of the discriminator D is to conduct adversarial training with the generator G and improve the plausibility of the synthesized facial stego images.

More specifically, let I be the genuine facial image sample of shape × ×

from a publicly available genuine facial image dataset. The discriminator D estimates the thesized by the generator G. The generator G attempts to fool the discriminator D. Through such adversarial training, the generator G is encouraged to synthesize much more realistic facial stego images. As a variant of GAN, the network structure and loss function of BEGAN [17] (5) (6) Thus, we in this work employ the adversarial training loss used in BEGAN. Mathematically, the adversarial loss Ladv for the generator G can be calculated as

Ladv(D(S),S) =

[|D(S) − S|], 1 where the shape of output D(S)is same as the facial stego image. The adversarial loss LD for the discriminator D is

1 LD(I, S) = [|D(I) − I| − ℎ ⋅ |D(S) − S|],

3.4. Anti-forensics Part

Remind that there is no explicit cover images involved in stego synthesis-based data hiding methods. This merit makes such type of data hiding method could efectively resist to conventional steganalysis detection. However, as pointed in [15], a well-trained forensic network could readily distinguish a synthesized stego image from the genuine one, even the synthesized stego image is of no perceptual diferences to an observer.

Although F is an expert in such a detection task, some studies [10, 11] have shown that deep neural networkbased classifiers are vulnerable to adversarial examples.

Inspired by this, we propose to apply strategies of obtaining adversarial examples to evade the stego detection network as a way for realizing anti-forensics. In FSISGAN framework, we consider a white-box scenario, i.e., assuming one has full knowledge of the target forensic network. The target forensic network F is trained with the genuine images from a publicly available facial dataset and the synthesized images that produced by BEGAN [17]. Then, we integrate the well-trained F into the FSIS-GAN framework, in which F receives the synthesized facial stego image S and output the confidence.

The gradients that back-propagated by the F are used to update the parameters of the generator G. To measure the loss of resisting forensic detection, we define the antithe output of F and our target genuine image label:

LF (S) = −log (1 − F (S)), where F (S) ∈ (0, 1) is the confidence output by Clearly, the decrement of LF indicates the probability (8) F .

F .

probability that a given image sample belonging to a syn- forensic loss LF to computes the cross-entropy between provides a good reference for improving training stability. increment of S being identified as a genuine image by where ℎ controls the discrimination ability of D in the -th training step to equilibrate the adversarial training. layer, we apply batch normalization (BN) [18] and ReLU

It can be computed as

ℎ+1 = ℎ +

Here the parameter is the learning rate of training, and is a hyper-parameter to control the diversity of synthesized facial images. The quality and diversity of the facial stego images can be freely adjusted by tuning the parameter .

[ |D(I) − I| − |D(S) − S|]. (7)

3.5. Network Structure and Training Strategy

The network architecture of the generator G and the extractor E are shown in Figure 2. For generator G, the secret key vector k is first concatenated to the secret message vector m and then fed to subsequent layers. Then, G applies two fully-connected (FC) layers and three convtranspose (ConvT) layers to produce the facial stego image S. In particular, after each FC layer or ConvT activation function to process intermediate vectors. In experiments, we found that both m and k are composed of binary number 0 or 1, and such form is not suitable as input and the adversarial training loss would diverge. To solve this issue, additional BN layers were added, and normalization operation is carried out inside the network. Experiential results show that this trick could greatly alleviate the divergence problem. !&""#'$'"(%)"*!

!"#" !"#$"% &"'(! !"/" (! !"#$"%

In this section, we first introduce the experimental setup.

Then, to verify the robustness of our proposed FSIS-GAN, it is evaluated under image degradation and without degradation, respectively. Finally, the anti-forensic capability of FSIS-GAN is validated.

4.1. Experimental Setup

(!)*+,-"+.

Our experiments are conducted on the CelebA dataset [20], where the region with face is identified and ex!"/! tracted. All images are reshaped into 3 × 64 × 64. The 1"++23"(#$ following three metrics are used for evaluation: 45%$2#%". (a) Network structure of the generator G vector m′ (or m̃′) with size of 1 × .

For the discriminator D, we adopt the auto-encoder alike structure from BEGAN [17]. For the target forensic network F, we use Ye-Net [19], which is a widely-used steganalytic method. The training process of the proposed FSIS-GAN framework is iteratively optimize the loss function of each network, except the well-trained forensic network F .

We apply the extraction loss LE and the adversarial loss LD as the loss function for the extractor E and the discriminator D, respectively. In particular, The total loss LG for the generator G is a proper fusion of the four losses aforementioned as follows

LG = Ladv + (L E + L̃ ) + L F ,

E (9) • Fréchet Inception Distance (FID) [21], which is a widely-used perceptual image quality assessment metric for synthesized images. FID is a de facto metric for assessing the image quality created by generator of GANs’. Lower FID score indicates better consistency with human’s perception on natural images. • Accuracy of message extraction (ACC) that is computed by ACC = of secret message m. of correctly extracted message and is the length Ext , where Ext is the length • Probability of missed detection (PMD). This metric can be calculated by PMD = , where

(False Negative) is the ratio for case “synthesized facial image is misclassified as a genuine one”, and

(True Positive) is the ratio for case “synthesized facial image is correctly detected”.

Larger PMD indicates higher resisting ability to

+ the forensic network.

The proposed FSIS-GAN framework is implemented

with PyTorch and train on four NVIDIA GTX1080Ti

GPUs with 11GB memory.

The number of training epochs is set to 400 with a mini batch-size of 64. We use Adam [22] as the optimizer with a learning rate of 2 × 10−4. For the hyper-parameters and in (9), with a number of trials and errors, we empirically set them as 0.1 in experiments. The parameter in (7) is set to 0.7, which is expected to produce reasonably diverse facial stego images. The competing method is the most related work [15]. We implement this work by ourselves because there is no publicly available code. With certain tweaking and fine-tuning, the tested results were comwhere Ladv is the adversarial loss for G, L̃ is the in- parable to the originally reported data from [15]. For a

E verse loss, and LF is the anti-forensic loss. The hyperparameters of and are used to control the relative importance among the four losses. identical to that of work [15]. fair comparison, the length of the secret message and the secret key are all set to 100, so as to the payload is 4.2. Performance Without Degradations would significantly deduce the extraction accuracy from 97.01% to 71.50%, while FSIS-GAN-WD almost retains the same extraction accuracy. This phenomena means that the involvement of the secret key will not work if we exclude the inverse loss. In contrast, FSIS-GAN-WD (ex L̃ ) with the incorrect key k̃ still attains a quite high

E extraction accuracy of (> 97%). In a short summary, without the inverse loss LẼ , the variant FSIS-GAN-WD (ex L̃ ) will violate the Kerckhsofs’ principle.

E Notice that the competing method [15] does not consider the image degradations. To verify the efectiveness of the proposed method under same settings and make a fair comparison. We in this subsection to evaluate the performance without degradation layers N. The facial stego image S will be transmitted to extractor E without any degradation. To avoid confusion, this variation of our proposed method is termed as FSIS-GAN-WD (WD is abbreviated for Without Degradations). We first compare the visual quality of the facial stego images with the competing method [15]. As can be seen from Figure 3, the 4.3. Performance With Degradations proposed FSIS-GAN-WD could synthesize more realistic In this subsection, we test the robustness performance facial stego images in comparison with Hu et al. [15]. of the proposed framework under certain image degraWith more careful inspection, one can notice that the dations. The image degradation type and level are given stego images produced by FSIS-GAN-WD are more vivid as prior knowledge. This scenario is common in practice and with more correct semantic structures. It is dificult because one can obtain some prior knowledge on the for a common human to aware the inauthenticity of the degradation through probing the communication chanfacial stego images synthesized by FSIS-GAN-WD. In nel. Thus, one can fix the degradation layers N and its contrast, the stego images generated by Hu et al. [15] are associated parameters during training stage. Specifically, typically blurry and severely distorted, which apparently in our experiments, the standard deviation 1 of the Gausdraw attentions from a forensic analyzer. For the FID sian noise layer (GNL) is set to 0.2. The kernel width evaluation experiment, we use 10, 000 pairs of genuine and the standard deviation 2 of the Gaussian blurring images and synthesized facial stego images to compute layer (GBL) are set to 3 and 1, respectively. The diferthe FID score. The FID score of FSIS-GAN-WD is 23.20, entiable JPEG compression layer (JCL) is implemented which is much smaller than that of Hu et al. [15]’s 32.07. as suggested by the work HiDDEN [8] For referring sim

Then, we evaluate the extraction accuracy for the case plicity, this variation is termed as FSIS-GAN-FD (FD is of without degradation. The results are tabulated in Table abbreviated for Fixed Degradation) in the sequel. 1. To demonstrate the impact of the inverse loss LẼ on Firstly, the stego images synthesized by FSIS-GAN-FD the extraction accuracy, the ablation experiments are also are provided in Figure 4. One can observe that some conducted, by excluding the inverse loss during training. speckle noises emerge in the generated stego images, This LẼ -ablated version is denoted as FSIS-GAN-WD (ex which can be clearly seen from the highlighted regions LẼ ). From the Table 1, one can draw the following con- with red line in Figure 4 (b). Quantitatively, the FID score clusions. First, the extraction accuracy of FSIS-GAN-WD of FSIS-GAN-FD is 41.40, which is inferior to that of with the correct secret key k is 98.76%, which dramati- FSIS-GAN-WD (23.20) and Hu et al. [15] (32.07). Nevercally outperforms 85.23% of the competing method [15]. theless, the stego images produced by FSIS-GAN-FD are Second, by comparing FSIS-GAN-WD and FSIS-GAN- intuitively more realistic than that of Hu et al. [15]. WD (ex LẼ ), one can see that, the extraction accuracy of Secondly, in Table 2, we report the extraction accuracy FSIS-GAN-WD with a correct secret key k slightly infe- performance under fixed degradations. Not surprisingly, rior to that of FSIS-GAN-WD (ex LẼ ). This suggests that one can notice that the extraction accuracy of Hu et al. the introduced inverse loss would marginally harm the [15] and FSIS-GAN-WD greatly degrade, which can be extraction accuracy. However, when comparing the case attributed to the overlooking on degradation-resistant of incorrect key k̃, the participation of the inverse loss LẼ message extraction issue. In contrast, FSIS-GAN-FD ex

Hu et al.

FSIS-GAN-WD

FSIS-GAN-FD 75 Compres5si5on quality f4a5ctors (QF) 15

5 ever, as pointed in [15], a well-trained forensic network Table 2 can efectively identify a synthesized image. To solve this dCeogmrapdaaritsioonnocfomndesitsiaognese.xtTrhaectiboonldacacnudracmya(r%k)eudnvdaelruvearwioiuths issue, we explicitly considered the anti-forensics scenario an asterisk (*) denote the highest extraction accuracy with and introduce the anti-forensic loss LF . correct secret key k and the lowest extraction accuracy with To demonstrate the influence of anti-forensic loss LF , the incorrect secret key k̃, respectively. we conduct the ablation experiment by excluding the loss term LF , and thus this variant is termed as FSIS-GAN (ex Scheme Hu et al. FSIS-GAN-WD FSIS-GAN-FD LF ). For a concrete example, we employ the well-trained [15] with k with k̃ with k with k̃ forensic network Ye-Net [19] F to detect 3000 facial W/o degradation 85.23 98.76 71.50∗ 98.22 72.08 stego images produced by diferent methods, and record the probability of missed detection (PMD). The PMD’s Fixed GNL 52.72 59.78 56.23∗ 95.58 72.74 of Hu et al. [15], FSIS-GAN (ex LF ), and FSIS-GAN are Fixed GBL 69.68 57.52 54.68∗ 98.58 73.78 3.23%, 8.84%, and 89.91, respectively. As clearly shown, Fixed JCL 65.33 61.38 58.00∗ 98.46 72.67 for FSIS-GAN (ex LF ), despite the facial stego images look natural for human, they are easily exposed to the forensic network, where the PMD value is lower than 10%. hibits quite promising results. Under three types of degra- In contrast, by introducing the anti-forensic loss term, the dation layers, the extraction accuracy typically exceeds value of PMD of FSIS-GAN could reach 89.91%. This 94% (though lower than that of FSIS-GAN-WD, which is means the proposed method FSIS-GAN could efectively specifically designed for the non-degradation scenario). bypass the existing forensic network, retaining an nice The results verify that for the case of known degrada- anti-forensic capability. tions, the proposed framework could learn to efectively resistant the fixed degradations, by employing the fixed degradation layers during the training. 5. Conclusion

Finally, to illustrate how the robustness of message extraction changes under diferent degradation levels, we test diferent degradation types with a variety of degradation levels. Due to space limit, we only report the JPEG compression degradation in Figure 5. As can be seen, with the decrement of quality factor ( ), the extraction accuracy generally decreases. Although the JCL that adopted from HiDDEN [8] could handle nondiferentiable JPEG compression, it cannot perfectly reproduce the JPEG compression artifacts. Nevertheless, FSIS-GAN-FD still achieve superior robustness, when comparing with other two schemes.

Acknowledgments

phy, IEEE Trans. Inf. Forensics Security 14 (2019) 2074–2087.

This work was supported in part by the National Natu- [13] K. Wu, C. Wang, Steganography using reversible ral Science Foundation of China under Grant 61901237, texture synthesis, IEEE Trans. Image Process. 24 in part by the Open Project Program of the State Key (2014) 130–139.

Laboratory of CADCG, Zhejiang University under Grant [14] S. Li, X. Zhang, Toward construction-based data A2006, and in part by the Ningbo Natural Science Foun- hiding: From secrets to fingerprint images, IEEE dation under Grant 2019A610103. Thanks to Southeast Trans. Image Process. 28 (2018) 1482–1497. Digital Economic Development Institute for supporting [15] D. Hu, L. Wang, W. Jiang, S. Zheng, B. Li, A novel imthe computing facility. age steganography method via deep convolutional generative adversarial networks, IEEE Access 6 References (2018) 38303–38314. [16] Z. Zhang, G. Fu, R. Ni, J. Liu, X. Yang, A genera[1] V. Sedighi, R. Cogranne, J. Fridrich, Content- tive method for steganography by cover synthesis adaptive steganography by minimizing statistical with auxiliary semantics, Tsinghua Science and detectability, IEEE Trans. Inf. Forensics Security 11 Technology 25 (2020) 516–527.

(2015) 221–234. [17] D. Berthelot, T. Schumm, L. Metz, BEGAN: Bound[2] J. Zhou, W. Sun, L. Dong, X. Liu, O. C. Au, Y. Y. Tang, ary equilibrium generative adversarial networks, Secure reversible image data hiding over encrypted arXiv preprint arXiv:1703.10717 (2017). domain via key modulation, IEEE Trans. Circuits [18] S. Iofe, C. Szegedy, Batch normalization: AccelerSyst. Video Technol. 26 (2015) 441–452. ating deep network training by reducing internal [3] L. Dong, J. Zhou, W. Sun, D. Yan, R. Wang, First covariate shift, arXiv preprint arXiv:1502.03167 steps toward concealing the traces left by reversible (2015). image data hiding, IEEE Trans. Circuits Syst. II, Exp. [19] J. Ye, J. Ni, Y. Yi, Deep learning hierarchical repreBriefs 67 (2020) 951–955. sentations for image steganalysis, IEEE Trans. Inf. [4] S. Baluja, Hiding images within images, IEEE Trans. Forensics Security 12 (2017) 2545–2557.

Pattern Anal. Mach. Intell. 42 (2020) 1685–1697. [20] Z. Liu, P. Luo, X. Wang, X. Tang, Deep learning [5] H. Shi, J. Dong, W. Wang, Y. Qian, X. Zhang, SS- face attributes in the wild, in: Proc. IEEE Int. Conf.

GAN: secure steganography based on generative Comput. Vis., 2015. adversarial networks, in: Pacific Rim Conference [21] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, on Multimedia, 2017, pp. 534–544. S. Hochreiter, Gans trained by a two time-scale [6] W. Tang, S. Tan, B. Li, J. Huang, Automatic stegano- update rule converge to a local nash equilibrium, graphic distortion learning using a generative ad- in: Proc. Adv. Neural Inf. Process. Syst., 2017, pp. versarial network, IEEE Signal Process. Lett. 24 6629–6640.

(2017) 1547–1551. [22] D. P. Kingma, J. Ba, Adam: A method for stochas[7] J. Hayes, G. Danezis, Generating steganographic tic optimization, arXiv preprint arXiv:1412.6980 images via adversarial training, in: Proc. Adv. Neu- (2014).

ral Inf. Process. Syst., 2017, pp. 1954–1963. [8] J. Zhu, R. Kaplan, J. Johnson, F. Li, HiDDen: Hiding data with deep networks, in: Proc. Eur. Conf.

Comput. Vis., 2018, pp. 657–672. [9] K. A. Zhang, A. Cuesta-Infante, L. Xu, K. Veeramachaneni, SteganoGAN: High capacity image steganography with GANs, arXiv preprint arXiv:1901.03892 (2019). [10] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, arXiv preprint arXiv:1312.6199 (2013). [11] I. J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, arXiv preprint arXiv:1412.6572 (2014). [12] W. Tang, B. Li, S. Tan, M. Barni, J. Huang, CNNbased adversarial embedding for image steganogra