=Paper=
{{Paper
|id=Vol-3084/paper8
|storemode=property
|title=Towards Image Data Hiding via Facial Stego Synthesis With Generative Model
|pdfUrl=https://ceur-ws.org/Vol-3084/paper8.pdf
|volume=Vol-3084
|authors=Li Dong,Jie Wang,Rangding Wang,Yuanman Li,Weiwei Sun
}}
==Towards Image Data Hiding via Facial Stego Synthesis With Generative Model==
Towards Image Data Hiding via Facial Stego Synthesis
With Generative Model
Li Dong1,2 , Jie Wang1,2 , Rangding Wang1,2 , Yuanman Li3 and Weiwei Sun4
1
Faculty of Electrical Engineering and Computer Science, Ningbo University, Zhejiang, China, 315211
2
Southeast Digital Economic Development Institute, Zhejiang, China, 324000
3
Shenzhen University, Guangdong, China, 518061
4
Alibaba Group, Zhejiang, China, 310052
Abstract
Stego synthesis-based data hiding aims to directly produce a plausible natural image to convey secret message. However,
most of the existing works neglected the possible communication degradations and forensic actions, which commonly occur
in practice. In this paper, we devise a generative adversarial network (GAN)-based framework to synthesize facial stego
images. The framework consists of four components: generator, extractor, discriminator and forensic network. Specifically,
the generator is deployed to generate a realistic facial stego image from the secret message and key, while the extractor aims at
extracting the secret message from the stego image with the provided secret key. To combat forensics, we explicitly integrate
a forensic network into the proposed framework, which is responsible for guiding the update of generator. Three degradation
layers are further incorporated, enforcing the generator to characterize the communication degradations. Experimental results
demonstrate that the proposed framework could accurately extract the secret message and effectively resist the forensic
detection and certain degradations, while attaining realistic facial stego images.
Keywords
data hiding, stego synthesis, generative adversarial network
1. Introduction tion probability maps. For the methods HayersGAN [7],
HiDDeN [8] and SteganoGAN [9], they all designed an
Data hiding aims to embed the secret message into a encoder-decoder alike framework based on GAN. These
cover signal, without incurring awareness of an adver- methods could automatically learn the suitable areas for
sary. It is widely used in many applications, e.g., covert embedding the secret bitstream message.
communication [1] and multimedia data protection [2, 3]. For the last several years, the adversarial examples
The primitive ad-hoc Least-Significant Bit (LSB) replaces to neural networks meet data hiding, and continuously
the bit in least significant bit-plane of each pixel with drawing extensive attention from the community. Some
the secret bit. While the modern data hiding methods at- studies, e.g., [10, 11], found that adding slight pertur-
tempt to eliminate the traces of data hiding action and im- bations to the input data would paralyze the prediction
prove the steganographic capacity. For example, content- capability of learning-based classifiers. As the opponent
adaptive steganography [1] designed sophisticated dis- of data hiding, steganalysis aims to expose the data hiding
tortion function according to prior knowledge and used on stego signal and usually involves machine-learning
Syndrome-Trellis coding to embed the secret message. classifiers. Therefore, it is possible for data hiding meth-
Recently, neural network-based data hiding is becoming ods to bypass steganalysis by borrowing some strategies
one of the active research directions. Baluja [4] employed from the adversarial examples-related works. Tang et al.
convolutional neural networks to hide an entire secret [12] presented the Adversarial Embedding (ADV-EMB)
image into the cover image in an end-to-end fashion. The method that adjusts the modification cost of image ele-
work SSGAN [5] attempted to exploit GAN to synthesize ments, according to the gradients that back-propagated
a cover image which is more suitable for the subsequent from the target steganalytic neural network. The con-
steganographic data embedding. ASDL-GAN [6] inte- structed adversarial stego could effectively fool the ste-
grated the content-adaptive steganography and GAN, in ganalytic network, revealing the vulnerability of the deep
which the generator was able to produce the modifica- learning-based steganalyzer.
Note that, all aforementioned data hiding techniques
International Workshop on Safety & Security of Deep Learning, 21st
-26th August, 2021
are based on the cover modification. The common char-
Envelope-Open dongli@nbu.edu.cn (L. Dong); 1811082196@nbu.edu.cn acteristic is that these methods can not be independent of
(J. Wang); wangrangding@nbu.edu.cn (R. Wang); the modification on the given cover image. As such, it in-
yuanmanli@szu.edu.cn (Y. Li); sunweiwei.sww@alibaba-inc.com evitably leaves artifacts exposing to steganalysis. On the
(W. Sun) contrary, stego synthesis-based data hiding, e.g., [13, 14],
Β© 2021 Copyright for this paper by its authors. Use permitted under Creative
CEUR
Workshop
http://ceur-ws.org
ISSN 1613-0073
Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)
refers to synthesizing the stego image directly from the
Proceedings
Adversarial Training Anti-Forensics
secret message. It could pose more challenges for ste-
ganalysis. Under this concept, traditional methods tried Discriminator π Forensic network ππ½
to produce stego image based on some hand-crafted des- Genuine image π
ignations. Although the capacity was relatively higher,
Secret Degraded stego
they were limited to synthesizing patterned images, such message π¦ Generator π
Degradation image π(π)
layers π
as textures and fingerprints. As an alternative solution, 10101Β·Β·Β·11
Synthesized Extracted
stego image π message π¦β²
some methods [15, 16] use GAN to synthesize stego im- 10010Β·Β·Β·01 Extractor π 10101Β·Β·Β·01
ages with rich semantics, e.g., face and food. However, Secret key π€
the accuracy of message extraction was unsatisfactory Facial Stego Image Synthesis and Message Extraction
under image degradations. Moreover, the synthesized
Figure 1: Overview of the proposed FSIS-GAN framework.
stego images can be easily identified by a well-trained
forensic detector. It is thus urgent to further improve
the robustness of message extraction and anti-forensic modification would leave embedding traces that can be
capability of stego synthesis-based data hiding methods. detected. To resist the detection by steganalyzer, stego
In this work, we propose a Facial Stego Image Synthe- synthesis-based data hiding method could directly pro-
sis method for data hiding with GAN, which is termed as duce the stego images from the given secret message.
FSIS-GAN. Unlike the cover modification-based data hid- For early attempts, Wu et al. citewu2014steganography
ing methods, FSIS-GAN is designed without providing proposed a texture image synthesis-based method, which
a cover image beforehand. Compared with the existing selectively distributes the source patches of the original
stego synthesis-based methods, FSIS-GAN can not only texture image onto the synthesized stego image. The
synthesize realistic facial stego images, but also achieve message hiding and extraction depend on the choice of
superior performance in terms of robustness and anti- source patches. Motivated by the fingerprint biomet-
forensic capability. Experimental results conducted on rics, Li et al. [14] proposed to use the hologram phase
the public facial dataset validate such merits of our pro- constructed from the secret message to synthesize fin-
posed method. The main contributions of this work can gerprint stego image. The hologram phase consists of
be summarized as follows, two phases: The first spiral phase encodes the secret
message to the two-dimensional points with different po-
β’ We explicitly consider the image degradation dur- larities, and the second continuous phase is to synthesize
ing the covert communication, and integrate mul- fingerprint images. It is worth noting that conventional
tiple degradation layers into the framework. This stego image synthesis-based methods can only synthe-
boost the robustness performance in terms of the size patterned stego image such as textures, lacking rich
message extraction. semantics, which limits their practical applications.
β’ We incorporate a forensic network during train- Instead, Hu et al. [15] suggested using the genera-
ing FSIS-GAN. By exploiting the gradients from tor of GAN to synthesize a facial stego image from the
such a forensic network, the stego image pro- secret message. Meanwhile, the secret message can be
duced by the learned generator could effectively extracted from the stego image by the corresponding ex-
fool the forensic network. tractor network. Similarly, Zhang et al. [16] exploited
β’ We explicitly adopt the secret key into the data GAN to generate stego image with different semantic
hiding procedure of FISI-GAN, which could fur- labels, which could improve the robustness of data ex-
ther improve the reliability of the secret message traction but significantly scarifying the steganographic
extraction. capacity. The main advantage of the GAN-based works is
that they could synthesize stego images with rich seman-
The rest of this paper is organized as follows. Section tics. However, we shall note that stego images can be
II briefly reviews the related work on stego synthesis- easily identified by some well-trained forensic networks.
based data hiding. Section III describes the proposed FSIS- In addition, there is no trade-off between capacity and
GAN, including network architecture and loss function. extraction accuracy.
Section IV presents the experimental results, and the final
conclusions are drawn in Section V.
3. Facial Image Data Hiding via
2. Stego Synthesis-based Data Generative Stego Synthesis
Hiding In this section, we first give an overview of the proposed
FSIS-GAN framework and then introduce each compo-
The majority of data hiding method involves the modifi- nent of the framework, accompanied with thorough dis-
cation on the given cover images. However, such cover cussion on the loss function, network structure and train-
ing procedure. where N(β
) models the image degradation process, and
N(S) is the degraded stego image. Here, mβ² β (0, 1)ππ
3.1. Overview of FSIS-GAN denotes the extracted secret message. It shall be noted
that the extracted message mβ² shall be (approximately)
The proposed FSIS-GAN framework is illustrated in Fig- equals the original secret message m, and thus one can
ure 1. In general, it is an end-to-end framework consist- employ error correcting mechanism to fully correct the
ing of three parts, where each part is designed to achieve erroneous bits.
a specific goal. First, the part of facial stego image syn- To measure the distortion between the original secret
thesis and message extraction contains a generator G, an message m and the extracted message mβ² , we use the
extractor E and the degradation layers N. The generator cross-entropy loss to calculate the message extraction loss
G is deployed to convert the secret message along with LE , which is given by
the secret key into a facial stego image. The degradation ππ
layers N are used to simulate possible common image 1
LE (m, mβ² ) = β β [ππ log(ππβ² ) + (1 β ππ )log(1 β ππβ² )],
degradations within the communication channel. The π π π=1
extractor E is learned to recover the secret message from (3)
β² β²
the degraded stego image. Second, there is a discrimina- where ππ and ππ is π-th element of m and m , respectively.
tor D in the part of adversarial training, which aims at Note that, our proposed FSIS-GAN framework explic-
distinguishing the genuine data sample from the ones pro- itly receiving a secret key as an input, which is designed
duced by the generator G. Third, a well-trained existing to satisfy the Kerckhoffsβ principle. It means that even
forensic network Fπ (parameterized by π) is introduced the extractor E network is completely exposed to an at-
in the part of anti-forensics, which could distinguish the tacker, the secret message m will be recovered only if
genuine from the synthesized facial stego image. Note the receiver obtain both the secret key k and the facial
that this target forensic network is treated as a fixed stego image S. It is worth emphasizing that, for most of
adversary, and its network parameters are always frozen. the existing GAN-based methods, e.g., [15, 16], there is
no involvement of a secret key. Further notice that as
the input of the extractor E, the dimensions of secret key
3.2. Stego Image Synthesis and Message k is greatly smaller than that of the facial stego image
Extraction S. Thus, the extractor E tends to discard the secret key
The part of facial stego image synthesis and message because it carries much less information. To mitigate this
extraction achieve two functionalities. First, by using issue, we propose to use randomly generated incorrect
Μ π Μ
the generator G, one can convert the given secret mes- secret key k β {0, 1} π , where k β k, as input during train-
sage into a facial stego image. Second, the extractor E is ing stage. Instead of directly using the correct secret key
responsible for extracting the secret message from the and minimize the difference between the extracted and
input stego image. Furthermore, a secret key is intro- original message, we maximize the differences between
duced to ensure the communication reliability and high the extracted and original message when applying incor-
diversity of the generated facial stego image. rect secret key. Mathematically, the loss term inverse loss
Generally, generator G and extractor E aim to learn LE Μ , can be expressed by the negative cross-entropy loss:
two mappings, i.e., mapping the given secret message ππ
into a stego image, and vice versa. More formally, let 1
LE β²
Μ (m, mΜ ) = β [ππ log(π Μπβ² )], (4)
Μπβ² ) + (1 β ππ )log(1 β π
m β {0, 1}ππ and k β {0, 1}ππ be the binary secret message ππ π=1
and the secret key, respectively. Generator G is intended where π Μπβ² is the π-th element of the extracted message mβ²Μ
to learn the first mapping, transforming the message m with the incorrect key k,Μ i.e., mβ²Μ = E(N(S), k). Μ
along with the secret key k into a stego image: Enhancing robustness with degradation layers:
In a practical communication channel, there often ex-
S = G(m, k), (1) ists degradations on the synthesized stego image S, when
transmitting the stego to a receiver. To this end, the data
where S denotes the synthesized facial stego image of
hiding system requires certain robustness to ensure the
shape πΆ Γ π» Γ π. To recover the secret message, we next
accuracy of message extraction. Therefore, in this work,
introduce the extractor E. Considering that the facial
we take three representative degradations into account,
stego image S may be degraded during transmission, the
i.e., image noise pollution, blurring, and compression.
second mapping should be from the degraded stego image
For noise pollution, we consider the one of the most
along with the secret key k to the secret message, which
widely-used noise models: Gaussian noise. For blurring,
can be expressed by
the Gaussian blurring is used. For signal compression,
mβ² = E(N(S), k), (2) JPEG image compression is employed, which is exten-
sively used for reducing the bandwidth of transmission
process. In experiments, we implement these three types 3.4. Anti-forensics Part
of degradation as neural network layers N to degrade the
Remind that there is no explicit cover images involved in
stego image. Specifically, three network layers are used
stego synthesis-based data hiding methods. This merit
for simulating each type of degradation. Gaussian noise
makes such type of data hiding method could effectively
layer (GNL) is to add Gaussian noise to the facial stego
resist to conventional steganalysis detection. However,
image S. Gaussian blurring layer (GBL) blurs S. For JPEG
as pointed in [15], a well-trained forensic network could
compression, considering that the quantitation operation
readily distinguish a synthesized stego image from the
is non-differentiable, we approximate the quantization
genuine one, even the synthesized stego image is of no
operation with a differentiable polynomial function. Such
perceptual differences to an observer.
differentiating technique can also be referred to the work
Although Fπ is an expert in such a detection task, some
HiDDeN [8].
studies [10, 11] have shown that deep neural network-
based classifiers are vulnerable to adversarial examples.
3.3. Adversarial Training Part Inspired by this, we propose to apply strategies of ob-
As aforementioned, the hand-crafted stego synthesis- taining adversarial examples to evade the stego detection
based data hiding methods [13, 14] only could synthesize network as a way for realizing anti-forensics. In FSIS-
patterned images such as texture and fingerprint, limiting GAN framework, we consider a white-box scenario, i.e.,
their practical applications. Synthesizing a natural image assuming one has full knowledge of the target foren-
with semantics is a challenging task. However, this prob- sic network. The target forensic network F is trained
lem can be alleviated with the guidance of adversarial with the genuine images from a publicly available facial
training. In this part, the purpose of the discriminator dataset and the synthesized images that produced by BE-
D is to conduct adversarial training with the generator GAN [17]. Then, we integrate the well-trained Fπ into
G and improve the plausibility of the synthesized facial the FSIS-GAN framework, in which Fπ receives the syn-
stego images. thesized facial stego image S and output the confidence.
More specifically, let I be the genuine facial image sam- The gradients that back-propagated by the Fπ are used
ple of shape πΆ Γ π» Γ π from a publicly available genuine to update the parameters of the generator G. To measure
facial image dataset. The discriminator D estimates the the loss of resisting forensic detection, we define the anti-
probability that a given image sample belonging to a syn- forensic loss LFπ to computes the cross-entropy between
thesized by the generator G. The generator G attempts to the output of Fπ and our target genuine image label:
fool the discriminator D. Through such adversarial train-
LFπ (S) = β log (1 β Fπ (S)), (8)
ing, the generator G is encouraged to synthesize much
more realistic facial stego images. As a variant of GAN, where F (S) β (0, 1) is the confidence output by F .
π π
the network structure and loss function of BEGAN [17] Clearly, the decrement of L indicates the probability
Fπ
provides a good reference for improving training stability. increment of S being identified as a genuine image by F .
π
Thus, we in this work employ the adversarial training
loss used in BEGAN. Mathematically, the adversarial loss
Ladv for the generator G can be calculated as 3.5. Network Structure and Training
1 Strategy
Ladv (D(S), S) = [|D(S) β S|], (5)
πΆπ» π The network architecture of the generator G and the
where the shape of output D(S) is same as the facial stegoextractor E are shown in Figure 2. For generator G, the
image. The adversarial loss LD for the discriminator D secret key vector k is first concatenated to the secret
is message vector m and then fed to subsequent layers.
1 Then, G applies two fully-connected (FC) layers and three
LD (I, S) = [|D(I) β I| β βπ‘ β
|D(S) β S|], (6)
convtranspose (ConvT) layers to produce the facial stego
πΆπ» π
where βπ‘ controls the discrimination ability of D in the image S. In particular, after each FC layer or ConvT
layer, we apply batch normalization (BN) [18] and ReLU
π‘-th training step to equilibrate the adversarial training.
It can be computed as activation function to process intermediate vectors. In
experiments, we found that both m and k are composed
π
βπ‘+1 = βπ‘ + [πΎ|D(I) β I| β |D(S) β S|]. (7) of binary number 0 or 1, and such form is not suitable
πΆπ» π as input and the adversarial training loss would diverge.
Here the parameter π is the learning rate of training, To solve this issue, additional BN layers were added, and
and πΎ is a hyper-parameter to control the diversity of normalization operation is carried out inside the network.
synthesized facial images. The quality and diversity of Experiential results show that this trick could greatly
the facial stego images can be freely adjusted by tuning alleviate the divergence problem.
the parameter πΎ.
4. Experiment results
!+,%-"'./"0*
'%")1 .&()"*#
!"#$"%
3"+*"
(!
!"0!
!"#$"% !"#$%&'#%&'
(!
In this section, we first introduce the experimental setup.
Then, to verify the robustness of our proposed FSIS-GAN,
&"''()"*! !"#)*+,- !"#)*+,- !"#)*
+.'/0 +.'/0 +*%#1
!"#"
!"$0" % 0! &
*)"
+ -
"
, , it is evaluated under image degradation and without
degradation, respectively. Finally, the anti-forensic capa-
+ -
!"!'() .(" "
) )
+ -
!*" "
( (
/"+"-
bility of FSIS-GAN is validated.
(a) Network structure of the generator G
!"#$%&'#%&'
4.1. Experimental Setup
(!
Our experiments are conducted on the CelebA dataset
!"#$"%
&"'(!
!"#/)0'12 !"#/)0'12 !"#/)0'12 !"#/)0'12
(!)*+,-"+. [20], where the region with face is identified and ex-
!"/"
+*,"
# $
"
*!+"
# $
"
!, !,
!"/! tracted. All images are reshaped into 3 Γ 64 Γ 64. The
following three metrics are used for evaluation:
. .
# $ 45%$2#%".
!+." "
- - 1"++23"(#$
# $
!"#"$ ,-" "
+ +
!')%*"+,-".( &"#"$ '& ( !)"#"$
+%"3/ ,123"(%
β’ FrΓ©chet Inception Distance (FID) [21], which
(b) Network structure of the extractor E is a widely-used perceptual image quality assess-
Figure 2: Network structure of the generator G and the extrac- ment metric for synthesized images. FID is a de
tor E. βConcatβ, βFCβ, βConvTβ, βBNβ, βConvβ denote the con- facto metric for assessing the image quality cre-
catenation, fully-connected layer, convtranspose layer, batch ated by generator of GANsβ. Lower FID score
norm, and convolution layer, respectively. indicates better consistency with humanβs per-
ception on natural images.
β’ Accuracy of message extraction (ACC) that is
For extractor E, we shall ensure the secret key vector πΏ
k and the facial stego image matrix S in a way such computed by ACC = πΏExt , where πΏExt is the length
that the extractor E would not neglect the information of correctly extracted message and πΏ is the length
provided by the secret key. To this end, the extractor of secret message m.
E first applies FC layer to the secret key to form the β’ Probability of missed detection (PMD). This
intermediate matrix, i.e., 1 Γ π Γ π». Then, the facial stego metric can be calculated by PMD = πΉ ππΉ+ππ
π
, where
image S and the intermediate matrix are concatenated, πΉ π (False Negative) is the ratio for case βsynthe-
and then feed the fused tensor to the four convolutional sized facial image is misclassified as a genuine
(Conv) layers. Finally, the extractor E applies the FC layer oneβ, and π π (True Positive) is the ratio for case
and Sigmoid activation function to produce the message βsynthesized facial image is correctly detectedβ.
vector mβ² (or mβ²Μ ) with size of 1 Γ ππ . Larger PMD indicates higher resisting ability to
For the discriminator D, we adopt the auto-encoder the forensic network.
alike structure from BEGAN [17]. For the target forensic
network F, we use Ye-Net [19], which is a widely-used The proposed FSIS-GAN framework is implemented
steganalytic method. with PyTorch and train on four NVIDIA GTX1080Ti
The training process of the proposed FSIS-GAN frame- GPUs with 11GB memory. The number of training
work is iteratively optimize the loss function of each epochs is set to 400 with a mini batch-size of 64. We
network, except the well-trained forensic network Fπ . use Adam [22] as the optimizer with a learning rate of
We apply the extraction loss LE and the adversarial loss 2 Γ 10β4 . For the hyper-parameters πΌ and π½ in (9), with
LD as the loss function for the extractor E and the dis- a number of trials and errors, we empirically set them
criminator D, respectively. In particular, The total loss as 0.1 in experiments. The parameter πΎ in (7) is set to
LG for the generator G is a proper fusion of the four 0.7, which is expected to produce reasonably diverse fa-
losses aforementioned as follows cial stego images. The competing method is the most
related work [15]. We implement this work by ourselves
LG = Ladv + πΌ(LE + LE Μ ) + π½L Fπ , (9) because there is no publicly available code. With certain
tweaking and fine-tuning, the tested results were com-
where Ladv is the adversarial loss for G, LE Μ is the in- parable to the originally reported data from [15]. For a
verse loss, and LFπ is the anti-forensic loss. The hyper- fair comparison, the length of the secret message ππ and
parameters of πΌ and π½ are used to control the relative the secret key ππ are all set to 100, so as to the payload is
importance among the four losses. identical to that of work [15].
Table 1
Comparison of message extraction accuracy (%) for the case
of no communication degradations. Here, k and kΜ denote
the correct and incorrect secret key, respectively. FSIS-GAN-
WD is a variant of the proposed method by excluding the
degradation layers, and FSIS-GAN-WD (ex LE Μ ) represents the
FSIS-GAN-WD trained without inverse loss LE Μ.
Hu et al. FSIS-GAN-WD FSIS-GAN-WD (ex LEΜ)
Figure 3: Comparison of exemplar synthesized stego images. Scheme
[15] with k with kΜ with k with kΜ
Top: Hu et al. [15]; Bottom: Proposed FSIS-GAN-WD.
Accuracy 85.23 98.76 71.50 99.41 97.01
4.2. Performance Without Degradations
would significantly deduce the extraction accuracy from
Notice that the competing method [15] does not consider 97.01% to 71.50%, while FSIS-GAN-WD almost retains
the image degradations. To verify the effectiveness of the same extraction accuracy. This phenomena means
the proposed method under same settings and make a that the involvement of the secret key will not work if
fair comparison. We in this subsection to evaluate the we exclude the inverse loss. In contrast, FSIS-GAN-WD
performance without degradation layers N. The facial (ex LE Μ
Μ ) with the incorrect key k still attains a quite high
stego image S will be transmitted to extractor E without extraction accuracy of (> 97%). In a short summary, with-
any degradation. To avoid confusion, this variation of out the inverse loss LE Μ , the variant FSIS-GAN-WD (ex
our proposed method is termed as FSIS-GAN-WD (WD is LEΜ ) will violate the Kerckhsoffsβ principle.
abbreviated for Without Degradations). We first compare
the visual quality of the facial stego images with the
competing method [15]. As can be seen from Figure 3, the 4.3. Performance With Degradations
proposed FSIS-GAN-WD could synthesize more realistic In this subsection, we test the robustness performance
facial stego images in comparison with Hu et al. [15]. of the proposed framework under certain image degra-
With more careful inspection, one can notice that the dations. The image degradation type and level are given
stego images produced by FSIS-GAN-WD are more vivid as prior knowledge. This scenario is common in practice
and with more correct semantic structures. It is difficult because one can obtain some prior knowledge on the
for a common human to aware the inauthenticity of the degradation through probing the communication chan-
facial stego images synthesized by FSIS-GAN-WD. In nel. Thus, one can fix the degradation layers N and its
contrast, the stego images generated by Hu et al. [15] are associated parameters during training stage. Specifically,
typically blurry and severely distorted, which apparently in our experiments, the standard deviation π1 of the Gaus-
draw attentions from a forensic analyzer. For the FID sian noise layer (GNL) is set to 0.2. The kernel width π
evaluation experiment, we use 10, 000 pairs of genuine and the standard deviation π2 of the Gaussian blurring
images and synthesized facial stego images to compute layer (GBL) are set to 3 and 1, respectively. The differ-
the FID score. The FID score of FSIS-GAN-WD is 23.20, entiable JPEG compression layer (JCL) is implemented
which is much smaller than that of Hu et al. [15]βs 32.07. as suggested by the work HiDDEN [8] For referring sim-
Then, we evaluate the extraction accuracy for the case plicity, this variation is termed as FSIS-GAN-FD (FD is
of without degradation. The results are tabulated in Table abbreviated for Fixed Degradation) in the sequel.
1. To demonstrate the impact of the inverse loss LE Μ on Firstly, the stego images synthesized by FSIS-GAN-FD
the extraction accuracy, the ablation experiments are also are provided in Figure 4. One can observe that some
conducted, by excluding the inverse loss during training. speckle noises emerge in the generated stego images,
This LE Μ -ablated version is denoted as FSIS-GAN-WD (ex which can be clearly seen from the highlighted regions
LEΜ ). From the Table 1, one can draw the following con- with red line in Figure 4 (b). Quantitatively, the FID score
clusions. First, the extraction accuracy of FSIS-GAN-WD of FSIS-GAN-FD is 41.40, which is inferior to that of
with the correct secret key k is 98.76%, which dramati- FSIS-GAN-WD (23.20) and Hu et al. [15] (32.07). Never-
cally outperforms 85.23% of the competing method [15]. theless, the stego images produced by FSIS-GAN-FD are
Second, by comparing FSIS-GAN-WD and FSIS-GAN- intuitively more realistic than that of Hu et al. [15].
WD (ex LE Μ ), one can see that, the extraction accuracy of Secondly, in Table 2, we report the extraction accuracy
FSIS-GAN-WD with a correct secret key k slightly infe- performance under fixed degradations. Not surprisingly,
rior to that of FSIS-GAN-WD (ex LE Μ ). This suggests that one can notice that the extraction accuracy of Hu et al.
the introduced inverse loss would marginally harm the [15] and FSIS-GAN-WD greatly degrade, which can be
extraction accuracy. However, when comparing the case attributed to the overlooking on degradation-resistant
of incorrect key k,Μ the participation of the inverse loss LEΜ message extraction issue. In contrast, FSIS-GAN-FD ex-
100
Hu et al.
FSIS-GAN-WD
FSIS-GAN-FD
90
Message extraction accuracy
80
70
60
(a) (b)
Figure 4: The comparison of synthesized facial stego images, 50
95 75 55 45 15 5
where four images of (a) are produced by FSIS-GAN-WD; Compression quality factors (QF)
images of (b) are stego images produced by FSIS-GAN-FD. Figure 5: Comparison of the message extraction accuracy (%)
With the introduction of degradation layers, minor speckle under various levels of JPEG compression degradation.
noises emerge (highlighted with red rectangular).
ever, as pointed in [15], a well-trained forensic network
Table 2
can effectively identify a synthesized image. To solve this
Comparison of message extraction accuracy (%) under various
degradation conditions. The bold and marked value with
issue, we explicitly considered the anti-forensics scenario
an asterisk (*) denote the highest extraction accuracy with and introduce the anti-forensic loss LFπ .
correct secret key k and the lowest extraction accuracy with To demonstrate the influence of anti-forensic loss LFπ ,
the incorrect secret key k,Μ respectively. we conduct the ablation experiment by excluding the loss
term LFπ , and thus this variant is termed as FSIS-GAN (ex
Hu et al. FSIS-GAN-WD FSIS-GAN-FD LFπ ). For a concrete example, we employ the well-trained
Scheme
[15] with k with kΜ with k with kΜ forensic network Ye-Net [19] Fπ to detect 3000 facial
W/o degradation 85.23 98.76 71.50β 98.22 72.08
stego images produced by different methods, and record
the probability of missed detection (PMD). The PMDβs
Fixed GNL 52.72 59.78 56.23β 95.58 72.74
of Hu et al. [15], FSIS-GAN (ex LFπ ), and FSIS-GAN are
β
Fixed GBL 69.68 57.52 54.68 98.58 73.78 3.23%, 8.84%, and 89.91, respectively. As clearly shown,
Fixed JCL 65.33 61.38 58.00β 98.46 72.67 for FSIS-GAN (ex LFπ ), despite the facial stego images
look natural for human, they are easily exposed to the
forensic network, where the PMD value is lower than 10%.
hibits quite promising results. Under three types of degra- In contrast, by introducing the anti-forensic loss term, the
dation layers, the extraction accuracy typically exceeds value of PMD of FSIS-GAN could reach 89.91%. This
94% (though lower than that of FSIS-GAN-WD, which is means the proposed method FSIS-GAN could effectively
specifically designed for the non-degradation scenario). bypass the existing forensic network, retaining an nice
The results verify that for the case of known degrada- anti-forensic capability.
tions, the proposed framework could learn to effectively
resistant the fixed degradations, by employing the fixed
degradation layers during the training. 5. Conclusion
Finally, to illustrate how the robustness of message
extraction changes under different degradation levels, In this work, we proposed a stego-synthesis based data
we test different degradation types with a variety of hiding method using generative neural network, by
degradation levels. Due to space limit, we only report explicitly considering the image degradation and anti-
the JPEG compression degradation in Figure 5. As can forensic need. Specifically, the generator is to synthe-
be seen, with the decrement of quality factor (ππΉ), the size a facial stego image from the given secret message
extraction accuracy generally decreases. Although the and secret key. The extractor aims to recover the secret
JCL that adopted from HiDDEN [8] could handle non- message with the secret key. Through the adversarial
differentiable JPEG compression, it cannot perfectly re-training with the discriminator, the generator could pro-
produce the JPEG compression artifacts. Nevertheless, duce realistic facial stego images. The degradation layers
FSIS-GAN-FD still achieve superior robustness, when are introduced during the training, which significantly
comparing with other two schemes. enhance the robustness of message extraction. A forensic
network is incorporated during training, in response to
the possible adversarial forensic analysis in communi-
4.4. Performance of Anti-forensics cation channel. Experimental results verified that, our
Recall that, owing to that no cover images are involve- approach could generate more natural facial stego im-
ment for data hiding, our method has a relatively good ages, while retaining higher message extraction accuracy
undetectability when exposed to a steganalyzer. How- and nice anti-forensic ability.
Acknowledgments phy, IEEE Trans. Inf. Forensics Security 14 (2019)
2074β2087.
This work was supported in part by the National Natu- [13] K. Wu, C. Wang, Steganography using reversible
ral Science Foundation of China under Grant 61901237, texture synthesis, IEEE Trans. Image Process. 24
in part by the Open Project Program of the State Key (2014) 130β139.
Laboratory of CADCG, Zhejiang University under Grant [14] S. Li, X. Zhang, Toward construction-based data
A2006, and in part by the Ningbo Natural Science Foun- hiding: From secrets to fingerprint images, IEEE
dation under Grant 2019A610103. Thanks to Southeast Trans. Image Process. 28 (2018) 1482β1497.
Digital Economic Development Institute for supporting [15] D. Hu, L. Wang, W. Jiang, S. Zheng, B. Li, A novel im-
the computing facility. age steganography method via deep convolutional
generative adversarial networks, IEEE Access 6
(2018) 38303β38314.
References [16] Z. Zhang, G. Fu, R. Ni, J. Liu, X. Yang, A genera-
[1] V. Sedighi, R. Cogranne, J. Fridrich, Content- tive method for steganography by cover synthesis
adaptive steganography by minimizing statistical with auxiliary semantics, Tsinghua Science and
detectability, IEEE Trans. Inf. Forensics Security 11 Technology 25 (2020) 516β527.
(2015) 221β234. [17] D. Berthelot, T. Schumm, L. Metz, BEGAN: Bound-
[2] J. Zhou, W. Sun, L. Dong, X. Liu, O. C. Au, Y. Y. Tang, ary equilibrium generative adversarial networks,
Secure reversible image data hiding over encrypted arXiv preprint arXiv:1703.10717 (2017).
domain via key modulation, IEEE Trans. Circuits [18] S. Ioffe, C. Szegedy, Batch normalization: Acceler-
Syst. Video Technol. 26 (2015) 441β452. ating deep network training by reducing internal
[3] L. Dong, J. Zhou, W. Sun, D. Yan, R. Wang, First covariate shift, arXiv preprint arXiv:1502.03167
steps toward concealing the traces left by reversible (2015).
image data hiding, IEEE Trans. Circuits Syst. II, Exp. [19] J. Ye, J. Ni, Y. Yi, Deep learning hierarchical repre-
Briefs 67 (2020) 951β955. sentations for image steganalysis, IEEE Trans. Inf.
[4] S. Baluja, Hiding images within images, IEEE Trans. Forensics Security 12 (2017) 2545β2557.
Pattern Anal. Mach. Intell. 42 (2020) 1685β1697. [20] Z. Liu, P. Luo, X. Wang, X. Tang, Deep learning
[5] H. Shi, J. Dong, W. Wang, Y. Qian, X. Zhang, SS- face attributes in the wild, in: Proc. IEEE Int. Conf.
GAN: secure steganography based on generative Comput. Vis., 2015.
adversarial networks, in: Pacific Rim Conference [21] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler,
on Multimedia, 2017, pp. 534β544. S. Hochreiter, Gans trained by a two time-scale
[6] W. Tang, S. Tan, B. Li, J. Huang, Automatic stegano- update rule converge to a local nash equilibrium,
graphic distortion learning using a generative ad- in: Proc. Adv. Neural Inf. Process. Syst., 2017, pp.
versarial network, IEEE Signal Process. Lett. 24 6629β6640.
(2017) 1547β1551. [22] D. P. Kingma, J. Ba, Adam: A method for stochas-
[7] J. Hayes, G. Danezis, Generating steganographic tic optimization, arXiv preprint arXiv:1412.6980
images via adversarial training, in: Proc. Adv. Neu- (2014).
ral Inf. Process. Syst., 2017, pp. 1954β1963.
[8] J. Zhu, R. Kaplan, J. Johnson, F. Li, HiDDen: Hid-
ing data with deep networks, in: Proc. Eur. Conf.
Comput. Vis., 2018, pp. 657β672.
[9] K. A. Zhang, A. Cuesta-Infante, L. Xu, K. Veera-
machaneni, SteganoGAN: High capacity im-
age steganography with GANs, arXiv preprint
arXiv:1901.03892 (2019).
[10] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Er-
han, I. Goodfellow, R. Fergus, Intriguing properties
of neural networks, arXiv preprint arXiv:1312.6199
(2013).
[11] I. J. Goodfellow, J. Shlens, C. Szegedy, Explain-
ing and harnessing adversarial examples, arXiv
preprint arXiv:1412.6572 (2014).
[12] W. Tang, B. Li, S. Tan, M. Barni, J. Huang, CNN-
based adversarial embedding for image steganogra-