<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Hiding via Facial Stego Synthesis With Generative Model</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Li Dong</string-name>
          <email>dongli@nbu.edu.cn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jie Wang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rangding Wang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuanman Li</string-name>
          <email>yuanmanli@szu.edu.cn</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Weiwei Sun</string-name>
          <email>sunweiwei.sww@alibaba-inc.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alibaba Group</institution>
          ,
          <addr-line>Zhejiang, China, 310052</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Electrical Engineering and Computer Science, Ningbo University</institution>
          ,
          <addr-line>Zhejiang, China, 315211</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Shenzhen University</institution>
          ,
          <addr-line>Guangdong, China, 518061</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Southeast Digital Economic Development Institute</institution>
          ,
          <addr-line>Zhejiang, China, 324000</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Workshop Proce dings</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Stego synthesis-based data hiding aims to directly produce a plausible natural image to convey secret message. However, most of the existing works neglected the possible communication degradations and forensic actions, which commonly occur in practice. In this paper, we devise a generative adversarial network (GAN)-based framework to synthesize facial stego images. The framework consists of four components: generator, extractor, discriminator and forensic network. Specifically, the generator is deployed to generate a realistic facial stego image from the secret message and key, while the extractor aims at extracting the secret message from the stego image with the provided secret key. To combat forensics, we explicitly integrate a forensic network into the proposed framework, which is responsible for guiding the update of generator. Three degradation layers are further incorporated, enforcing the generator to characterize the communication degradations. Experimental results demonstrate that the proposed framework could accurately extract the secret message and efectively resist the forensic detection and certain degradations, while attaining realistic facial stego images.</p>
      </abstract>
      <kwd-group>
        <kwd>data hiding</kwd>
        <kwd>stego synthesis</kwd>
        <kwd>generative adversarial network</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Data hiding aims to embed the secret message into a
cover signal, without incurring awareness of an
adversary. It is widely used in many applications, e.g., covert
communication [1] and multimedia data protection [2, 3].</p>
      <sec id="sec-1-1">
        <title>The primitive ad-hoc Least-Significant Bit (LSB) replaces</title>
        <p>the bit in least significant bit-plane of each pixel with
tempt to eliminate the traces of data hiding action and
improve the steganographic capacity. For example,
contentadaptive steganography [1] designed sophisticated
distortion function according to prior knowledge and used
Recently, neural network-based data hiding is becoming
one of the active research directions. Baluja [4] employed
convolutional neural networks to hide an entire secret
image into the cover image in an end-to-end fashion. The
work SSGAN [5] attempted to exploit GAN to synthesize
a cover image which is more suitable for the subsequent
steganographic data embedding. ASDL-GAN [6]
integrated the content-adaptive steganography and GAN, in
which the generator was able to produce the
modifica(W. Sun)
HiDDeN [8] and SteganoGAN [9], they all designed an
encoder-decoder alike framework based on GAN. These
methods could automatically learn the suitable areas for
embedding the secret bitstream message.</p>
        <p>For the last several years, the adversarial examples
to neural networks meet data hiding, and continuously
drawing extensive attention from the community. Some
bations to the input data would paralyze the prediction
capability of learning-based classifiers. As the opponent
of data hiding, steganalysis aims to expose the data hiding
on stego signal and usually involves machine-learning
ods to bypass steganalysis by borrowing some strategies
from the adversarial examples-related works. Tang et al.
[12] presented the Adversarial Embedding (ADV-EMB)
method that adjusts the modification cost of image
elements, according to the gradients that back-propagated
from the target steganalytic neural network. The
constructed adversarial stego could efectively fool the
steganalytic network, revealing the vulnerability of the deep
learning-based steganalyzer.</p>
        <p>Note that, all aforementioned data hiding techniques
are based on the cover modification. The common
characteristic is that these methods can not be independent of
the modification on the given cover image. As such, it
inevitably leaves artifacts exposing to steganalysis. On the
contrary, stego synthesis-based data hiding, e.g., [13, 14],
refers to synthesizing the stego image directly from the
secret message. It could pose more challenges for ste- Adversarial Training Anti-Forensics
ganalysis. Under this concept, traditional methods tried Discriminator  Forensic network 
to produce stego image based on some hand-crafted des- Genuine image 
ignations. Although the capacity was relatively higher,
tahsetyexwtuerreesliamnidtefindgteorspyrninthtse.siAzisnagnpaalttteerrnnaetdivimeasogleust,isounc,h 1Sm0ee1csr0sea1tg··e·11 Generator  Synthesized Delagyreardsation DiemgraagdeedEs(txet)groacted
some methods [15, 16] use GAN to synthesize stego im- 10010···01 stego image  Extractor  1m0e1s0sa1g·e··01′
ages with rich semantics, e.g., face and food. However, Secret key 
the accuracy of message extraction was unsatisfactory Facial Stego Image Synthesis and Message Extraction
under image degradations. Moreover, the synthesized
stego images can be easily identified by a well-trained Figure 1: Overview of the proposed FSIS-GAN framework.
forensic detector. It is thus urgent to further improve
the robustness of message extraction and anti-forensic modification would leave embedding traces that can be
capability of stego synthesis-based data hiding methods. detected. To resist the detection by steganalyzer, stego</p>
        <p>In this work, we propose a Facial Stego Image Synthe- synthesis-based data hiding method could directly
prosis method for data hiding with GAN, which is termed as duce the stego images from the given secret message.
FSIS-GAN. Unlike the cover modification-based data hid- For early attempts, Wu et al. citewu2014steganography
ing methods, FSIS-GAN is designed without providing proposed a texture image synthesis-based method, which
a cover image beforehand. Compared with the existing selectively distributes the source patches of the original
stego synthesis-based methods, FSIS-GAN can not only texture image onto the synthesized stego image. The
synthesize realistic facial stego images, but also achieve message hiding and extraction depend on the choice of
superior performance in terms of robustness and anti- source patches. Motivated by the fingerprint
biometforensic capability. Experimental results conducted on rics, Li et al. [14] proposed to use the hologram phase
the public facial dataset validate such merits of our pro- constructed from the secret message to synthesize
finposed method. The main contributions of this work can gerprint stego image. The hologram phase consists of
be summarized as follows, two phases: The first spiral phase encodes the secret
message to the two-dimensional points with diferent
po• We explicitly consider the image degradation dur- larities, and the second continuous phase is to synthesize
ing the covert communication, and integrate mul- fingerprint images. It is worth noting that conventional
tiple degradation layers into the framework. This stego image synthesis-based methods can only
syntheboost the robustness performance in terms of the size patterned stego image such as textures, lacking rich
message extraction. semantics, which limits their practical applications.
• We incorporate a forensic network during train- Instead, Hu et al. [15] suggested using the
generaing FSIS-GAN. By exploiting the gradients from tor of GAN to synthesize a facial stego image from the
such a forensic network, the stego image pro- secret message. Meanwhile, the secret message can be
duced by the learned generator could efectively extracted from the stego image by the corresponding
exfool the forensic network. tractor network. Similarly, Zhang et al. [16] exploited
• We explicitly adopt the secret key into the data GAN to generate stego image with diferent semantic
hiding procedure of FISI-GAN, which could fur- labels, which could improve the robustness of data
exther improve the reliability of the secret message traction but significantly scarifying the steganographic
extraction. capacity. The main advantage of the GAN-based works is
that they could synthesize stego images with rich
seman</p>
        <p>The rest of this paper is organized as follows. Section tics. However, we shall note that stego images can be
II briefly reviews the related work on stego synthesis- easily identified by some well-trained forensic networks.
based data hiding. Section III describes the proposed FSIS- In addition, there is no trade-of between capacity and
GAN, including network architecture and loss function. extraction accuracy.</p>
        <p>Section IV presents the experimental results, and the final
conclusions are drawn in Section V.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Stego Synthesis-based Data</title>
    </sec>
    <sec id="sec-3">
      <title>Hiding</title>
      <sec id="sec-3-1">
        <title>The majority of data hiding method involves the modification on the given cover images. However, such cover</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Facial Image Data Hiding via</title>
    </sec>
    <sec id="sec-5">
      <title>Generative Stego Synthesis</title>
      <sec id="sec-5-1">
        <title>In this section, we first give an overview of the proposed</title>
        <p>FSIS-GAN framework and then introduce each
component of the framework, accompanied with thorough
discussion on the loss function, network structure and
training procedure.</p>
        <sec id="sec-5-1-1">
          <title>3.1. Overview of FSIS-GAN</title>
          <p>The proposed FSIS-GAN framework is illustrated in
Figure 1. In general, it is an end-to-end framework
consisting of three parts, where each part is designed to achieve
a specific goal. First, the part of facial stego image
synthesis and message extraction contains a generator G, an
extractor E and the degradation layers N. The generator
G is deployed to convert the secret message along with
the secret key into a facial stego image. The degradation
layers N are used to simulate possible common image
degradations within the communication channel. The
extractor E is learned to recover the secret message from
the degraded stego image. Second, there is a
discriminator D in the part of adversarial training, which aims at
duced by the generator G. Third, a well-trained existing
forensic network F (parameterized by  ) is introduced
in the part of anti-forensics, which could distinguish the
genuine from the synthesized facial stego image. Note
that this target forensic network is treated as a fixed</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>3.2. Stego Image Synthesis and Message</title>
        </sec>
        <sec id="sec-5-1-3">
          <title>Extraction</title>
          <p>The part of facial stego image synthesis and message
extraction achieve two functionalities. First, by using
the generator G, one can convert the given secret
message into a facial stego image. Second, the extractor E is
responsible for extracting the secret message from the
input stego image. Furthermore, a secret key is
introduced to ensure the communication reliability and high
diversity of the generated facial stego image.</p>
          <p>Generally, generator G and extractor E aim to learn
two mappings, i.e., mapping the given secret message
into a stego image, and vice versa. More formally, let
m ∈ {0, 1}  and k ∈ {0, 1}  be the binary secret message
and the secret key, respectively. Generator G is intended
to learn the first mapping, transforming the message
along with the secret key k into a stego image:</p>
          <p>S = G(m, k),
where S denotes the synthesized facial stego image of
shape  ×  ×</p>
          <p>. To recover the secret message, we next
introduce the extractor E. Considering that the facial
stego image S may be degraded during transmission, the
second mapping should be from the degraded stego image
along with the secret key k to the secret message, which
can be expressed by</p>
          <p>m′ = E(N(S),k),
distinguishing the genuine data sample from the ones pro- itly receiving a secret key as an input, which is designed
adversary, and its network parameters are always frozen. the existing GAN-based methods, e.g., [15, 16], there is
where N(⋅) models the image degradation process, and
denotes the extracted secret message. It shall be noted
that the extracted message m′ shall be (approximately)
equals the original secret message m, and thus one can
employ error correcting mechanism to fully correct the
erroneous bits.</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>To measure the distortion between the original secret</title>
        <p>message m and the extracted message m′, we use the
cross-entropy loss to calculate the message extraction loss
LE, which is given by</p>
        <p>LE(m, m′) = −</p>
        <p>∑ [  log( ′) + (1 −   )log(1 −  ′)],
1</p>
        <p>=1
(3)
where   and  ′ is  -th element of m and m′, respectively.</p>
        <p>Note that, our proposed FSIS-GAN framework
explicto satisfy the Kerckhofs’ principle. It means that even
the extractor E network is completely exposed to an
attacker, the secret message m will be recovered only if
the receiver obtain both the secret key k and the facial
stego image S. It is worth emphasizing that, for most of
no involvement of a secret key. Further notice that as
the input of the extractor E, the dimensions of secret key
k is greatly smaller than that of the facial stego image
S. Thus, the extractor E tends to discard the secret key
because it carries much less information. To mitigate this
issue, we propose to use randomly generated incorrect
secret key k̃ ∈ {0, 1}  , where k̃ ≠ k, as input during
training stage. Instead of directly using the correct secret key
and minimize the diference between the extracted and
original message, we maximize the diferences between
the extracted and original message when applying
incorrect secret key. Mathematically, the loss term inverse loss
LẼ , can be expressed by the negative cross-entropy loss:
LẼ (m, m̃′) = 1</p>
        <p>=1</p>
        <p>∑ [  log( ̃′) + (1 −   )log(1 −  ̃′)], (4)
m
(1)
(2)
with the incorrect key k̃, i.e., m̃′ = E(N(S),k̃).
where  ̃′ is the  -th element of the extracted message m̃′</p>
        <p>Enhancing robustness with degradation layers:
In a practical communication channel, there often
exists degradations on the synthesized stego image S, when
transmitting the stego to a receiver. To this end, the data
hiding system requires certain robustness to ensure the
accuracy of message extraction. Therefore, in this work,
we take three representative degradations into account,
i.e., image noise pollution, blurring, and compression.
For noise pollution, we consider the one of the most
widely-used noise models: Gaussian noise. For blurring,
the Gaussian blurring is used. For signal compression,
JPEG image compression is employed, which is
extensively used for reducing the bandwidth of transmission
process. In experiments, we implement these three types
of degradation as neural network layers N to degrade the
stego image. Specifically, three network layers are used
for simulating each type of degradation. Gaussian noise
layer (GNL) is to add Gaussian noise to the facial stego
image S. Gaussian blurring layer (GBL) blurs S. For JPEG
compression, considering that the quantitation operation
is non-diferentiable, we approximate the quantization
operation with a diferentiable polynomial function. Such
diferentiating technique can also be referred to the work</p>
      </sec>
      <sec id="sec-5-3">
        <title>HiDDeN [8].</title>
        <sec id="sec-5-3-1">
          <title>3.3. Adversarial Training Part</title>
          <p>As aforementioned, the hand-crafted stego
synthesisbased data hiding methods [13, 14] only could synthesize
patterned images such as texture and fingerprint, limiting
their practical applications. Synthesizing a natural image
with semantics is a challenging task. However, this
problem can be alleviated with the guidance of adversarial
training. In this part, the purpose of the discriminator
D is to conduct adversarial training with the generator
G and improve the plausibility of the synthesized facial
stego images.</p>
          <p>More specifically, let I be the genuine facial image
sample of shape  ×  ×</p>
          <p>from a publicly available genuine
facial image dataset. The discriminator D estimates the
thesized by the generator G. The generator G attempts to
fool the discriminator D. Through such adversarial
training, the generator G is encouraged to synthesize much
more realistic facial stego images. As a variant of GAN,
the network structure and loss function of BEGAN [17]
(5)
(6)
Thus, we in this work employ the adversarial training
loss used in BEGAN. Mathematically, the adversarial loss
Ladv for the generator G can be calculated as</p>
          <p>Ladv(D(S),S) =</p>
          <p>[|D(S) − S|],
 
1
where the shape of output D(S)is same as the facial stego
image. The adversarial loss LD for the discriminator D
is</p>
          <p>1
LD(I, S) =
[|D(I) − I| − ℎ ⋅ |D(S) − S|],</p>
        </sec>
        <sec id="sec-5-3-2">
          <title>3.4. Anti-forensics Part</title>
          <p>Remind that there is no explicit cover images involved in
stego synthesis-based data hiding methods. This merit
makes such type of data hiding method could efectively
resist to conventional steganalysis detection. However,
as pointed in [15], a well-trained forensic network could
readily distinguish a synthesized stego image from the
genuine one, even the synthesized stego image is of no
perceptual diferences to an observer.</p>
          <p>Although F is an expert in such a detection task, some
studies [10, 11] have shown that deep neural
networkbased classifiers are vulnerable to adversarial examples.</p>
          <p>Inspired by this, we propose to apply strategies of
obtaining adversarial examples to evade the stego detection
network as a way for realizing anti-forensics. In
FSISGAN framework, we consider a white-box scenario, i.e.,
assuming one has full knowledge of the target
forensic network. The target forensic network F is trained
with the genuine images from a publicly available facial
dataset and the synthesized images that produced by
BEGAN [17]. Then, we integrate the well-trained F into
the FSIS-GAN framework, in which F receives the
synthesized facial stego image S and output the confidence.</p>
          <p>The gradients that back-propagated by the F are used
to update the parameters of the generator G. To measure
the loss of resisting forensic detection, we define the
antithe output of F and our target genuine image label:</p>
          <p>LF (S) = −log (1 − F (S)),
where F (S) ∈ (0, 1) is the confidence output by
Clearly, the decrement of LF indicates the probability
(8)
F .</p>
          <p>F .</p>
          <p>probability that a given image sample belonging to a syn- forensic loss LF to computes the cross-entropy between
provides a good reference for improving training stability. increment of S being identified as a genuine image by
where ℎ controls the discrimination ability of D in the
 -th training step to equilibrate the adversarial training. layer, we apply batch normalization (BN) [18] and ReLU</p>
        </sec>
      </sec>
      <sec id="sec-5-4">
        <title>It can be computed as</title>
        <p>ℎ+1 = ℎ +</p>
        <p>Here the parameter  is the learning rate of training,
and  is a hyper-parameter to control the diversity of
synthesized facial images. The quality and diversity of
the facial stego images can be freely adjusted by tuning
the parameter .</p>
        <p>[ |D(I) − I| − |D(S) − S|].
(7)</p>
        <sec id="sec-5-4-1">
          <title>3.5. Network Structure and Training</title>
        </sec>
        <sec id="sec-5-4-2">
          <title>Strategy</title>
          <p>The network architecture of the generator G and the
extractor E are shown in Figure 2. For generator G, the
secret key vector k is first concatenated to the secret
message vector m and then fed to subsequent layers.
Then, G applies two fully-connected (FC) layers and three
convtranspose (ConvT) layers to produce the facial stego
image S. In particular, after each FC layer or ConvT
activation function to process intermediate vectors. In
experiments, we found that both m and k are composed
of binary number 0 or 1, and such form is not suitable
as input and the adversarial training loss would diverge.
To solve this issue, additional BN layers were added, and
normalization operation is carried out inside the network.
Experiential results show that this trick could greatly
alleviate the divergence problem.
!&amp;""#'$'"(%)"*!</p>
          <p>!"#"
!"#$"%
&amp;"'(!
!"/"
(!
!"#$"%</p>
        </sec>
      </sec>
      <sec id="sec-5-5">
        <title>In this section, we first introduce the experimental setup.</title>
        <p>Then, to verify the robustness of our proposed FSIS-GAN,
it is evaluated under image degradation and without
degradation, respectively. Finally, the anti-forensic
capability of FSIS-GAN is validated.</p>
        <sec id="sec-5-5-1">
          <title>4.1. Experimental Setup</title>
          <p>(!)*+,-"+.</p>
          <p>Our experiments are conducted on the CelebA dataset
[20], where the region with face is identified and
ex!"/! tracted. All images are reshaped into 3 × 64 × 64. The
1"++23"(#$ following three metrics are used for evaluation:
45%$2#%".
(a) Network structure of the generator G
vector m′ (or m̃′) with size of 1 ×   .</p>
        </sec>
      </sec>
      <sec id="sec-5-6">
        <title>For the discriminator D, we adopt the auto-encoder alike structure from BEGAN [17]. For the target forensic network F, we use Ye-Net [19], which is a widely-used steganalytic method.</title>
      </sec>
      <sec id="sec-5-7">
        <title>The training process of the proposed FSIS-GAN framework is iteratively optimize the loss function of each network, except the well-trained forensic network F .</title>
        <p>We apply the extraction loss LE and the adversarial loss
LD as the loss function for the extractor E and the
discriminator D, respectively. In particular, The total loss

LG for the generator G is a proper fusion of the four
losses aforementioned as follows</p>
        <p>LG = Ladv +  (L E + L̃ ) + L F ,</p>
        <p>E

(9)
• Fréchet Inception Distance (FID) [21], which
is a widely-used perceptual image quality
assessment metric for synthesized images. FID is a de
facto metric for assessing the image quality
created by generator of GANs’. Lower FID score
indicates better consistency with human’s
perception on natural images.
• Accuracy of message extraction (ACC) that is
computed by ACC =
of secret message m.
of correctly extracted message and  is the length

 Ext , where  Ext is the length
• Probability of missed detection (PMD). This
metric can be calculated by PMD =
, where</p>
        <p>(False Negative) is the ratio for case
“synthesized facial image is misclassified as a genuine
one”, and</p>
        <p>(True Positive) is the ratio for case
“synthesized facial image is correctly detected”.</p>
      </sec>
      <sec id="sec-5-8">
        <title>Larger PMD indicates higher resisting ability to</title>
        <p>+ 
the forensic network.</p>
      </sec>
      <sec id="sec-5-9">
        <title>The proposed FSIS-GAN framework is implemented</title>
        <p>with PyTorch and train on four NVIDIA GTX1080Ti</p>
      </sec>
      <sec id="sec-5-10">
        <title>GPUs with 11GB memory.</title>
        <p>The number of training
epochs is set to 400 with a mini batch-size of 64. We
use Adam [22] as the optimizer with a learning rate of
2 × 10−4. For the hyper-parameters  and  in (9), with
a number of trials and errors, we empirically set them
as 0.1 in experiments. The parameter  in (7) is set to
0.7, which is expected to produce reasonably diverse
facial stego images. The competing method is the most
related work [15]. We implement this work by ourselves
because there is no publicly available code. With certain
tweaking and fine-tuning, the tested results were
comwhere Ladv is the adversarial loss for G, L̃ is the in- parable to the originally reported data from [15]. For a</p>
        <p>E
verse loss, and LF is the anti-forensic loss. The
hyperparameters of  and  are used to control the relative
importance among the four losses.
identical to that of work [15].
fair comparison, the length of the secret message   and
the secret key   are all set to 100, so as to the payload is
4.2. Performance Without Degradations
would significantly deduce the extraction accuracy from
97.01% to 71.50%, while FSIS-GAN-WD almost retains
the same extraction accuracy. This phenomena means
that the involvement of the secret key will not work if
we exclude the inverse loss. In contrast, FSIS-GAN-WD
(ex L̃ ) with the incorrect key k̃ still attains a quite high</p>
        <p>E
extraction accuracy of (&gt; 97%). In a short summary,
without the inverse loss LẼ , the variant FSIS-GAN-WD (ex
L̃ ) will violate the Kerckhsofs’ principle.</p>
        <p>E
Notice that the competing method [15] does not consider
the image degradations. To verify the efectiveness of
the proposed method under same settings and make a
fair comparison. We in this subsection to evaluate the
performance without degradation layers N. The facial
stego image S will be transmitted to extractor E without
any degradation. To avoid confusion, this variation of
our proposed method is termed as FSIS-GAN-WD (WD is
abbreviated for Without Degradations). We first compare
the visual quality of the facial stego images with the
competing method [15]. As can be seen from Figure 3, the 4.3. Performance With Degradations
proposed FSIS-GAN-WD could synthesize more realistic In this subsection, we test the robustness performance
facial stego images in comparison with Hu et al. [15]. of the proposed framework under certain image
degraWith more careful inspection, one can notice that the dations. The image degradation type and level are given
stego images produced by FSIS-GAN-WD are more vivid as prior knowledge. This scenario is common in practice
and with more correct semantic structures. It is dificult because one can obtain some prior knowledge on the
for a common human to aware the inauthenticity of the degradation through probing the communication
chanfacial stego images synthesized by FSIS-GAN-WD. In nel. Thus, one can fix the degradation layers N and its
contrast, the stego images generated by Hu et al. [15] are associated parameters during training stage. Specifically,
typically blurry and severely distorted, which apparently in our experiments, the standard deviation  1 of the
Gausdraw attentions from a forensic analyzer. For the FID sian noise layer (GNL) is set to 0.2. The kernel width 
evaluation experiment, we use 10, 000 pairs of genuine and the standard deviation  2 of the Gaussian blurring
images and synthesized facial stego images to compute layer (GBL) are set to 3 and 1, respectively. The
diferthe FID score. The FID score of FSIS-GAN-WD is 23.20, entiable JPEG compression layer (JCL) is implemented
which is much smaller than that of Hu et al. [15]’s 32.07. as suggested by the work HiDDEN [8] For referring
sim</p>
        <p>Then, we evaluate the extraction accuracy for the case plicity, this variation is termed as FSIS-GAN-FD (FD is
of without degradation. The results are tabulated in Table abbreviated for Fixed Degradation) in the sequel.
1. To demonstrate the impact of the inverse loss LẼ on Firstly, the stego images synthesized by FSIS-GAN-FD
the extraction accuracy, the ablation experiments are also are provided in Figure 4. One can observe that some
conducted, by excluding the inverse loss during training. speckle noises emerge in the generated stego images,
This LẼ -ablated version is denoted as FSIS-GAN-WD (ex which can be clearly seen from the highlighted regions
LẼ ). From the Table 1, one can draw the following con- with red line in Figure 4 (b). Quantitatively, the FID score
clusions. First, the extraction accuracy of FSIS-GAN-WD of FSIS-GAN-FD is 41.40, which is inferior to that of
with the correct secret key k is 98.76%, which dramati- FSIS-GAN-WD (23.20) and Hu et al. [15] (32.07).
Nevercally outperforms 85.23% of the competing method [15]. theless, the stego images produced by FSIS-GAN-FD are
Second, by comparing FSIS-GAN-WD and FSIS-GAN- intuitively more realistic than that of Hu et al. [15].
WD (ex LẼ ), one can see that, the extraction accuracy of Secondly, in Table 2, we report the extraction accuracy
FSIS-GAN-WD with a correct secret key k slightly infe- performance under fixed degradations. Not surprisingly,
rior to that of FSIS-GAN-WD (ex LẼ ). This suggests that one can notice that the extraction accuracy of Hu et al.
the introduced inverse loss would marginally harm the [15] and FSIS-GAN-WD greatly degrade, which can be
extraction accuracy. However, when comparing the case attributed to the overlooking on degradation-resistant
of incorrect key k̃, the participation of the inverse loss LẼ message extraction issue. In contrast, FSIS-GAN-FD
ex</p>
        <p>Hu et al.</p>
        <p>FSIS-GAN-WD</p>
        <p>FSIS-GAN-FD
75 Compres5si5on quality f4a5ctors (QF) 15</p>
        <p>5
ever, as pointed in [15], a well-trained forensic network
Table 2 can efectively identify a synthesized image. To solve this
dCeogmrapdaaritsioonnocfomndesitsiaognese.xtTrhaectiboonldacacnudracmya(r%k)eudnvdaelruvearwioiuths issue, we explicitly considered the anti-forensics scenario
an asterisk (*) denote the highest extraction accuracy with and introduce the anti-forensic loss LF .
correct secret key k and the lowest extraction accuracy with To demonstrate the influence of anti-forensic loss LF ,
the incorrect secret key k̃, respectively. we conduct the ablation experiment by excluding the loss
term LF , and thus this variant is termed as FSIS-GAN (ex
Scheme Hu et al. FSIS-GAN-WD FSIS-GAN-FD LF ). For a concrete example, we employ the well-trained
[15] with k with k̃ with k with k̃ forensic network Ye-Net [19] F to detect 3000 facial
W/o degradation 85.23 98.76 71.50∗ 98.22 72.08 stego images produced by diferent methods, and record
the probability of missed detection (PMD). The PMD’s
Fixed GNL 52.72 59.78 56.23∗ 95.58 72.74 of Hu et al. [15], FSIS-GAN (ex LF ), and FSIS-GAN are
Fixed GBL 69.68 57.52 54.68∗ 98.58 73.78 3.23%, 8.84%, and 89.91, respectively. As clearly shown,
Fixed JCL 65.33 61.38 58.00∗ 98.46 72.67 for FSIS-GAN (ex LF ), despite the facial stego images
look natural for human, they are easily exposed to the
forensic network, where the PMD value is lower than 10%.
hibits quite promising results. Under three types of degra- In contrast, by introducing the anti-forensic loss term, the
dation layers, the extraction accuracy typically exceeds value of PMD of FSIS-GAN could reach 89.91%. This
94% (though lower than that of FSIS-GAN-WD, which is means the proposed method FSIS-GAN could efectively
specifically designed for the non-degradation scenario). bypass the existing forensic network, retaining an nice
The results verify that for the case of known degrada- anti-forensic capability.
tions, the proposed framework could learn to efectively
resistant the fixed degradations, by employing the fixed
degradation layers during the training. 5. Conclusion</p>
        <p>Finally, to illustrate how the robustness of message
extraction changes under diferent degradation levels,
we test diferent degradation types with a variety of
degradation levels. Due to space limit, we only report
the JPEG compression degradation in Figure 5. As can
be seen, with the decrement of quality factor ( ), the
extraction accuracy generally decreases. Although the
JCL that adopted from HiDDEN [8] could handle
nondiferentiable JPEG compression, it cannot perfectly
reproduce the JPEG compression artifacts. Nevertheless,
FSIS-GAN-FD still achieve superior robustness, when
comparing with other two schemes.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>phy, IEEE Trans. Inf. Forensics Security 14 (2019)
2074–2087.</p>
      <p>This work was supported in part by the National Natu- [13] K. Wu, C. Wang, Steganography using reversible
ral Science Foundation of China under Grant 61901237, texture synthesis, IEEE Trans. Image Process. 24
in part by the Open Project Program of the State Key (2014) 130–139.</p>
      <p>Laboratory of CADCG, Zhejiang University under Grant [14] S. Li, X. Zhang, Toward construction-based data
A2006, and in part by the Ningbo Natural Science Foun- hiding: From secrets to fingerprint images, IEEE
dation under Grant 2019A610103. Thanks to Southeast Trans. Image Process. 28 (2018) 1482–1497.
Digital Economic Development Institute for supporting [15] D. Hu, L. Wang, W. Jiang, S. Zheng, B. Li, A novel
imthe computing facility. age steganography method via deep convolutional
generative adversarial networks, IEEE Access 6
References (2018) 38303–38314.
[16] Z. Zhang, G. Fu, R. Ni, J. Liu, X. Yang, A
genera[1] V. Sedighi, R. Cogranne, J. Fridrich, Content- tive method for steganography by cover synthesis
adaptive steganography by minimizing statistical with auxiliary semantics, Tsinghua Science and
detectability, IEEE Trans. Inf. Forensics Security 11 Technology 25 (2020) 516–527.</p>
      <p>(2015) 221–234. [17] D. Berthelot, T. Schumm, L. Metz, BEGAN:
Bound[2] J. Zhou, W. Sun, L. Dong, X. Liu, O. C. Au, Y. Y. Tang, ary equilibrium generative adversarial networks,
Secure reversible image data hiding over encrypted arXiv preprint arXiv:1703.10717 (2017).
domain via key modulation, IEEE Trans. Circuits [18] S. Iofe, C. Szegedy, Batch normalization:
AccelerSyst. Video Technol. 26 (2015) 441–452. ating deep network training by reducing internal
[3] L. Dong, J. Zhou, W. Sun, D. Yan, R. Wang, First covariate shift, arXiv preprint arXiv:1502.03167
steps toward concealing the traces left by reversible (2015).
image data hiding, IEEE Trans. Circuits Syst. II, Exp. [19] J. Ye, J. Ni, Y. Yi, Deep learning hierarchical
repreBriefs 67 (2020) 951–955. sentations for image steganalysis, IEEE Trans. Inf.
[4] S. Baluja, Hiding images within images, IEEE Trans. Forensics Security 12 (2017) 2545–2557.</p>
      <p>Pattern Anal. Mach. Intell. 42 (2020) 1685–1697. [20] Z. Liu, P. Luo, X. Wang, X. Tang, Deep learning
[5] H. Shi, J. Dong, W. Wang, Y. Qian, X. Zhang, SS- face attributes in the wild, in: Proc. IEEE Int. Conf.</p>
      <p>GAN: secure steganography based on generative Comput. Vis., 2015.
adversarial networks, in: Pacific Rim Conference [21] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler,
on Multimedia, 2017, pp. 534–544. S. Hochreiter, Gans trained by a two time-scale
[6] W. Tang, S. Tan, B. Li, J. Huang, Automatic stegano- update rule converge to a local nash equilibrium,
graphic distortion learning using a generative ad- in: Proc. Adv. Neural Inf. Process. Syst., 2017, pp.
versarial network, IEEE Signal Process. Lett. 24 6629–6640.</p>
      <p>(2017) 1547–1551. [22] D. P. Kingma, J. Ba, Adam: A method for
stochas[7] J. Hayes, G. Danezis, Generating steganographic tic optimization, arXiv preprint arXiv:1412.6980
images via adversarial training, in: Proc. Adv. Neu- (2014).</p>
      <p>ral Inf. Process. Syst., 2017, pp. 1954–1963.
[8] J. Zhu, R. Kaplan, J. Johnson, F. Li, HiDDen:
Hiding data with deep networks, in: Proc. Eur. Conf.</p>
      <p>Comput. Vis., 2018, pp. 657–672.
[9] K. A. Zhang, A. Cuesta-Infante, L. Xu, K.
Veeramachaneni, SteganoGAN: High capacity
image steganography with GANs, arXiv preprint
arXiv:1901.03892 (2019).
[10] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D.
Erhan, I. Goodfellow, R. Fergus, Intriguing properties
of neural networks, arXiv preprint arXiv:1312.6199
(2013).
[11] I. J. Goodfellow, J. Shlens, C. Szegedy,
Explaining and harnessing adversarial examples, arXiv
preprint arXiv:1412.6572 (2014).
[12] W. Tang, B. Li, S. Tan, M. Barni, J. Huang,
CNNbased adversarial embedding for image
steganogra</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>