=Paper= {{Paper |id=Vol-3087/paper_5 |storemode=property |title=Leveraging Multi-task Learning for Unambiguous and Flexible Deep Neural Network Watermarking |pdfUrl=https://ceur-ws.org/Vol-3087/paper_5.pdf |volume=Vol-3087 |authors=Fangqi Li,Lei Yang,Shilin Wang,Alan Wee-Chung Liew |dblpUrl=https://dblp.org/rec/conf/aaai/LiYWL22 }} ==Leveraging Multi-task Learning for Unambiguous and Flexible Deep Neural Network Watermarking== https://ceur-ws.org/Vol-3087/paper_5.pdf
     Leveraging Multi-task Learning for Unambiguous and Flexible Deep Neural
                             Network Watermarking
                           Fangqi Li1 , Lei Yang1 , Shilin Wang1 * , Alan Wee-Chung Liew2
                  1
                      School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University
                                               {solour lfq,yangleisx,wsl}@sjtu.edu.cn
                             2
                               School of Information and Communication Technology, Griffith University
                                                        a.liew@griffith.edu.au


                             Abstract                                                      The primary branch
                                                                                                                              Predictions for
  Deep neural networks are playing an important role in many          Dprimary                                               the primary task.
  real-life applications. An important prerequisite in commer-
  cializing deep neural networks is the identification of their                                                     cp              7
  genuine owners. Therefore, watermarking schemes that em-
  bed the owner’s identity information into the models have             key
                                                                                                      ······
                                                                       DWM                                                     Predictions for
  been proposed. However, current schemes cannot meet all                                                                  the watermarking task.
  the security requirements such as unambiguity and are inflex-
  ible since most of them focus on classification models. To                                                       cWM            +/−
  meet the formal definitions of the security requirements and
  increase the applicability of deep neural network watermark-
  ing schemes, we propose a new method, MTLSign, based                 key
                                                                                          The watermark branch.
  on multi-task learning. By treating the watermark embedding
  as an extra task, the security requirements are explicitly for-
  mulated and met with well-designed regularizers and com-
                                                                     Figure 1: Architecture of MTLSign. The orange blocks are
  ponents from cryptography. Experiments have demonstrated           the backbone DNN, cp and cWM are classifier backends for
  that MTLSign is flexible and robust for practical security in      the primary task and the watermarking task respectively.
  machine learning applications.

                                                                        If the pirated model is deployed as an API then the owner
                       1    Introduction                             has to adopt backdoor-based watermarking schemes (Zhang
Deep neural network (DNN) is spearheading artificial intel-          et al. 2018; Adi et al. 2018), where special triggers evoke
ligence with broad application in assorted fields. Training a        certain outputs. Triggers can be generated from an autoen-
DNN is expensive, a large amount of data has to be collected         coder (Li et al. 2019b; Li and Wang 2021), adversarial sam-
and preprocessed, following the data preparation is parame-          ples (Le Merrer, Perez, and Trédan 2020), or exceptional
ter tuning and DNN structure optimizing. On the contrary,            samples (Li et al. 2019a). Backdoor-based watermarking
using a DNN is easy: a user simply propagates the input              schemes are fragile given backdoor clearance methods (Liu
forward. Such imbalance between DNN production and de-               et al. 2020; Li et al. 2021; Namba and Sakuma 2019). Model
ployment calls for protecting DNN models as intellectual             tuning such as fine-pruning (Liu, Dolan-Gavitt, and Garg
properties (IP) against piracy. Moreover, the identification         2018) can also block some backdoors and hence the water-
of DNN’s owner forms the basis of the accountability of AI           mark.
systems.                                                                If the entire suspicious model is accessible, e.g., in model
   Watermarking is an influential method for DNN IP pro-             competitions and project certifications, then weight-based
tection (Uchida et al. 2017). Some information is embedded           watermarks can incorporate the owner’s identity information
into the neural network as the watermark. After adversaries          into the weights of a DNN (Uchida et al. 2017), or the statis-
stealing the model and pretending to have built it on them-          tics of the intermediate feature maps (Darvish, Chen, and
selves, an ownership verification (OV) process reveals the           Koushanfar 2019). These white-box schemes usually carry
hidden information and identifies the authentic owner.               more information and have a larger forensics value.
    * S. Wang is the corresponding author. This work was supported
                                                                        Hitherto, most watermarking methods are only designed
by National Natural Science Foundation of China (61771310). Part
                                                                     and examined for DNNs for image classification or de-
of the work appeared as https://arxiv.org/pdf/2108.09065.pdf         pend on specialized layers. Such inflexibility challenges
Copyright © 2022, 2022 for this paper by its authors. Use permit-    the broader application of DNN watermarking schemes
ted under Creative Commons License Attribution 4.0 International     as a commercial standard. Moreover, some basic security
(CC BY 4.0).                                                         requirements against adversarial attacks have been over-
looked. The robustness of watermarks against new adaptive           2.2   Functionality-preserving and covertness
attacks such as the spoil attack (Li, Wang, and Liew 2021)          The watermarked DNN should perform slightly worse than,
also requires more attention.                                       if not as well as, the clean model. The formal definition is:
   To overcome these difficulties, we propose a new white-
box DNN watermarking scheme based on multi-task learn-                    Pr(x,y)∼Tprimary {d(Mclean (x), MWM (x)) ≤ δ} ≈ 1,
ing (MTL) (Sener and Koltun 2018), MTLSign, as shown                which can be examined a posteriori. However, it is hard to
in Fig. 1. By modeling the watermark embedding proce-               explicitly incorporate this definition into the watermarking
dure as an extra task, security requirements are satisfied with     scheme. Instead, we resort to the following definition:
well-designed regularizers. This extra task has an indepen-
dent backend classifier, hence it can verify the ownership of                   ∀x ∈ X , d(Mclean (x), MWM (x)) ≤ δ.           (3)
arbitrary models. Cryptological primitives are adopted to in-
                                                                    To meet Eq. (3), we only have to ensure that the parame-
stantiate the watermarking task, making MTLSign provably
                                                                    ters of MWM do not deviate from those of Mclean too much.
secure against the ambiguity attack. The major contributions
                                                                    Meanwhile, such small deviation is also the requirement of
of our work are three-fold:
                                                                    covertness, i.e., the secrecy of the watermark (Ganju et al.
 • We examine the security requirements for DNN water-              2018). The owner should be able to control the level of this
   mark, especially the unambiguity, in a formal manner.            difference. Let θ be a parameter within WM that regulates
 • A DNN watermarking scheme based on MTL is pro-                   such difference. It is desirable that in the extreme case where
   posed. It can be applied to DNNs for tasks other than            θ approaches zero, the watermarked model converges to the
   image classification, the major focus of previous works.         clean model:
 • Experiments show that MTLSign is more robust, flex-                             MWM → Mclean , when θ → 0.                  (4)
   ible, and secure compared with several state-of-the-art
   schemes.                                                         So the owner can select the optimal level of functionality/-
                                                                    covertness by modifying θ.
              2    Security Requirements                            2.3   Robustness against tuning
We assume that the adversary possesses fewer data than the          An adversary can tune M by running backpropagation
owner (otherwise the piracy is unnecessary), but has full           on a local dataset, pruning unnecessary neurons (NP), or
knowledge of the watermarking scheme and can tune the               pruning and fine-tuning M (FP). It is suggested that FP
model adaptively. The pirated deep learning model fulfils           can efficiently eliminate backdoors from image classifica-
a primary task, Tprimary , with dataset Dprimary , data space X ,   tion models and watermarks within (Liu, Dolan-Gavitt, and
label space Y and a metric d on Y. We study four crucial            Garg 2018). After being tuned on the adversary’s dataset
security requirements confronting DNN IP protection.                Dadversary , the model’s parameters shift and the verification
                                                                                                           Dadversary
2.1   Unambiguity                                                   of the watermark might fail. Let M 0 ←−−−−− MWM denotes
                                                                    a model M 0 obtained by tuning MWM with Dadversary . As
A DNN watermarking scheme WM composed of a key gen-                 shown in Fig. 2(b), a watermarking scheme is robust against
eration module Gen and embedding module Embed, it first             tuning if:
generates a key for the owner with security parameter N :
                                                                             Pr {verify(M 0 , key) = 1} ≥ 1 − (N ).           (5)
                                      N
                      key ← Gen(1 ),
                                                                    To meet (5), the owner has to make verify(·, key) insen-
then embed key into a clean model Mclean :                          sitive to tuning in the neighbour of MWM .

        (MWM , verify) ← Embed(Mclean , key).                       2.4   Flexibility
                                                                    Many white-box DNN watermarking schemes rely on extra
where MWM is the watermarked DNN model and verify
                                                                    modules as passport layers or specialized network architec-
is the (possibly publicly available) ownership verifier (Li,
                                                                    tures (Fan et al. 2021). Therefore, they cannot be readily ap-
Wang, and Liew 2021). To accurately verify the ownership,
                                                                    plied to arbitrary DNN models. To ensure generalization, it
it is necessary and sufficient that:
                                                                    is desirable that the watermarking scheme does not depend
       Pr {verify(MWM , key) = 1} ≥ 1 − (N ),               (1)    on specific modules incorporated within the DNN or explic-
                                                                    itly modify the product’s structure.
       Pr {verify(MWM , key0 ) = 0} ≥ 1 − (N ),             (2)       A comprehensive summary of established watermarking
                                                                    schemes judged according to the enumerated security re-
where  declines exponentially in N and key0 6= key is a
                                                                    quirements is given in Table 1.
random key. Claiming ownership with verify and key0 is
the ambiguity attack, hence Eq. (2) is defined as the unam-         Remark Apart from these major requirements, there are
biguity property, which is demonstrated in Fig. 2(a). Unam-         secondary security demands such as the security against
biguity has been examined for certain models as GAN (Ong            overwriting and declaration attack as shown in Fig. 2(c), re-
et al. 2021) but its formal connection with the security pa-        moval, privacy concerns, etc. We save the examinations and
rameter has not been established.                                   discussions on these demands to the empirical studies.
                                                                                                                   M 0 is my product!
       Owner
                        key                                 Owner                                                Adversary          key0
                      MWM                                                  key
                                                                                                        MWM                                    M0
                                    verify            MWM                  M0
                                              X                                         verify                     overwrite
                                                                                                                               WM
                                                                 tune
                                              ×                                                  X
      Adversary
                                                                                                                   verify0

                        key 0
                                                               Adversary
                                                                                                                                           X

      (a) Security against the ambiguity attack.       (b) Robustnenss against tuning.                     (c) Redeclaration attack.

                                Figure 2: Some security requirements and threats in DNN IP protection.

                                Table 1: Security requirements and established watermarking schemes.


                                                                                         Functionality-        Robustness
                      Scheme                         Type           Unambiguity                                                            Flexibility
                                                                                          preserving          against tuning
           (Uchida et al. 2017)                    White-box               ×                     X                     ×                            X
  (Darvish, Chen, and Koushanfar 2019)             White-box               X                     X                     X                            ×
              (Li et al. 2019a)                    Black-box               X                     X                     X                            ×
             (Zhu et al. 2020)                     Black-box               X                     X                     X                            ×
            (Guan et al. 2020)                     White-box               X                     X                     ×                            ×
   (Le Merrer, Perez, and Trédan 2020)            Black-box               ×                     X                     X                            X
             (Ong et al. 2021)                     Black-box               ×                     X                     X                            ×
             (Fan et al. 2021)                     Black-box               X                     X                     X                            ×
       (Liu, Weng, and Zhu 2021)                   White-box               ×                     X                     X                            ×
                    Ours.                          White-box               X                     X                     X                            X



                  3   The Proposed Method                                       ent layers of the backbone network, so it is harder to in-
                                                                                validate the watermark completely compared with passport-
3.1      Motivation                                                             layer based schemes.
We leverage multi-task learning to design a white-box water-                       To produce a watermarked model, the owner should:
marking framework for DNN IP protection. The watermark                                                    key                       N
embedding is modeled as an additional task TWM . A classi-                      1. Generate N samples DWM     = {xi , yi }i=1 using a pseudo-
fier for TWM is built independent to the backend for Tprimary ,                    random algorithm with key as the seed.
                                                                                                                                          key
so common tunings such as fine-tune last layer (FTLL) or                        2. Optimize the DNN to jointly minimize the loss on DWM
re-train last layers (RTLL) (Adi et al. 2018) have no impact                       and Dprimary . During the optimization, a series of regu-
on our watermark. After training and watermark embedding,                          larizers are designed to meet the security requirements
only the network structure for Tprimary is published.                              enumerated in Section 2.
   Under this formulation, the functionality-preserving prop-                   3. Publishes MWM .
erty and the security against tuning can be formally ad-
dressed. A decently designed TWM ensures the security                             To prove its ownership over a model M to a third-party
against ambiguity attacks as well, making MTLSign a se-                         customer, the owner and the customer conduct the follow-
cure and flexible option for DNN IP protection. To better                       ings:
handle the forensic difficulties involving watermark redec-                     1. The owner submits M , cWM and key.
laration, we adopt a decentralized consensus protocol to au-                    2. The customer checks whether cWM is consistent with
thorize the time-stamp correlated with the watermarks.                             M ’s architecture.
                                                                                                                key
                                                                                3. The customer generates DWM       from key and combines
3.2      The watermarking scheme MTLSign                                           cWM with M ’s backbone to reproduce fWM .
The structure of the watermarking scheme MTLSign is il-                                                       key
                                                                                4. If fWM statistically fits DWM  then the customer confirms
lustrated in Fig. 1. The entire network consists of the back-                      the owner’s ownership over M .
bone network and two independent backends: cp and cWM .
The published watermarked model MWM is the backbone                             The implementation of TWM The watermark task TWM
                                                                                                                                          key
followed by cp and fWM is the watermarking branch in                            is instantiated as a binary classification. To generate DWM   ,
which cWM takes the output of different layers from the                         key is used as the seed of a pseudo-random generator (e.g.,
backbone as its input. cWM monitors the outputs of differ-                      a stream cipher) to generate π key , a sequence of N different
integers from [0, · · · , 2m − 1], and a binary string lkey of      where lWM (·, ·) is the cross entropy loss, and
length N , where m = 3dlog2 (N )e.
   For each type of data space X , a deterministic and injec-                         Rfunc (W) = kW − W0 k22 .                  (8)
tive function is adopted to map each integer in π key into an       The regularizer Rfunc in (8) confines W in the neighbour of
element in X . For example, when X is the image domain,             W0 . Then the continuity of MWM as a function of W ensures
the mapping could be the QRcode encoder. When X is the              the functionality-preserving property defined in (3).
sequence of words in English, the mapping could map an in-
                                                                    Remark on covertness Note that λ1 = θ−1 regarding
teger n into the n-th word of the dictionary. Without loss of
                                                                    Eq. (4) regulates the parameter deviation of MWM from
generality, let π key [i] denote the mapped data from the i-th
                                                                    Mclean . If the owner adopts a large λ1 then it obtains a high
integer in π key . Both the pseudo-random generator and the
                                                                    level of covertness. Meanwhile, a smaller λ1 trades covert-
functions that map integers into specialized data space are
                                                                    ness for faster convergence of the watermarking task.
accessible for all parties. Now we set:
                 key
                           key                 N                   The tuning regularizer To be robust against adversarial
               DWM    = (πm      [i], lkey [i]) i=1 ,               tuning, it is sufficient to make cWM robust against tuning ac-
where lkey [i] is the i-th bit of lkey . The security require-      cording to the definition in (5). We assume that Dadversary
ments raised in Section 2 are merged into MTLSign as the            shares a similar distribution as Dprimary . Otherwise, the stolen
analysis below.                                                     model would not have state-of-the-art performance on the
                                                                    adversary’s task. A subset of Dprimary is firstly sampled as an
Unambiguity To justify the ownership of a model M to a              estimation of Dadversary . Let W be the current configuration
owner with key given cWM , verify operates as Algo. 1.              of the model’s parameter. Tuning is tantamount to minimiz-
                                                                                                  0
                                                                    ing the empirical loss on Dprimary by starting from W, which
Algorithm 1: verify(·, ·|cWM , γ)                                                                                0
                                                                                                                Dprimary
Require: M , key.                                                   results in the updated parameter: Wt ←−−−− W. In practice,
                                                                                                                    0
Ensure: The verification of M ’s ownership.                         Wt is obtained by replacing Dprimary in (6) by Dprimary and
 1: Build the watermarking branch f from M and cWM ;                training for a few epochs.
                key
 2: Generate DWM from key;                                             To achieve the security in (5), for any Dadversary and
                                                                                key
                                                           key
 3: If f correctly classifies at least γ · N terms within DWM       (x, y) ∈ DWM    , the parameter W should meet:
 4:    Then return 1.                                                                   t
                                                                                                            0
                                                                                                           Dprimary
 5:    Else return 0.
                                                                                     W
                                                                                    fWM (x) = y, Wt ←−−−− W.
                                                                    This condition, together with Algo. 1 implies (5).
   If M = MWM then M has been trained to minimize the                 To exert the constraint in (9) to the training process, we
binary classification loss on TWM , hence the test is likely to     design a new regularizer:
succeed, this justifies the correctness requirement in (1). For                            X                  t
                                                                                                                 W
                                                                                                                         
an arbitrary key0 6= key, the induced watermark training              RDA (W) =                          lWM fWM   (x), y . (9)
        key0        key                                                               D0
data DWM     and DWM     can hardly overlap. It can be proven                          primary       key
                                                                                   Wt ←−−−−−W,(x,y)∈DWM
that if m ≥ log2 (N 3 ) and γ is selected to be significantly
higher than 12 then the probability of a successful ambiguity       Then the loss to be minimized is updated from (7) to:
attack declines exponentially with N , details are given in                               key
                                                                        L2 (W|Dprimary , DWM                        key
                                                                                              ) =L1 (W, Dprimary , DWM  )
Appendix A. This justifies the unambiguity condition (2).                                                                       (10)
                                                                                                 + λ2 · RDA (W).
The functionality-preserving regularizer Denote the
trainable parameters of the DNN model by W. The opti-               RDA defined by (9) can be understood as one kind of data
mization target for Tprimary takes the form:                        augmentation for TWM . Data augmentation aims to improve
                        X                                           the model’s robustness against some specific perturbation in
                                      W
                                                
L0 (W|Dprimary ) =               l MWM    (x), y + λ0 · u(W),       the input domain (Shorten and Khoshgoftaar 2019). This is
                    (x,y)∈Dprimary                                  usually done by adding an extra regularizer:
                                                              (6)                       X
                                                                                                    l f W (x0 ), y .
                                                                                                                  
                                                                                                                            (11)
where l(·, ·) is the loss defined by Tprimary and u(·) is a regu-
                                                                                                 perturb
larizer reflecting the prior knowledge on W.                                      (x,y)∈D,x0 ←−−−x
   Since DWM is much smaller than Dprimary , TWM might              Unlike in the data domain of Tprimary , it is hard to explicitly
not converge properly when being learned simultaneously             define augmentation for TWM against tuning. A regularizer
with Tprimary . Hence we first optimize W w.r.t. the loss on        with the form of (11) can be derived from (9) by interchang-
the primary task (6) to obtain Mclean with parameter W0 =           ing the order of summation. Concretely, the perturbation in
arg minW {L0 (W, Dprimary )}.                                       the watermarking task with the form:
   Then the model is tuned for TWM by minimizing:
                                                                                       W −1  Wt         perturb
                                                                                x0 ∈ fWM         fWM (x) ←−−− x
                                  X
                       key                      W
   L1 (W|Dprimary , DWM    )=             lWM (fWM (x), y)
                                      key
                               (x,y)∈DWM                      (7)   can increase the watermarked model’s robustness against
                               + λ1 · Rfunc (W),                    tuning.
3.3    The ownership verification protocol                         Security of the OV protocol To pirate a model under this
To regulate the OV process against watermark overwriting           protocol, an adversary must obtain a legal key, the hash of
and piracy, one option is to use a trusted authorization cen-      a cWM , and the correct info at earlier than the owner. This
ter, which is vulnerable and expensive. Therefore, we resort       is hard since the adversary has to correctly guess the pirated
to decentralized consensus protocols as Raft (Ongaro and           DNN’s architecture and embed its key into it without mod-
Ousterhout 2014) or PBFT (Castro, Liskov et al. 1999), un-         ifying its cWM . Otherwise, such piracy can be falsified by
der which messages are responded to and recorded by clients        examining the time-stamp.
within the community. By storing the necessary information
into the servers of a distributed community, the watermark                   4     Experiments and Discussions
becomes unforgeable (Li, Wang, and Liew 2021).                     4.1    Experiment Setup
   To conduct an OV, the owner submits the evidence to the
entire community, so each member can independently con-            To illustrate the flexibility of MTLSign, we considered three
duct the verification. The final result is obtained through        primary tasks: image classification (IC), sentimental anal-
voting, the process is illustrated in Fig. 3. The key gener-       ysis (SA) of discourse, and image semantic segmentation
ation process can be tangled with the owner’s digital signa-       (SS). We adopted five datasets for IC, two datasets for SS,
ture (e.g., by a CPA-encryption) so revealing key would not        and two datasets for SA. The descriptions of these datasets
violate the privacy or lead to further threats.                    and the corresponding DNN structures are listed in Table 2.
                                                                      ResNet (He et al. 2016) is a classical model for image
                                                                   processing. For the VirusShare dataset, we compiled a col-
                  M
                                                                   lection of 26,000 malware into images and adopted ResNet
                                   Verification result: !          as the classifier. Glove (Pennington, Socher, and Manning
                                                                   2014) is a pre-trained word embedding, while bidirectional
                                                                   long short-term memory (Bi-LSTM) (Huang, Xu, and Yu
                                         !
                                                     !             2015) is commonly used in NLP. Cascade mask RCNN
                      key
                                    !                              (CMRCNN) (Cai and Vasconcelos 2018) is a DNN special-
                                              %                    ized for semantic segmentation.
      Owner                                        !
                      verify            !                                   Table 2: Datasets and their DNN structures.

                               Public verification community
                                                                           Dataset       Description         DNN structure
              Figure 3: OV process for a DNN.                              MNIST         IC, 10 classes        ResNet-18
                                                                          Fashion-
                                                                                         IC, 10 classes        ResNet-18
                                                                          MNIST
To publish a model An owner B signs and broadcasts the
following message to the entire community:                                CIFAR-10       IC, 10 classes        ResNet-18

hPublish:ktimekhash(key)khash(cWM )khash(info)i,                         CIFAR-100      IC, 100 classes        ResNet-18
                                                                         VirusShare      IC, 10 classes        ResNet-18
where k denotes string concatenation, time is the time
stamp, info explains how cWM connects to the backbone                       IMDb         SA, 2 classes      Glove+Bi-LSTM
model, and hash is a preimage resistant hash function map-                  SST          SA, 5 classes      Glove+Bi-LSTM
ping an object into a string and is accessible for all parties.
Once B is confirmed that the majority of clients has recorded            Penn-Fudan                            ResNet-50+
                                                                                         SS, 2 classes
its broadcast (e.g. when B receives a confirmation from the              -Pedestrian                           CMRCNN
current leader under the Raft protocol), it publishes MWM .                                                    ResNet-50+
                                                                            VOC          SS, 20 classes
                                                                                                               CMRCNN
To prove the ownership over a model For model M , B
signs and broadcasts the following message:
              hOV:klM khash(M )klcWM kkeyi,                           For the image datasets, cWM was a two-layer perceptron
                                                                   that took the outputs of the first three layers from the ResNet
                                                                                                                       key
where lM and lcWM are pointers to M and cWM . Upon receiv-         as input. QRcode was adopted to generate DWM            . For the
ing this request, any client within the consensus community        NLP datasets, the network took the structure in Fig. 4.
can independently conduct the ownership proof. It firstly             Throughout the experiments, we set N = 600. To set
downloads the model from lM and examines its hash. Then            the verification threshold γ in Algo. 1, we tested the clas-
it downloads cWM and retrieves the corresponding message           sification accuracy of a randomly initialized cWM across
from B by hash(cWM ). The last steps follow Section 3.2.           nine datasets over 5,000 watermarking datasets. It was ob-
After finishing the verification, this client broadcasts its re-   served that all accuracy fell in [0.425, 0.575]. We selected
sult as the proof for B’s ownership over the model in lM .         γ = 0.7 so the probability of a successful piracy is less than
                       0/1                                               watermarked model. The spoil attack can always eliminate
                                                     Sentimental
                                                       labels.           the watermark, so as in NP, the statistics of interest is the rel-
                       cWM                                               ative decrease of the performance on Tprimary , which reflects
                                ···                                      the adversary’s expense. We measured these values for all
                                                                         compared schemes in five classification datasets, the results
                                                         cp              are summarized in Fig. 6, detailed implementations of the
  LSTM           LSTM         ······    LSTM
   unit           unit                   unit                            spoil attacks are provided in Appendix B.
                                                                            Our method resulted in only a slight difference in parame-
   this           is         ······    AAAI     Training sentences.      ters compared with other candidates, in particular the white-
                                                                         box competitors. It is harder for an adversary to distinguish
   renaissance   matrix       ······   kalends Watermarking sentences.
                                                                         a model watermarked by MTLSign from a clean one. Re-
                                                                         garding robustness and functionality-preserving, our method
Figure 4: The network architecture for sentimental analysis.             uniformly outperformed other competitors, this is due to: (1)
                                                                         MTLSign does not incorporate backdoors into the model,
                                                                         so adversarial modifications such as FP, which are designed
2.69 × 10−8 with λ = 0.34 in the Chernoff bound according                to eliminate backdoor, can hardly reduce our watermark. (2)
to Appendix A. Dprimary0 took 10% samples randomly from                  MTLSign relies on an extra module, cWM , as a verifier. As
the training dataset. For the tuning attacks, we considered FP           an adversary cannot tamper with this module, universal tun-
and NP. As for adaptive attacks, we adopted the overwriting              ings such as NP have less impact. MTLSign can also adapt
attack and the spoil attack (Li, Wang, and Liew 2021).                   to new tuning operators by incorporating them into RDA .
                                                                         Moreover, MTLSign asserts weak conditions on both the
4.2       Ablation Study                                                 task (e.g. NLP) and the DNN architecture and is more flex-
To examine the efficacy of Rfunc and RDA , we compared the               ible. At last, we consider the overwriting attack, where the
performance of the watermarked DNN MWM under differ-
ent configurations. Three metrics are of interest: (i) The per-          Table 3: Decrease of the accuracy of the watermarking
formance of MWM on Tprimary . (ii) The decline of the per-               branch against watermark overwriting (in %).
formance of MWM on Tprimary when NP made fWM ’s accu-
racy on TWM lower than γ. (iii) The performance of fWM
on TWM after FP. The models were trained by minimizing                                           Number of overwriting epochs
                                                                                Dataset
the MTL loss defined by (10), where we adopted fine-tuning                                       50     150     250       350
and NP and chose the optimal λ1 and λ2 by grid search in                        MNIST              1.0      1.5       1.5        2.0
[0.02, 0.04, · · · , 0.2]. The results are collected in Fig, 5. We
observe that Rfunc preserves the model’s performance on the                    F-MNIST             2.0      2.5       2.5        2.5
primary task. On the other hand, RDA makes the watermark-                     CIFAR-10             4.5      4.5       4.5        4.5
ing branch robust against FP, whose accuracy on TWM is sig-
nificantly higher than the models without RDA . Meanwhile,                    CIFAR-100            0.0      0.5       0.9        0.9
the performance on the primary task has to decrease much                      VirusShare           0.0      0.5       0.5        0.5
larger during NP to invalidate the watermarked model with
RDA , so the adversary has to sacrifice more in order to inval-                  IMDb              3.0      3.0       3.0        3.0
idate the original ownership. Therefore, we suggest that both                     SST              2.5      3.0       3.0        2.5
regularizers be incorporated in watermarking the model.
                                                                            PF-Pedestrian          0.5      1.0       1.0        1.0
4.3       Comparative Studies and Discussion                                     VOC               1.3      2.0       2.1        2.1
For comparison, several SOTA watermarking schemes (Zhu
et al. 2020; Li et al. 2019a; Darvish, Chen, and Koushanfar
2019; Fan et al. 2021) that are secure against the ambiguity             adversary embeds its watermark into the pirated DNN. Al-
attack and tuning were considered. Yet they cannot be read-              though the adversary’s ownership declaration can be falsi-
ily generalized to semantic segmentation and NLP tasks. We               fied by the OV protocol, it is necessary that such overwrit-
generated 600 backdoor/passport/feature map triggers and                 ing does not invalidate the owner’s watermark. The decrease
assigned them with proper labels for each candidate scheme.              of the accuracy of the watermarking branch with the over-
   To compare the levels of covertness, we measured the av-              writing epochs was recorded in Table 3. Since the decrease
erage deviation of parameters after watermarking. For the                is uniformly bounded by 5%, overwriting does not form a
functionality-preserving property and the robustness against             threat to MTLSign.
tuning, we recorded the performance of the watermarked
models on the primary task, the verification accuracy of wa-                                   5     Conclusion
termarks after FP, and the relative decline of the performance           This paper presents MTLSign, an MTL-based DNN wa-
on the primary task when NP invalidated the watermarks.                  termarking scheme. We examine the basic security require-
   Finally, we conducted the spoil attack, an improved wa-               ments for the DNN watermark, especially the unambiguity,
termark removal attack (Li, Wang, and Liew 2021), to the                 and propose to embed the watermark as an additional task.
Figure 5: Ablation study on the efficacy of Rfunc and RDA regarding the three metrics. For the watermarked model’s performance
on SS, the benchmark is mAP, otherwise is the classification accuracy.


The proposed scheme explicitly meets security requirements        Ganju, K.; Wang, Q.; Yang, W.; Gunter, C. A.; and Borisov,
by corresponding regularizers. With a decentralized consen-       N. 2018. Property inference attacks on fully connected neu-
sus protocol, MTLSign is secure against adaptive attacks.         ral networks using permutation invariant representations. In
It is true that like any other white-box DNN watermark-           Proceedings of the 2018 ACM SIGSAC Conference on Com-
ing scheme, MTLSign remains vulnerable to functionality           puter and Communications Security, 619–633.
equivalence attacks such as the neuron permutation. This is       Guan, X.; Feng, H.; Zhang, W.; Zhou, H.; Zhang, J.; and Yu,
one of the aspects that require further effort to increase the    N. 2020. Reversible Watermarking in Deep Convolutional
applicability of DNN watermarks.                                  Neural Networks for Integrity Authentication. In Proceed-
                                                                  ings of the 28th ACM International Conference on Multime-
                       References                                 dia, 2273–2280.
Adi, Y.; Baum, C.; Cisse, M.; Pinkas, B.; and Keshet, J.          He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep resid-
2018. Turning your weakness into a strength: Watermarking         ual learning for image recognition. In Proceedings of the
deep neural networks by backdooring. In 27th {USENIX}             IEEE conference on computer vision and pattern recogni-
Security Symposium ({USENIX} Security 18), 1615–1631.             tion, 770–778.
Cai, Z.; and Vasconcelos, N. 2018. Cascade r-cnn: Delv-           Huang, Z.; Xu, W.; and Yu, K. 2015.            Bidirectional
ing into high quality object detection. In Proceedings of the     LSTM-CRF models for sequence tagging. arXiv preprint
IEEE conference on computer vision and pattern recogni-           arXiv:1508.01991.
tion, 6154–6162.                                                  Le Merrer, E.; Perez, P.; and Trédan, G. 2020. Adversarial
Castro, M.; Liskov, B.; et al. 1999. Practical byzantine fault    frontier stitching for remote neural network watermarking.
tolerance. In OSDI, volume 99, 173–186.                           Neural Computing and Applications, 32(13): 9233–9244.
Darvish, R. B.; Chen, H.; and Koushanfar, F. 2019. Deep-          Li, F.; Wang, S.; and Liew, A. W.-C. 2021. Regulating Own-
Signs: an end-to-end watermarking framework for owner-            ership Verification for Deep Neural Networks: Scenarios,
ship protection of deep neural networks. In Proceedings           Protocols, and Prospects. IJCAI Workshop.
of the Twenty-Fourth International Conference on Architec-        Li, F.-Q.; and Wang, S.-L. 2021. Persistent Watermark For
tural Support for Programming Languages and Operating             Image Classification Neural Networks By Penetrating The
Systems, 485–497.                                                 Autoencoder. In 2021 IEEE International Conference on
Fan, L.; Ng, K. W.; Chan, C. S.; and Yang, Q. 2021. DeepIP:       Image Processing (ICIP), 3063–3067.
Deep Neural Network Intellectual Property Protection with         Li, H.; Willson, E.; Zheng, H.; and Zhao, B. Y. 2019a.
Passports. IEEE Transactions on Pattern Analysis and Ma-          Persistent and unforgeable watermarks for deep neural net-
chine Intelligence.                                               works. arXiv preprint arXiv:1910.01226.
               (a) MNIST.                            (b) Fashion-MNIST.                          (c) CIFAR-10.




                                  (d) CIFAR-100.                           (e) VirusShare.

Figure 6: Comparison between MTLSign and other SOTA schemes. The covertness measures the average deviation of param-
eters after watermark embedding, scaling in [0, 10−2 ]. The functionality-preserving measures the performance of the water-
marked DNN, scaling in [0%, 100%]. The robustness against FP measures the accuracy of the watermarking branch after FP,
scaling in [0%, 100%]. The robustness against NP/spoil measures the decrease of the accuracy of the primary branch when
NP/the spoil attack invalidates the watermark, scaling in [−50%, 0%]/[−10%, 0%].


Li, Y.; Koren, N.; Lyu, L.; Lyu, X.; Li, B.; and Ma, X.         Ong, D. S.; Chan, C. S.; Ng, K. W.; Fan, L.; and Yang, Q.
2021. Neural Attention Distillation: Erasing Backdoor           2021. Protecting Intellectual Property of Generative Adver-
Triggers from Deep Neural Networks. arXiv preprint              sarial Networks From Ambiguity Attacks. In Proceedings of
arXiv:2101.05930.                                               the IEEE/CVF Conference on Computer Vision and Pattern
Li, Z.; Hu, C.; Zhang, Y.; and Guo, S. 2019b. How to            Recognition, 3630–3639.
prove your model belongs to you: a blind-watermark based        Ongaro, D.; and Ousterhout, J. 2014. In search of an under-
framework to protect intellectual property of DNN. In Pro-      standable consensus algorithm. In 2014 {USENIX} Annual
ceedings of the 35th Annual Computer Security Applications      Technical Conference ({USENIX}{ATC} 14), 305–319.
Conference, 126–137.                                            Pennington, J.; Socher, R.; and Manning, C. D. 2014. Glove:
Liu, H.; Weng, Z.; and Zhu, Y. 2021. Watermarking Deep          Global vectors for word representation. In Proceedings of
Neural Networks with Greedy Residuals. In International         the 2014 conference on empirical methods in natural lan-
Conference on Machine Learning, 6978–6988. PMLR.                guage processing (EMNLP), 1532–1543.
Liu, K.; Dolan-Gavitt, B.; and Garg, S. 2018. Fine-pruning:     Sener, O.; and Koltun, V. 2018. Multi-task learning as multi-
Defending against backdooring attacks on deep neural net-       objective optimization. In Advances in Neural Information
works. In International Symposium on Research in Attacks,       Processing Systems, 527–538.
Intrusions, and Defenses, 273–294. Springer.                    Shorten, C.; and Khoshgoftaar, T. M. 2019. A survey on
Liu, X.; Li, F.; Wen, B.; and Li, Q. 2020. Removing             image data augmentation for deep learning. Journal of Big
Backdoor-Based Watermarks in Neural Networks with Lim-          Data, 6(1): 60–107.
ited Data. arXiv preprint arXiv:2008.00407.                     Uchida, Y.; Nagai, Y.; Sakazawa, S.; and Satoh, S. 2017.
Namba, R.; and Sakuma, J. 2019. Robust watermarking of          Embedding watermarks into deep neural networks. In Pro-
neural network with exponential weighting. In Proceedings       ceedings of the 2017 ACM on International Conference on
of the 2019 ACM Asia Conference on Computer and Com-            Multimedia Retrieval, 269–277.
munications Security, 228–240.                                  Zhang, J.; Gu, Z.; Jang, J.; Wu, H.; Stoecklin, M. P.; Huang,
H.; and Molloy, I. 2018. Protecting intellectual property of      objective is to tune MWM into Mspoiled in order to escape IP
deep neural networks with watermarking. In Proceedings of         regulation, which means the following condition holds with
the 2018 on Asia Conference on Computer and Communi-              a large probability:
cations Security, 159–172.
                                                                                  verify(Mspoiled , key) = 0.
Zhu, R.; Zhang, X.; Shi, M.; and Tang, Z. 2020. Secure neu-
ral network watermarking protocol against forging attack.            For the backdoor-based watermarking schemes, key is
EURASIP Journal on Image and Video Processing, 2020(1):           uniquely corelated with a collection of labelled triggers
1–12.                                                                       N
                                                                  {tn , yn }n=1 . The spoil attack is tantamount to fitting the
                                                                  watermarked model on the same triggers with adversarially
       A    A: Derivation for the umambiguity                     shuffled labels.
                                                                     For the weight-based watermarking schemes, key reveals
                        condition                                 the places where information is hidden. So the adversary
                                                              0
                                                         key
To formulate this intuition, consider the event where DWM         only has to replace these parameters (which is usually a
                            key                                   small part of the entire model) with random values.
shares q · N terms with DWM     , q ∈ (0, 1). With a pseudo-
random generator, it is computationally impossible to dis-           For hybrid white-box watermarking schemes with a com-
tinguish π key from a sequence of N randomly selected inte-       plex verify module such as MTLSign, the adversary has
gers. The same argument holds for lkey and a random binary        to tune the watermarking branch to fit shuffled labels with
string of length N . Therefore the probability of this event      the backend fixed. The loss function to be minimized can be
can be upper bounded by:                                          written as:
                                                                                       X
 
   N
                                                
                                                     r
                                                         qN      L(Wbackbone ) =             lWM (y 0 , cWM (M (x|Wbackbone ))),
                     (1−q)N
       ·rqN ·(1 − r)         ≤ (1 + (1 − q)N )                ,                            key
                                                                                    (x,y)∈DWM
  qN                                               1−r
           N
where r = 2m+1                                   1
               . For an arbitrary q, let r < 2+(1−q)N then        in which y 0 is a randomly assigned label independent from
                         0
                                                                  y. This attack usually results in a large-scale shift of the pa-
                         key                 key
the probability that DWM     overlaps with DWM   with a portion   rameters within the backbone DNN. If the adversary cannot
of q declines exponentially.                                      properly fine-tune the model afterward (which is always the
   For numbers not appeared in π key , the watermarking           case in practice since otherwise the adversary would have
branch is expected to output a random guess. Therefore if         already acquired enough data and can train its DNN from
                                          key0                    scratch) then the DNN’s SOTA performance is at risk as
q is smaller than a threshold τ then DWM       can hardly pass
the statistical test in Algo. 1 with N big enough. So let         demonstrated in the empirical studies.

               m ≥ log2 [2N (2 + (1 − τ )N )]
and N be large enough would make an effective collision
in the watermark dataset almost impossible. For simplicity,
setting m = 3 · dlog2 (N )e ≥ log2 (N 3 ) is sufficient.
   To select the threshold γ, assume that the random guess
strategy achieves an average accuracy of at most p = 0.5 +
α(N ), where α is a negligible function. The verification pro-
cess returns 1 iff the watermark classifier achieves binary
classification of accuracy no less than γ. The demand for se-
curity is that by randomly guessing, the probability that an
adversary passes the test declines exponentially with n. Let
X denote the number of correct guesses with average accu-
racy p, an adversary succeeds only if X ≥ γ · N . By the
Chernoff theorem:
                                                N
                                 1 − p + p · eλ
                              
           Pr {X ≥ γ · N } ≤                         ,
                                      eγ·λ
where λ is an arbitrary nonnegative number.If γ is larger
                                                        
                                                      λ
than p by a constant independent of N then 1−p+p·e
                                                 eγ·λ
                                                          is
less than unity with proper λ, reducing the probability of a
successful attack into negligibility.

   B       B: Implementation of the spoil attacks
During the spoil attack, the adversary has full knowledge
of key, verify, and has obtained MWM . The adversary’s