=Paper=
{{Paper
|id=Vol-3087/paper_5
|storemode=property
|title=Leveraging Multi-task Learning for Unambiguous and Flexible Deep Neural Network Watermarking
|pdfUrl=https://ceur-ws.org/Vol-3087/paper_5.pdf
|volume=Vol-3087
|authors=Fangqi Li,Lei Yang,Shilin Wang,Alan Wee-Chung Liew
|dblpUrl=https://dblp.org/rec/conf/aaai/LiYWL22
}}
==Leveraging Multi-task Learning for Unambiguous and Flexible Deep Neural Network Watermarking==
Leveraging Multi-task Learning for Unambiguous and Flexible Deep Neural Network Watermarking Fangqi Li1 , Lei Yang1 , Shilin Wang1 * , Alan Wee-Chung Liew2 1 School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University {solour lfq,yangleisx,wsl}@sjtu.edu.cn 2 School of Information and Communication Technology, Griffith University a.liew@griffith.edu.au Abstract The primary branch Predictions for Deep neural networks are playing an important role in many Dprimary the primary task. real-life applications. An important prerequisite in commer- cializing deep neural networks is the identification of their cp 7 genuine owners. Therefore, watermarking schemes that em- bed the owner’s identity information into the models have key ······ DWM Predictions for been proposed. However, current schemes cannot meet all the watermarking task. the security requirements such as unambiguity and are inflex- ible since most of them focus on classification models. To cWM +/− meet the formal definitions of the security requirements and increase the applicability of deep neural network watermark- ing schemes, we propose a new method, MTLSign, based key The watermark branch. on multi-task learning. By treating the watermark embedding as an extra task, the security requirements are explicitly for- mulated and met with well-designed regularizers and com- Figure 1: Architecture of MTLSign. The orange blocks are ponents from cryptography. Experiments have demonstrated the backbone DNN, cp and cWM are classifier backends for that MTLSign is flexible and robust for practical security in the primary task and the watermarking task respectively. machine learning applications. If the pirated model is deployed as an API then the owner 1 Introduction has to adopt backdoor-based watermarking schemes (Zhang Deep neural network (DNN) is spearheading artificial intel- et al. 2018; Adi et al. 2018), where special triggers evoke ligence with broad application in assorted fields. Training a certain outputs. Triggers can be generated from an autoen- DNN is expensive, a large amount of data has to be collected coder (Li et al. 2019b; Li and Wang 2021), adversarial sam- and preprocessed, following the data preparation is parame- ples (Le Merrer, Perez, and Trédan 2020), or exceptional ter tuning and DNN structure optimizing. On the contrary, samples (Li et al. 2019a). Backdoor-based watermarking using a DNN is easy: a user simply propagates the input schemes are fragile given backdoor clearance methods (Liu forward. Such imbalance between DNN production and de- et al. 2020; Li et al. 2021; Namba and Sakuma 2019). Model ployment calls for protecting DNN models as intellectual tuning such as fine-pruning (Liu, Dolan-Gavitt, and Garg properties (IP) against piracy. Moreover, the identification 2018) can also block some backdoors and hence the water- of DNN’s owner forms the basis of the accountability of AI mark. systems. If the entire suspicious model is accessible, e.g., in model Watermarking is an influential method for DNN IP pro- competitions and project certifications, then weight-based tection (Uchida et al. 2017). Some information is embedded watermarks can incorporate the owner’s identity information into the neural network as the watermark. After adversaries into the weights of a DNN (Uchida et al. 2017), or the statis- stealing the model and pretending to have built it on them- tics of the intermediate feature maps (Darvish, Chen, and selves, an ownership verification (OV) process reveals the Koushanfar 2019). These white-box schemes usually carry hidden information and identifies the authentic owner. more information and have a larger forensics value. * S. Wang is the corresponding author. This work was supported Hitherto, most watermarking methods are only designed by National Natural Science Foundation of China (61771310). Part and examined for DNNs for image classification or de- of the work appeared as https://arxiv.org/pdf/2108.09065.pdf pend on specialized layers. Such inflexibility challenges Copyright © 2022, 2022 for this paper by its authors. Use permit- the broader application of DNN watermarking schemes ted under Creative Commons License Attribution 4.0 International as a commercial standard. Moreover, some basic security (CC BY 4.0). requirements against adversarial attacks have been over- looked. The robustness of watermarks against new adaptive 2.2 Functionality-preserving and covertness attacks such as the spoil attack (Li, Wang, and Liew 2021) The watermarked DNN should perform slightly worse than, also requires more attention. if not as well as, the clean model. The formal definition is: To overcome these difficulties, we propose a new white- box DNN watermarking scheme based on multi-task learn- Pr(x,y)∼Tprimary {d(Mclean (x), MWM (x)) ≤ δ} ≈ 1, ing (MTL) (Sener and Koltun 2018), MTLSign, as shown which can be examined a posteriori. However, it is hard to in Fig. 1. By modeling the watermark embedding proce- explicitly incorporate this definition into the watermarking dure as an extra task, security requirements are satisfied with scheme. Instead, we resort to the following definition: well-designed regularizers. This extra task has an indepen- dent backend classifier, hence it can verify the ownership of ∀x ∈ X , d(Mclean (x), MWM (x)) ≤ δ. (3) arbitrary models. Cryptological primitives are adopted to in- To meet Eq. (3), we only have to ensure that the parame- stantiate the watermarking task, making MTLSign provably ters of MWM do not deviate from those of Mclean too much. secure against the ambiguity attack. The major contributions Meanwhile, such small deviation is also the requirement of of our work are three-fold: covertness, i.e., the secrecy of the watermark (Ganju et al. • We examine the security requirements for DNN water- 2018). The owner should be able to control the level of this mark, especially the unambiguity, in a formal manner. difference. Let θ be a parameter within WM that regulates • A DNN watermarking scheme based on MTL is pro- such difference. It is desirable that in the extreme case where posed. It can be applied to DNNs for tasks other than θ approaches zero, the watermarked model converges to the image classification, the major focus of previous works. clean model: • Experiments show that MTLSign is more robust, flex- MWM → Mclean , when θ → 0. (4) ible, and secure compared with several state-of-the-art schemes. So the owner can select the optimal level of functionality/- covertness by modifying θ. 2 Security Requirements 2.3 Robustness against tuning We assume that the adversary possesses fewer data than the An adversary can tune M by running backpropagation owner (otherwise the piracy is unnecessary), but has full on a local dataset, pruning unnecessary neurons (NP), or knowledge of the watermarking scheme and can tune the pruning and fine-tuning M (FP). It is suggested that FP model adaptively. The pirated deep learning model fulfils can efficiently eliminate backdoors from image classifica- a primary task, Tprimary , with dataset Dprimary , data space X , tion models and watermarks within (Liu, Dolan-Gavitt, and label space Y and a metric d on Y. We study four crucial Garg 2018). After being tuned on the adversary’s dataset security requirements confronting DNN IP protection. Dadversary , the model’s parameters shift and the verification Dadversary 2.1 Unambiguity of the watermark might fail. Let M 0 ←−−−−− MWM denotes a model M 0 obtained by tuning MWM with Dadversary . As A DNN watermarking scheme WM composed of a key gen- shown in Fig. 2(b), a watermarking scheme is robust against eration module Gen and embedding module Embed, it first tuning if: generates a key for the owner with security parameter N : Pr {verify(M 0 , key) = 1} ≥ 1 − (N ). (5) N key ← Gen(1 ), To meet (5), the owner has to make verify(·, key) insen- then embed key into a clean model Mclean : sitive to tuning in the neighbour of MWM . (MWM , verify) ← Embed(Mclean , key). 2.4 Flexibility Many white-box DNN watermarking schemes rely on extra where MWM is the watermarked DNN model and verify modules as passport layers or specialized network architec- is the (possibly publicly available) ownership verifier (Li, tures (Fan et al. 2021). Therefore, they cannot be readily ap- Wang, and Liew 2021). To accurately verify the ownership, plied to arbitrary DNN models. To ensure generalization, it it is necessary and sufficient that: is desirable that the watermarking scheme does not depend Pr {verify(MWM , key) = 1} ≥ 1 − (N ), (1) on specific modules incorporated within the DNN or explic- itly modify the product’s structure. Pr {verify(MWM , key0 ) = 0} ≥ 1 − (N ), (2) A comprehensive summary of established watermarking schemes judged according to the enumerated security re- where declines exponentially in N and key0 6= key is a quirements is given in Table 1. random key. Claiming ownership with verify and key0 is the ambiguity attack, hence Eq. (2) is defined as the unam- Remark Apart from these major requirements, there are biguity property, which is demonstrated in Fig. 2(a). Unam- secondary security demands such as the security against biguity has been examined for certain models as GAN (Ong overwriting and declaration attack as shown in Fig. 2(c), re- et al. 2021) but its formal connection with the security pa- moval, privacy concerns, etc. We save the examinations and rameter has not been established. discussions on these demands to the empirical studies. M 0 is my product! Owner key Owner Adversary key0 MWM key MWM M0 verify MWM M0 X verify overwrite WM tune × X Adversary verify0 key 0 Adversary X (a) Security against the ambiguity attack. (b) Robustnenss against tuning. (c) Redeclaration attack. Figure 2: Some security requirements and threats in DNN IP protection. Table 1: Security requirements and established watermarking schemes. Functionality- Robustness Scheme Type Unambiguity Flexibility preserving against tuning (Uchida et al. 2017) White-box × X × X (Darvish, Chen, and Koushanfar 2019) White-box X X X × (Li et al. 2019a) Black-box X X X × (Zhu et al. 2020) Black-box X X X × (Guan et al. 2020) White-box X X × × (Le Merrer, Perez, and Trédan 2020) Black-box × X X X (Ong et al. 2021) Black-box × X X × (Fan et al. 2021) Black-box X X X × (Liu, Weng, and Zhu 2021) White-box × X X × Ours. White-box X X X X 3 The Proposed Method ent layers of the backbone network, so it is harder to in- validate the watermark completely compared with passport- 3.1 Motivation layer based schemes. We leverage multi-task learning to design a white-box water- To produce a watermarked model, the owner should: marking framework for DNN IP protection. The watermark key N embedding is modeled as an additional task TWM . A classi- 1. Generate N samples DWM = {xi , yi }i=1 using a pseudo- fier for TWM is built independent to the backend for Tprimary , random algorithm with key as the seed. key so common tunings such as fine-tune last layer (FTLL) or 2. Optimize the DNN to jointly minimize the loss on DWM re-train last layers (RTLL) (Adi et al. 2018) have no impact and Dprimary . During the optimization, a series of regu- on our watermark. After training and watermark embedding, larizers are designed to meet the security requirements only the network structure for Tprimary is published. enumerated in Section 2. Under this formulation, the functionality-preserving prop- 3. Publishes MWM . erty and the security against tuning can be formally ad- dressed. A decently designed TWM ensures the security To prove its ownership over a model M to a third-party against ambiguity attacks as well, making MTLSign a se- customer, the owner and the customer conduct the follow- cure and flexible option for DNN IP protection. To better ings: handle the forensic difficulties involving watermark redec- 1. The owner submits M , cWM and key. laration, we adopt a decentralized consensus protocol to au- 2. The customer checks whether cWM is consistent with thorize the time-stamp correlated with the watermarks. M ’s architecture. key 3. The customer generates DWM from key and combines 3.2 The watermarking scheme MTLSign cWM with M ’s backbone to reproduce fWM . The structure of the watermarking scheme MTLSign is il- key 4. If fWM statistically fits DWM then the customer confirms lustrated in Fig. 1. The entire network consists of the back- the owner’s ownership over M . bone network and two independent backends: cp and cWM . The published watermarked model MWM is the backbone The implementation of TWM The watermark task TWM key followed by cp and fWM is the watermarking branch in is instantiated as a binary classification. To generate DWM , which cWM takes the output of different layers from the key is used as the seed of a pseudo-random generator (e.g., backbone as its input. cWM monitors the outputs of differ- a stream cipher) to generate π key , a sequence of N different integers from [0, · · · , 2m − 1], and a binary string lkey of where lWM (·, ·) is the cross entropy loss, and length N , where m = 3dlog2 (N )e. For each type of data space X , a deterministic and injec- Rfunc (W) = kW − W0 k22 . (8) tive function is adopted to map each integer in π key into an The regularizer Rfunc in (8) confines W in the neighbour of element in X . For example, when X is the image domain, W0 . Then the continuity of MWM as a function of W ensures the mapping could be the QRcode encoder. When X is the the functionality-preserving property defined in (3). sequence of words in English, the mapping could map an in- Remark on covertness Note that λ1 = θ−1 regarding teger n into the n-th word of the dictionary. Without loss of Eq. (4) regulates the parameter deviation of MWM from generality, let π key [i] denote the mapped data from the i-th Mclean . If the owner adopts a large λ1 then it obtains a high integer in π key . Both the pseudo-random generator and the level of covertness. Meanwhile, a smaller λ1 trades covert- functions that map integers into specialized data space are ness for faster convergence of the watermarking task. accessible for all parties. Now we set: key key N The tuning regularizer To be robust against adversarial DWM = (πm [i], lkey [i]) i=1 , tuning, it is sufficient to make cWM robust against tuning ac- where lkey [i] is the i-th bit of lkey . The security require- cording to the definition in (5). We assume that Dadversary ments raised in Section 2 are merged into MTLSign as the shares a similar distribution as Dprimary . Otherwise, the stolen analysis below. model would not have state-of-the-art performance on the adversary’s task. A subset of Dprimary is firstly sampled as an Unambiguity To justify the ownership of a model M to a estimation of Dadversary . Let W be the current configuration owner with key given cWM , verify operates as Algo. 1. of the model’s parameter. Tuning is tantamount to minimiz- 0 ing the empirical loss on Dprimary by starting from W, which Algorithm 1: verify(·, ·|cWM , γ) 0 Dprimary Require: M , key. results in the updated parameter: Wt ←−−−− W. In practice, 0 Ensure: The verification of M ’s ownership. Wt is obtained by replacing Dprimary in (6) by Dprimary and 1: Build the watermarking branch f from M and cWM ; training for a few epochs. key 2: Generate DWM from key; To achieve the security in (5), for any Dadversary and key key 3: If f correctly classifies at least γ · N terms within DWM (x, y) ∈ DWM , the parameter W should meet: 4: Then return 1. t 0 Dprimary 5: Else return 0. W fWM (x) = y, Wt ←−−−− W. This condition, together with Algo. 1 implies (5). If M = MWM then M has been trained to minimize the To exert the constraint in (9) to the training process, we binary classification loss on TWM , hence the test is likely to design a new regularizer: succeed, this justifies the correctness requirement in (1). For X t W an arbitrary key0 6= key, the induced watermark training RDA (W) = lWM fWM (x), y . (9) key0 key D0 data DWM and DWM can hardly overlap. It can be proven primary key Wt ←−−−−−W,(x,y)∈DWM that if m ≥ log2 (N 3 ) and γ is selected to be significantly higher than 12 then the probability of a successful ambiguity Then the loss to be minimized is updated from (7) to: attack declines exponentially with N , details are given in key L2 (W|Dprimary , DWM key ) =L1 (W, Dprimary , DWM ) Appendix A. This justifies the unambiguity condition (2). (10) + λ2 · RDA (W). The functionality-preserving regularizer Denote the trainable parameters of the DNN model by W. The opti- RDA defined by (9) can be understood as one kind of data mization target for Tprimary takes the form: augmentation for TWM . Data augmentation aims to improve X the model’s robustness against some specific perturbation in W L0 (W|Dprimary ) = l MWM (x), y + λ0 · u(W), the input domain (Shorten and Khoshgoftaar 2019). This is (x,y)∈Dprimary usually done by adding an extra regularizer: (6) X l f W (x0 ), y . (11) where l(·, ·) is the loss defined by Tprimary and u(·) is a regu- perturb larizer reflecting the prior knowledge on W. (x,y)∈D,x0 ←−−−x Since DWM is much smaller than Dprimary , TWM might Unlike in the data domain of Tprimary , it is hard to explicitly not converge properly when being learned simultaneously define augmentation for TWM against tuning. A regularizer with Tprimary . Hence we first optimize W w.r.t. the loss on with the form of (11) can be derived from (9) by interchang- the primary task (6) to obtain Mclean with parameter W0 = ing the order of summation. Concretely, the perturbation in arg minW {L0 (W, Dprimary )}. the watermarking task with the form: Then the model is tuned for TWM by minimizing: W −1 Wt perturb x0 ∈ fWM fWM (x) ←−−− x X key W L1 (W|Dprimary , DWM )= lWM (fWM (x), y) key (x,y)∈DWM (7) can increase the watermarked model’s robustness against + λ1 · Rfunc (W), tuning. 3.3 The ownership verification protocol Security of the OV protocol To pirate a model under this To regulate the OV process against watermark overwriting protocol, an adversary must obtain a legal key, the hash of and piracy, one option is to use a trusted authorization cen- a cWM , and the correct info at earlier than the owner. This ter, which is vulnerable and expensive. Therefore, we resort is hard since the adversary has to correctly guess the pirated to decentralized consensus protocols as Raft (Ongaro and DNN’s architecture and embed its key into it without mod- Ousterhout 2014) or PBFT (Castro, Liskov et al. 1999), un- ifying its cWM . Otherwise, such piracy can be falsified by der which messages are responded to and recorded by clients examining the time-stamp. within the community. By storing the necessary information into the servers of a distributed community, the watermark 4 Experiments and Discussions becomes unforgeable (Li, Wang, and Liew 2021). 4.1 Experiment Setup To conduct an OV, the owner submits the evidence to the entire community, so each member can independently con- To illustrate the flexibility of MTLSign, we considered three duct the verification. The final result is obtained through primary tasks: image classification (IC), sentimental anal- voting, the process is illustrated in Fig. 3. The key gener- ysis (SA) of discourse, and image semantic segmentation ation process can be tangled with the owner’s digital signa- (SS). We adopted five datasets for IC, two datasets for SS, ture (e.g., by a CPA-encryption) so revealing key would not and two datasets for SA. The descriptions of these datasets violate the privacy or lead to further threats. and the corresponding DNN structures are listed in Table 2. ResNet (He et al. 2016) is a classical model for image processing. For the VirusShare dataset, we compiled a col- M lection of 26,000 malware into images and adopted ResNet Verification result: ! as the classifier. Glove (Pennington, Socher, and Manning 2014) is a pre-trained word embedding, while bidirectional long short-term memory (Bi-LSTM) (Huang, Xu, and Yu ! ! 2015) is commonly used in NLP. Cascade mask RCNN key ! (CMRCNN) (Cai and Vasconcelos 2018) is a DNN special- % ized for semantic segmentation. Owner ! verify ! Table 2: Datasets and their DNN structures. Public verification community Dataset Description DNN structure Figure 3: OV process for a DNN. MNIST IC, 10 classes ResNet-18 Fashion- IC, 10 classes ResNet-18 MNIST To publish a model An owner B signs and broadcasts the following message to the entire community: CIFAR-10 IC, 10 classes ResNet-18 hPublish:ktimekhash(key)khash(cWM )khash(info)i, CIFAR-100 IC, 100 classes ResNet-18 VirusShare IC, 10 classes ResNet-18 where k denotes string concatenation, time is the time stamp, info explains how cWM connects to the backbone IMDb SA, 2 classes Glove+Bi-LSTM model, and hash is a preimage resistant hash function map- SST SA, 5 classes Glove+Bi-LSTM ping an object into a string and is accessible for all parties. Once B is confirmed that the majority of clients has recorded Penn-Fudan ResNet-50+ SS, 2 classes its broadcast (e.g. when B receives a confirmation from the -Pedestrian CMRCNN current leader under the Raft protocol), it publishes MWM . ResNet-50+ VOC SS, 20 classes CMRCNN To prove the ownership over a model For model M , B signs and broadcasts the following message: hOV:klM khash(M )klcWM kkeyi, For the image datasets, cWM was a two-layer perceptron that took the outputs of the first three layers from the ResNet key where lM and lcWM are pointers to M and cWM . Upon receiv- as input. QRcode was adopted to generate DWM . For the ing this request, any client within the consensus community NLP datasets, the network took the structure in Fig. 4. can independently conduct the ownership proof. It firstly Throughout the experiments, we set N = 600. To set downloads the model from lM and examines its hash. Then the verification threshold γ in Algo. 1, we tested the clas- it downloads cWM and retrieves the corresponding message sification accuracy of a randomly initialized cWM across from B by hash(cWM ). The last steps follow Section 3.2. nine datasets over 5,000 watermarking datasets. It was ob- After finishing the verification, this client broadcasts its re- served that all accuracy fell in [0.425, 0.575]. We selected sult as the proof for B’s ownership over the model in lM . γ = 0.7 so the probability of a successful piracy is less than 0/1 watermarked model. The spoil attack can always eliminate Sentimental labels. the watermark, so as in NP, the statistics of interest is the rel- cWM ative decrease of the performance on Tprimary , which reflects ··· the adversary’s expense. We measured these values for all compared schemes in five classification datasets, the results cp are summarized in Fig. 6, detailed implementations of the LSTM LSTM ······ LSTM unit unit unit spoil attacks are provided in Appendix B. Our method resulted in only a slight difference in parame- this is ······ AAAI Training sentences. ters compared with other candidates, in particular the white- box competitors. It is harder for an adversary to distinguish renaissance matrix ······ kalends Watermarking sentences. a model watermarked by MTLSign from a clean one. Re- garding robustness and functionality-preserving, our method Figure 4: The network architecture for sentimental analysis. uniformly outperformed other competitors, this is due to: (1) MTLSign does not incorporate backdoors into the model, so adversarial modifications such as FP, which are designed 2.69 × 10−8 with λ = 0.34 in the Chernoff bound according to eliminate backdoor, can hardly reduce our watermark. (2) to Appendix A. Dprimary0 took 10% samples randomly from MTLSign relies on an extra module, cWM , as a verifier. As the training dataset. For the tuning attacks, we considered FP an adversary cannot tamper with this module, universal tun- and NP. As for adaptive attacks, we adopted the overwriting ings such as NP have less impact. MTLSign can also adapt attack and the spoil attack (Li, Wang, and Liew 2021). to new tuning operators by incorporating them into RDA . Moreover, MTLSign asserts weak conditions on both the 4.2 Ablation Study task (e.g. NLP) and the DNN architecture and is more flex- To examine the efficacy of Rfunc and RDA , we compared the ible. At last, we consider the overwriting attack, where the performance of the watermarked DNN MWM under differ- ent configurations. Three metrics are of interest: (i) The per- Table 3: Decrease of the accuracy of the watermarking formance of MWM on Tprimary . (ii) The decline of the per- branch against watermark overwriting (in %). formance of MWM on Tprimary when NP made fWM ’s accu- racy on TWM lower than γ. (iii) The performance of fWM on TWM after FP. The models were trained by minimizing Number of overwriting epochs Dataset the MTL loss defined by (10), where we adopted fine-tuning 50 150 250 350 and NP and chose the optimal λ1 and λ2 by grid search in MNIST 1.0 1.5 1.5 2.0 [0.02, 0.04, · · · , 0.2]. The results are collected in Fig, 5. We observe that Rfunc preserves the model’s performance on the F-MNIST 2.0 2.5 2.5 2.5 primary task. On the other hand, RDA makes the watermark- CIFAR-10 4.5 4.5 4.5 4.5 ing branch robust against FP, whose accuracy on TWM is sig- nificantly higher than the models without RDA . Meanwhile, CIFAR-100 0.0 0.5 0.9 0.9 the performance on the primary task has to decrease much VirusShare 0.0 0.5 0.5 0.5 larger during NP to invalidate the watermarked model with RDA , so the adversary has to sacrifice more in order to inval- IMDb 3.0 3.0 3.0 3.0 idate the original ownership. Therefore, we suggest that both SST 2.5 3.0 3.0 2.5 regularizers be incorporated in watermarking the model. PF-Pedestrian 0.5 1.0 1.0 1.0 4.3 Comparative Studies and Discussion VOC 1.3 2.0 2.1 2.1 For comparison, several SOTA watermarking schemes (Zhu et al. 2020; Li et al. 2019a; Darvish, Chen, and Koushanfar 2019; Fan et al. 2021) that are secure against the ambiguity adversary embeds its watermark into the pirated DNN. Al- attack and tuning were considered. Yet they cannot be read- though the adversary’s ownership declaration can be falsi- ily generalized to semantic segmentation and NLP tasks. We fied by the OV protocol, it is necessary that such overwrit- generated 600 backdoor/passport/feature map triggers and ing does not invalidate the owner’s watermark. The decrease assigned them with proper labels for each candidate scheme. of the accuracy of the watermarking branch with the over- To compare the levels of covertness, we measured the av- writing epochs was recorded in Table 3. Since the decrease erage deviation of parameters after watermarking. For the is uniformly bounded by 5%, overwriting does not form a functionality-preserving property and the robustness against threat to MTLSign. tuning, we recorded the performance of the watermarked models on the primary task, the verification accuracy of wa- 5 Conclusion termarks after FP, and the relative decline of the performance This paper presents MTLSign, an MTL-based DNN wa- on the primary task when NP invalidated the watermarks. termarking scheme. We examine the basic security require- Finally, we conducted the spoil attack, an improved wa- ments for the DNN watermark, especially the unambiguity, termark removal attack (Li, Wang, and Liew 2021), to the and propose to embed the watermark as an additional task. Figure 5: Ablation study on the efficacy of Rfunc and RDA regarding the three metrics. For the watermarked model’s performance on SS, the benchmark is mAP, otherwise is the classification accuracy. The proposed scheme explicitly meets security requirements Ganju, K.; Wang, Q.; Yang, W.; Gunter, C. A.; and Borisov, by corresponding regularizers. With a decentralized consen- N. 2018. Property inference attacks on fully connected neu- sus protocol, MTLSign is secure against adaptive attacks. ral networks using permutation invariant representations. In It is true that like any other white-box DNN watermark- Proceedings of the 2018 ACM SIGSAC Conference on Com- ing scheme, MTLSign remains vulnerable to functionality puter and Communications Security, 619–633. equivalence attacks such as the neuron permutation. This is Guan, X.; Feng, H.; Zhang, W.; Zhou, H.; Zhang, J.; and Yu, one of the aspects that require further effort to increase the N. 2020. Reversible Watermarking in Deep Convolutional applicability of DNN watermarks. Neural Networks for Integrity Authentication. In Proceed- ings of the 28th ACM International Conference on Multime- References dia, 2273–2280. Adi, Y.; Baum, C.; Cisse, M.; Pinkas, B.; and Keshet, J. He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep resid- 2018. Turning your weakness into a strength: Watermarking ual learning for image recognition. In Proceedings of the deep neural networks by backdooring. In 27th {USENIX} IEEE conference on computer vision and pattern recogni- Security Symposium ({USENIX} Security 18), 1615–1631. tion, 770–778. Cai, Z.; and Vasconcelos, N. 2018. Cascade r-cnn: Delv- Huang, Z.; Xu, W.; and Yu, K. 2015. Bidirectional ing into high quality object detection. In Proceedings of the LSTM-CRF models for sequence tagging. arXiv preprint IEEE conference on computer vision and pattern recogni- arXiv:1508.01991. tion, 6154–6162. Le Merrer, E.; Perez, P.; and Trédan, G. 2020. Adversarial Castro, M.; Liskov, B.; et al. 1999. Practical byzantine fault frontier stitching for remote neural network watermarking. tolerance. In OSDI, volume 99, 173–186. Neural Computing and Applications, 32(13): 9233–9244. Darvish, R. B.; Chen, H.; and Koushanfar, F. 2019. Deep- Li, F.; Wang, S.; and Liew, A. W.-C. 2021. Regulating Own- Signs: an end-to-end watermarking framework for owner- ership Verification for Deep Neural Networks: Scenarios, ship protection of deep neural networks. In Proceedings Protocols, and Prospects. IJCAI Workshop. of the Twenty-Fourth International Conference on Architec- Li, F.-Q.; and Wang, S.-L. 2021. Persistent Watermark For tural Support for Programming Languages and Operating Image Classification Neural Networks By Penetrating The Systems, 485–497. Autoencoder. In 2021 IEEE International Conference on Fan, L.; Ng, K. W.; Chan, C. S.; and Yang, Q. 2021. DeepIP: Image Processing (ICIP), 3063–3067. Deep Neural Network Intellectual Property Protection with Li, H.; Willson, E.; Zheng, H.; and Zhao, B. Y. 2019a. Passports. IEEE Transactions on Pattern Analysis and Ma- Persistent and unforgeable watermarks for deep neural net- chine Intelligence. works. arXiv preprint arXiv:1910.01226. (a) MNIST. (b) Fashion-MNIST. (c) CIFAR-10. (d) CIFAR-100. (e) VirusShare. Figure 6: Comparison between MTLSign and other SOTA schemes. The covertness measures the average deviation of param- eters after watermark embedding, scaling in [0, 10−2 ]. The functionality-preserving measures the performance of the water- marked DNN, scaling in [0%, 100%]. The robustness against FP measures the accuracy of the watermarking branch after FP, scaling in [0%, 100%]. The robustness against NP/spoil measures the decrease of the accuracy of the primary branch when NP/the spoil attack invalidates the watermark, scaling in [−50%, 0%]/[−10%, 0%]. Li, Y.; Koren, N.; Lyu, L.; Lyu, X.; Li, B.; and Ma, X. Ong, D. S.; Chan, C. S.; Ng, K. W.; Fan, L.; and Yang, Q. 2021. Neural Attention Distillation: Erasing Backdoor 2021. Protecting Intellectual Property of Generative Adver- Triggers from Deep Neural Networks. arXiv preprint sarial Networks From Ambiguity Attacks. In Proceedings of arXiv:2101.05930. the IEEE/CVF Conference on Computer Vision and Pattern Li, Z.; Hu, C.; Zhang, Y.; and Guo, S. 2019b. How to Recognition, 3630–3639. prove your model belongs to you: a blind-watermark based Ongaro, D.; and Ousterhout, J. 2014. In search of an under- framework to protect intellectual property of DNN. In Pro- standable consensus algorithm. In 2014 {USENIX} Annual ceedings of the 35th Annual Computer Security Applications Technical Conference ({USENIX}{ATC} 14), 305–319. Conference, 126–137. Pennington, J.; Socher, R.; and Manning, C. D. 2014. Glove: Liu, H.; Weng, Z.; and Zhu, Y. 2021. Watermarking Deep Global vectors for word representation. In Proceedings of Neural Networks with Greedy Residuals. In International the 2014 conference on empirical methods in natural lan- Conference on Machine Learning, 6978–6988. PMLR. guage processing (EMNLP), 1532–1543. Liu, K.; Dolan-Gavitt, B.; and Garg, S. 2018. Fine-pruning: Sener, O.; and Koltun, V. 2018. Multi-task learning as multi- Defending against backdooring attacks on deep neural net- objective optimization. In Advances in Neural Information works. In International Symposium on Research in Attacks, Processing Systems, 527–538. Intrusions, and Defenses, 273–294. Springer. Shorten, C.; and Khoshgoftaar, T. M. 2019. A survey on Liu, X.; Li, F.; Wen, B.; and Li, Q. 2020. Removing image data augmentation for deep learning. Journal of Big Backdoor-Based Watermarks in Neural Networks with Lim- Data, 6(1): 60–107. ited Data. arXiv preprint arXiv:2008.00407. Uchida, Y.; Nagai, Y.; Sakazawa, S.; and Satoh, S. 2017. Namba, R.; and Sakuma, J. 2019. Robust watermarking of Embedding watermarks into deep neural networks. In Pro- neural network with exponential weighting. In Proceedings ceedings of the 2017 ACM on International Conference on of the 2019 ACM Asia Conference on Computer and Com- Multimedia Retrieval, 269–277. munications Security, 228–240. Zhang, J.; Gu, Z.; Jang, J.; Wu, H.; Stoecklin, M. P.; Huang, H.; and Molloy, I. 2018. Protecting intellectual property of objective is to tune MWM into Mspoiled in order to escape IP deep neural networks with watermarking. In Proceedings of regulation, which means the following condition holds with the 2018 on Asia Conference on Computer and Communi- a large probability: cations Security, 159–172. verify(Mspoiled , key) = 0. Zhu, R.; Zhang, X.; Shi, M.; and Tang, Z. 2020. Secure neu- ral network watermarking protocol against forging attack. For the backdoor-based watermarking schemes, key is EURASIP Journal on Image and Video Processing, 2020(1): uniquely corelated with a collection of labelled triggers 1–12. N {tn , yn }n=1 . The spoil attack is tantamount to fitting the watermarked model on the same triggers with adversarially A A: Derivation for the umambiguity shuffled labels. For the weight-based watermarking schemes, key reveals condition the places where information is hidden. So the adversary 0 key To formulate this intuition, consider the event where DWM only has to replace these parameters (which is usually a key small part of the entire model) with random values. shares q · N terms with DWM , q ∈ (0, 1). With a pseudo- random generator, it is computationally impossible to dis- For hybrid white-box watermarking schemes with a com- tinguish π key from a sequence of N randomly selected inte- plex verify module such as MTLSign, the adversary has gers. The same argument holds for lkey and a random binary to tune the watermarking branch to fit shuffled labels with string of length N . Therefore the probability of this event the backend fixed. The loss function to be minimized can be can be upper bounded by: written as: X N r qN L(Wbackbone ) = lWM (y 0 , cWM (M (x|Wbackbone ))), (1−q)N ·rqN ·(1 − r) ≤ (1 + (1 − q)N ) , key (x,y)∈DWM qN 1−r N where r = 2m+1 1 . For an arbitrary q, let r < 2+(1−q)N then in which y 0 is a randomly assigned label independent from 0 y. This attack usually results in a large-scale shift of the pa- key key the probability that DWM overlaps with DWM with a portion rameters within the backbone DNN. If the adversary cannot of q declines exponentially. properly fine-tune the model afterward (which is always the For numbers not appeared in π key , the watermarking case in practice since otherwise the adversary would have branch is expected to output a random guess. Therefore if already acquired enough data and can train its DNN from key0 scratch) then the DNN’s SOTA performance is at risk as q is smaller than a threshold τ then DWM can hardly pass the statistical test in Algo. 1 with N big enough. So let demonstrated in the empirical studies. m ≥ log2 [2N (2 + (1 − τ )N )] and N be large enough would make an effective collision in the watermark dataset almost impossible. For simplicity, setting m = 3 · dlog2 (N )e ≥ log2 (N 3 ) is sufficient. To select the threshold γ, assume that the random guess strategy achieves an average accuracy of at most p = 0.5 + α(N ), where α is a negligible function. The verification pro- cess returns 1 iff the watermark classifier achieves binary classification of accuracy no less than γ. The demand for se- curity is that by randomly guessing, the probability that an adversary passes the test declines exponentially with n. Let X denote the number of correct guesses with average accu- racy p, an adversary succeeds only if X ≥ γ · N . By the Chernoff theorem: N 1 − p + p · eλ Pr {X ≥ γ · N } ≤ , eγ·λ where λ is an arbitrary nonnegative number.If γ is larger λ than p by a constant independent of N then 1−p+p·e eγ·λ is less than unity with proper λ, reducing the probability of a successful attack into negligibility. B B: Implementation of the spoil attacks During the spoil attack, the adversary has full knowledge of key, verify, and has obtained MWM . The adversary’s