<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Interpretable and Robust Face Verification</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Preetam</forename><surname>Prabhu</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Srikar</forename><surname>Dammu</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">TCS Research</orgName>
								<orgName type="institution" key="instit2">Tata Consultancy Services Ltd</orgName>
								<address>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Srinivasa</forename><surname>Rao Chalamala</surname></persName>
							<email>srinivas.chalamala@research.iiit.ac.in</email>
							<affiliation key="aff1">
								<orgName type="institution">International Institute of Information Technology</orgName>
								<address>
									<settlement>Hyderabad</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ajeet</forename><forename type="middle">Kumar</forename><surname>Singh</surname></persName>
							<email>ajeetk.singh1@tcs.com</email>
							<affiliation key="aff0">
								<orgName type="institution" key="instit1">TCS Research</orgName>
								<orgName type="institution" key="instit2">Tata Consultancy Services Ltd</orgName>
								<address>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yegnanarayana</forename><surname>Bayya</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">International Institute of Information Technology</orgName>
								<address>
									<settlement>Hyderabad</settlement>
									<country key="IN">India</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Interpretable and Robust Face Verification</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">E5FF04431867D97C96D0726EDFA12219</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T02:59+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Face Verification</term>
					<term>Interpretability</term>
					<term>Adversarial Robustness</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Advances in deep learning have been instrumental in enhancing the performance of face verification systems. Despite their ability to attain high accuracy, most of these systems fail to provide interpretations of their decisions. With the increased demands in making deep learning models more interpretable, numerous post-hoc methods have been proposed to probe the workings of these systems. Yet, the quest for face verification systems that inherently provide interpretations still remains largely unexplored. Additionally, most of the existing face recognition models are highly susceptible to adversarial attacks. In this work, we propose a face verification system which addresses the issue of interpretability by employing modular neural networks. In this, representations for each individual facial parts such as nose, mouth, eyes etc. are learned separately. We also show that our method is significantly more resistant to adversarial attacks, thereby addressing another crucial weakness concerning deep learning models.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Over the last decade, many deep learning methods for face verification have been proposed, a few of them have even surpassed human performance <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4]</ref>. These deep learning methods, while enabling exceptional performance does not provide reasoning for their predictions. Blindly relying on the results of these black boxes without interpreting the reasons for their decisions could be detrimental especially in critical applications related to medical, financial, and security domains.</p><p>In the context of image recognition, various methods have been proposed to tackle interpretability by attempting to reason why an object has been recognized in a particular way. LRP <ref type="bibr" target="#b4">[5]</ref>, Grad-CAM <ref type="bibr" target="#b5">[6]</ref>, LIME <ref type="bibr" target="#b6">[7]</ref> have been used widely to highlight regions of the image that the models look at for arriving at the final prediction. Despite the existence of several ways post hoc interpretability methods, it is desirable to have a system that is inherently capable of producing interpretations of its decisions. When the latent features generated by the system represent a logical part of an object, it is convenient to infer the contributions of these features to the final prediction.</p><p>Though most of the interpretability method procure heatmaps highlighting the regions that contribute to the decision process of the models, in some applications it is still difficult to understand these heatmaps as they are generated at a pixel-level. If these heatmaps can highlight logical visual concepts in the images then it would be more convenient to interpret. (Please refer Figure. <ref type="bibr" target="#b6">7</ref> and Section. 5.2).</p><p>Another significant drawback of deep learning models is their susceptibility to adversarial attacks. Seemingly insignificant noise which is imperceptible to the human eye can fool deep learning models. Numerous black box and white box adversarial attack methods have been proposed in the literature <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10]</ref>.</p><p>The problem of detecting and defending adversarial attacks on deep learning models is still largely unsolved. As these attacks on face verification systems pose a serious security threat, it is imperative to develop trustworthy systems. Our motivation behind this work is to integrate both robustness to attacks as well as interpretability into face verification systems.</p><p>Hence, in this work, we propose a face verification system that addresses the aforementioned issues by learning independent latent representations of high-level facial features. The proposed method generates intuitive and easily understood heatmaps on the fly, and is also shown to be much more robust against adversarial examples.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Face recognition is a non-invasive biometric authentication mechanism and has been in commercial use for several years. It has become one of the preferred choice of authentication for mobile device users as it easy to use and avoids the need of remembering passwords. Though people have some reservations against using face recog-nition on large scale systems due to privacy issues, it continues to be one of the widely used technologies for identification.</p><p>Deep learning based face recognition has surpassed hand crafted feature-based systems and shallow learning systems in performance. In <ref type="bibr" target="#b1">[2]</ref>, the authors proposed a deep learning architecture called VGGFace for generating facial feature representations or face embeddings. These face embeddings can be further used for identifying the person using a similarity measure or a classifier. DeepID2 <ref type="bibr" target="#b10">[11]</ref> uses a Bayesian learning framework for learning metrics for face recognition. In FaceNet <ref type="bibr" target="#b11">[12]</ref> authors proposed a compact embedding learned directly from images using triplet-loss for face verification. Different loss functions that maximizes intra-class similarity and improves discriminability for faces have been proposed ArcFace <ref type="bibr" target="#b13">[13]</ref>, CosFace <ref type="bibr" target="#b14">[14]</ref>, SphereFace <ref type="bibr" target="#b15">[15]</ref>, CoCo Loss <ref type="bibr" target="#b16">[16]</ref>.</p><p>Existing face recognition models are extremely vulnerable to adversarial attacks even in black-box setting, which raises security concerns and the requisite for developing more robust face recognition models. Adversarial attacks <ref type="bibr" target="#b17">[17,</ref><ref type="bibr" target="#b18">18,</ref><ref type="bibr" target="#b19">19]</ref> involve additive small, imperceptible and carefully crafted perturbations to the input with the aim of fooling machine learning models. Adversarial attacks allow an attacker to evade detection or recognition or to impersonate another person. <ref type="bibr" target="#b20">[20]</ref> described a method to realize adversarial attacks by introducing a pair of eye glasses. These glasses could be used to evade detection or to impersonate others. Another approach for fooling ArcFace using adversarial patches has been proposed in <ref type="bibr" target="#b21">[21]</ref>. In <ref type="bibr" target="#b22">[22]</ref>, the authors have proposed an approach for detecting adversarial attacks on faces.</p><p>Understanding and interpreting the decisions of machine learning systems is of high importance in many applications, as it allows verifying the reasoning of the system and provides information to the human expert or end-user. Early works include direct visualization of the filters <ref type="bibr" target="#b23">[23]</ref>, deconvolutional networks to reconstruct inputs from different layers <ref type="bibr" target="#b24">[24]</ref>.</p><p>Numerous interpretability methods have been proposed in the literature, some of the widely known ones are Layer-wise Relevance Propagation (LRP) <ref type="bibr" target="#b4">[5]</ref>, Gradient-weighted Class Activation Mapping (Grad-CAM) <ref type="bibr" target="#b25">[25]</ref>, Grad-CAM++ <ref type="bibr" target="#b26">[26]</ref>, SHapley Additive ex-Planations (SHAP) values <ref type="bibr" target="#b27">[27]</ref> and Local Interpretable Model-Agnostic Explanations (LIME) <ref type="bibr" target="#b6">[7]</ref>. Most of these techniques attempt to provide pixel-level explanations to indicate the contribution of each pixel to the classification decision. However, these methods are mostly suitable for tasks such as object recognition where the deep learning models only take a single input image.</p><p>Recently, a few methods that attempt to explain the behavior and decisions of face recognition systems have emerged <ref type="bibr" target="#b28">[28,</ref><ref type="bibr" target="#b29">29,</ref><ref type="bibr" target="#b30">30,</ref><ref type="bibr" target="#b31">31,</ref><ref type="bibr" target="#b32">32]</ref>. In <ref type="bibr" target="#b28">[28]</ref>, the authors rely on controlled degradations using inpainting to generate explanations. In <ref type="bibr" target="#b29">[29]</ref>, visual psychophysics was used to probe and study the behavior of face recognition systems. In <ref type="bibr" target="#b30">[30]</ref>, the authors propose a loss function that introduces interpretability to the face verification model through training. In <ref type="bibr" target="#b31">[31]</ref>, the authors use 3D modeling to visualize and understand how the model represents the information of face images. Fooling techniques <ref type="bibr" target="#b32">[32]</ref> have also been used for gaining insights on facial regions that contribute more to the decision.</p><p>The recently developed explainability methods for face recognition are considerably different from one another in their approach and form of explanations, unlike saliency methods for object recognition which generate similar form of explanations. Each of these methods have their own pros and cons and are suitable for different purposes. We believe our method has certain characteristics that are well-suited for real world applications: easily interpretable feature level explanations, on-the-fly explanations for every prediction, structurally interpretable model architecture, provides feedback in real time and more importantly robust towards adversarial attacks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Interpretable and Robust Face Verification System</head><p>Modular neural networks (MNN) <ref type="bibr" target="#b33">[33]</ref> are a class of composite neural networks that were inspired by the biological modularity of the human brain. MNNs are composed of independent neural networks that serve as modules, each of them specializing in a specific task. MNNs are inherently more interpretable than monolithic neural networks due to their architecture and divide-and-conquer methodology. MNNs also intrinsically introduce structural interpretability due to their modular structure. Studies have shown that MNNs are better at handling noise than monolithic networks <ref type="bibr" target="#b33">[33]</ref>. Several defense mechanisms against adversarial attacks have been proposed in the literature, some of which have employed deep generative models <ref type="bibr" target="#b34">[34,</ref><ref type="bibr" target="#b35">35]</ref>. One of the main motivations for using generative models is their capability of representing information in a lower-dimensional latent space retaining only the most salient features <ref type="bibr" target="#b36">[36]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Model</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1.">Model Composition Overview</head><p>In the proposed MNN architecture, we allocate dedicated modules for eyes, nose, mouth and one for the rest of the features. We employ autoencoders to learn separate and distinct latent representations for different facial features.</p><p>To achieve this, we mask the input image to retain only the region of interest of that specific module and present it as the target image (See Fig. <ref type="figure" target="#fig_0">1</ref>). After the autoencoders have been trained, we retain the encoder and substitute the decoder with Siamese networks in all of the modules, resulting in Modular Siamese Networks (MSN) (See Fig. <ref type="figure" target="#fig_1">2</ref>).</p><p>In the task of face verification, a pair of images is given as input, which could be either a valid pair or an impostor pair. In the proposed MSN architecture, disentangled embeddings of facial features are generated for both of the input images by the feature extracting encoders present in each feature specific module. These feature embedding pairs are then fed to the Siamese networks present in each module which compute the 𝐿1 distance vectors for each of the twin feature latent embeddings pairs, similar to the method followed in <ref type="bibr" target="#b37">[37]</ref>. The distance vectors from all of the modules are then concatenated and fed to a common decision network which makes the final prediction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2.">Feature-extracting Autoencoders</head><p>In this work, we employ undercomplete autoencoders <ref type="bibr" target="#b36">[36]</ref>, a type of autoencoder which has a latent dimension lower than the input dimension. Undercomplete autoencoders are trained to reconstruct the original image as accurately as possible while constricting the latent space to a sufficiently small dimension to ensure that only the most salient features are retained in the encoded latent vectors. To achieve our task of extracting feature specific latent vectors, we use a novel technique. In this technique, instead of giving a full image as the target, we mask the input image and retain only a part of the image containing the feature of interest and produce it as the target image. Consequently, the autoencoder learns a latent representation containing important information about the feature and restores only the required part of the image (See Fig. <ref type="figure" target="#fig_0">1</ref>, examples in 3.2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.3.">Siamese Networks</head><p>Siamese networks have achieved great results in image verification <ref type="bibr" target="#b37">[37,</ref><ref type="bibr" target="#b38">38]</ref>. The two Siamese twin networks share the same weights and parameters. The hypothesis behind this architecture is that if the inputs 𝑥1 and 𝑥2 are similar, then the distance between the output vectors ℎ1 and ℎ2 will be less. The network is trained in such a way that it maximizes the distance between mismatched pairs and minimizes the distance between matched pairs. Loss functions like contrastive loss <ref type="bibr" target="#b39">[39]</ref> and triplet loss <ref type="bibr" target="#b40">[40]</ref> can be used to achieve this task, few improvised versions of these loss functions have also been proposed in the literature <ref type="bibr" target="#b41">[41,</ref><ref type="bibr" target="#b42">42]</ref>.</p><p>In our model, we employ Siamese networks for discriminating between feature specific latent vectors of impostors and valid pairs. The latent vectors 𝑥1 and 𝑥2 are obtained from the feature-extracting autoencoders described in 3.1.2. L1 distance vectors are computed from the output vectors ℎ1 and ℎ2 obtained from the Siamese twins for each module. The distance vectors of all of the modules are then concatenated and given as input to the decision network (See Fig. <ref type="figure" target="#fig_1">2</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.4.">Decision Network</head><p>The decision network is a feed-forward fully connected network that takes the concatenated input from all of the modules. This network enables us to incorporate information from all of the modules to predict the final decision.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.5.">Model Architectural Details</head><p>The model architecture and training setting described in <ref type="bibr" target="#b43">[43]</ref> were used for training the feature extracting autoencoders. The Siamese networks consist of four fully connected layers with ELU activation functions. The final decision network that takes the concatenated distance vectors from the modules has two fully connected layers with ReLU activation functions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Training details</head><p>The training of the proposed MSN is carried out in 3 training phases. In the first phase, the feature extracting autoencoders are trained with perceptual loss <ref type="bibr" target="#b43">[43]</ref>. In the next phase, the decoder parts in each of the modules are replaced with the Siamese network and trained using the triplet loss, freezing the layers trained in the previous phase. Finally, the decision network is trained using Binary Cross-Entropy (BCE). The Adam optimization technique <ref type="bibr" target="#b44">[44]</ref> was used for training the network in all of the three training phases.</p><p>From Fig. <ref type="figure" target="#fig_2">3</ref>, 4, 5 and 6, we observe that the feature extracting autoencoders are able to generate high quality reconstructions of the intended facial feature. Once training is complete, the autoencoders take unmasked full images as input and reconstruct only the required facial region by incorporating relevant information of that facial feature into the latent feature vector.</p><p>The subnetworks can be trained in parallel as they are independent of each other. Once the training is complete, we obtain a complete end-to-end face verification system.    </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Interpretability in Modular</head><p>Siamese Networks</p><p>The proposed system generates inherently feature-level heatmaps that are intuitive and easily interpreted, as humans naturally observe the similarity of high-level visual concepts instead of pixels. Each subnetwork of the MSN generates a distance measure that reflects the visual similarity of the features. This is achieved by computing the euclidean distance between the twin output vectors produced by the Siamese networks for each module representing a certain feature. Using these distance measures, a pairwise heatmap incorporating the similarity or dissimilarity of the features is generated and overlayed on both of the images. As can be seen in Fig. <ref type="figure" target="#fig_6">7</ref>, the proposed system is able to effectively localize the similarities and dissimilarities of features in a pair of images. These heatmaps could be used as a tool for understanding the decisions taken by the verification system (Refer section 5.2).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Experimental Results</head><p>The face verification system was trained on the VG-GFace2 dataset <ref type="bibr" target="#b46">[46]</ref> and evaluated on Labeled Faces in the Wild (LFW) dataset <ref type="bibr" target="#b47">[47]</ref>. For reporting performance, we use 10-fold cross validation using the splits defined by LFW protocol which serves as a benchmark for comparison <ref type="bibr" target="#b47">[47]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Verification</head><p>The accuracies of the individual modules and the proposed MSN model have been presented in Table <ref type="table" target="#tab_0">1</ref>. The accuracies for individual modules have been calculated by finding the optimum distance threshold that maximizes accuracy.</p><p>No. Model Accuracy Accuracies of modular siamese network and sub-modules.</p><p>We observe that the eyes module outperforms other modules, indicating that it could be the most discriminating feature. The accuracy of MSN is 98.5% which is comparable to the SOTA accuracies that have been reported in the literature which are greater than 99%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Feature-level Heatmaps</head><p>Feature-level heatmaps are intuitive and easily interpretable as humans, unlike computers, look at features as whole and not at pixels individually. The pairwise heatmaps that are inherently generated by the proposed method incorporate relative information taking both of the input images into consideration. The feature-wise euclidean distances computed by individual modules in MSN are used to generate the heatmaps. As can be seen in Figure <ref type="figure">.</ref> 7, features that look visually similar are colored blue and colored red when dissimilar in all of the images. For true positives, the heatmaps are indicating high similarity for features that are visually close, as expected. The system shows high dissimilarity between the nose regions of the first impostor pair in 5.b, which is in line with human perception as their shapes are significantly different. Studying when the system fails could be helpful, since these visual cues may help rectify the workings of the system. In the first pair of 5.c, we observe that both of the persons wearing eye glasses caused the eyes module to assign low distance score and when accompanied another similar looking feature resulted in misclassification. The heatmap of the second pair of 5.c demonstrates how spectacles and similar looking facial hair fooled the system. The heatmaps in 5.d illustrate how closing eyes and significant difference in pose can affect the verification. In the first pair, the same person closing eyes in one of the images made the eyes module to compute a high distance score. In the second, significantly different pose which resulted in partial visibility of facial features in one of the images led the system to predict high dissimilarity score.</p><p>Since these computations at feature level are carried out in live, the system could instantly generate meaningful messages that can help the user to correct any issues in case of a failure, like removing eye glasses or changing pose for better lighting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Performance under adversarial attacks</head><p>We tested the robustness and resistance of the proposed method against the widely known adversarial attacks such as the Fast Gradient Sign Method (FGSM) <ref type="bibr" target="#b7">[8]</ref>, DeepFool <ref type="bibr" target="#b48">[48]</ref> and FGSM in fast adversarial training (FFGSM) <ref type="bibr" target="#b49">[49]</ref>.</p><p>Assuming the first image in the two image pairs to be the test image, and the other one to be the anchor image, we attack only test image similar to the experiments conducted in the studies <ref type="bibr" target="#b50">[50,</ref><ref type="bibr" target="#b51">51]</ref>.  The proposed method has shown significantly higher robustness than FaceNet against all three adversarial at- For FGSM, the accuracy of FaceNet falls below 20% when 𝜖 is 0.05 while MSN is still close to 60% accurate (See Figure <ref type="figure">. 8</ref>). In the case of DeepFool attack, we notice a sharp drop of accuracy to below 10% on step 2 in facenet, while MSN shows a lot more resilience by being more than 70% accurate. Similarly for FFGSM, accuracy of FaceNet drops to just above 30% while MSN has an accuracy still above 60% at 𝜖 equals 0.03. In all of these attacks, we notice that individual modules are noticably more resistant. Since MSN makes the final prediction based on these functionally independent modules, it consequently inherits its robustness from them.</p><p>The enhanced robustness could be attributed to the fault tolerant nature of MNN <ref type="bibr" target="#b52">[52,</ref><ref type="bibr" target="#b33">33]</ref>. Additionally, the encoders used for extracting feature specific latent representations are trained to retain only the most salient features because of the bottleneck latent layer and as a result, they may be able to provide some immunity against noise or perturbations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion and Future Work</head><p>Numerous face verification methods have been proposed in the literature, most of which focus solely on improving the performance. Consequently, super-human accuracy has already been achieved in face verification. The real need for improvement in this domain is in the areas of robustness, explainability and fairness. The most important attribute of the proposed method is that it is both robust to adversarial attacks and inherently interpretable. To the best of our knowledge, there is no other published method for face verification that provides both of these qualities at the same time. We believe that pursuing this direction is essential for developing more trustworthy systems.</p><p>Having the interpretations of predictions or decisions while they are being taken by deep learning models could prove to be paramount in many applications. While posthoc interpretations might help in understanding the behavior of the model, they may not be of much help in generating real-time explanations. Incorporating interpretability to the system itself could allow us to handle human errors by enabling communication with the user, informing them of what went wrong and suggesting rectifications.</p><p>In this paper, we have presented a new technique to learn latent representations of high-level facial features. We proposed a modular face verification system that inherently generates interpretations of its decisions with the help of the learned feature-specific latent representations. The need and importance of having such a readily interpretable systems were discussed. Further, we have demonstrated that the proposed system a has higher resistance to adversarial examples.</p><p>In summary, we have introduced and validated a face verification system that: provides on-the-fly and easily interpretable feature level explanations, has structurally interpretable model architecture, is able to provide feedback in real time, and has increased robustness towards adversarial attacks.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Proposed feature specific latent representations encoding. Images are encoded to feature specific latent representations using feature extracting autoencoders. Reconstructions and corresponding target images are displayed on the right.</figDesc><graphic coords="3,150.77,84.19,291.69,171.03" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Proposed Modular Siamese Network. Image is initially disentangled by feature-specific encoders to obtain featurewise embedding pairs, then these embedding pairs are fed to Siamese networks which will compute the distance vectors. All of the distance vectors are then concatenated and fed to the decision network for final verification decision.</figDesc><graphic coords="4,109.11,84.19,375.02,151.17" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Reconstruction of eyes. (a) input image, (b) masked target image, (c) reconstructed image</figDesc><graphic coords="4,330.36,291.76,145.85,106.29" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Reconstruction of nose. (a) input image, (b) masked target image, (c) reconstructed image</figDesc><graphic coords="4,330.36,439.28,145.85,106.29" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Reconstruction of mouth. (a) input image, (b) masked target image, (c) reconstructed image</figDesc><graphic coords="5,117.03,84.19,145.85,106.29" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Reconstruction of remaining facial region. (a) is the input image, (b) is the masked target image, (c) is the reconstructed image</figDesc><graphic coords="5,117.03,231.71,145.85,106.29" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Demonstration of facial feature explanations: Each facial factor and its relevance to face verification. Green indicates similarity while red indicates dissimilarity. (a) True Positives (b) True Negatives (c) False Positives (d) False Negatives (e) Color map indicating dissimilarity. Best viewed in color. (Refer Section. 5.2)</figDesc><graphic coords="6,89.29,84.19,208.34,172.61" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head></head><label></label><figDesc>For comparison, we have considered the well-known FaceNet model which has report SOTA performance earlier. The results have been plotted in Figures. 8,</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head></head><label></label><figDesc>9 and 10.   </figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_9"><head>Figure 8 :Figure 9 :Figure 10 :</head><label>8910</label><figDesc>Figure 8: Robustness of proposed approach against FGSM Attack. (IFV: Interpretable and Robust Face Verification system (proposed method))</figDesc><graphic coords="6,319.94,84.19,166.67,120.44" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell>1.</cell><cell>Module 1 -Eyes</cell><cell>80.8%</cell></row><row><cell>2.</cell><cell>Module 2 -Nose</cell><cell>73.2%</cell></row><row><cell>3.</cell><cell>Module 3 -Mouth</cell><cell>74.5%</cell></row><row><cell>4.</cell><cell>Module 4 -Rest</cell><cell>78.3%</cell></row><row><cell>5.</cell><cell>Modular Siamese Network</cell><cell>98.5%</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Attribute and simile classifiers for face verification</title>
		<author>
			<persName><forename type="first">N</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Berg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">N</forename><surname>Belhumeur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Nayar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 12th international conference on computer vision</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2009">2009. 2009</date>
			<biblScope unit="page" from="365" to="372" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Deep Face Recognition</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">M</forename><surname>Parkhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vedaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
		<idno type="DOI">10.5244/C.29.41</idno>
		<ptr target="http://www.bmva.org/bmvc/2015/papers/paper041/index.html.doi:10.5244/C.29.41" />
	</analytic>
	<monogr>
		<title level="m">Procedings of the British Machine Vision Conference 2015</title>
				<meeting>edings of the British Machine Vision Conference 2015</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page">12</biblScope>
		</imprint>
	</monogr>
	<note>British Machine Vision Association</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Facenet: A unified embedding for face recognition and clustering</title>
		<author>
			<persName><forename type="first">F</forename><surname>Schroff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kalenichenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Philbin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition</title>
				<meeting>the IEEE conference on computer vision and pattern recognition</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="815" to="823" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Additive margin softmax for face verification</title>
		<author>
			<persName><forename type="first">F</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Signal Processing Letters</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="page" from="926" to="930" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">On pixel-wise explanations for non-linear classifier decisions by layerwise relevance propagation</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Binder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Montavon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Klauschen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">R</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Samek</surname></persName>
		</author>
		<idno type="DOI">10.1371/journal.pone.0130140</idno>
	</analytic>
	<monogr>
		<title level="j">PLoS ONE</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">e0130140</biblScope>
			<date type="published" when="2015">2015</date>
			<publisher>Public Library of Science</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">R</forename><surname>Selvaraju</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cogswell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Vedantam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Parikh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Batra</surname></persName>
		</author>
		<ptr target="http://gradcam.cloudcv.org" />
	</analytic>
	<monogr>
		<title level="m">ICCV</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="618" to="626" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">why should i trust you?&quot; explaining the predictions of any classifier</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Ribeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Guestrin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining</title>
				<meeting>the 22nd ACM SIGKDD international conference on knowledge discovery and data mining</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1135" to="1144" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">J</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Shlens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Szegedy</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1412.6572</idno>
		<title level="m">Explaining and harnessing adversarial examples</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">One pixel attack for fooling deep neural networks</title>
		<author>
			<persName><forename type="first">J</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">V</forename><surname>Vargas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sakurai</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Evolutionary Computation</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page" from="828" to="841" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Towards evaluating the robustness of neural networks</title>
		<author>
			<persName><forename type="first">N</forename><surname>Carlini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wagner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2017 ieee symposium on security and privacy (sp)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="39" to="57" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<author>
			<persName><forename type="first">Y</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Tang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1406.4773v1</idno>
	</analytic>
	<monogr>
		<title level="m">Deep Learning Face Representation by Joint Identification-Verification</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">FaceNet: A unified embedding for face recognition and clustering</title>
		<author>
			<persName><forename type="first">F</forename><surname>Schroff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kalenichenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Philbin</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2015.7298682</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE Computer Society Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="2015-06">June. 2015</date>
			<biblScope unit="page" from="815" to="823" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title/>
		<idno type="DOI">10.1109/CVPR.2015.7298682</idno>
		<idno type="arXiv">arXiv:1503.03832</idno>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Arcface: Additive angular margin loss for deep face recognition</title>
		<author>
			<persName><forename type="first">J</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Xue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zafeiriou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE/CVF Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="4690" to="4699" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Cosface: Large margin cosine loss for deep face recognition</title>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Ji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Gong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="5265" to="5274" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Sphereface: Deep hypersphere embedding for face recognition</title>
		<author>
			<persName><forename type="first">W</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Raj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Song</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition</title>
				<meeting>the IEEE conference on computer vision and pattern recognition</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="212" to="220" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Rethinking Feature Discrimination and Polymerization for Large-scale Recognition</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<ptr target="http://arxiv.org/abs/1710.00870.arXiv:1710.00870" />
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">S.-M</forename><surname>Moosavi-Dezfooli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Fawzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Fawzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Frossard</surname></persName>
		</author>
		<ptr target="http://arxiv.org/abs/1610.08401.arXiv:1610.08401" />
		<title level="m">Universal adversarial perturbations</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Explaining and harnessing adversarial examples</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">J</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Shlens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Szegedy</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1412.6572</idno>
	</analytic>
	<monogr>
		<title level="m">3rd International Conference on Learning Representations, ICLR 2015 -Conference Track Proceedings, International Conference on Learning Representations</title>
				<imprint>
			<publisher>ICLR</publisher>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Intriguing properties of neural networks</title>
		<author>
			<persName><forename type="first">C</forename><surname>Szegedy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zaremba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bruna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Erhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1312.6199</idno>
	</analytic>
	<monogr>
		<title level="m">2nd International Conference on Learning Representations, ICLR 2014 -Conference Track Proceedings, International Conference on Learning Representations</title>
				<imprint>
			<publisher>ICLR</publisher>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Reiter, Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sharif</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bhagavatula</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">K</forename></persName>
		</author>
		<idno type="DOI">10.1145/2976749.2978392</idno>
		<ptr target="http://dl.acm.org/citation.cfm?doid=2976749.2978392.doi:10.1145/2976749.2978392" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACM Conference on Computer and Communications Security</title>
				<meeting>the ACM Conference on Computer and Communications Security<address><addrLine>New York, New York, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2016-10">October-2016. 2016</date>
			<biblScope unit="page" from="1528" to="1540" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">On adversarial patches: real-world attack on arcface-100 face recognition system</title>
		<author>
			<persName><forename type="first">M</forename><surname>Pautov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Melnikov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kaziakhmedov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kireev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Petiushko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2019 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIR-CON)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="391" to="0396" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Adversarial attacks on face detectors using neural net based constrained optimization</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Bose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Aarabi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 20th International Workshop on Multimedia Signal Processing (MMSP), IEEE</title>
				<imprint>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Visualizing and understanding convolutional networks</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Zeiler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-10590-1_53</idno>
		<idno type="arXiv">arXiv:1311.2901</idno>
	</analytic>
	<monogr>
		<title level="s">Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics</title>
		<imprint>
			<biblScope unit="volume">8689</biblScope>
			<biblScope unit="page" from="818" to="833" />
			<date type="published" when="2014">2014</date>
			<publisher>Springer Verlag</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Adaptive deconvolutional networks for mid and high level feature learning</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Zeiler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">W</forename><surname>Taylor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Fergus</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICCV.2011.6126474</idno>
		<ptr target="http://ieeexplore.ieee.org/document/6126474/.doi:10.1109/ICCV.2011.6126474" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE International Conference on Computer Vision</title>
				<meeting>the IEEE International Conference on Computer Vision</meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="2018" to="2025" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Grad-cam: Visual explanations from deep networks via gradient-based localization</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">R</forename><surname>Selvaraju</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cogswell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Das</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Vedantam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Parikh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Batra</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE International Conference on Computer Vision</title>
				<meeting>the IEEE International Conference on Computer Vision</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="618" to="626" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Grad-cam++: Generalized gradientbased visual explanations for deep convolutional networks</title>
		<author>
			<persName><forename type="first">A</forename><surname>Chattopadhay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sarkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Howlader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">N</forename><surname>Balasubramanian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Winter Conference on Applications of Computer Vision (WACV)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="page" from="839" to="847" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">A unified approach to interpreting model predictions</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Lundberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-I</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="4765" to="4774" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Explainable face recognition</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Williford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">B</forename><surname>May</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Byrne</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computer Vision -ECCV 2020</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Vedaldi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Bischof</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Brox</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J.-M</forename><surname>Frahm</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="248" to="263" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Visual psychophysics for making face recognition algorithms more explainable</title>
		<author>
			<persName><forename type="first">B</forename><surname>Richardwebster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">Y</forename><surname>Kwon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Clarizio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">E</forename><surname>Anthony</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">J</forename><surname>Scheirer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computer Vision -ECCV 2018</title>
				<editor>
			<persName><forename type="first">V</forename><surname>Ferrari</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Hebert</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Sminchisescu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Weiss</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="263" to="281" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Towards interpretable face recognition</title>
		<author>
			<persName><forename type="first">B</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Tran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE International Conference on Computer Vision</title>
				<meeting>the IEEE International Conference on Computer Vision</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="9348" to="9357" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">G</forename><surname>Garrod</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">H</forename><surname>Torr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-C</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Ince</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">G</forename><surname>Schyns</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1811.07807</idno>
		<title level="m">Deeper interpretability of deep networks</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Enhancing human face recognition with an interpretable neural network</title>
		<author>
			<persName><forename type="first">T</forename><surname>Zee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Gali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Nwogu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops</title>
				<meeting>the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Modularity -a concept for new neural network architectures</title>
		<author>
			<persName><forename type="first">A</forename><surname>Schmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Bandar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. IASTED International Conference on Computer Systems and Applications</title>
				<meeting>IASTED International Conference on Computer Systems and Applications<address><addrLine>Irbid, Jordan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1998">1998. 1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Puvae: A variational autoencoder to purify adversarial examples</title>
		<author>
			<persName><forename type="first">U</forename><surname>Hwang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yoon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">I</forename><surname>Cho</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="126582" to="126593" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Samangouei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kabkab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Chellappa</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1805.06605</idno>
		<title level="m">Defensegan: Protecting classifiers against adversarial attacks using generative models</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b36">
	<monogr>
		<author>
			<persName><forename type="first">I</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Courville</surname></persName>
		</author>
		<title level="m">Deep learning</title>
				<imprint>
			<publisher>MIT press</publisher>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Siamese neural networks for one-shot image recognition</title>
		<author>
			<persName><forename type="first">G</forename><surname>Koch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zemel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Salakhutdinov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICML deep learning workshop</title>
				<meeting><address><addrLine>Lille</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">2</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<monogr>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">M</forename><surname>Parkhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vedaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
		<title level="m">Deep face recognition</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<analytic>
		<title level="a" type="main">Dimensionality reduction by learning an invariant mapping</title>
		<author>
			<persName><forename type="first">R</forename><surname>Hadsell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chopra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lecun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR&apos;06)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2006">2006. 2006</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="1735" to="1742" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<analytic>
		<title level="a" type="main">Large scale online learning of image similarity through ranking</title>
		<author>
			<persName><forename type="first">G</forename><surname>Chechik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Sharma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Shalit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="1109" to="1135" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b41">
	<analytic>
		<title level="a" type="main">Beyond triplet loss: a deep quadruplet network for person re-identification</title>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Huang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="403" to="412" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b42">
	<analytic>
		<title level="a" type="main">Person re-identification by multi-channel parts-based cnn with improved triplet loss function</title>
		<author>
			<persName><forename type="first">D</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Zheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the iEEE conference on computer vision and pattern recognition</title>
				<meeting>the iEEE conference on computer vision and pattern recognition</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1335" to="1344" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b43">
	<analytic>
		<title level="a" type="main">Deep feature consistent variational autoencoder</title>
		<author>
			<persName><forename type="first">X</forename><surname>Hou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Qiu</surname></persName>
		</author>
		<idno type="DOI">10.1109/WACV.2017.131</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE Winter Conference on Applications of Computer Vision (WACV)</title>
				<imprint>
			<date type="published" when="2017">2017. 2017</date>
			<biblScope unit="page" from="1133" to="1141" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b44">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ba</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1412.6980</idno>
		<title level="m">Adam: A method for stochastic optimization</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b45">
	<analytic>
		<title level="a" type="main">Joint face detection and alignment using multitask cascaded convolutional networks</title>
		<author>
			<persName><forename type="first">K</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Qiao</surname></persName>
		</author>
		<idno type="DOI">10.1109/LSP.2016.2603342</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Signal Processing Letters</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page" from="1499" to="1503" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b46">
	<analytic>
		<title level="a" type="main">Vggface2: A dataset for recognising faces across pose and age</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">M</forename><surname>Parkhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">13th IEEE International Conference on Automatic Face &amp; Gesture Recognition (FG 2018)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="page" from="67" to="74" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b47">
	<monogr>
		<title level="m" type="main">Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">B</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ramesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Berg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Learned-Miller</surname></persName>
		</author>
		<idno>07-49</idno>
		<imprint>
			<date type="published" when="2007">2007</date>
		</imprint>
		<respStmt>
			<orgName>University of Massachusetts, Amherst</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b48">
	<analytic>
		<title level="a" type="main">Deepfool: A simple and accurate method to fool deep neural networks</title>
		<author>
			<persName><forename type="first">S</forename><surname>Moosavi-Dezfooli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Fawzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Frossard</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.2016.282</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<imprint>
			<date type="published" when="2016">2016. 2016</date>
			<biblScope unit="page" from="2574" to="2582" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b49">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Wong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rice</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">Z</forename><surname>Kolter</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2001.03994</idno>
		<title level="m">Fast is better than free: Revisiting adversarial training</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b50">
	<analytic>
		<title level="a" type="main">Exploiting the inherent limitation of l0 adversarial examples</title>
		<author>
			<persName><forename type="first">F</forename><surname>Zuo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Zeng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">22nd International Symposium on Research in Attacks, Intrusions and Defenses</title>
				<meeting><address><addrLine>{RAID}</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="page" from="293" to="307" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b51">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Kulkarni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Abubakar</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1805.01431</idno>
		<title level="m">Siamese networks for generating adversarial examples</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b52">
	<analytic>
		<title level="a" type="main">Modular neural networks: a survey</title>
		<author>
			<persName><forename type="first">G</forename><surname>Auda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kamel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Neural Systems</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="129" to="151" />
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
