Multi-script Off-line Signature Verification: A Two Stage Approach Srikanta Pal Umapada Pal Michael Blumenstein School of Information and Computer Vision and Pattern School of Information and Communication Technology, Recognition Unit, Indian Statistical Communication Technology, Griffith University, Gold Coast Institute, Kolkata, India, Griffith University, Gold Coast, Australia, Email: Email: umapada@isical.ac.in Australia, Email: srikanta.pal@griffithuni.edu.au m.blumenstein@griffith.edu.au Abstract—Signature identification and verification are of great been devoted to the task of multi-script signature importance in authentication systems. The purpose of this verification. Very few published papers involving multi- paper is to introduce an experimental contribution in the script signatures, including non-English signatures, have direction of multi-script off-line signature identification and been communicated in the field of signature verification. verification using a novel technique involving off-line English, Pal et al. [5] introduced a signature verification system Hindi (Devnagari) and Bangla (Bengali) signatures. In the first evaluation stage of the proposed signature verification employing Hindi Signatures. The direction of the paper was technique, the performance of a multi-script off-line signature to present an investigation of the performance of a signature verification system, considering a joint dataset of English, verification system involving Hindi off-line signatures. In Hindi and Bangla signatures, was investigated. In the second that study, two important features such as: gradient feature, stage of experimentation, multi-script signatures were Zernike moment feature and SVM classifiers were identified based on the script type, and subsequently the employed. Encouraging results were obtained in this verification task was explored separately for English, Hindi investigation. In a different contribution by Pal et al. [6], a and Bangla signatures based on the identified script result. The multi-script off-line signature identification technique was gradient and chain code features were employed, and Support proposed. In that report, the signatures involving Bangla Vector Machines (SVMs) along with the Modified Quadratic Discriminate Function (MQDF) were considered in this (Bengali), Hindi (Devnagari) and English were considered scheme. From the experimental results achieved, it is noted for the signature script identification process. A multi-script that the verification accuracy obtained in the second stage of off-line signature identification and verification approach, experiments (where a signature script identification method involving English and Hindi signatures, was presented by was introduced) is better than the verification accuracy Pal et al. [7]. In that paper, the multi-script signatures were produced following the first stage of experiments. identified first on the basis of signature script type, and Experimental results indicated that an average error rate of afterward, verification experiments were conducted based 20.80% and 16.40% were obtained for two different types of on the identified script result. verification experiments. Development of a general multi-script signature Keywords—Biometrics; off-line signature verification; multi- script signature identification. verification system, which can verify signatures of all scripts, is very complicated. The verification accuracy in I. INTRODUCTION such multi-script signature environments will not be as Biometrics are the most widely used approaches for successful when compared to single script signature personal identification and verification. Among all of the verification [10]. To achieve the necessary accuracy for biometric authentication systems, handwritten signatures, a multi-script signature verification, it is important to identify pure behavioral biometric, have been accepted as an official signatures based on the type of script and then use an means to verify personal identity for legal purposes on such individual single script signature verification system for the documents as cheques, credit cards and wills [1]. identified script [10]. Based on this observation, in the In general, automated signature verification is divided into proposed system, the signatures of three different scripts are two broad categories: static (off-line) methods and dynamic separated to feed into the individual signature verification (on-line) methods [2], depending on the mode of system. On the other hand to get a comparative idea, multi- handwritten signature acquisition. If both the spatial as well script signature verification results on a joint English, Hindi as temporal information regarding signatures are available and Bangla dataset, without using any script identification, to the systems, verification is performed using on-line [3] is also investigated. data. In the case where temporal information is not available The remainder of this paper is organized as follows. The and the system can only utilize spatial information gleaned multi-script signature verification concept is described in through scanned or even camera-captured documents, Section II. Section III introduces the notable properties of verification is performed on off-line data [4]. Hindi and Bangla script. The Hindi, Bangla and English Considerable research has previously been undertaken in signature database used for the current research is described the area of signature verification, particularly involving in Section IV. Section V briefly presents the feature single-script signatures. On the other hand, less attention has extraction techniques employed in this work. The classifier details are described in Section VI. The experimental settings are presented in Section VII. Results and a III. PROPERTIES OF HINDI AND BANGLA SCRIPT discussion are provided in Section VIII. Finally, conclusions Most of the Indian scripts including Bangla and Devanagari and future work are discussed in Section IX. have originated from ancient Brahmi script through various II. MULTI-SCRIPT SIGNATURE VERIFICATION CONCEPT transformations and evolution [8]. Bangla and Devanagari are the two most accepted scripts in India. In both scripts, When a country deals with two or more scripts and the writing style is from left to right and there is no concept languages for reading and writing purposes, it is known as a of upper/lower case. These scripts have a complex multi-script and multi-lingual country. In India, there are composition of their constituent symbols. The scripts are officially 23 (Indian constitution accepted) languages and 11 recognizable by a distinctive horizontal line called the ‘head different scripts. line’ that runs along the top of full letters, and it links all the In such a multi-script and multi-lingual country like letters together in a word. Both scripts have about fifty India, languages are not only used for writing/reading basic characters including vowels and consonants. purposes but also applied for reasons pertaining to signing and signatures. In such an environment in India, the IV. DATABASE USED FOR EXPERIMENTATION signatures of an individual with more than one language A. Hindi and Bangla Signature Database (regional language and international language) are essentially needed in official transactions (e.g. in passport As there has been no public signature corpus available for Hindi and Bangla script, it was necessary to create a database application forms, examination question papers, money of Hindi and Bangla signatures. The Hindi and Bangla order forms, bank account application forms etc.). To deal signature databases used for experimentation consisted of 50 with these situations, signature verification techniques sets per script type. Each set consists of 24 genuine employing single-script signatures are not sufficient for signatures and 30 skilled forgeries. Some genuine signature consideration. Therefore in a multi-script and multi-lingual samples of Hindi and Bangla, with their corresponding scenario, signature verification methods considering more forgeries, are displayed in Table 1 and Table 2. than one script are necessarily required. Towards this direction of verification, the contribution of B. GPDS English Database this paper is twofold: First, multi-script signature Another database, consisting of 50 sets from GPDS-160 [9], verification considering joint datasets as shown in Figure 1, was also utilised for these experiments. Each signature set the second is identification of signatures based on script, of this corpus consists of 24 genuine signatures and 30 and subsequent verification for English, Hindi and Bangla simple forgeries. The reason 50 sets were used from the signatures based on the identified script result. A diagram of GPDS on this occasion, is due to the fact that the Bangla this second verification mode is shown in Figure 2. and Hindi datasets described previously were comprised of 50 sets each, and it was considered important to have Multi-script off-line Signatures (Signatures equivalent signature numbers for experimentation. of English, Hindi and Bangla) TABLE 1. SAMPLES OF HINDI GENUINE AND FORGED SIGNATURES Verification based on Multi-script Signatures Genuine Signatures Forged signatures Accuracy of Verification Figure 1. Diagram of signature verification considering a joint dataset. Multi-script Signatures (English, Hindi and Bangla) TABLE 2. SAMPLES OF BANGLA GENUINE AND FORGED SIGNATURES Genuine Signatures Forged signatures Signature Script Identification Signatures of Signatures of Signatures of English Script Hindi Script Bangla Script English Hindi Bangla Signature V. FEATURE EXTRACTION Signature Signature Verification Verification Verification Feature extraction is a crucial step in any pattern Figure 2. Diagram of multi-script signature identification recognition system. Two different types of feature and verification based on English, Hindi and Bangla signatures. extraction techniques such as: gradient feature extraction and the chain code feature are considered here. A. Computation of 576-dimensional gradient Features f ( x)    j x j  x  b 576-dimensional gradient features were extracted for this j research and experimentation, which are described in paper where {xj} are the set of support vectors and the parameters [7]. j and b have been determined by solving a quadratic B. 64-Dimensional Chain Code Feature Extraction problem [11]. The linear SVM can be extended to various non-linear variants; details can be found in [11, 12]. In these The 64-dimensional Chain Code feature is determined as proposed experiments, the Gaussian kernel SVM follows. In order to compute the contour points of a two- outperformed other non-linear SVM kernels, hence tone image, a 3 x 3 window is considered surrounding the identification/verification results based on the Gaussian object point. If any one of the four neighbouring points (as kernel are reported only. shown in Fig. 3 (a)) is a background point, then this object point (P) is considered as a contour point. Otherwise it is a B. MQDF Classifier non-contour point. The Modified Quadratic Discriminant Function is defined as The bounding box (minimum rectangle containing the follows [13]. character) of an input character is then divided into 7 x 7 blocks. In each of these blocks, the direction chain code for ( ) ( ) [ [‖ ‖ ∑ ]] each contour point is noted and the frequency of the direction codes is computed. Here, the chain code of four directions only [directions 1 (horizontal), 2 (45 degree ∑ ( ) slanted), 3 (vertical) and 4 (135 degree slanted)] is used. where X is the feature vector of an input character; M is a Four chain code directions are shown in Fig. 3 (b). It is mean vector of samples; is the ith eigen vector of the assumed that the chain code of directions 1 and 5, 2 and 6, 3 sample covariance matrix; is the ith eigen value of the and 7, 4 and 8, are the same. Thus, in each block, an array is sample covariance matrix; k is the number of eigen values obtained of four integer values representing the frequencies, considered here; n is the feature size; is the initial and those frequency values are used as features. Thus, for 7 estimation of a variance; N is the number of learning x 7 blocks, 7 x 7 x 4= 196 features are obtained. To reduce samples; and N0 is a confidence constant for s and N0 is the feature dimensions, after the histogram calculation into 7 considered as 3N/7 for experimentation. All the eigen values x 7 blocks, the blocks are down-sampled with a Gaussian and their respective eigen vectors are not used for filter into 4 x 4 blocks. As a result, 4 x 4 x 4 = 64 features classification. Here, the eigen values are stored in are obtained for recognition. To normalize the features, a descending order and the first 60 (k=60) eigen values and maximum value of the histograms from all the blocks, is their respective eigen vectors are used for classification. computed. Each of the above features is divided by this Compromising on trade-off between accuracy and maximum value to obtain the feature values between 0 and computation time, k was determined as 60. 1. VII. EXPERIMENTAL SETTINGS A. Settings for Verification used in 1st Stage of Experiments The skilled forgeries were not considered for training purposes. For experimentation, random signatures were (a) (b) considered for training purposes. For each signature set, an Figure 3. Eight neighbours (a) For a point P and its neighbours (b) For a point P the direction codes for its eight neighbouring points. SVM was trained with 12 randomly chosen genuine signatures. The negative samples for training (random VI. CLASSIFIER DETAILS signatures) were the genuine signatures of 149 other Based on these features, Support Vector Machines signature sets. Two signatures were taken from each set. In (SVMs) and the Modified Quadratic Discriminant Function total, there were 149x2=298 random signatures employed (MQDF) are applied as the classifiers for the experiments. for training. For testing, the remaining 12 genuine signatures and 30 skilled forgeries of the signature set being A. SVM Classifier considered were employed. The number of samples for For this experiment, a Support Vector Machine (SVM) training and testing for these experiments are shown in classifier is used. The SVM is originally defined for two- Table 3. class problems and it looks for the optimal hyper plane, Table 3. No. of Signatures used per set in 1st Phase of Verification which maximizes the distance and the margin, between the Genuine Random Skilled nearest examples of both classes, named support vectors Signature Signatures Forgeries (SVs). Given a training database of M data: {xm| m=1,..., M}, Training 12 298 n/a the linear SVM classifier is then defined as: Testing 12 n/a 30 B. Settings for Verification used in 2nd Stage of Experiments identification stage by using the SVM classifier. The 1) Settings for Signature Script Identification accuracy of Bangla, English and Hindi are 85.19, 95.74 and 150 sets of signatures (50 sets of English, 50 sets of Hindi 98.33% respectively. Confusion matrices obtained using and 50 sets of Bangla) were used for signature script SVM classifiers, and the 64-dimensional chain code features identification. 30 sets of signatures from each script were investigated, are shown in Table 6. considered for training, and the remaining 20 sets were TABLE 5. ACCURACY OBTAINED USING SVM AND MQDF CLASSIFIERS considered for testing purposes. The number of samples for Classifiers Identification Accuracy (%) training and testing used in experimentation of the identification approach are shown in Table 4. SVMs 93.08 MQDF 82.45 TABLE 4. SIGNATURE SAMPLES USED FOR SCRIPT IDENTIFICATION PHASE. English Signatures Hindi Signatures Bangla Signatures TABLE. 6. CONFUSION MATRIX OBTAINED USING THE CHAIN CODE FEATURE AND SVM CLASSIFIER Genuine Forged Genuine Forged Genuine Forged Bangla English Hindi Training 720 900 720 900 720 900 Bangla 920 19 141 Testing 480 600 480 600 480 600 English 27 1034 19 Hindi 10 8 1062 2) Settings for Signature Verification after Signature Script Identification Based on the outcomes of the identification phase, The verification task in the second stage was explored verification experiments subsequently followed. separately for English signatures, Hindi signatures and Verification results obtained for individual scripts were Bangla signatures based on the identified script result. calculated on 93.08% (identification rate) accuracy levels. Signature samples (30 sets from each script) that were In this phase of experimentation, the SVMs produced an considered for training purposes in signature script overall AER of 21.10%, 13.05% and 15.05% using English, identification were not used for the individual verification Hindi and Bangla signatures respectively. The overall task. Only the correctly identified samples from 20 sets verification accuracy obtained for the second major (used for the testing part in identification) were considered experiments (identification plus verification) was 83.60% for verification. For each signature set, an SVM was trained (average of 78.90% of English, 86.95% of Hindi and with 12 genuine signatures. The negative samples for 84.94% of Bangla). training were 95 (19x5) genuine signatures of 19 other signature sets. B. Comparision of Performance VIII. RESULTS AND DISCUSSION From the experimental results obtained, it was observed that the performance of signature verification in the second set A. Experimental Results of experiments (identification and verification) was 1) First Verification Experiments encouraging compared to the signature verification accuracy In this stage of experimentation, 8100 (150x54) signatures from the first experiment set (verification only). Table 7 involving English, Hindi and Bangla scripts were employed demonstrates the accuracies attained in the first experiment for training and testing purposes. At this operational point, set as well as separate verification results for English, Hindi the SVMs produced an AER of 20.80%, and an encouraging and Bangla from the second experiment set. accuracy of 79.20% was achieved in this first mode of verification. TABLE 7. VERIFICATION ACCURACIES RESULTING FROM DIFFERENT EXPERIMENTS 2) Second Verification Experiments In this stage of verification the signatures are identified Verification Techniques Accuracy (%) based on their script and subsequently, the identified Experiment Sets Dataset Used signatures are applied separately for verification. In the English, Hindi and signature script identification stage, only 64-dimensional 1st experiment 79.20 Bangla chain code features were used because a slightly better accuracy was obtained when compared to the gradient English 78.90 feature. The MQDF classifier was also taken into account in nd 2 experiment Hindi 86.95 the script identification step applying chain code features for Bangla 84.94 a better accuracy, but MQDF did not achieve the better result as compared to SVMs in this study. To get a comparative idea, script identification results using two In the second stage of verification, the overall accuracy is different classifiers with chain code features are shown in 83.60% (Avg. of 78.90%, 86.95% and 84.94%) which is Table 5. An accuracy of 93.08% is achieved at the script 4.40 (83.60-79.20) higher than the accuracy in the first stage. The comparison of these two accuracies is shown in substantially affects the verification accuracy, indicates an Table. 8. important step in the process. The comparatively higher verification accuracy obtained in the second experimental TABLE 8. ACCURACY IN DIFFERENT PHASES OF VERIFICATION approach is likewise a substantial contribution. The gradient Verification Experiment Verification Accuracy (%) feature, chain code feature as well as SVM and MQDF Without Script Identification 79.20 classifiers were employed for experimentation. The idea of a With Script Identification 83.60 multi-script signature verification approach, which deals with an identification phase, is a very important contribution to the area of signature verification. The proposed off-line From the above table it is evident that verification accuracy multi-script signature verification scheme is a new with script identification is much higher than without script investigation in the field of off-line signature verification. In identification. This increased accuracy is achieved because the near future, we plan to extend our work considering of the proper application of the identification stage. This further sets of signature samples, which may include research clearly demonstrates the importance of using different languages/scripts. identification in multi-script signature verification techniques. X. ACKNOWLEDGMENTS C. Error Analysis Thanks to my colleague Mr. Nabin Sharma for his help Most of the methods used for signature verification generate towards the preparation of this paper. some erroneous results. In these experiments, a few REFERENCES signature samples were mis-identified in both the [1] R. Plamondon and G. Lorette, “Automatic signature verification and identification and verification stages. Few of the confusing writer identification - the state of the art”, Pattern Recognition, signature samples obtained in the signature script pp.107–131, 1989. identification stage using the SVM classifier are shown in [2] S. Madabusi, V. Srinivas, S. Bhaskaran and M. Balasubramanian, Figures 4, 5 and 6. Three categories of confusing samples “On-line and off-line signature verification using relative slope are generated by the classifier. The first category illustrates algorithm”, in proc. International Workshop on Measurement Systems for Homeland Security, pp. 11-15, 2005. a Bangla signature sample treated as a Hindi signature [3] D. Impedovo, G. Pirlo, “ Automatic signature verification: The state sample. The second one illustrates an English signature of the art”, IEEE transactions on Systems, Man, and Cybernetics part- sample treated as a Bangla signature sample and the third C, vol. 38, no. 5, pp. 609–635, 2008. one illustrates a Hindi signature sample treated as a Bangla [4] M. Kalera, S. Srihari and A. Xu. “Offline signature verification and signature sample. identification using distance statistics”, International Journal on Pattern Recognition and Artificial Intelligence, pp.1339-1360, 2004. [5] S. Pal, U. Pal, M. Blumenstein, “Hindi Off-line Signature Verification”, in proc. International Conference on Frontiers in Handwritten Recognition, ICFHR 2012, Bari, Italy, pp. 371-376. [6] S. Pal, A. Alaei, U. Pal, M. Blumenstein, “Multi-Script off-line Figure 4. Bangla sample treated as Hindi signature identification” , in proc. International Conference on Hybrid Intelligent Systems, pp. 236-240, 2012. [7] S. Pal, U. Pal, M. Blumenstein, “Hindi and English off-line signature identification and verification”, in proc. International Conference on Advances in Computing. pp. 905–910, 2012. Figure 5. English treated as Bangla [8] B. B. Chaudhuri and U. Pal, “An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi)”, in proc. International Conference on Document Analysis and Recognition, pp. 1011–1015, 1997. [9] M. A. Ferrer, J. B. Alonso, and C. M. Travieso, “Offline geometric parameters for automatic signature verification using fixed-point Figure 6. Hindi Signature treated as Bangla arithmetic”, IEEE transactions on Pattern Analysis and Machine Intelligence, 27:993–997, 2005. IX. CONCLUSIONS AND FUTURE WORK [10] S. Pal, U. Pal and M. Blumenstein, “A Two-Stage Approach for This paper provides an investigation of the excellent English and Hindi Off-line Signature Verification”, International workshop on Emerging Aspects in Handwritten signature processing, performance of a multi-script signature verification 2013(Acceoted). technique involving English, Hindi and Bangla off-line [11] V.Vapnik, “The Nature of Statistical Learning Theory”, Springer signatures. The novel approach used in a multi-script Verlang, 1995. signature verification environment with the combination of a [12] C. Burges, “A Tutorial on support Vector machines for pattern custom Hindi and Bangla off-line signature dataset provides recognition”, Data Mining and Knowledge Discovery, pp.1-43, 1998. a substantial contribution to the field of signature [13] F. Kimura, K. Takashina, S. Tsuruoka and Y. Miyake, “Modified quadratic discriminant function and the application to Chinese verification. In such a verification environment, the proper character recognition”, IEEE transactions on Pattern Analysis and utilization of a script identification technique, which Machine Intelligence, Vol. 9, pp 149-153, 1987.