=Paper=
{{Paper
|id=None
|storemode=property
|title=Automated Off-Line Writer Verification Using Short Sentences and Grid Features
|pdfUrl=https://ceur-ws.org/Vol-768/Paper5.pdf
|volume=Vol-768
|dblpUrl=https://dblp.org/rec/conf/icdar/TseliosZNKE11
}}
==Automated Off-Line Writer Verification Using Short Sentences and Grid Features==
Proceedings of the 1st International Workshop on Automated Forensic Handwriting Analysis (AFHA) 2011 1
Automated Off-Line Writer Verification Using
Short Sentences and Grid Features
Konstantinos Tselios, Elias N. Zois, Member IEEE, Athanasios Nassiopoulos, Sotirios Karabetsos,
Member IEEE and George Economou, Member IEEE
Siddiqi and Vincent [3] this kind of writer verification
Abstract—This work presents a feature extraction method for problem is similar to signature verification.
writer verification based on their handwriting. Motivation for Although content dependent approaches using well defined
this work comes from the need of enchancing modern eras semantics have been used at the early years of writer
security applications, mainly focused towards real or near to real
time processing, by implementing methods similar to those used
recognition there are at least three important reasons that
in signature verification. In this context, we have employed a full justify the continuous study of handwriting patterns other than
sentence written in two languages with stable and predefined signatures. Firstly, biometric verification schemes based on
content. The novelty of this paper focuses to the feature handwritten words or small sentences can be potentialy used
extraction algorithm which models the connected pixel to real world security applications which are quickly emerging
distribution along predetermined curvature and line paths of a in a modern and continuous evolving mobile and Internet
handwritten image. The efficiency of the proposed method is
evaluated with a combination of a first stage similarity score and
based environment. Secondly, content based retrieval systems
a continuous SVM output distribution. The experimental could also benefit since their users could query handwriting
benchmarking of the new method along with others, state of the images from various corpuses with similar handwriting styles
art techniques found in the literature, relies on the ROC curves [4]. Finally, an important reason emerges from the field of
and the Equal Error Rate estimation. The produced results continuous verification [5]. By this, we mean that we could
support a first hand proof of concept that our proposed feature use the handwritten patterns, to grant access to resources not
extraction method has a powerful discriminative nature.
only to a person’s initial entrance, but also within a cyclic and
Index Terms—Writer Verification, Handwritten Sentences,
continuously verification loop, throughout the entire use of the
Grid Features, ROC, EER application. In order to explore writer verification tasks, we
can test a number of algorithms in a number of well
I. INTRODUCTION established databases in the literature like IAM [6], Firemaker
[7], CEDAR [8] and Brazilian Forensic letter database [9].
B IOMETRICS recognition is an appealing method for
keeping numerous situations, including defense and
economic transactions secured. Thus, access to important
These databases carry rich handwriting information since they
have a large sample size like 156 words and/or paragraphs.
The use of these databases might bring around awkward
resources is granted by reducing potential vulnerability.
circumstances if issues like those described in the continuous
Among other biometric features, online and offline
verification schemes need to be raised. This can be easily seen
handwriting, which is a subset of behavioral biometrics, has
using the following example: Imagine the case that a person
been frequently used for resolving the problem of recognizing
has to verify him/her by writing a entire letter in a relative
writers either for security or forensic applications [1], [2]. In
small amount of time. In order to cope with this situation, an
recent years, writer identification and verification tasks have
alternative idea would be either to use a portion of the afore-
received considerable attention among the scientific
mentioned databases or to employ one small sentence content
community. A special case of writer verification uses context
like the one provided by database like the HIFCD1 [10].
based handwriting. So, the answer to the question: is this
In this work, we are presenting a novel feature extraction
person who he claims to be? shall be provided by examining a
method for writer verification based on the structured
predetermined text of known transcription. As stated by
exploitation of the statistical pixel directionality of
handwriting. This is achieved by counting, in a probabilistic
E. N. Zois is with the Electronics Engineering Department, Technological way, the occurrence of specific pixel transitions along
and Educational Institute of Athens, Agiou Spiridonos Str., 12210, Aegaleo, predefined paths within two pre-confined chessboard
Greece (phone: +302105387204 fax: +302105385304; e-mail: ezois@
distances. Then, the handwritten elements described by their
teiath.gr).
S. Karabetsos and A. Nassiopoulos are with the Electronics Engineering strokes, angles and arcs are modelled by fusing, in the feature
Department, Technological and Educational Institute of Athens Greece. level, two and three step transitional probabilities. This is an
K Tselios is with the CMRI, University of Bolton, United Kingdom, (e- extension of the work proposed in [11] for signature
mail: kt1cmr@bolton.ac.uk).
G. Economou is with the Physics Department, University of Patras, 26500, verification.
Greece. (e-mail: economou@upatras.gr).
21
Proceedings of the 1st International Workshop on Automated Forensic Handwriting Analysis (AFHA) 2011 2
A two stage classification scheme based on similarity image version with maximized amount of utilized information.
measures and an SVM has been enabled in the HIFCD1 The pre-processing stage includes thresholding of the original
corpus. The verification efficiency is evaluated by measuring handwritten image using Otsu’s method [14] and thinning in
the Equal Error Rate on the ROC curves, which is the point order to provide a one pixel wide handwritten trace, which is
were the probability of misclassifying genuine samples is considered to be insensitive to pen parameters changes like
equal to the probability of misclassifying forgery samples. The size, colour and style. Finally, the bounding rectangle of the
EER is evaluated as a function of the word population. This is image is produced. It must be pointed out that we treat the
achieved by plotting the ROC curves each time we append a handwritten image as a whole and we do not perform any
word for verification. character segmentation. Next, an alignment is carried out for
Finally, in order to benchmark our proposed method, every bounded image.
comparisons are provided against recently described, state of
the art methodologies for, off-line signature verification pre-
processing and feature extraction, as well as writer
verification and feature extraction approaches. Within this
context, we are providing a feasibility study of the
discriminative power of our method. This "feature
benchmarking" concept can be justified by the fact that an
ideal feature extraction method would make the classifier's job
trivial whereas an ideal classifier would not need a feature
extractor [12]. Thus, by keeping the classifier stage fixed, Fig. 1. HIFCD samples
feature benchmarking could be rated in a comparative way. This stage gathers the intrapersonal useful information from
The rest of this work is organized as follows: Section 2 all the samples of a writer inside a region that is considered to
provides the database details and the description of the feature be the one that contains the most useful handwriting
extraction algorithm. Section 3 presents the experimental information [9], [11]. In this work, we have used the estimated
verification protocol which has been applied. Section 4 coordinates of the centre of mass x and y for each image.
presents the comparative evaluation results while section 5 Fig. 2 presents in a graphical way the above discussion. In this
draws the conclusions. work the term ‘most informative window’ (MIW) of the
handwritten pattern is presented by considering the processed
II. DATABASE AND FEATURE EXTRACTION PROCEDURE handwritten word sub-region, inside the bounded image,
centred at x and y parameters while its length and width are
A. Database Description and Pre-Processing
determined empirical with trial an error method.
In order to provide a confirmation of the proposed method
and evaluate our approach, we have employed the HIFCD1
handwritten corpus which has been used formerly in the
literature [10]. This corpus is under re-enlistment and
enrichment since its initial appearance in 2000. The developed
database consists of two different small sentences, one written
in Greek and the other one in English. Additionally to the first
twenty persons who have been enrolled in the past, another
Fig. 2. Original and pre-processed handwritten image with MIW
twenty persons have been enrolled later on creating a total
temporary set of forty persons. This database is under B. Feature Extraction
restructuring in order to increase its size and diversity (e.g. The feature extraction method maps the handwriting
include iris, fingerprints, gait, signatures, face, large scale information, represented by the sequence of MIW words, to a
handwritten text etc.) of biometric samples equivalent to these feature vector which models handwriting by estimating the
provided by modern databases like IAM [6] and BioSecure distribution of local features like orientation and curvature.
[13]. Each sentence was written by each writer 120 times. The idea behind this originates from the simplest form of
Consequently, 9600 sentences were recorded in our database chain code. Analytically, chain code describes an eight set of
containing a total of 48000 words. Both linguistic forms of the sequences of two pixels and codes the succession of different
sentences are presented in Fig.1. The Greek language, being orientations on the image grid. When sequences of three
our native language, was used in order to maintain constant successive pixels are examined, line, convex and concave
handwriting characteristics. The Greek sentence is made up of curvature features are generated. Since we do not utilize the
two small words of three letters, two medium length words of features’ order of appearance, the corresponding features
seven letters and a lengthy word of eleven letters. Each word which can be defined uniquely, beginning from a central pixel
has been created in its own cell thus making segmentation to another one, inside a chess-board distance equal to 2 are
procedures trivial. For every word image of the corpus, pre- twenty-two (22). The enforcement of the symmetry condition
processing steps are applied in order to provide an enhanced limits the number of independent convex and concave features
22
Proceedings of the 1st International Workshop on Automated Forensic Handwriting Analysis (AFHA) 2011 3
to 11. This subset is enriched with the use of four line-features genuine training vectors are evaluated by using the weighted
describing the fundamental line segments of slope 0, 45, 90, (
distance as eq. (1) provides [12] and their pdf S vGTW | Wi is )
135. This 15-dimensional feature space defines the new
embedding space. Furthermore we have partitioned the MIW stored. A similar procedure, described by eq. (2), has been
image to a 2 × 2 sub-window grid, and the respective outputs applied in order to derive the distribution of the similarity
have been fused in feature level by simple appending. (
scores S vFTW | Wi )
for the case of the false train
Following the above idea, we explore an additional feature samples {FW } .
set by measuring the pixels paths which are obeying the −0.5
⎛ 88 ) 2⎞
following statement. Find the four pixel connected paths,
while restraining the chess-board distance among the first and
( )
S vGTW | Wi = ⎜ ∑ σ ( j ) v−GTW
2
( )
G ( j )TW − µ ( j ) vGTW ⎟ (1)
⎝ j =1 ⎠
the fourth pixel equal to three and co instantaneously −0.5
⎛ 88 ) 2⎞
restraining the chess-board distance among the first and the
third pixel equal to two, by ignoring the prior path selection
( )
S vFTW | Wi = ⎜ ∑ σ ( j ) v−FTW
2
( )
F ( j )TW − µ ( j ) vFTW ⎟ (2)
⎝ j =1 ⎠
that has taken place in the inner two-step transition. This Following the first stage, a two-class support vector
provides a feature with dimensionality of 28 since we do not machine is employed in order to provide a mapping of the
partition the image. The final feature vector is generated by training similarity scores to another distance space, induced by
appending, in a feature fusion way, the aforementioned two the SVM. Accordingly, inputs to the second stage are the
and three step features. Its dimensionality equals to 88 (four
sub-images x 15 features + one image x 28 features) and it is
genuine and impostor distribution scores S vGTW | Wi , ( )
depicted graphically in Fig. 3. Algorithmically, a rectangular ( )
S vFTW | Wi . The output of the SVM is a continuous-valued
grid of 4 × 7 dimension scans every input of MIW words
distance of the optimal separating hyper-plane from the
sequence. This mask aligns each aforementioned pixel with unknown test input sample vector [24]. The mapping function
the {5, 3} coordinate, thus enabling 15 potential 2-step paths has been represented by a Gaussian Radial Base kernel
and 28 3-step paths from the central pixel according to the function after a number of trials.
previous discussion. Then, the paths which are included in the The testing phase uses the remaining samples of the
feature set are marked and a counter updates the
genuine and forgery sets {vTSw } = {GTSW , FTSW } . Thus, for each
corresponding features found. Finally, the feature components
are normalized by their total sum in order to provide a writer, the similarity scores, evaluated from the samples of the
probabilistic expression. testing set, are presented as an input to the second stage SVM
mapping function. A negative value from the SVM output
indicates that the unknown feature vector is below the optimal
separating hyper plane and near the hyper-plane which
corresponds to the genuine class. On the other, a positive
value denotes that the unknown input vector tends to fall
towards the impostor hyper-plane class [15]. Finally, the
continuous SVM output models both the overall distribution
of the genuine writers along with the impostor ones. The
selection of the training samples for the genuine class is
accomplished using random samples with the hold-out
validation method.
Evaluation of the verification efficiency of the system is
accomplished with the use of a global threshold on the overall
Fig. 3. Feature extraction methodology. Example with activated feature SVM output distribution. This is achieved by providing the
components (represented in yellow circles). a) Basic feature generating mask system’s False Acceptance Rate (FAR: samples not belonging
within chessboard distance of two. b) The feature mask within chessboard
distance of three, irrespective of the inner, two-step path.
to genuine writers, yet assigned to them) and the False
Rejection Rate (FRR: samples belonging to genuine writers,
III. CLASSIFICATION PROTOCOL yet not classified) functions. With these two rates, the receiver
operator characteristics (ROC) are drawn by means of their
As described in section II, the input to the classification FAR / FRR plot. Then, classification performance is measured
system are the training and testing feature vectors denoted with the utilization of the system Equal Error Rate (EER: the
hereafter as {vTw , vTSw } . The training set vTw is composed of the point which FAR equals FRR).
genuine and forgery vectors {GTW , FTW } of each writer
Wi , i = 1, 2,..., 40 . The GTW vectors are modeling the genuine IV. RESULTS
class population by means of their average value µvGTW and A. Benchmarking With Relative Feature Algorithms
standard deviation σˆ vGTW . Next, the similarity scores of the We have benchmarked the proposed methodology against
23
Proceedings of the 1st International Workshop on Automated Forensic Handwriting Analysis (AFHA) 2011 4
three other feature extraction methods for signature figures 4-8 are presented in tabular form in table 1.
verification and writer identification, which can be found in
the literature. The first is a signature verification texture based
approach, which is provided by Vargas, Ferrer, Travieso and
Alonso [16]. Secondly, we are examining the performance of
a shape descriptor proposed by Aguilar, Hermira, Marquez
and Garcia, which is based on the use of predetermined shape
masks [17]. In all cases, the pre-processing as well as the
feature extraction steps have been realized according to the
description described by the authors. The third method uses
the f1 contour direction pdf features and the f2 contour hinge
features which are a part of the work proposed by Bulaku and
Schomaker [18]. It is of great interest that the f2 feature is one
of the most powerful descriptors for modelling the
handwriting. It must be noted that, an appropriate pre-
processing step has been carried out in order to provide the
contours of the handwritten images.
Fig. 4. ROC curves and EER of the proposed and the competitive methods.
B. Verification Results The lower left part presents the results from one Greek word while the upper
According to the material exposed in section III, right uses a sequence of the first and second words.
representation of the genuine class has been realized with
various schemes by utilizing 5, 10, 15, 20, 25, 30 samples for
the {GTW } training and 115, 110, 105, 100, 95 and 90 samples
for the {GTSW } testing. On the other, the {FTW } training set
for the forgery class has been formed using one sample of all
the remaining writers which results to a number of 39
samples. The {FTSW } samples are formed by employing the
remaining 119 ( samples writer ) × 39 writers , resulting to a
total number of 4641. The ROC curves, which are drawn as a
function of the number of words and presented to figs, 4-8,
illustrate the classification efficiency of our method against to
those mentioned to the previous section. These curves have
been evaluated for the last training scheme, i.e 30 and 90
training samples for {GTW } and {GTW } population. Similar
Fig. 5. ROC curves and EER of the proposed and the competitive methods.
results regarding the evaluation taxonomy have been obtained. The lower left part presents the results from one English word while the upper
Commenting on the results, it can be easily inferred that our right uses a sequence of the first and second Enlish words.
method provides a challenging, first hand proof of concept of
its enhanced writer verification capabilities. Another
interesting issue is that the verification efficiency is enhanced
when the number of the inserted words to the feature stage
increases, which is intuitively correct. An Additional comment
is that the English sentence provides a boosted EER when
compared to the Greek sentence, even though Greek is our
native language. This might be due to the fact that the text
used in the English sentence incorporates lengthier words
when compared to the Greek one. Another standpoint for the
enhanced Latin EER measure could be that when Greeks or
individuals which are not having English as their native
language are forced to write in Latin, their response provides
less spontaneous handwritten samples. This may have
introduced less writer specificity in the data which in its turn
provides higher verification rates. Although the results are
quite encouraging however; they must be further tested in Fig. 6. ROC curves and EER of the proposed and the competitive methods.
The lower left part presents the results by employing a sequence of the first
larger databases and under a number of different feature and three words of the Greek sentence while the upper right uses a sequence of the
classifications schemes. The best EER rates corresponding to first four Greek words.
24
Proceedings of the 1st International Workshop on Automated Forensic Handwriting Analysis (AFHA) 2011 5
[2] G. X. Tan, C. Viard-Gaudin, and A. C. Kot, "Automatic writer
identification framework for online handwritten documents using
character prototypes," Pattern Recognition, vol. 42, pp. 3313-3323,
2009.
[3] I. Siddiqi and N. Vincent, "Text independent writer recognition using
redundant writing patterns with contour-based orientation and curvature
features," Pattern Recognition, vol. 43, pp. 3853-3865, 2010.
[4] A. Bhardwaj, A. O. Thomas, Y. Fu, and V. Govindaraju, "Retrieving
handwriting styles: A content based approach to handwritten document
retrieval," in Proc. International Conference on Handwriting
Recognition, Kolkata, India, 2010, pp. 265-270.
[5] T. Sim, S. Zhang, R. Janakiraman, and S. Kumar, "Continuous
verification using multimodal biometrics," IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 29, pp. 687-700, 2007.
[6] U.-V. Marti and H. Bunke, "The IAM-database: An English sentence
database for off-line handwriting recognition "International Journal on
Document Analysis and Recognition, Vol. 5, pp. 39-46, 2002.
[7] M. Bulaku and L. Schomaker, "Forensic Writer Identification: A
Benchmark Data Set and a Comparison of Two Systems", Technical
Report, NICI 2000.
Fig. 7. ROC curves and EER of the proposed and the competitive methods. [8] S. N. Srihari, S.-H. Cha, H. Arora and S. Lee, "Individuality of
The lower left part presents the results by employing a sequence of the first handwriting", Journal of Forensic Science, Vol. 47, pp.1-17, 2002.
three words of the English sentence while the upper right uses a sequence of [9] R. K. Hanusiak, L. S. Oliveira, E. Justino and R. Sabourin, "Writer
the first four English words. verification using texture-based features", International Journal of
Document Analysis and "Recognition, [DOI:10.1007/s10032-011-0166-
4] , 2011.
[10] E. N. Zois and V. Anastassopoulos, "Fusion of correlated decisions for
writer verification," Pattern Recognition, vol. 34, pp. 47-61, 2001.
[11] E. N. Zois, K. Tselios, E. Siores, A. Nassiopoulos, and G. Economou,
"Off-Line Signature Verification Using Two Step Transitional Features,"
in Proc 12th IAPR Conference on Machine Vision Applications, Nara,
Japan, 2011.
[12] R. O. Duda and P. E. Hart, Pattern classification. New York: John
Wiley and Sons, 2001.
[13] http://biosecure.it-sudparis.eu/AB/
[14] N. Otsu, "A threshold selection method from gray-level histogram",
IEEE Transactions on System, Man and Cybernetics, Vol. 8, pp.62-66,
1978.
[15] Lutz Hamel: "Kernel Knowledge discovery with support vector
machines", Wiley, New Jersey, 2009.
[16] J. F. Vargas, M. A. Ferrer, C. M. Travieso, and J. B. Alonso, "Off-line
signature verification based on grey level information using texture
features", Pattern Recognition, Vol. 44, pp. 375-385, 2011.
Fig. 8. ROC curves and EER of the proposed and the competitive methods. [17] J. F. Aguilar, N. A. Hermira, G. M. Marquez and J. O. Garcia, "An off-
The lower left part presents the results by employing a sequence of the five line signature verification system based of local and global information",
words of the Greek sentence while the upper right uses a sequence of the five LCNS 3087, pp.295-306, 2004.
words of the English sentence. [18] M. Bulacu and L. Schomaker, "Text-independent writer identification
and verification using textural and allographic features," IEEE
REFERENCES Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, pp.
701-717, 2007.
[1] R. Plamondon and S. N. Srihari, "On-line and off-line handwriting
recognition: A comprehensive survey," IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 22, pp. 63-84, 2000.
TABLE I
CLASSIFICATION EFFICIENCY (%) BASED ON THE EQUAL ERROR RATE DERIVED FROM FIGS. 4-8
Sequences of Words
Feature (1st / {1st & 2nd } / {1st& 2nd&3rd} / {1st& 2nd&3rd&4th} / {all}
Extraction Method
English Sentence Greek Sentence
Proposed work 15.53 / 6.05 / 5.92 / 4.90 / 4.08 22.78 / 11.13 / 9.21 / 7.14 / 5.71
Feature proposed by [16] 13.54 / 11.10 / 9.08 / 7.69 / 6.92 15.04 / 12.29 / 10.99 / 9.76 / 8.96
f1 Feature proposed by [18] 29.81 / 21.06 / 19.46 / 18.41 / 14.12 29.78 / 28.08 / 26.49 / 23.85 / 21.98
f2 Feature proposed by [18] 20.22 / 12.72 / 11.36 / 7.48 / 5.58 26.55 / 17.72 / 17.57 / 12.41 / 10.82
Feature proposed by [17] 28.95 / 28.19 / 24.64 / 19.07 / 16.90 32.30 / 30.44 / 29.18 / 28.47 / 27.63
25