=Paper= {{Paper |id=None |storemode=property |title=Automated Off-Line Writer Verification Using Short Sentences and Grid Features |pdfUrl=https://ceur-ws.org/Vol-768/Paper5.pdf |volume=Vol-768 |dblpUrl=https://dblp.org/rec/conf/icdar/TseliosZNKE11 }} ==Automated Off-Line Writer Verification Using Short Sentences and Grid Features== https://ceur-ws.org/Vol-768/Paper5.pdf
                     Proceedings of the 1st International Workshop on Automated Forensic Handwriting Analysis (AFHA) 2011                          1




          Automated Off-Line Writer Verification Using
               Short Sentences and Grid Features
            Konstantinos Tselios, Elias N. Zois, Member IEEE, Athanasios Nassiopoulos, Sotirios Karabetsos,
                                  Member IEEE and George Economou, Member IEEE


                                                                                    Siddiqi and Vincent [3] this kind of writer verification
   Abstract—This work presents a feature extraction method for                      problem is similar to signature verification.
writer verification based on their handwriting. Motivation for                          Although content dependent approaches using well defined
this work comes from the need of enchancing modern eras                             semantics have been used at the early years of writer
security applications, mainly focused towards real or near to real
time processing, by implementing methods similar to those used
                                                                                    recognition there are at least three important reasons that
in signature verification. In this context, we have employed a full                 justify the continuous study of handwriting patterns other than
sentence written in two languages with stable and predefined                        signatures. Firstly, biometric verification schemes based on
content. The novelty of this paper focuses to the feature                           handwritten words or small sentences can be potentialy used
extraction algorithm which models the connected pixel                               to real world security applications which are quickly emerging
distribution along predetermined curvature and line paths of a                      in a modern and continuous evolving mobile and Internet
handwritten image. The efficiency of the proposed method is
evaluated with a combination of a first stage similarity score and
                                                                                    based environment. Secondly, content based retrieval systems
a continuous SVM output distribution. The experimental                              could also benefit since their users could query handwriting
benchmarking of the new method along with others, state of the                      images from various corpuses with similar handwriting styles
art techniques found in the literature, relies on the ROC curves                    [4]. Finally, an important reason emerges from the field of
and the Equal Error Rate estimation. The produced results                           continuous verification [5]. By this, we mean that we could
support a first hand proof of concept that our proposed feature                     use the handwritten patterns, to grant access to resources not
extraction method has a powerful discriminative nature.
                                                                                    only to a person’s initial entrance, but also within a cyclic and
  Index Terms—Writer Verification, Handwritten Sentences,
                                                                                    continuously verification loop, throughout the entire use of the
Grid Features, ROC, EER                                                             application. In order to explore writer verification tasks, we
                                                                                    can test a number of algorithms in a number of well
                          I. INTRODUCTION                                           established databases in the literature like IAM [6], Firemaker
                                                                                    [7], CEDAR [8] and Brazilian Forensic letter database [9].
B    IOMETRICS recognition is an appealing method for
     keeping numerous situations, including defense and
economic transactions secured. Thus, access to important
                                                                                    These databases carry rich handwriting information since they
                                                                                    have a large sample size like 156 words and/or paragraphs.
                                                                                    The use of these databases might bring around awkward
resources is granted by reducing potential vulnerability.
                                                                                    circumstances if issues like those described in the continuous
Among other biometric features, online and offline
                                                                                    verification schemes need to be raised. This can be easily seen
handwriting, which is a subset of behavioral biometrics, has
                                                                                    using the following example: Imagine the case that a person
been frequently used for resolving the problem of recognizing
                                                                                    has to verify him/her by writing a entire letter in a relative
writers either for security or forensic applications [1], [2]. In
                                                                                    small amount of time. In order to cope with this situation, an
recent years, writer identification and verification tasks have
                                                                                    alternative idea would be either to use a portion of the afore-
received considerable attention among the scientific
                                                                                    mentioned databases or to employ one small sentence content
community. A special case of writer verification uses context
                                                                                    like the one provided by database like the HIFCD1 [10].
based handwriting. So, the answer to the question: is this
                                                                                       In this work, we are presenting a novel feature extraction
person who he claims to be? shall be provided by examining a
                                                                                    method for writer verification based on the structured
predetermined text of known transcription. As stated by
                                                                                    exploitation of the statistical pixel directionality of
                                                                                    handwriting. This is achieved by counting, in a probabilistic
    E. N. Zois is with the Electronics Engineering Department, Technological        way, the occurrence of specific pixel transitions along
and Educational Institute of Athens, Agiou Spiridonos Str., 12210, Aegaleo,         predefined paths within two pre-confined chessboard
Greece (phone: +302105387204 fax: +302105385304; e-mail: ezois@
                                                                                    distances. Then, the handwritten elements described by their
teiath.gr).
    S. Karabetsos and A. Nassiopoulos are with the Electronics Engineering          strokes, angles and arcs are modelled by fusing, in the feature
Department, Technological and Educational Institute of Athens Greece.               level, two and three step transitional probabilities. This is an
    K Tselios is with the CMRI, University of Bolton, United Kingdom, (e-           extension of the work proposed in [11] for signature
mail: kt1cmr@bolton.ac.uk).
    G. Economou is with the Physics Department, University of Patras, 26500,        verification.
Greece. (e-mail: economou@upatras.gr).

                                                                               21
                  Proceedings of the 1st International Workshop on Automated Forensic Handwriting Analysis (AFHA) 2011                       2

   A two stage classification scheme based on similarity                image version with maximized amount of utilized information.
measures and an SVM has been enabled in the HIFCD1                      The pre-processing stage includes thresholding of the original
corpus. The verification efficiency is evaluated by measuring           handwritten image using Otsu’s method [14] and thinning in
the Equal Error Rate on the ROC curves, which is the point              order to provide a one pixel wide handwritten trace, which is
were the probability of misclassifying genuine samples is               considered to be insensitive to pen parameters changes like
equal to the probability of misclassifying forgery samples. The         size, colour and style. Finally, the bounding rectangle of the
EER is evaluated as a function of the word population. This is          image is produced. It must be pointed out that we treat the
achieved by plotting the ROC curves each time we append a               handwritten image as a whole and we do not perform any
word for verification.                                                  character segmentation. Next, an alignment is carried out for
   Finally, in order to benchmark our proposed method,                  every bounded image.
comparisons are provided against recently described, state of
the art methodologies for, off-line signature verification pre-
processing and feature extraction, as well as writer
verification and feature extraction approaches. Within this
context, we are providing a feasibility study of the
discriminative power of our method. This "feature
benchmarking" concept can be justified by the fact that an
ideal feature extraction method would make the classifier's job
trivial whereas an ideal classifier would not need a feature
extractor [12]. Thus, by keeping the classifier stage fixed,                                    Fig. 1. HIFCD samples

feature benchmarking could be rated in a comparative way.                  This stage gathers the intrapersonal useful information from
   The rest of this work is organized as follows: Section 2             all the samples of a writer inside a region that is considered to
provides the database details and the description of the feature        be the one that contains the most useful handwriting
extraction algorithm. Section 3 presents the experimental               information [9], [11]. In this work, we have used the estimated
verification protocol which has been applied. Section 4                 coordinates of the centre of mass x and y for each image.
presents the comparative evaluation results while section 5             Fig. 2 presents in a graphical way the above discussion. In this
draws the conclusions.                                                  work the term ‘most informative window’ (MIW) of the
                                                                        handwritten pattern is presented by considering the processed
   II. DATABASE AND FEATURE EXTRACTION PROCEDURE                        handwritten word sub-region, inside the bounded image,
                                                                        centred at x and y parameters while its length and width are
   A. Database Description and Pre-Processing
                                                                        determined empirical with trial an error method.
   In order to provide a confirmation of the proposed method
and evaluate our approach, we have employed the HIFCD1
handwritten corpus which has been used formerly in the
literature [10]. This corpus is under re-enlistment and
enrichment since its initial appearance in 2000. The developed
database consists of two different small sentences, one written
in Greek and the other one in English. Additionally to the first
twenty persons who have been enrolled in the past, another
                                                                             Fig. 2. Original and pre-processed handwritten image with MIW
twenty persons have been enrolled later on creating a total
temporary set of forty persons. This database is under                    B. Feature Extraction
restructuring in order to increase its size and diversity (e.g.            The feature extraction method maps the handwriting
include iris, fingerprints, gait, signatures, face, large scale         information, represented by the sequence of MIW words, to a
handwritten text etc.) of biometric samples equivalent to these         feature vector which models handwriting by estimating the
provided by modern databases like IAM [6] and BioSecure                 distribution of local features like orientation and curvature.
[13]. Each sentence was written by each writer 120 times.               The idea behind this originates from the simplest form of
Consequently, 9600 sentences were recorded in our database              chain code. Analytically, chain code describes an eight set of
containing a total of 48000 words. Both linguistic forms of the         sequences of two pixels and codes the succession of different
sentences are presented in Fig.1. The Greek language, being             orientations on the image grid. When sequences of three
our native language, was used in order to maintain constant             successive pixels are examined, line, convex and concave
handwriting characteristics. The Greek sentence is made up of           curvature features are generated. Since we do not utilize the
two small words of three letters, two medium length words of            features’ order of appearance, the corresponding features
seven letters and a lengthy word of eleven letters. Each word           which can be defined uniquely, beginning from a central pixel
has been created in its own cell thus making segmentation               to another one, inside a chess-board distance equal to 2 are
procedures trivial. For every word image of the corpus, pre-            twenty-two (22). The enforcement of the symmetry condition
processing steps are applied in order to provide an enhanced            limits the number of independent convex and concave features

                                                                   22
                     Proceedings of the 1st International Workshop on Automated Forensic Handwriting Analysis (AFHA) 2011                                      3

to 11. This subset is enriched with the use of four line-features                   genuine training vectors are evaluated by using the weighted
describing the fundamental line segments of slope 0, 45, 90,                                                                                     (
                                                                                    distance as eq. (1) provides [12] and their pdf S vGTW | Wi is         )
135. This 15-dimensional feature space defines the new
embedding space. Furthermore we have partitioned the MIW                            stored. A similar procedure, described by eq. (2), has been
image to a 2 × 2 sub-window grid, and the respective outputs                        applied in order to derive the distribution of the similarity
have been fused in feature level by simple appending.                                           (
                                                                                    scores S vFTW | Wi    )
                                                                                                          for the case of the false train
   Following the above idea, we explore an additional feature                       samples {FW } .
set by measuring the pixels paths which are obeying the                                                                                             −0.5
                                                                                                    ⎛ 88 )                                     2⎞
following statement. Find the four pixel connected paths,
while restraining the chess-board distance among the first and
                                                                                          (         )
                                                                                      S vGTW | Wi = ⎜ ∑ σ ( j ) v−GTW
                                                                                                                  2
                                                                                                                      (                      )
                                                                                                                      G ( j )TW − µ ( j ) vGTW ⎟           (1)
                                                                                                    ⎝ j =1                                      ⎠
the fourth pixel equal to three and co instantaneously                                                                                           −0.5
                                                                                                      ⎛ 88 )                                    2⎞
restraining the chess-board distance among the first and the
third pixel equal to two, by ignoring the prior path selection
                                                                                          (         )
                                                                                       S vFTW | Wi = ⎜ ∑ σ ( j ) v−FTW
                                                                                                                   2
                                                                                                                      (                     )
                                                                                                                       F ( j )TW − µ ( j ) vFTW ⎟  (2)
                                                                                                      ⎝ j =1                                     ⎠
that has taken place in the inner two-step transition. This                            Following the first stage, a two-class support vector
provides a feature with dimensionality of 28 since we do not                        machine is employed in order to provide a mapping of the
partition the image. The final feature vector is generated by                       training similarity scores to another distance space, induced by
appending, in a feature fusion way, the aforementioned two                          the SVM. Accordingly, inputs to the second stage are the
and three step features. Its dimensionality equals to 88 (four
sub-images x 15 features + one image x 28 features) and it is
                                                                                    genuine and impostor distribution scores S vGTW | Wi ,           (         )
depicted graphically in Fig. 3. Algorithmically, a rectangular                        (         )
                                                                                    S vFTW | Wi . The output of the SVM is a continuous-valued
grid of 4 × 7 dimension scans every input of MIW words
                                                                                    distance of the optimal separating hyper-plane from the
sequence. This mask aligns each aforementioned pixel with                           unknown test input sample vector [24]. The mapping function
the {5, 3} coordinate, thus enabling 15 potential 2-step paths                      has been represented by a Gaussian Radial Base kernel
and 28 3-step paths from the central pixel according to the                         function after a number of trials.
previous discussion. Then, the paths which are included in the                         The testing phase uses the remaining samples of the
feature set are marked and a counter updates the
                                                                                    genuine and forgery sets {vTSw } = {GTSW , FTSW } . Thus, for each
corresponding features found. Finally, the feature components
are normalized by their total sum in order to provide a                             writer, the similarity scores, evaluated from the samples of the
probabilistic expression.                                                           testing set, are presented as an input to the second stage SVM
                                                                                    mapping function. A negative value from the SVM output
                                                                                    indicates that the unknown feature vector is below the optimal
                                                                                    separating hyper plane and near the hyper-plane which
                                                                                    corresponds to the genuine class. On the other, a positive
                                                                                    value denotes that the unknown input vector tends to fall
                                                                                    towards the impostor hyper-plane class [15]. Finally, the
                                                                                    continuous SVM output models both the overall distribution
                                                                                    of the genuine writers along with the impostor ones. The
                                                                                    selection of the training samples for the genuine class is
                                                                                    accomplished using random samples with the hold-out
                                                                                    validation method.
                                                                                       Evaluation of the verification efficiency of the system is
                                                                                    accomplished with the use of a global threshold on the overall
Fig. 3. Feature extraction methodology. Example with activated feature              SVM output distribution. This is achieved by providing the
components (represented in yellow circles). a) Basic feature generating mask        system’s False Acceptance Rate (FAR: samples not belonging
within chessboard distance of two. b) The feature mask within chessboard
distance of three, irrespective of the inner, two-step path.
                                                                                    to genuine writers, yet assigned to them) and the False
                                                                                    Rejection Rate (FRR: samples belonging to genuine writers,
                  III. CLASSIFICATION PROTOCOL                                      yet not classified) functions. With these two rates, the receiver
                                                                                    operator characteristics (ROC) are drawn by means of their
  As described in section II, the input to the classification                       FAR / FRR plot. Then, classification performance is measured
system are the training and testing feature vectors denoted                         with the utilization of the system Equal Error Rate (EER: the
hereafter as {vTw , vTSw } . The training set vTw is composed of the                point which FAR equals FRR).
genuine and forgery vectors {GTW , FTW } of each writer
Wi , i = 1, 2,..., 40 . The GTW vectors are modeling the genuine                                                IV. RESULTS
class population by means of their average value µvGTW and                           A. Benchmarking With Relative Feature Algorithms
standard deviation σˆ vGTW . Next, the similarity scores of the                      We have benchmarked the proposed methodology against


                                                                               23
                   Proceedings of the 1st International Workshop on Automated Forensic Handwriting Analysis (AFHA) 2011                                4

three other feature extraction methods for signature                       figures 4-8 are presented in tabular form in table 1.
verification and writer identification, which can be found in
the literature. The first is a signature verification texture based
approach, which is provided by Vargas, Ferrer, Travieso and
Alonso [16]. Secondly, we are examining the performance of
a shape descriptor proposed by Aguilar, Hermira, Marquez
and Garcia, which is based on the use of predetermined shape
masks [17]. In all cases, the pre-processing as well as the
feature extraction steps have been realized according to the
description described by the authors. The third method uses
the f1 contour direction pdf features and the f2 contour hinge
features which are a part of the work proposed by Bulaku and
Schomaker [18]. It is of great interest that the f2 feature is one
of the most powerful descriptors for modelling the
handwriting. It must be noted that, an appropriate pre-
processing step has been carried out in order to provide the
contours of the handwritten images.
                                                                           Fig. 4. ROC curves and EER of the proposed and the competitive methods.
  B. Verification Results                                                  The lower left part presents the results from one Greek word while the upper
   According to the material exposed in section III,                       right uses a sequence of the first and second words.
representation of the genuine class has been realized with
various schemes by utilizing 5, 10, 15, 20, 25, 30 samples for
the {GTW } training and 115, 110, 105, 100, 95 and 90 samples
for the {GTSW } testing. On the other, the {FTW } training set
for the forgery class has been formed using one sample of all
the remaining writers which results to a number of 39
samples. The {FTSW } samples are formed by employing the
remaining 119 ( samples writer ) × 39 writers , resulting to a
total number of 4641. The ROC curves, which are drawn as a
function of the number of words and presented to figs, 4-8,
illustrate the classification efficiency of our method against to
those mentioned to the previous section. These curves have
been evaluated for the last training scheme, i.e 30 and 90
training samples for {GTW } and {GTW } population. Similar
                                                                           Fig. 5. ROC curves and EER of the proposed and the competitive methods.
results regarding the evaluation taxonomy have been obtained.              The lower left part presents the results from one English word while the upper
   Commenting on the results, it can be easily inferred that our           right uses a sequence of the first and second Enlish words.
method provides a challenging, first hand proof of concept of
its enhanced writer verification capabilities. Another
interesting issue is that the verification efficiency is enhanced
when the number of the inserted words to the feature stage
increases, which is intuitively correct. An Additional comment
is that the English sentence provides a boosted EER when
compared to the Greek sentence, even though Greek is our
native language. This might be due to the fact that the text
used in the English sentence incorporates lengthier words
when compared to the Greek one. Another standpoint for the
enhanced Latin EER measure could be that when Greeks or
individuals which are not having English as their native
language are forced to write in Latin, their response provides
less spontaneous handwritten samples. This may have
introduced less writer specificity in the data which in its turn
provides higher verification rates. Although the results are
quite encouraging however; they must be further tested in                  Fig. 6. ROC curves and EER of the proposed and the competitive methods.
                                                                           The lower left part presents the results by employing a sequence of the first
larger databases and under a number of different feature and               three words of the Greek sentence while the upper right uses a sequence of the
classifications schemes. The best EER rates corresponding to               first four Greek words.


                                                                      24
                      Proceedings of the 1st International Workshop on Automated Forensic Handwriting Analysis (AFHA) 2011                                        5

                                                                                     [2]  G. X. Tan, C. Viard-Gaudin, and A. C. Kot, "Automatic writer
                                                                                          identification framework for online handwritten documents using
                                                                                          character prototypes," Pattern Recognition, vol. 42, pp. 3313-3323,
                                                                                          2009.
                                                                                     [3] I. Siddiqi and N. Vincent, "Text independent writer recognition using
                                                                                          redundant writing patterns with contour-based orientation and curvature
                                                                                          features," Pattern Recognition, vol. 43, pp. 3853-3865, 2010.
                                                                                     [4] A. Bhardwaj, A. O. Thomas, Y. Fu, and V. Govindaraju, "Retrieving
                                                                                          handwriting styles: A content based approach to handwritten document
                                                                                          retrieval," in Proc. International Conference on Handwriting
                                                                                          Recognition, Kolkata, India, 2010, pp. 265-270.
                                                                                     [5] T. Sim, S. Zhang, R. Janakiraman, and S. Kumar, "Continuous
                                                                                          verification using multimodal biometrics," IEEE Transactions on
                                                                                          Pattern Analysis and Machine Intelligence, vol. 29, pp. 687-700, 2007.
                                                                                     [6] U.-V. Marti and H. Bunke, "The IAM-database: An English sentence
                                                                                          database for off-line handwriting recognition "International Journal on
                                                                                          Document Analysis and Recognition, Vol. 5, pp. 39-46, 2002.
                                                                                     [7] M. Bulaku and L. Schomaker, "Forensic Writer Identification: A
                                                                                          Benchmark Data Set and a Comparison of Two Systems", Technical
                                                                                          Report, NICI 2000.
Fig. 7. ROC curves and EER of the proposed and the competitive methods.              [8] S. N. Srihari, S.-H. Cha, H. Arora and S. Lee, "Individuality of
The lower left part presents the results by employing a sequence of the first             handwriting", Journal of Forensic Science, Vol. 47, pp.1-17, 2002.
three words of the English sentence while the upper right uses a sequence of         [9] R. K. Hanusiak, L. S. Oliveira, E. Justino and R. Sabourin, "Writer
the first four English words.                                                             verification using texture-based features", International Journal of
                                                                                          Document Analysis and "Recognition, [DOI:10.1007/s10032-011-0166-
                                                                                          4] , 2011.
                                                                                     [10] E. N. Zois and V. Anastassopoulos, "Fusion of correlated decisions for
                                                                                          writer verification," Pattern Recognition, vol. 34, pp. 47-61, 2001.
                                                                                     [11] E. N. Zois, K. Tselios, E. Siores, A. Nassiopoulos, and G. Economou,
                                                                                          "Off-Line Signature Verification Using Two Step Transitional Features,"
                                                                                          in Proc 12th IAPR Conference on Machine Vision Applications, Nara,
                                                                                          Japan, 2011.
                                                                                     [12] R. O. Duda and P. E. Hart, Pattern classification. New York: John
                                                                                          Wiley and Sons, 2001.
                                                                                     [13] http://biosecure.it-sudparis.eu/AB/
                                                                                     [14] N. Otsu, "A threshold selection method from gray-level histogram",
                                                                                          IEEE Transactions on System, Man and Cybernetics, Vol. 8, pp.62-66,
                                                                                          1978.
                                                                                     [15] Lutz Hamel: "Kernel Knowledge discovery with support vector
                                                                                          machines", Wiley, New Jersey, 2009.
                                                                                     [16] J. F. Vargas, M. A. Ferrer, C. M. Travieso, and J. B. Alonso, "Off-line
                                                                                          signature verification based on grey level information using texture
                                                                                          features", Pattern Recognition, Vol. 44, pp. 375-385, 2011.
Fig. 8. ROC curves and EER of the proposed and the competitive methods.              [17] J. F. Aguilar, N. A. Hermira, G. M. Marquez and J. O. Garcia, "An off-
The lower left part presents the results by employing a sequence of the five              line signature verification system based of local and global information",
words of the Greek sentence while the upper right uses a sequence of the five             LCNS 3087, pp.295-306, 2004.
words of the English sentence.                                                       [18] M. Bulacu and L. Schomaker, "Text-independent writer identification
                                                                                          and verification using textural and allographic features," IEEE
                               REFERENCES                                                 Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, pp.
                                                                                          701-717, 2007.
[1]   R. Plamondon and S. N. Srihari, "On-line and off-line handwriting
      recognition: A comprehensive survey," IEEE Transactions on Pattern
      Analysis and Machine Intelligence, vol. 22, pp. 63-84, 2000.

                                                  TABLE I
          CLASSIFICATION EFFICIENCY (%) BASED ON THE EQUAL ERROR RATE DERIVED FROM FIGS. 4-8
                                                             Sequences of Words
           Feature                    (1st / {1st & 2nd } / {1st& 2nd&3rd} / {1st& 2nd&3rd&4th} / {all}
      Extraction Method
                                       English Sentence                             Greek Sentence
        Proposed work                15.53 / 6.05 / 5.92 / 4.90 / 4.08       22.78 / 11.13 / 9.21 / 7.14 / 5.71
  Feature proposed by [16]          13.54 / 11.10 / 9.08 / 7.69 / 6.92      15.04 / 12.29 / 10.99 / 9.76 / 8.96
 f1 Feature proposed by [18]      29.81 / 21.06 / 19.46 / 18.41 / 14.12    29.78 / 28.08 / 26.49 / 23.85 / 21.98

 f2 Feature proposed by [18]       20.22 / 12.72 / 11.36 / 7.48 / 5.58     26.55 / 17.72 / 17.57 / 12.41 / 10.82

  Feature proposed by [17]        28.95 / 28.19 / 24.64 / 19.07 / 16.90    32.30 / 30.44 / 29.18 / 28.47 / 27.63




                                                                                25