<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Offline Handwriting Acquisition under Controlled and Uncontrolled Conditions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Linda Alewijnse</string-name>
          <email>l.alewijnse@nfi.minvenj.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Netherlands Forensic Institute Department of Digital Technology and Biometrics The Hague</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>-This paper gives a description of offline handwriting acquisition under controlled and uncontrolled conditions for research purposes. The data collection task is an underestimated part in the process of developing signature verification or handwriting identification systems. There is a continuous need for new, unpublished data to train and evaluate new algorithms. Handwriting samples that make up the current publicly available databases have all been collected under controlled conditions. However, good quality data is still limited.</p>
      </abstract>
      <kwd-group>
        <kwd>offline data</kwd>
        <kwd>data collection</kwd>
        <kwd>signature verification</kwd>
        <kwd>forensic handwriting examiner</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>INTRODUCTION</p>
      <p>Signature verification is a biometric technique with
promising results for the near future for implementation within
the forensic handwriting examination. In the past 10 years
rapid developments are made within the pattern recognition
discipline [1]. Implementing analysis tools in the forensic
practice is the next challenge. Before an automated signature
verification or handwriting identification system can be
implemented, the forensic community must be ascertained that
the systems are trained, evaluated and validated by correct
environmental conditions.</p>
      <p>Collecting and selecting handwriting samples for research
purposes is often an underestimated task. The number of
publicly available databases with handwriting is limited, so
new data must be collected regularly. Data are primarily
collected to provide information regarding a specific topic.
Therefore, data must be in accordance with the objective of
the study. The overall performance of a biometric technology
is eventually influenced by the quality of the input data.</p>
    </sec>
    <sec id="sec-2">
      <title>A. Learning from the past</title>
      <p>
        The following example illustrates the importance of
sample design and sample selection to suit the purpo
        <xref ref-type="bibr" rid="ref2">se of the
study. In 2002</xref>
        , Srihari and colleagues [2] conducted a study to
test the principle of individuality of handwriting. Handwriting
samples were collected from 1500 individuals. The dataset
was representative for the US population with respect to
gender, age, ethnicity, handedness, etc. The automated system
CEDAR-FOX was used to evaluate the handwriting, and could
identify the writer of a particular sample with 98 percent
confidence. Inferring these statistics over the entire U.S.
population, writer identification can be established with 96
percent confidence.
      </p>
      <p>Saks [3] commented on this study by arguing that to test
individuality, a better sampling design would have been to
gather a representative sample of clusters of writers, with each
cluster composed of highly similar writers. Only then, the data
would have been discriminative of highly similar handwriting.
And it would have been repeatable if the same effect was
observed between the clusters. The choice of data by Srihari
and colleagues was not adequate for testing the hypothesis that
handwriting is individual.</p>
      <p>In a response to this, Durina and colleagues [4] conducted
a study in which samples of writing were obtained from 52
writers and their teachers who were taught the same copybook
style at the same Catholic elementary school approximately 4
decades ago. The research addressed the criticisms that earlier
studies on the individuality of handwriting did not include
populations from homogeneous writing communities. It
demonstrated that there is a high degree of inter-writer
variation among writers, even in populations where the driving
forces for variation are low. In spite of the size of the dataset,
it was better fit for purpose to investigate the uniqueness of
handwriting.</p>
    </sec>
    <sec id="sec-3">
      <title>B. Learning from each other</title>
      <p>In the past years, from 2009 until 2013, different datasets
with signatures as well as handwriting are collected by the
Netherlands Forensic Institute for the Signature Competition
(SigComp) [5]. This competition allows researchers and
practitioners from academia and industries to compare
performance on signature verification on new and unpublished
datasets. Because all participating parties in the competition
are provided with the same data, results are comparable. While
the competition provides an overview of involved parties and
shows the performance of the available systems to the forensic
community, the pattern recognition researchers are more
concerned about which features are most discriminative. The
SigComp provides a platform to bridge the gap between the
two communities.</p>
      <p>Two years ago, in 2011, a group of researchers from
different fields of expertise started the discussion about how to
bridge the gap between the two communities and to signal the
challenges. Computer programmers learned how a forensic
handwriting examination is carried out and examples of real
casework are described. Forensic scientists got an overview of
state-of-the-art automatic verification systems. Recent
advances are comparing the performance with Minimum Cost
of Log Likelihood Ratios [6], the task of reporting a
probabilistic output score, and the addition of disguised
signatures in new datasets. Nevertheless, much work needs
still to be done in order of bringing together researchers in the
field of automated handwriting analysis and signature
verification and experts from the forensic handwriting
examination community.</p>
      <p>The scope of the competition changes each year. In the end,
when automated systems are meant to aid the FHE in the
examination or as an objective tool. The first competition was
focused on skilled forgeries. After that, disguised signatures
were added to the questioned signatures. Last year we’ve
provided different scripts, i.e. Dutch and Chinese signatures.
The consequence of the changing focus of the competition
allows the developers to improve their algorithms and benefit
from new and unpublished handwriting data.</p>
      <p>II.</p>
      <p>OBJECTIVE</p>
      <p>Three scenarios for handwriting data collection can be
distinguished: 1) The samples are collected under controlled
conditions, e.g. let the participants write on the same make of
paper, with the same writing instrument, in similar writing
position, etc., 2) spontaneous writings are collected from
participants by gathering their writings from the past, and 3)
forensic handwriting samples from casework are shared, either
anonymously or by an online evaluation platform.</p>
      <sec id="sec-3-1">
        <title>Topics that are covered in this paper are:</title>
        <p>• offline and online data
• requirements of the dataset
• controlled versus uncontrolled conditions
• research data versus forensic data</p>
        <p>The first part of the paper describes the most favorable and
pragmatic approach for offline handwriting sample collection.
The second part stresses the importance of data collection
under uncontrolled conditions. Furthermore, this paper calls
for exploring the possibilities of using forensic datasets to
further develop automated systems.</p>
        <p>III.</p>
      </sec>
      <sec id="sec-3-2">
        <title>METHOD</title>
        <p>Two categories capturing a person’s handwriting can be
distinguished, namely, offline and online. The online modality
is discussed here very shortly, because this data is not
available to the forensic handwriting examiner. It is useful for
biometric identification and finding the new features or feature
combinations that are most discriminative. Handwriting
examiners will in particular be interest in offline systems and
therefore offline data acquisition is described more in detail.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>A. Online data</title>
      <p>Online data collection requires an electronic writing tablet
and recording software. Most often WACOM tablets are used
to collect handwriting samples, but since pen-input devices
getting more widespread this might change on short term. The
online handwriting is captured with an electronic writing
tablet and stored digitally in x, y, and z-positions as a function
of time.</p>
    </sec>
    <sec id="sec-5">
      <title>B. Offline data</title>
      <p>Offline handwriting data is a representation of the
handwriting in as a scanned image. It has been demonstrated
[7] the FHE’s can infer dynamic information, such as writing
velocity and pen pressure, from the static trace. Writing
velocity is reflected in line quality, pen pressure differences
and blunt beginnings and endings of stroke. The pen pressure
is not useful for the examiner as an absolute measure, since it
is not only writer specific but strongly depends on extrinsic
factors. It is only writer specific if other conditions such as
writing surface and writer instrument are constant. The
indentation of the paper shows the handwriting examiner if the
ink was deposited by a natural course of writing or by forced
writing.</p>
      <p>For offline data collection all that is needed is a pen, a
piece of paper and a scanner. To aid the writer, a guiding line
or box can be used. The easiest and practical solution is to use
an underlying sheet of paper with the lineation or boxes
printed with a black, bold line. No lineation or bounding boxes
must strike trough the writings. In this way, the data is kept
‘clean’ and less effort for data preparation is needed.</p>
      <p>The requirements for a high-quality offline dataset of
handwritten samples are summed up below. A formal data
collection process is necessary as it ensures that gathered data
are both defined and accurate and that decisions based on
arguments embodied in the findings are valid [7].</p>
      <p>The first list proposed shows which requirements of the
dataset are advised for training and evaluating automated
systems. Additionally there is a list of extra requirements
which are important for forensic handwriting researchers. The
summed information is necessary for forensic handwriting
examiners to get a better understanding of the data used in
experiments. In general, the data must reflect the variation of
handwriting in the relevant population, and intra-writer
variation must represent reality.</p>
      <sec id="sec-5-1">
        <title>Pattern recognition data requirements:</title>
        <p>• Substantial number of specimen writers
• Substantial number of simulators
• High resolution scans of the written samples,
preferably 400 dpi.
• Suitable format (PNG format would be preferable.</p>
        <p>This lossless format will retain information from
images when re-opened and re-saved. The PNG
format also creates smaller file size but without the
quality loss of a GIF-file)
• Cropping of the image
• Assign an identification code as filename
• Compatibility with earlier collections
Additional forensic requirements:
• Writer sex, age, handedness, level of education, and
profession
• Cultural origin (for signatures) or copybook system
(for handwritten text)
• Substantial amount of questioned writing (e.g. half a
page of text)
• Substantial amount of reference writing (number of
reference signatures or number of lines of text)
• Specification of conditions of forgery and/or
disguised
• Time span over which the data was collected
IV.</p>
        <p>FORENSIC HANDWRITING DATA</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>A. Collecting existing specimens</title>
      <p>One way of acquiring relevant data is to collect existing
writings. Such handwriting can consist signatures on
agreements, receipts, cheques, passports, etcetera. In short, it
can comprise handwriting, which is comparable to the
reference material in casework. All factors that are considered
by forensic handwriting examiners are in the dataset: natural
variation in the writings, different surfaces, different writing
instruments, different time period and the samples are written
under different mental circumstances. Both intrinsic and
extrinsic factors are represented. Participants are not
approached to write something, but provide the researcher
with their previously written material.</p>
      <p>a)
b)
c)
d)</p>
      <p>The best would be using forensic casework data to
evaluate and validate automated systems, but legal aspects
regarding privacy form an obstacle. One possible solution for
sharing forensic samples is to facilitate access at an online
evaluation platform. BEAT [8] is a project that is funded by
the European Commission, under the Seventh Framework
Programme and is offering such an approach. The goal of the
project is to propose a framework of standard operational
evaluations for biometric technologies. Unfortunately, it is not
available for forensic biometrics yet.</p>
      <p>Simulated data can be used in the training phase of system
development, because the ground truth of the origin is known.
The evaluation phase should at least contain case related data.
However, the validation of the system should completely be
performed with real casework samples.</p>
      <p>Where biometric systems usually have access to high
quality and uniform data, in forensic practice the trace under
investigation is often characterized by poor quality. This is not
represented by the currently existing handwriting databases.</p>
      <p>Since input data determines the overall performance of the
automated system, a next step in bridging the gap between the
pattern recognition community and forensic handwriting
examiners should logically involve the use of samples that
were written under uncontrolled circumstances. The condition
of the dataset has its effect on the systems’ performance on
that trace and accordingly influences the strength of the
evidence.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Caligiuri</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Mohammed</surname>
          </string-name>
          , “
          <article-title>The Neuroscience of Handwriting: Applications for Forensic Document Examination</article-title>
          ,” CRC Press,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>S.N.</given-names>
            <surname>Srihari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S-H</given-names>
            <surname>Cha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Arora</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          , “Individuality of handwriting”,
          <source>J Forensic Sci</source>
          , vol.
          <volume>47</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>856</fpage>
          -
          <lpage>872</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Saks</surname>
          </string-name>
          ,
          <article-title>Authors' Response in the J Forensic Sci</article-title>
          , vol.
          <volume>48</volume>
          (
          <issue>4</issue>
          ),
          <year>July 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>M.E.</given-names>
            <surname>Durina</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.P.</given-names>
            <surname>Caligiuri</surname>
          </string-name>
          , “
          <article-title>The Determination of Authorship from a Homogenous Group of Writers,”</article-title>
          <source>Journal of ASUDE</source>
          , vol.
          <volume>12</volume>
          , nr.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>M.I.</given-names>
            <surname>Malik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liwicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Alewijnse</surname>
          </string-name>
          , and W. Ohyama, “
          <article-title>ICDAR2013 Competitions on Signature Verification and Writer Identification for Onand Offline Skilled Forgeries (SigWiComp2013</article-title>
          ),” in press.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>M. Liwicki</surname>
          </string-name>
          et al.,
          <article-title>“Signature Verification Competition for Online and Offline Skilled Forgeries (SigComp2011),” in Document Analysis and Recognition (ICDAR</article-title>
          ), 2011 International Conference on, pp.
          <fpage>1480</fpage>
          -
          <lpage>1484</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>D.</given-names>
            <surname>Meuwly</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.N.J.</given-names>
            <surname>Veldhuis</surname>
          </string-name>
          , “
          <article-title>Forensic biometrics: From two communities to one discipline</article-title>
          ,
          <source>” IEEE Conference publications BIOSIG</source>
          <year>2012</year>
          , Darmstadt Germany, pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          ,
          <year>Sep 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>[8] Information available at www.beat-eu.org</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>