Proceedings of the 1st International Workshop on Automated Forensic Handwriting Analysis (AFHA) 2011 Forensic vs. Computing writing features as seen by Rex, the intuitive document retriever Vlad Atanasiu Abstract—The paper reveals the superficial matching between script features as understood by forensic experts and computer scientists and advocates the development of computational instru- ments tailored to fit the features traditionally used by the forensic community. In particular, and including other areas of grapho- nomics and the general public, there exists a demand for software for the analysis of intuitive features, think “slant” or “roundness,” as opposed to analytical features, like “Fourier transform” or “en- tropy.” Rex, a software with such a capability, is introduced and used to explore the potentialities of this approach for script foren- sics. An investigation of properties of the script contour orienta- tion, the feature used by Rex, is also presented. Index Terms—script features, contour orientation, computa- tional graphonomics, handwriting forensics, rensic professionals tools which they know how to handle, but I. Introduction also allows them to communicate about their work — an essen- tial aspect in respect to testimony in court. Intuitive features I n this paper I wish to discuss the distinction between the typi- cal forensic and computer science writing features (section ii), introduce a software that takes into account their specifics additionally benefit the design of computer systems, improving the ergonomy of user interfaces as exemplified in section iii. Cognition — An interesting viewpoint on the debate over (section iii) and investigate the behavior of the feature used by intuitive and analytic features is to consider mathematics as an the said software (section iv). The overall goal of the paper, be- evolutionary outcrop of the neural computing capacities of the side the immediate benefits derived from the individual topics, brain. Intuition is evolutionary unconscious learning by interac- is to provide thinking material about the challenges building tion with the environment to which conscious analysis supple- software adapted to forensic applications. ments when novelties arise. Thus the two can be envisioned as a continuum, mathematics progressively becoming intuitive. II. Forensic vs. Computing writing features Sociology — To think that the divergence of the two feature Semiotics — That much forensic handwriting expertise is types is a function of mathematical educational level is over- subjective and would profit from mathematics and computing looking a fundamental distinction. Writer identification and ver- in its quest for objectivity and replicability is publicly admitted ification are main mobiles of computational handwriting foren- [1], but the less advertised side of reality is that of software sics, and because here only results count, it can use any method insisting to treat the users on feasts of mathematics and tech- without even the need of thorough understanding insofar as it nology without actually meeting their needs [2]. At the root of is better. This evolutionary mindset of a goal-focused black box this dialogue of the deaf lies, among other interesting factors of approach is faced by the knowledge-oriented crystal ball atti- the sociology of science, the very words “writing feature.” For tude seen in the traditional graphonomical research, which adds forensic experts the “feature” is usually intuitively comprehen- to the control tasks mentioned above a considerable interest in sible, such as “slant” [3], while for computer scientists the most the handwriting ecosystem, i.e. the structures and dynamics of powerful “features” are mathematical concepts, like “Fourier handwriting features across populations and the underlying fac- components” or “fractal dimension,” which need specialized tors: material, cognitive, biomechanical, sociocultural. knowledge for their properties to be understood. Developing Linguistics — The issues with the term “feature” extend to measurement software for intuitive features not only gives fo- a further worldview cloaking inconspicuously its users. The proposition “This font is Roman” is considered in philosophy Manuscript received August 15, 2011. either as an expression on a property owned by the font (objec- V. Atanasiu was with Télécom ParisTech, Paris, France. He is now with Kabikadj, Paris, France (phone: +33 143 254 811, email: atanasiu@alum.mit. tivism) or attributed to the font by an observer (subjectivism) edu, website: http://alum.mit.edu/www/atanasiu/). [4], [5]. The difference is one of lifestyle: the world is there for 16 Proceedings of the 1st International Workshop on Automated Forensic Handwriting Analysis (AFHA) 2011 Fig. 1. Contour orientation profiles — Look carefully at the enlargement of the 128 pixels diameter circle and you’ll see four horizontal and vertical pixels in a row: a bitmap shape representation has more pixels in these directions than warranted by the ideatic shape. The distortion decreases with object size. The ordinate values reveal that bias is small: its amplitude is ~0.002, while for a typical written document the mean is ~0.025 and the maximum ~0.04 (see Fig. 3 and [16]–[18]). truth to be discovered or for models to be invented. Translated Definition — I discuss now the slant of three Roman script at lexical level this is what defines the terms “feature” and “de- characters as perceived by a human and raise the question of scriptor,” among their numerous handwriting related synonyms how this simple feature should be defined. In the case of I the [3]. To this author “descriptor” seems more appropriate since it slant is vertical and corresponds to the shape’s axis of equilib- doesn’t presuppose anything about the object (it just is) and it’s rium through its center of gravity — here the slant is a physical easier and more fun to be critical about a model than a truth. In- property of the object. For an O there is no way to tell how the cidentally, while “feature” prevails in graphonomics, “descrip- character is oriented would the baseline be unknown — slant is tor” has a foothold in the wider pattern recognition community, here a property of the object relative to the surrounding. The as witnessed in a wording like “shape descriptor.” slant of y can be considered as upright only if we are able to Implications — Computer scientists have to consider in com- identify the shape as character “y” and be aware of the conven- mon intelligence with forensic experts three issues worth men- tion that this lower case letter has to be considered vertical de- tioning because they bear an influence on how the software pre- spite its physical right-leaning — this is a case of semantic slant. sented later in the paper is to be used. The issues are the desired A deeper examination might reveal even more criteria. In con- precision of the analysis, the definition of the features and the clusion, a slant analysis algorithm implementing human expert affordability to analyze them in the current state of the art. I will behavior appears to be more challenging than suspected, given illustrate this through two visual examples. first the very difficulty to define the feature, and secondly due to Precision — Fig. 1 presents three bitmap circles of various the mix of perceptual and cultural considerations to model. sizes for which the orientation along their contour is measured Afordability — The last sentence leads to the issue of afford- (details in section iii). Being circles, we would expect that all ability: do we have the technological means to perform compre- orientations be equally well represented, but due to the discrete hensive slant analysis since we need to recognize unconstrained nature of the underlying raster in which the shapes live the dis- handwritten characters? This task not being presently solved, tribution is biased towards the orthogonal direction — the dis- a positive answer can be given only if we are happy with a tribution will peak at 0 and 90 degrees ([6], [7], for hexagonal certain degree of imprecision, its exact amount having to be grids see [8]). Making a model of the distortion and applying determined. Some of the fine computational forensic expertise it to arbitrary orientation profiles should solve the issue, but it that we would wish to attain is thus yet out of reach. turns out that the distortion is shape specific. For example, a vertical line has no distortion at all, so there is no need for cor- III. Rex, the intuitive document retriever rection. A somewhat better choice is to increase the image reso- Rationale — Written documents in databases can be retrieved lution at capture time or after, with the drawback of generating by appearance by one of the following methods: visual (using voluminous files and knowing that often only low resolution a reference document), semantic (describing script features), images are available. This digital geometry problem is com- haptic (by drawing) and exogenous (from document ecosystem pounded upstream by the design of discrete Gaussian filters for metadata). Semantic retrieval is convenient because it is intuitive orientation measurement [9], and downstream by digitization, (it takes place via a graphical and natural-language interface), the same physical document producing at pixel level different free of any preexisting model (not always available) and can shapes depending on its alignment with the digital grid of the describe aspects of a script (contrary to the holistic approach of imaging system, hence affecting the replicability of results [10], visual retrieval). The software that grew out of these considera- [11]. A number of techniques address these issues [12]–[15] but tions, called Rex, suits the demand for tools supporting forensic the implications for handwriting analysis have yet to be fully specific features as described above (Fig. 2) [16]–[18]. explored, starting with the question of how much precision is Technicalities — The software measures the local orientation needed for which application. High accuracy graphonomics is along the writing contour, a popular computational graphonom- therefore an area open to investigation. ics feature [19]. This is done by applying on the binary image 17 Proceedings of the 1st International Workshop on Automated Forensic Handwriting Analysis (AFHA) 2011 Fig. 2. Rex screenshot — After selecting an intuitive script feature (left picture, showing also the underlying mathematical measurements and instruments), users obtain a list of documents ranked according to the quantitative value of the feature, in this case “roundness” (right picture, giving the file and writer id too). The document and a mouse-over zoom with pixels colorcoded by orientation is presented, as well as the orientation profile and a hyperlink to the original document. Fig. 3. Pixels to vectors to scalars to concepts — Prospecting for intuitive writing descriptors by extracting various statistical parameters of a global measurement. The colormap of the script samples (P02-081 and L01-199 of [8]) encodes the contour orientation at each pixel location — red for example being horizontal. of the contour an anisotropic Gaussian filter bank with one de- a document browser, but also a teaching tool about handwrit- gree of radial displacement. At this stage of this well-known ing. In addition to learning about individual documents, Rex approach two innovations are introduced, in addition to the fine provides an insight in the make-up of a population of writ- grained resolution. First, after deriving the probability density ers — that of the canton of Bern from where most of the dataset function from the orientations’ frequency count, statistical prop- writers hail (Fig. 4). The question that immediately springs to erties of the distribution are obtained. Second, it was discovered mind —“Do writers from other parts of the world have simi- that these statistics correlate with various script features of the lar characteristics?” — is typical of the richness of research and intuitive type, perceived as distinct one from another, such as pedagogical possibilities opened by such an instrument (indeed, “slant,” “roundness” or “density” (Fig. 3). To sum up, Rex be- the few Greek, Chinese and other foreigners among the contrib- haves like a handy, multipurpose Swiss army knife. utors show scriptural characteristics apart form the Swiss ma- Applications — The Swiss reference is not fortuitous, since jority). If the present usage of Rex is rather limited to a browser the handwriting documents presently used by Rex originate in of a specific dataset and much development can be imagined, it that country (IAM Handwriting Database 3.0 [20]). This shows is nevertheless also an intriguing tool to experiment with as a again the surprising versatility of the tool in that it is not only testbed for other computational forensic applications. 18 Proceedings of the 1st International Workshop on Automated Forensic Handwriting Analysis (AFHA) 2011 Fig. 4. “1539 Swiss Mountains” — Superpositon of all orientation profiles of the Swiss IAM Offline Database 3.0. We might be aware that few people have a left- slanted handwriting, but this visualization makes the phenomenon visible and measurable: there are few and only small peaks on the left-hand side of the image. It is also apparent that peaks on the right are narrower, hence slanting a handwriting reduces its roundness, making it more linear: a shear transform takes place. IV. Properties of the orientation feature given line and even not know if the variation is concentrated in one line or spread over the entire document. While contour orientation is a concept easy enough to grasp, Convexity — For 180° shape rotations the profiles are identi- it has a number of less apparent properties with implications for cal, leading to shape confusion (Fig. 5.3–4). the expertise work. They reveal why studies find orientation not Neighborhood — Fig. 5.6 shows that lines and circles in cer- the best performing biometric instrument [19]. tain configurations can look the same to the orientation instru- Rotation — The feature is evidently not rotation invariant, ment: it is unaware about the neighborhood. meaning that the same document will have different measure- Additivity — Shapes contribute linearly to profiles, facilitat- ment profiles depending on, for example, the skew of the pa- ing combinatorial pattern simulations from primitives. per in a scanner (Fig. 5.1–2). However the difference is only a translation of the profile, thus the bias can be corrected. V. Conclusions Organization — Contour orientation exhibits some unusual cases of shape invariance, all deriving from its low sensitivity I conclude by reminding that forensic and computational to the spatial organization of pixels, due to the fact that, by defi- script features are usually not identical, that they need to be nition, the measure is done locally. It is thus possible to have thoroughly explored to be safely used, and that public software, perceptually different shapes with the same orientation profile. like Rex, introduced here, are excellent learning opportunities. Fig. 5.5 demonstrates scrambling invariance. References Localization — The various informations that can be read in the global orientation profile can’t be traced to specific loca- [1] B. Found and D. Rogers, “The Probative Character of Forensic Document Examiners’ Identification and Elimination Opinions on Questioned Signa- tions in the written document. If there is, say slant variation in tures,” in Proc. 13th Conf. of the Intl. Graphonomics Society, Melbourne, a particular line, we see it in the profile, but can’t localize the Australia, 2007, pp. 171–174. 19 Proceedings of the 1st International Workshop on Automated Forensic Handwriting Analysis (AFHA) 2011 Fig. 5. Cases of confusion — (1, 2) A rotation of shapes (in blue) is equivalent of a translation of the orientation profile (in red). (3, 4) Rotation by 180° or (5) break- ing up a shape doesn’t affect the orientation profile beyond quantization errors. (6) The bitmaps row shows a spiral and 10 ray bundles, each bundle being rotated by 1° in respect to its neighbor, covering the entire angular sensitivity spectrum of the measurement instrument. Despite the perceptual pattern difference — one linear, the other curly — the orientation profiles are similar, especially when seen at the scale of the writing of Fig. 3 (the differences become visible when zooming in). [2] R.J. Verduijn, C.E. van den Heuvel, R.D. Stoel, “Forensic Requirements retriever for written documents,” April 2011 [Online]. Accessible: http:// for Automated Handwriting Analysis Systems,” in Proc. 15th Conf. of the glyph.telecom-paristech.fr [Accessed: June 30, 2011]. Intl. Graphonomics Society, Cancun, Mexico, 2011, pp. 132–135. [17] V. Atanasiu, L. Likforman-Sulem, N. Vincent, “Talking Script. Retrieval [3] R.A. Huber, A.M. Headrick, Handwriting Identification: Facts and Fun- of written documents by description of script features,” Gazette du Livre damentals. Boca Raton, FL: CRC, 1999, pp. 89–91. Medieval, to be published. [4] D.H. Mulder, “Objectivity,” The Internet Encyclopedia of Philosophy, [18] V. Atanasiu, L. Likforman-Sulem, and N. Vincent, “Writer Retrieval–Ex- September 6, 2004 [Online]. http://www.iep.utm.edu/objectiv/ [Acessed: ploration of a Novel Biometric Scenario Using Perceptual Features De- August 3, 2011]. rived from Script Orientation,” in Proc. 11th Intl. Conf. on Document [5] Ch. Swoyer and F. Orilia, “Properties,” The Stanford Encyclopedia of Phi- Analysis and Recognition, Beijing, China, 2011, to be published. losophy, July 2011 [Online]. Acessible: http://plato.stanford.edu/entries/ [19] M.L. Bulacu, “Statistical pattern recognition for automatic writer identi- properties/ [Accessed: August 3, 2011]. fication and verification,” Ph.D. dissertation, Artif. Intell. Inst., Univ. of [6] G. Gonzato, F. Mulargia and M. Ciccotti, “Measuring the fractal dimen- Groningen, The Netherlands, 2007. sion of ideal and actual objects: implications for application in geology [20] U. Marti and H. Bunke. “The IAM-database: an English sentence data- and geophysics,” Geophysical J. Intl., vol. 142, 2000, pp. 108–116. base for off-line handwriting recognition,” Intl. J. on Document Analysis [7] R. Klette and A. Rosenfeld, Digital Geometry, San Francisco, CA: Mor- and Recognition, vol. 5, 2002, pp. 39–46. Available: http://www.iam.un- gan Kaufmann, 2004. ibe.ch/fki/databases/iam-handwriting-database [8] R.C. Staunton and N. Storey, “A Comparison Between Square and Hex- agonal Sampling Methods for Pipeline Image Processing,” in SPIE Conf. Vlad Atanasiu was born in Timişoara, Romania, Optics, Illumination, and Image Sensing for Machine Vision, Philadel- 1970. He passed the entry examination to the Tele- phia, PA, USA, 1989, pp. 142–151. communication Faculty, Polytechnic University, [9] E. R. Davies, “Design of optimal Gaussian operators in small neighbour- Timişoara, in 1989 and went on to switch careers hoods,” Image and Vision Computing, vol. 5(3), 1987, pp. 199–205. thanks to the fall of the Iron Wall, earning a B.A. in [10] B. Nagy, “An algorithm to find the number of the digitizations of discs Middle Eastern Studies, an M.A. in Linguistics, from University of Provence, Aix-en-Provence, France in with afixed radius,” Electronic Notes in Discrete Mathematics, vol. 20, 1994, respectively 1996, and a Ph.D. in Paleography, 2005, pp. 607–622. Art History and History from École pratique des Hau- [11] M.N. Huxley and J. Žunić, “Different Digitisations of Displaced Discs,” tes Études, Paris, France, in 2003. He took classes in Foundations of Computational Mathematics, 2006, pp. 255–268. Cognitive Science at the Massachusetts Institute of Technology, Cambridge, [12] F. de Vieilleville and J.-O. Lachaud, “Comparison and improvement of USA during post-doctoral studies, 2003–2005. tangent estimators on digital curves,” Pattern Recognition, vol. 42(8), He was Intern in the Vision Science psychophysical department of Light- 2009, pp. 1693–1707. house International, New York, USA, 2004–2005; Senior Scientist for Image [13] D. Coeurjolly and R. Klette, “A Comparative Evaluation of Length Es- Processing and Geographical Information Systems as well as Coordinator of timators of Digital Curves,” IEEE Trans. Pattern Analysis and Machine the European project “Bernstein. The Memory of Papers. Image-Based Paper Intelligence, vol. 26(2), 2004, pp. 252–258. Analysis and History” at the Austrian Academy of Science, Vienna, Austria, [14] L.J. van Vliet and P.W. Verbeek, “Curvature and bending energy in dig- 2006–2009; and Research and Development Engineer for handwriting analysis itized 2D and 3D images,” in Proc. 8th Scandinavian Conf. on Image and document retrieval at Télécom ParisTech, Paris, France, 2010–2011. He is Analysis, Tromsø, Norway, 1993, pp. 1403–1410. now CEO of Kabikadj, Paris, France. He published the book Letter frequencies [15] S.-Ch. Pei and J.-W. Horng, “Fitting digital curve using circular arcs,” and their influence on Arabic calligraphy, Paris: L’Harmattan, 1999. Pattern Recognition, vol. 28(1), 1995, pp. 107–116. [16] V. Atanasiu, L. Likforman-Sulem, N. Vincent, “Rex, a description-based 20