Oriented Local Binary Patterns for Writer Identification Anguelos Nicolaou Marcus Liwicki and Rolf Ingolf Institute of Computer Science and Document, Image and Voice Analysis (DIVA) Group Applied Mathematics University of Bern University of Fribourg Neubrückstrasse 10 Bde des Perolles 90 3012 Bern, Switzerland Fribourg Switzerland Email: anguelos.nicolaou@gmail.com Email: firstname.lastname@unifr.ch Abstract—In this paper we present an oriented texture feature specific to handwriting and treats the problem as a generic set and apply it to the problem of offline writer identification. oriented binary texture classification problem. The extent to Our feature set is based on local binary patterns (LBP) which which handwriting contains invariant characteristics of the were broadly used for face recognition in the past. These features writer is an open question. While forensic document examiners are inherently texture features. Thus, we approach the writer have been tested in detecting disguised handwriting by Bird identification problem as an oriented texture recognition task and obtain remarkable results comparable to the state of the art. et al [10], Malik et al [11] have started to address the Our experiments were conducted on the ICDAR 2011 and ICHFR issue of different writing styles for automated offline writer 2012 writer identification contest datasets. On these datasets we identification systems. It remains an open question whether investigate the strengths of our approach as well its limitations. handwriting style can provide us with real biometric markers, invariant to the sample acquisition conditions. By preserving I. I NTRODUCTION the generic attributes of our method, we can safely avoid addressing many complications that are specific to handwriting A. Local Binary Patterns analysis and writer detection. Local binary patterns (LBP) were broadly popularized in 2002 with the work of Ojala et al [1] as a texture feature set II. LBP F EATURE S ET extracted directly on grayscale images. As well demonstrated by Ojala, the histogram of some specific binary patterns is a Although writer identification seems to require scale invari- very important feature-set. LBP are inherently texture features, ant features, scale sensitive features might be suited as well. but they have been used in a very broad range of applications Writers tend to write with a specific size, therefore the scale of in Computer Vision (CV), many of which exceed the typical the texture tends to be directly dependent on the sampling rate. texture recognition tasks. In 2004, Ahonen et al [2] used The task of writer identification is almost always done with successfully LBP for face recognition. In 2007, Zhao et al [3] respect to a dataset, where the sampling rate is defined or at extended the operator as a 2D plus time voxel version of LBP, least known when performing feature extraction. It is feasible called VLBP, and used them successfully for facial gesture and probably worth the effort of resampling all text images to recognition. In 2009, Whang et al [4] combined LBP features a standard sampling resolution, rather than improvising a scale with HOG features to address the problem of partial occlusions invariant feature-set. Our feature-set as is the norm, is derived in the problem of human detection. from the histogram of occurring binary patterns. B. Writer Identification A. The LBP operator While graphology, i.e. the detection of personality traits LBP were defined in [1] as a local structural operator, based on handwriting, has been associated with bad science [5] operating on the periphery of a circular neighborhood. LBP and has failed to provide experimentally sound significant are encoded as integers, which in binary notation would map results [6], handwriting style can be considered an invariant each sample on the periphery to a binary digit. As can be attribute of the individual. Writer identification has tradition- seen in Fig. 1 and (2), LBP are defined by the radius of the ally been performed by Forensic Document Examiners using circular neighborhood and the number of pixels sampled on visual examination. In recent decades there is an attempt the periphery. The sampling neighborhood Nr,b is formally to automate the process and codify this knowledge in to defined in (1). automated methods. In 2005, Bensefia et al [7] successfully used features derived from statistical analysis of graphemes, bigrams, and trigrams. In 2008, He et al [8] used Gabor ∀n, φ : n ∈ [0..b − 1] ∧ φ = (n ∗ 2 ∗ π)/b filter derived features and in 2010 Du et al [9] introduced LBP on the wavelet domain. Even-though the method of Du ∀f (x1 , x2 ) : R2 =⇒ {0, 1} uses LBP for feature extraction in writer identification, the similarities end there. Our method makes no assumptions Nr,b (I(x, y), n) = I(x + sin(φ) ∗ r, y + cos(φ) ∗ r) (1) (a) (b) (a) (b) (c) (d) Fig. 2: LBP edge patterns. In (a) the top-edge contributing patterns and in (b) the top-left edge contributing patterns can be seen. Contribution: black 100%, dark gray 50%, gray 25%, and light gray 12.5% (e) (f) (g) (h) only background have an LBP value of 255. By suppressing Fig. 1: Indicative LBP operators: LBP1,4 (a), LBP1,8 (b), (ignoring) the 255 bin, we make the LBP histogram surface LBP1.5,8 (c), LBP2,8 (d),LBP2,12 (e), LBP2,16 (f), invariant. All occurrences left in the histogram represent the LBP3,8 (g), LBP3,16 (e). Dark green represents pixels pixels in the border between foreground and background. The with 100% contribution, green represents pixels with 50%, core of the feature set comprises of the 255 histogram bins light green pixels with 25%, and black is the reference pixel. normalized to a sum of 1. This normalization renders the features derived from the histogram invariant to the number of signal pixels in the image. LBPr,b,f (x, y) = f (Nr,b (I(x, y), n) ∗ 2n , I(x, y))+ C. Redundant Features f (Nr,b (I(x, y), n − 1) ∗ 2n−1 ), I(x, y)) + ... (2) ... + f (Nr,b (I(x, y), 0) ∗ 20 , I(x, y)) Having the normalized 255 bins from the histogram as the core of the feature set, we calculate some redundant When defined on grayscale images, LBP are obtained by features that will amplify some aspects of the LBP we consider thresholding each pixel on the periphery by the central pixel. significant in the writer identification task. Our goal is to have Because we worked on binary images as input, a lot more a feature-set discriminative enough to work well with naive operations than greater or equal (thresholding) were possible classifiers such as nearest neighbor or, even more, classify as a binary operation. We generalized our definition of the LBP writers by clustering the samples without any training. in (2), to consider the boolean operator marked as f a third The first redundant feature group we use, is edge participa- defining characteristic of the LBP operator LBPr,b,f along tion. We consider each pattern to have a specific probability of with the radius r and the number of samples b. belonging to an edge of a specific orientation; from now on we We took into account several factors for selecting the call that contribution. The sum of the number of occurrences appropriate LBP binary operator. In what concerns the bit of each pattern, multiplied by its contribution factor makes up count, a bit-count of 8 presents us with many benefits. Im- the oriented edge occurrences. In Fig. 2a all top-edge patterns plementation wise, the LBP transform is an image that uses can be seen along with their probability, in 2b we can see the one byte per pixel. Its histogram has 256 bins providing patterns of the top-left-edge patterns and their probabilities a high feature-vector dimentionality and good discriminative which are derived from the top-edge patterns by rotating them properties. Additionally, containing the distinct LBP count to counter-clock-wise. By rotating the contributing patterns of 256, guaranties highly representative sampling in relatively the top-edge, we can obtain the contributing patterns of all small surfaces of text. eight edge-orientations. We also add the more general edge- orientations: horizontal, vertical, ascending, and descending as separate features which are calculated as the sum of the respec- B. The LBP function tive pair of edge-orientations. In the end we calculate the two While LBP are traditionally derived from grayscale images, final aggregations of perpendicular and diagonal, which are the when dealing with text, its better to use binarized text images sum of horizontal and vertical and respectively ascending and as input, thus avoiding all information coming from the text descending. In total we obtain 14 edge-features, which we then background. We considered many different binary operations normalize to a sum of 1. One of our aims when introducing and chose the binary operator ”equals” (3) as f () in (2 ). these redundant features is to enhance characteristics that have  been associated with writer identification such as text slant. 1 : xceter = xperiphery f (xceter , xperiphery ) = (3) The second redundant feature-group we implemented are 0 : xceter 6= xperiphery the rotation invariant hashes. We grouped all patterns, so ”Equals” as a boolean function on an image means true for any that each pattern in a group can be transformed in to any background pixel in the peripheral neighborhood of a back- other pattern in that group by rotating. When having an 8 ground pixel, true for any foreground pixel in the peripheral sample LBP, the distinct rotation invariant patterns are 36 neighborhood of a foreground pixel, and false for everything in total [1]. Some pattern groups contain only one pattern else. When using the ”equals” function as the binary function eg. pattern 0, while other groups contain up to 8 patterns, in a 8 bit-count LBP, all pixels with only foreground and such as all one bit true patterns 1,2,4,8,16,32,64,128. We took the number of occurrences for each group in the input which was defined by evolutionary optimization on the train- image and normalized them to a sum of 1, thus providing set. The optimization process is also performed on a given 36 rotation invariant features. A complementary feature-group dataset and should also be considered as a training stage. While to the rotation invariant patterns is what we named rotation it is not required, it makes more sense that both training steps phase. For each group of rotation invariant features, we took are performed on the same dataset. The fourth and last step the minimum, with respect to the numeric value, pattern in is to calculate the L1 norm on the scaled and rebased feature the group and designated it as group-hash. The number of vectors. Steps two and three can be combined in to a linear clockwise rotations each pattern needs to become its groups- operation on the feature space and in many aspects should be hash, is what we call the rotation phase. By definition, the viewed as a statistically derived heuristic matrix. Our classifier, distinct phases in an LBP image, are as many as the number as was implemented, has two inputs, a subject dataset and a of samples of the LBP. The frequency of all phases normalized reference dataset. The output consists of a table where each to the sum of 1, provides us with 8 more redundant features row refers to a sample in the subject dataset and contains all that are complementary to the rotation invariant hashes. samples in the reference dataset ranked by similarity to the specific sample. When benchmarking classification rates of our A third group of redundant features we introduced to our method, we can simply run our classifier with an annotated feature-set is what we called beta-function as defined in (4) dataset as both object dataset and reference dataset. In this along with the bit-count of every pattern. case, the first column contains the object sample and the ∀n ∈ [1..bitcount] second column contains the most similar sample in the dataset other than its self. The rate at which the classes in the first ∀lbp ∈ [0..2bitcount−1 ] column agree to the classes in second column, is the nearest 1 : bit n is set in lbp∧ ( neighbor classification rate. d(lbp, n) = bit n − 1 is not set in lbp (4) 0 : otherwise E. Scale Vector Optimisation X β(lbp) = d(lbp, n) Describing in detail the optimization process of the scaling n vector would go beyond the scope of this paper. In brief When the sample count is 8, the β function, has up to 5 distinct we optimized using an evolutionary algorithm. We used as values. The histogram of the β function (5 bins) normalized to input the 125 most prominent components of the features a sum of 1 and the histogram of the bit-count of every pattern and the id of the writer of each sample. We optimized using normalized to 1 as well, are the last redundant feature-group the ICHFR 2012 writer identification competition dataset [13] we defined. The β function becomes an important feature when which contains 100 writers contributing 4 writing samples the LBP radius is greater than pen stroke thickness. In those each. Individuals of the algorithm were modeled as vector situations, e.g. a β count of one, would indicate the ending of continuous scaling factors for each feature in the feature of a line, and a β count of three or four would indicate lines space. The fitness function was based on the classification rate crossing. a nearest neighbor classifier obtains when the feature space is scaled by each individual. The stoping criteria was set to If we put it all together, we have 255 bins of the histogram, 2000 generations, and each generation had 20 individuals. plus 36 rotation invariant features, plus 8 rotation phase Suitable parents were determined by the rank they obtained features, plus 14 edge-features, plus 5 β function features, plus in the generation. 9 sample-count features, makes a total of 327 features; these are the proposed feature-set. The redundant features make the features well suited for naive classifiers. By setting the 255 III. E XPERIMENTAL P ROCEDURE histogram bin to 0, the feature set ignores all non signal areas In order to have a proper understanding of our methods in the image. The normalization of all bins to a sum of 1, as performance, its robustness, and its limitations, we conducted well as the nullification of the last bin, renders our feature set a series of experiments. We used two datasets for our experi- invariant with respect to non signal areas (white). ments: the dataset from the ICDAR 2011 writer identification contest [12], hereafter 2011 dataset and the dataset from the D. The Classifier ICHFR 2012 writer identification challenge [13], hereafter 2012 dataset. The 2011 dataset has 26 writers contributing Once we transform a given set of images into feature samples in Greek, English, German, and French with 8 samples vectors, we can either use them as a nearest neighbor classifier per writer. The 2012 dataset has 100 writers, contributing or perform clustering on them. While clustering seems to be samples in Greek and English with 4 samples per writer. a more generic approach, it is constrained by the need to While the 2011 dataset was given as the train set for the 2012 process all samples at the same time. Such a constraint makes contest, we used them in the opposite manner. In order to the clustering approach very well suited for research purposes avoid overfitting during the optimization step, we deemed the but hard to match any real world scenarios. The construction ”harder” dataset, containing more classes and less samples per of the classifier consists of four steps. In the first step, we class, was better suited for training. extract the image features. In the second step, we rebase the features along the principal components of a given dataset by A. Performance performing principal components analysis. This step might, in a very broad sense of the term, be considered training because As previously described, our method consist of four stages: our method acquires information from a given dataset. In the feature extraction, principal components analysis, scaling vec- third step we scale the rebased features by a scaling vector tor optimization, and L1 distance estimation. Steps two and TABLE I: Performance Results. Various modalities of our method’s results on the 2011 dataset [12] and state of the art methods performance for reference Nearest Hard Hard Hard Hard Soft Soft NAME Neighbor Top-2 Top-3 Top-5 Top-7 Top-5 Top-10 Tsinghua 99.5% 97,1% NA 84.1% 44.1% 100% 100% MCS-NUST 99.0% 93.3% NA 78.9% 38.9% 99.5% 99.5% Tebessa 98.6% 97.1% NA 81.3% 50.0% 100% 100% No PC, No train 96.63% 87.02% 79.33% 63.94% 28.84% 98.56% 99.04% PC, No train 98.56% 91.35% 84.62% 68.27% 34.62% 98.56% 98.56% PC, Train 98.56% 95.19% 91.83% 84.13% 50.48% 99.04% 99.04% three require a training dataset, while steps one and four are took the 2012 dataset and we rotated its samples by 1◦ from totally independent of any data. In TABLE I analytical scores −20◦ to 20◦ . We obtained our measurements by classifying the of our method in various modalities can be seen. Apart from original 2012 dataset with the the rotated versions. In Fig. 3 the nearest neighbor accuracy we also add the hard TOP-N the rotation sensitivity of our method can be demonstrated. and soft TOP-N criteria [12], [13]. The soft TOP-N criterium Two different measurements can be seen. The first one, noted is calculated by estimating the percentage of samples in the as Sample Self Recognition, is the the nearest neighbor in- test set that have at least one sample of the same class in their cluding the test sample. Sample Self Recognition rate will N nearest neighbors. The hard TOP-N criterium is calculated be by definition 100% when no rotation occurs. The second by estimating the percentage of samples in the test set that have measurement, marked as Nearest Neighbor is the accuracy only samples of the same class in their N nearest neighbors. of nearest neighbor excluding the first occurrence. Nearest More in detail, in TABLE I we can see various versions of Neighbor is by definition the accuracy when no rotation occurs. our method and their performance as well as some state of As can be seen in Fig. 3 our method demonstrates some the art methods for reference. Methods Tsinghua, MCS-NUST rotation tolerance from −5◦ to +5◦ with sustainable accuracy and Tebessa [14] are the top performing methods from the rates, but performance drops significantly beyond this limit1 . ICDAR 2011 writer identification contest. We must point out It is also worth noticing the fact that −1◦ and +1◦ rotations that our method had a vastly superior train set, consisting of perform slightly worst than −2◦ to +2◦ ; a possible explanation 400 samples, and we had access to the test set while working. for this could be aliasing phenomena. Our method has two parts that were optimized on our train set, the 2012 dataset. The first is the principal components of 2) Downsampling: As we stated previously, in most real the train set and the second is the scaling of the feature space. world scenarios, the sampling resolution will be known to No PC, No train is the raw feature space without any training, a writer identification system, but not always controlled as just the features in an L1 nearest neighbor setup. PC, No train sometimes the data are acquired by external sources or at is the feature space rebased along the principal components of different times. We devised an experiment that demonstrate the the train set in a L1 nearest neighbor setup. PC, Train is the behavior and limitations of our method in what concerns the feature space rebased along the principal components of the resolution. We took the ICDAR 2011 Writer Identification the the train-set and scaled along the optimized vector in a dataset and rescaled it to various scales from 100% down to L1 nearest neighbor setup. As we can see our method almost 10%. As can be seen in Fig. 4 we obtained three measurements. reaches the overall performance of the state of the art when The first, marked as Self Recognition Unscaled Sample, is it incorporates the full trained heuristics but it also provides the nearest neighbor when classifying the initial dataset with very good results in its untrained form. the subsampled dataset as a database. The second, marked as Nearest Neighbor Unscaled Sample, is the second nearest neighbor when classifying the initial dataset with the subsam- B. Qualitative Experiments pled dataset as a database. We presume that the first nearest Apart from providing a comprehensive accuracy score that neighbor will always be the same sample in different scales is comparable to other methods, in order to describe the and therefore disregard it for this measurement. The third strength and limitations of our method, we performed a series measurement, named Nearest Neighbor Scaled Sample, is the of experiments that simulate frequently appearing distortions accuracy of the second nearest neighbor when classifying the to the data. scaled dataset with the scaled dataset a database. The first two measurements describe the sensitivity our method has in com- 1) Rotation: Text orientation, is a text image characteristic paring samples of different sampling resolution and therefore that is definitely affected by the writer. Under controlled scale as well, while the third measurement demonstrates how homogeneous experimental conditions of data acquisition, text well our method would work on datasets of lower resolution. orientation should depend only on the writer. Quite often in We should also point out that the optimization process was real life scenarios we have no way of knowing whether an performed on the original resolution. As we expected and can image has been rotated or not and to which extent. One of the be seen in Fig. 4, we find that our method has no tolerance in important characteristics of a writer identification system is the comparing samples from different sampling rates. We can also robustness against moderate rotation. We address this issue by an experiment where we try to recognize samples of a dataset 1 samples rotated by more than 5◦ could be manually corrected during with rotated versions of the database. More specifically we sample aquisition Fig. 3: Rotation Sensitivity Fig. 4: Resolution/scale sensitivity Fig. 5: Grapheme quantity sensitivity conclude that our method has tolerance to lower than standard the sample recognition rate. We performed a one tail t-test resolutions, but benefits mostly from higher resolutions. The on the results on 165 sample-classifications and obtained a p- out of the norm measurement in Nearest Neighbor Scaled value of 0.3734, which by all standards make the recognition- Sample posed us with a puzzle. The most probable explanation rates indistinguishable. This experiment indicates that for our is that it is related to aliasing but is worth investigating more. method any two samples written in different writing styles are as different regardless of whether they were written by 3) Removing Graphemes: A very important characteristic the same individual or not. From a forensic perspective, these of writer identification methods is how much text is required measurements imply that our method does not distinguish to in order to reach the claimed accuracy. We conducted an between disguised writing style and natural writing style. experiment to answer specifically this question. Our strategy was to create group datasets that vary only on the amount IV. D ISCUSSION of signal (text) and then compare results on these datasets. As the primary dataset we took the ICDAR 2011 writer In this paper we presented a powerful feature set that identification dataset, because it provides us with relatively summarizes any texture on a binary image as a vector of 327. large text samples. In order to quantify the available signal, We use our feature extraction method to produce a database we took the 2011 dataset and for each image in the dataset, from any given dataset with handwriting samples and use it as a we produced 20 images with different amounts of connected nearest neighbor classifier. In order to improve our classifier we components from the original image. Due to the very high performed PCA on a specific dataset and linearly transformed locality of our feature set, the fact that we removed connected the feature space. We also scaled the features by a scaling components instead of text lines should be negligible and vector in order to increase the impact of the features that at the same time it gave us quite higher control over the contribute to correct classifications on our test set. Both these signal quantity. As can be seen in Fig. 5 the results are quite improvements can be combined in to single matrix with which surprising. Instead of having a gradual drop in performance, we multiply all feature vectors. This single matrix should be the performance is unaffected down to 30% of the graphemes, viewed as a heuristic matrix statistically derived from the 2012 bellow that point, performance drops linearly. dataset. It is also valid to think of this matrix as the result of a supervised learning process. The idea is to calculate 4) Writer vs Writing Style: We submitted an earlier version this matrix once per type of texture we want to classify. In of our method to the SigWiComp2013 competition. The goal the context of western script handwriting, we obtained the of the writer identification part of the competition, is to mea- matrix from the 2012 dataset and used it in benchmarking sure the performance of writer identification systems, when the our method on the 2011 dataset, our qualitative experiments, handwriting style has been altered. A sample dataset was made and our submission to SigWiComp2013 [11]. When comparing available by the organizers of the competition. The dataset the experimental results to the state of the art, we can not contained 55 writers contributing 3 text samples each and obtain a perfectly fair comparison. The state of the art methods each sample written a different writing style. Having access participated in a blind competition with a very small train- to the sample dataset, we performed a simple experiment to set, although we could maybe assume that participants had determine whether our features encapsulate writer biometrical access to larger third-party datasets as well. Since datasets information or simply the writing style. We separated the of competitions are published after the competitions, the only dataset of 165 samples in to left and right halves. We then way to have a perfectly fair comparison to the state of the art performed a pair matching of the left halves to the right halves is to participate in those competitions. A comparison of the based on the nearest neighbor classification. We obtained two untrained classifier (96.63%) to the state of the art (99.5%) measurements, first the percentage of left-samples having an is quite unfair towards our method. On the other hand, a assigned right-sample written by the same writer (55 classes), comparison of our trained classifier (98.56%) to the state of and second the percentage of left-samples having the spe- the art (99.5%) is a bit unfair towards the state of the art. In cific sample’s complementary right-half as the nearest neigh- the authors opinion, a fair comparison would be a lot closer to bor (165 classes). The writer identification rate was 87.27%, the trained classifier than to the untrained. The performance of while the specific sample recognition rate was 86.06%. By the untrained classifier demonstrates clearly the potency of our definition the writer identification rate is greater or equal to feature set. The qualitative experiments were not performed with forensics in mind, except for the last one, writer vs [4] X. Wang, T. X. Han, and S. Yan, “An hog-lbp human detector with writing style. In writer vs writing style we tried to determine partial occlusion handling,” in Computer Vision, 2009 IEEE 12th the extent to which our feature set can deal with disguising International Conference on. IEEE, 2009, pp. 32–39. writers; the quick answer is, no our method can not deal with [5] G. A. Dean, I. W. Kelly, D. H. Saklofske, and A. Furnham, “Graphology disguising writers. There are many subtleties in the conclusions and human judgment.” 1992. that can be drawn from the writer vs writing style experiment [6] A. Furnham, “Write and wrong: The validity of graphological analysis,” about what phenomena is that our features model. One could The Hundreth Monkey and Other Paradigms of the Paranormal, pp. even say that our method is more about texture similarity than 200–205, 1991. about writer similarity; assuming there are biometric features [7] A. Bensefia, T. Paquet, and L. Heutte, “Handwritten document analysis in handwriting, the proposed feature set does not seem to for automatic writer recognition,” Electronic letters on computer vision encapsulate them. From a software engineering perspective the and image analysis, vol. 5, no. 2, pp. 72–86, 2005. approach of treating writer identification as a distance metric [8] Z. He, X. You, and Y. Y. Tang, “Writer identification of chinese instead of a classifier [12] seems more efficient and modular, it handwriting documents using hidden markov tree model,” Pattern allows for simplification and standardization of benchmarking. Recognition, vol. 41, no. 4, pp. 1295 – 1307, 2008. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320307004037 The fact that the proposed features encapsulate no structural information what so ever, makes them a very good candidate [9] L. Du, X. You, H. Xu, Z. Gao, and Y. Tang, “Wavelet domain local for fusion with other feature sets. binary pattern features for writer identification,” in Pattern Recognition (ICPR), 2010 20th International Conference on. IEEE, 2010, pp. 3691– 3694. ACKNOWLEDGMENT [10] C. Bird, B. Found, and D. Rogers, “Forensic document examiners skill The first author of this paper would like to thank Georgios in distinguishing between natural and disguised handwriting behaviors,” Louloudis for his precious insights on the subject of writer Journal of forensic sciences, vol. 55, no. 5, pp. 1291–1295, 2010. identification and performance evaluation. The first author [11] M. I. Malik, M. Liwicki, L. Alewijnse, W. Ohyama, M. Blumenstein, would also like to thank Muhammad Imran Malik for his and B. Found, “Signature verification and writer identification competi- effort and assistance in the participation of this method on tions for on- and offline skilled forgeries (sigwicomp2013),” in 12th Int. the SigWiComp2013 competition. Conf. on Document Analysis and Recognition, Washigton, DC, USA, 2013, p. n.A. R EFERENCES [12] G. Louloudis, N. Stamatopoulos, and B. Gatos, “Icdar 2011 writer identification contest,” in Document Analysis and Recognition (ICDAR), [1] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale 2011 International Conference on. IEEE, 2011, pp. 1475–1479. and rotation invariant texture classification with local binary patterns,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, [13] G. Louloudis, B. Gatos, and N. Stamatopoulos, “Icfhr 2012 com- vol. 24, no. 7, pp. 971–987, 2002. petition on writer identification challenge 1: Latin/greek documents,” [2] T. Ahonen, A. Hadid, and M. Pietikäinen, “Face recognition with local in Frontiers in Handwriting Recognition (ICFHR), 2012 International binary patterns,” in Computer Vision-ECCV 2004. Springer, 2004, pp. Conference on. IEEE, 2012, pp. 829–834. 469–481. [14] D. Chawki and S.-M. Labiba, “A texture based approach for arabic [3] G. Zhao and M. Pietikainen, “Dynamic texture recognition using local writer identification and verification,” in Machine and Web Intelligence binary patterns with an application to facial expressions,” Pattern (ICMWI), 2010 International Conference on. IEEE, 2010, pp. 115– Analysis and Machine Intelligence, IEEE Transactions on, vol. 29, no. 6, 120. pp. 915–928, 2007.