Pen Trace Reconstruction with Skeleton Representation of a Handwritten Text Image? Svetlana Kryzhanovskaya, Sergey Arseev, Leonid Mestetskiy Lomonosov Moscow State University, 119991 Leninskie Gory, Moscow, Russia skryzhanovskaya@yandex.ru 9413serg@gmail.com mestlm@mail.ru Abstract. This paper presents a new approach for the pen trace reconstruction task in handwritten text images. The proposed approach is based on the skeleton representation of a binary image which provides sufficient information about the pen movement trajectory during the writing. As such, the initial problem can be viewed as a problem of constructing a specific walk in the skeleton graph covering all its edges. The proposed approach consists of splitting the graph into structural elements and estimating the pen movement direction in each of them with the resulting trace being composed of the traces of these elements. Keywords: Pen Trace Reconstruction, Binary Image Skeleton, Stroke-based De- composition, Handwritten Text Recognition. 1 Introduction The pen trace reconstruction is the transformation of a given binary image containing a connected piece of handwritten text into a trace represented as a sequence of points where the pen is supposed to have been tracing over them consequently. This reconstruction is a possible approach to the wider task of offline handwriting recognition where the text is recovered from an image of a piece of handwritten text. No universal solution for this task has yet been developed and the current state of the art solutions impose major restrictions on the input and usually require a large amount of training data. The other related task is the online handwriting recognition where the input is provided as a sequence of pen positions instead of an image. This representation usually has significantly lower input dimensionality than the image and is less sensitive to noise which allows this task to be solved more efficiently. However, it requires the use of a specific hardware for the input data capture which greatly lowers the range of its practical applications. Therefore, the idea described in [7] is to reduce the offline recognition task to the online recognition task using the reconstructed pen trace from an image to simulate an online input. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons Li- cense Attribution 4.0 International (CC BY 4.0). ? Supported by Russian Foundation for Basic Research, grant #20-01-00664. 2 S. Kryzhanovskaya, S. Arseev, L. Mestetskiy The task of pen trace reconstruction in handwritten text images has a wide range of practical applications as a part of the handwriting recognition task. If the offline recog- nition task is successfully reduced to the online recognition task, more efficient and less restrictive methods can be used in tasks such as automated archive digitalization. Another example of the pen trace reconstruction applications is the task of text samples verification via comparing them to the reference. For example, in [8] the reconstructed trajectory is used for signature identification. 2 Related work Many approaches exist for the given task, however they still don’t provide the required accuracy in all cases. Most of them can be roughly split into three main groups: local, global and artificial neural network based methods. Global methods attempt to reconstruct the pen trajectory by optimizing a global quality functional. For example, in [6] the total curvature of the trace is minimized and the task is reduced to solving a travelling salesman task on a skeleton graph. The main drawback of global methods is that they are computationally expensive because they involve optimization over a large set of possibile options. In local methods, ambiguous zones of the trace are processed separately. In [8], a number of heuristic rules is proposed to solve these zones. In [3], a discrete transform of the skeleton is used with subsequent classification of its nodes. The main drawback of these methods is that they are often based on assumption that the pen trace is continuous. Recently, there has been a surge in methods using artificial neural networks. In [14], convolutional neural networks (CNN) are used for pen trace reconstruction in Chinese symbols, and in [1], the use of recurrent neural networks (RNN) is proposed to recover the trajectory in Indian scripts. In [12] RNN End-To-End framework combines with CNN for feature extraction. However, most existing methods based on neural networks are only effective for trajectory reconstruction in isolated symbols and require a large amount of training data. The main advantages of the proposed approach is that it is applicable for entire words. What is more, it combines local and global trajectory reconstruction methods and doesn’t require labeled training dataset. 3 Proposed method The main idea of the proposed method is that the binary image is transformed into a geometric graph where each node corresponds to the pen position at some point dur- ing writing and edges model the pen movement between nodes. This way, the initial problem is solved by consructing a sequence of walks covering all edges of the graph together. Since the handwriting can consist of separate strokes, the trace doesn’t neces- sary consist of a single walk. What is more, the pen can trace some parts of its trajec- tory multiple times so it is possible that an edge will be included more than once and re-traced both in the same or opposite direction. Pen Trace Reconstruction with Skeleton Representation of a Handwritten Text Image 3 3.1 Skeleton representation We use a skeleton of a binary image as a graph modeling the pen trajectory. The stages of its construction are: 1. Object border approximation with contour representation, i.e. a polygon with a minimum perimeter separating black and white pixels of the binary image. 2. Calculation of a skeleton of the contour representation as a locus of the centers of circles that are tangent to the polygon edge in two or more points, where all such circles are contained in the figure. The skeleton is then approximated as a geometric graph with the nodes corresponding to the centers of some of the circles and edges being straight lines. 3. Skeleton pruning, i.e. cutting off the shortest edges that don’t affect the overall shape significantly. Without pruning, these branches add significant noise to the pen trace model. The skeleton graph is described in greater detail in [10]. An example of a binary image with its skeleton graph can be seen on Fig. 1. If the skeleton consists of multiple connected components, each one is traced sepa- rately. The tracing algorithm for one component is presented below. Fig. 1. Contour representation (green line) and Fig. 2. Terminal nodes (in blue circles) and in- skeleton (red line) of a binary image. tersections (in yellow circles). 3.2 Stroke-based segmentation Lets define a degree 1 node of a graph as a terminal node and a node with a degree higher than 2 as an intersection. Note that removing this nodes will split the graph into subgraphs that we call branches. Examples of terminal nodes and intersections can be seen on Fig. 2. We assume that the pen can only lift from the paper in terminal nodes and intersec- tions. A stroke is a subgraph of a skeleton graph that consists of one or more branches and corresponds to a continuous fragment of the pen trace. The result of a stroke-based segmentation is a set of strokes covering all edges of the graph. It is possible for some of the strokes to have common branches (usually, in the parts of the trajectory that are traced twice). 4 S. Kryzhanovskaya, S. Arseev, L. Mestetskiy Cyclical strokes Cyclical strokes correspond to rounded graphic elements of hand- written letters. For each of the strokes, it is possible to trace it either clockwise or counter-clockwise. For example, in Russian language, cycles located below the base- line (baseline extraction methods are described in [11]) are usually traced clockwise and other cycles are traced counter-clockwise. Each cyclical stroke is a cycle in the skeleton graph. To find all cycles in an undi- rected graph we use the depth-first search described in [2]. In the set of all cycles we select a subset that satisfies the following criteria: 1. All cycle edges are covered. 2. Total length of the intersections between cycles is minimized (minimize the amount of parts traced twice). Vertical strokes Vertical strokes correspond to vertical graphic elements of handwrit- ten letters. We assume that these strokes are traced from the top down. Each such stroke is a chain in the graph that begins and ends in a terminal node. For a pair (u, v) of terminal nodes of the skeleton graph, the following symbols are introduced: 1. l is a line passing through u and v. 2. s = (u, v1 , . . . , vn , v) is a shortest path from u to v in the graph. 3. dist(l, v) is a distance from l to v. The pair (u, v) satisfies the verticality condition if 1. u.y < v.y (u – ”bottom” node, v – ”top” node); 2. The tilt of l is in range [α1 , α2 ], where α1 , α2 are parameters of the algorithm. The pair (u, v) satisfies the linearity condition if n 1X L(u, v) = dist(l, vi ) < β, β is a parameter n j=1 The chains in the graph with both ends being terminal nodes satisfying the vertical- ity and linearity conditions are designated as vertical strokes. Vertical strokes cannot have common branches with each other. Semi-vertical strokes Semi-vertical strokes are designated in a similar way as vertical strokes, but a semi-vertical stroke is a chain in a graph between the top terminal node and bottom intersection node. Semi-vertical strokes are designated after vertical strokes. Semi-vertical strokes can- not have common branches with each other and vertical strokes. Pen Trace Reconstruction with Skeleton Representation of a Handwritten Text Image 5 Simple strokes Simple strokes are chains that consist of a single branch of the skeleton graph. Simple strokes are designated after all other strokes have been found. A simple stroke cannot have common branches with other strokes. Fig. 4. Weighted metagraph and its root Fig. 3. Skeleton graph decomposition into (marked with an asterisk), spanning tree (high- strokes: cyclical (blue), vertical (green), semi- lighted with bold lines) and corresponding se- vertical (yellow) and simple (red). quence of tracing nodes (specified by num- bers). 3.3 Stroke sequence synthesis {S1 , . . . , Sn } is a set of strokes from the stroke-based segmentation. An example of a stroke-based segmentation can be seen on Fig. 3. Metagraph construction A metagraph of a skeleton graph and its stroke decomposi- tion is a weighted graph where the nodes correspond to the strokes of the initial graph and any two nodes are connected with an edge if their corresponding strokes have com- mon nodes. The weight W (Si ) of each node is a total length of the edges of the corre- sponding stroke. For skeleton graph and its decomposition into strokes on Fig. 3 a corresponding metagraph can be seen on Fig. 4. Metagraph spanning tree construction 1. Selection of a root stroke to be a starting node for a walk. Since in the Russian language words are written left to right, the stroke that contains a node with the minimum horizontal coortinate is selected as a starting stroke for the Russian lan- guage. 2. Width-first search in the metagraph, starting in the root stroke (described in [2]). The spanning tree consists of all pairs (u, v) where the algorithm processing the node u finds a new, not previously marked node v incidental to it. On Fig. 4 the root is marked with an asterisk and the spanning tree is highlighted with bold lines. 6 S. Kryzhanovskaya, S. Arseev, L. Mestetskiy Spanning tree walk For each node we calculate the weight of the ”heaviest” subtree:  W (Si ) if Si is a leaf D(Si ) = W (S ) + max D(Sj ) otherwise  i Sj ∈children(Si ) Algorithm 1 Spanning tree tracing algorithm 1: function M ETA S EARCH( Scur ) 2: if Scur is not a leaf then 3: sort Si ∈ children(Scur ) by D(Si ) in ascending order 4: for all Si ∈ children(Scur ) in this order do 5: M ETA S EARCH( Si ) 6: M ETA S EARCH(Sstart ) On Fig. 4 the sequence of trace nodes obtained by the algorithm is specified by numbers. Simple stroke direction Let (S(1) , . . . , S(n) ) be a sequence of strokes provided by the algorithm 1. If S(1) is a simple stroke, it is traced from the leftmost end node. Let S(i) be a simple node, i > 1. Find S(j) , so that: 1. j < i 2. S(j) ∩ S(i) 6= ∅ 3. ∀k : j < k < i, S(k) ∩ S(i) = ∅ Since each simple stroke is a branch, S(i) ∩ S(j) = {v}, and v is an end node of S(i) . This wayS(i) is traced starting from v. 4 Experiments For testing purposes and evaluation, the algorithm was implemented using the Python programming language. A library [9] was used to calculate the skeleton representation. For visual verification of the program, a graphical interface simulating the pen move- ment was developed. 4.1 Word level experiment Dataset and experiment conditions The method was evaluated on a set of 250 labeled binary images depicting scanned handwritten words in the Russian language. Examples of the images in the testing set can be seen on Fig. 5. For each image, a skeleton was la- beled by an assessor. The markup constsis of a sequence of the skeleton graph branches with their trace direction for the possible trajectory. Note that extra branches can appear in the skeleton not corresponding to the actual trajectory. In that case, these branches were not labeled and did not affect the quality metric. Pen Trace Reconstruction with Skeleton Representation of a Handwritten Text Image 7 Quality metrics Since it’s assumed that the pen can only lift from the paper in inter- sections and terminal nodes, all reconstruction errors fall into two categories: 1. Branch trace direction errors 2. Branch ordering errors We used the following quality metrics for evaluation: 1. The fraction of branches with the correctly chosen trace direction. 2. Levenshtein distance or the total amount of insertions, deletions, substitutions and transpositions required to turn the algorithm-produced branch ordering into the ground truth ordering performed by a human. Experiment results A total of 5940 branches were marked for 250 images. Branch direction predicted by the algorithm matched the assessor labels for 5762 branches (97 %). For branch ordering, the total Levenshtein distance to corresponding ground truth labels for all 250 samples equals 432 (average ≈ 2). These results are viewed as positive. It’s worth noting that in most cases several different handwriting styles are possible and as such, more than one trajectory can be considered to be the ground truth. In this experiment, only one possible trajectory was labeled by the assessor and the possibility for the algorithm to select another ”true” trajectory can potentially lower the metrics in relation to the actual quality. Table 1. Experiment results 1. Levenshtein Fraction of correctly distance traced branches (average) 0.97 ≈2 Fig. 5. Input images examples for word level Fig. 6. Input images examples for letter level experiment. experiment. 8 S. Kryzhanovskaya, S. Arseev, L. Mestetskiy 4.2 Letter level experiment Dataset and experiment conditions To test the ability of the proposed method to be used for online recognition, the method was evaluated on a set of 700 labeled binary images depicting various characters from English alphabet. It consists of seven classes of symbols: A, B, E, P, S, W and None, the first six containing data of the respective letter and the last containing various symbols not corresponding to any of the other classes. The dataset has 100 images for each class that were rasterized using traces gathered with the Low Power Projected Capacitive Touchpad Development Kit tool by Microchip. Some examples of the data are shown on Fig. 6. Quality metrics The main quality criterion of the reconstructed pen trace for the recog- nition task is the comparison between offline recognition, online recognition with true pen trace and offline-to-online recognition with the reconstructed pen trace. An algo- rithm implemented in [5] was selected for experimental evaluation of the pen trace reconstruction algorithm. This method uses a recurrent neural network consisting of 24 gated recurrent units with sigmoid activation function. The model was trained from scratch; images were split into training and testing sets in 4:1 proportion. The training set has been enhanced using random shifts for sequence points. The offline recogni- tion method selected for comparison was the VGG16 convolutional network described in [13] and pre-trained on the ImageNet dataset [4]. The training and testing sets split of the data has also been conducted in 4:1 proportion. Convolutional neural networks achieve good results but require a large amount of training data to learn properly. How- ever, in practical applications it’s not always possible to obtain a large marked data set relevant to the specific task. Different writing systems and different conditions require different data sets and it is often too tedious to perform a large scale data gathering and labeling for the task. The proposed method significantly reduces the complexity of the model and dimensionality of the data which enables it to produce practical results on a wide variety of different tasks without the need for long and costly data preparation. The quality metric used for comparison was the proportion of correctly classified samples. Experiment results The results of the evaluation are shown in the table. Recognition methods are marked as follows: 1. RNN-True – recurrent neural network from [5] trained on the real pen trace data. 2. RNN-Offline – same network trained on reconstructed pen trace data provided by the proposed algorithm. 3. VGG – the VGG16 convolutional network. The proposed method shows lower classification accuracy than recognition based on true trajectories, however its results are significantly higher than the offline recognition method. The low performance of the VGG16 network can be attributed to low size of the training set which, despite the help of transfer learning and data augmentation, is not enough for the complex convolutional models while the proposed method shows higher performance despite having only 80 training images for each class. Pen Trace Reconstruction with Skeleton Representation of a Handwritten Text Image 9 Table 2. Experiment results 2. Method Accuracy RNN-True 0.93 RNN-Offline 0.88 VGG 0.60 5 Conclusion This work presents a new approach to the pen trace reconstruction by a handwritten text image using its skeleton representation. The algorithm was evaluated on a set of data to measure its performance. Advantages of the proposed approach 1. The trajectory is not required to be continuous. 2. Some parts of the trajectory can be re-traced. 3. No labeled training dataset is required for word-level segmentation. 4. Can be used not only on isolated symbols but on continuous words as well. 5. When used for recognition, the method can succesfully learn on a very small la- beled dataset which enables it to be used for tasks with scarce data where convolu- tional models cannot be trained properly. Drawbacks of the proposed approach 1. Skeleton graph doesn’t always accurately represent the pen trajectory. For example, it can contain extra branches not corresponding to any part of the trace. 2. Stroke-based segmentation uses heuristics derived from language specifics. Further work in this area will include integration of the pen trace reconstruction method into online text recognition systems to enable solving the offline recognition problem using online recognition methods. References 1. Bhunia, A.K., Bhowmick, A., Bhunia, A.K.: Handwriting trajectory recovery using end-to- end deep encoder-decoder network. ICPR (2018) 2. Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms (2007) 3. Crispo, G., Diaz, M., Marcelli, A., Ferrer, M.: Tracking the ballistic trajectory in complex and long handwritten signatures (08 2018). https://doi.org/10.1109/ICFHR-2018.2018.00068 4. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.F.: Imagenet: a large-scale hierarchical image database. pp. 248–255 (06 2009). https://doi.org/10.1109/CVPR.2009.5206848 5. Fischer, T.: (2018), https://github.com/tobiasfshr/online-handwritten-character-recognition- capacitive-sensors 6. Jaeger, S.: Recovering writing traces in off-line handwriting recognition: Using a global optimization technique. vol. 3, pp. 150 – 154 vol.3 (1996) 10 S. Kryzhanovskaya, S. Arseev, L. Mestetskiy 7. Lallican, P.M., Viard-Gaudin, C., Knerr, S.: From off-line to on-line handwriting recognition (2004) 8. Lee, S., Pan, J.C.J.: Offline tracing and representation of signatures. IEEE Trans. Systems, Man, and Cybernetics 22, 755 – 771 (1992) 9. Mestetskiy, L.: http://www.machinelearning.ru/wiki/images/d/da/SkeletonMaker C.rar 10. Mestetskiy, L.: Continuous morphology of binary images: figures, skeletons, circulars. Moscow: Fizmatlit (2009) 11. Quirós, L.: Multi-task handwritten document layout analysis (06 2018) 12. Rabhi, B., Elbaati, A., Hamdi, Y., Alimi, A.M.: Handwriting recognition based on temporal order restored by the end-to-end system. In: 2019 International Confer- ence on Document Analysis and Recognition (ICDAR). pp. 1231–1236 (Sep 2019). https://doi.org/10.1109/ICDAR.2019.00199 13. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recog- nition. CoRR (2014) 14. Zhao, B., Yang, M., Tao, J.: Drawing order recovery for handwriting chinese characters. pp. 3227–3231 (2019)