Exploring few-shot text line segmentation approaches in challenging ancient manuscripts Silvia Zottin*,1 , Axel De Nardin*,1 , Giuseppe Branca1 , Emanuela Colombi2 , Claudio Piciarelli1 , Hafsa Shujat3 and Gian Luca Foresti1 1 Department of Mathematics, Computer Science and Physics, University of Udine, Via delle Scienze 206, Udine, Italy 2 Department of Humanities and Cultural Heritage, University of Udine, Vicolo Florio 2/B, Udine, Italy 3 International Islamic University Islamabad, Pakistan Abstract Text line segmentation is a critical component of document layout analysis, particularly for ancient handwritten manuscripts. Its primary goal is to accurately extract individual text lines, a step that significantly influences subsequent tasks such as optical character recognition, text transcription, and information extraction. However, segmenting text lines in historical manuscripts is particularly challenging due to irregular handwriting, faded ink, and complex layouts with overlapping lines and non-linear text flows. Additionally, the limited availability of large annotated datasets makes fully supervised learning approaches impractical for these documents. In this paper, we explore the applicability of three prominent semantic segmentation models when applied in a few-shot learning setting, using only a small number of labeled examples per manuscript. Our results demonstrate the challenges of addressing text line segmentation in the context of scarce labeled data. This provides a promising avenue for future research in document analysis for historical manuscripts. Keywords Text line segmentation, Few-Shot learning, Document layout analysis, Digital manuscript analysis 1. Introduction Text line segmentation is key in document layout analysis, particularly for historical handwritten manuscripts. This task is critical for downstream applications such as optical character recognition, text transcription, and information extraction. However, segmenting text lines in ancient manuscripts is inherently challenging due to degraded writing, irregular handwriting, complex layouts, overlapping text lines, and non-linear text flows. Additionally, the scarcity of large annotated datasets for these types of documents makes fully supervised learning approaches difficult to implement. Few-shot learning, which focuses on training models with a limited number of annotated examples, presents a promising solution for addressing these challenges. This approach is particularly valuable for historical manuscripts, where manually labeling large datasets is time-consuming and resource- intensive. In this paper, we explore the application of few-shot learning for text line segmentation in three challenging ancient manuscripts. Specifically, we investigate the performance of three well- known, effective, semantic segmentation models, FCN [1], PSPNet [2], and DeepLabv3+ [3], in the context of few-shot learning. With this paper, we aim to highlight that this is still an unexplored area in the literature and that there is significant potential in developing few-shot techniques for text line segmentation in ancient manuscripts. The rest of this paper is organized as follows. Section 2 presents the main framework from the existing literature. Section 3 describes the segmentation models employed in our experiments. Section 4 IRCDL 2025: 21st Conference on Information and Research Science Connecting to Digital and Library Science, February 20-21 2025, Udine, Italy $ silvia.zottin@uniud.it (S. Zottin*, ); axel.denardin@uniud.it (A. De Nardin*, ); branca.giuseppe@spes.uniud.it (G. Branca); emanuela.colombi@uniud.it (E. Colombi); claudio.piciarelli@uniud.it (C. Piciarelli); hafsashujat98@gmail.com (H. Shujat); gianluca.foresti@uniud.it (G. L. Foresti)  0000-0003-0820-7260 (S. Zottin*, ); 0000-0002-0762-708X (A. De Nardin*, ); 0000-0002-0384-6664 (E. Colombi); 0000-0001-5305-1520 (C. Piciarelli); 0000-0002-8425-6892 (G. L. Foresti) © 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). * These authors contributed equally to this work CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings provides the details of our experiments and reports the results. Finally, in Section 5, we draw our conclusions and discuss future work. 2. Related Works The scarcity of extensively labeled data in ancient manuscript analysis is due to the specialized expertise, significant time, and substantial financial resources required to create such datasets, especially for documents with intricate layouts. This limitation naturally motivates the development of systems that can achieve strong performance with minimal annotated data. However, the existing literature offers only a limited number of works that effectively address this challenge. Unsupervised learning, which does not require any annotated data, has been explored as a potential solution. Most methods in this category rely on intuitive heuristic assumptions that can be translated into deterministic rules. For example, in the task of text line segmentation, a common heuristic assumption is that the document contains only horizontal lines [4, 5]. Assumptions like this one, however, drastically limit the domain of applicability of such systems. Transfer learning provides another powerful approach to address the data scarcity problem. By leveraging pre-trained deep networks, representations learned on large, general-purpose datasets can be adapted to specific target domains with minimal labeled examples. Research in [6, 7] demonstrates that pre-training on document-related datasets consistently enhances segmentation results and accelerates convergence compared to using general-purpose datasets. These findings underscore the benefits of domain-specific pre-training for specialized applications. In addition, the study in [8] investigates the performance of models pre-trained on ImageNet and fine-tuned on an ancient document dataset. It highlights that the effectiveness of transfer learning versus training from scratch depends heavily on the characteristics of the target dataset. Similarly, domain-specific transfer learning has been shown to improve performance in document layout segmentation tasks. Few-shot and one-shot learning techniques have also emerged as promising solutions for data-scarce scenarios. An example of a few-shot learning strategy is presented in [9], where the model is trained with only two labeled images per manuscript. This framework combines a novel data augmentation technique [10] with a segmentation refinement module based on a traditional local thresholding method [11], achieving results comparable to state-of-the-art supervised methods. Another few-shot technique, Deep & Syntax [12], focuses on segmenting historical handwritten registers by leveraging recurrent patterns to delineate individual records. This hybrid approach combines U-shaped neural networks with logical rules, such as filtering and text alignment, to enhance accuracy. A more recent one-shot learning framework is introduced in [13], targeting layout segmentation of ancient Arabic documents. Despite using only one labeled page per manuscript, this method achieves state-of-the-art performance on a challenging dataset. It incorporates three main components: a semantic segmentation backbone, a dynamic instance generation module, and a segmentation refinement module. These advancements emphasize the growing interest in addressing the challenges of low-data scenarios. The significance of few-shot learning is further highlighted by the SAM challenge [14], which focuses on document layout segmentation under few-shot conditions. Despite these developments, few-shot learning techniques have not yet been explored for text line segmentation. To the best of our knowledge, no studies have addressed this specific topic in the literature. In this paper, we aim to fill this gap by exploring few-shot learning for text line segmentation in ancient manuscripts. We present preliminary results demonstrating the feasibility of performing text line segmentation using only three labeled images per manuscript. 3. Methods In this paper, we explore the performance of three prominent and high-performing semantic segmen- tation models when adopted for text line segmentation of ancient manuscripts in a few-shot setting. Specifically, we focus on three of the most popular models in the literature characterized by different architectural choices: FCN [1], PSPNet [2], and DeepLabv3+ [3]. 3.1. Fully Convolutional Network Fully Convolutional Network (FCN) [1] is a pioneering architecture designed for semantic segmentation tasks. The core architecture of FCN builds upon standard convolutional neural networks by transforming them into end-to-end trainable models capable of dense predictions. In particular, fully connected layers are replaced with fully convolutional layers, preserving spatial information and allowing the network to process input images of arbitrary size. To recover spatial resolution lost during downsampling in convolutional and pooling layers, FCN introduces upsampling layers. These layers use deconvolution to produce predictions at the same resolution as the input image. FCN is characterized by an encoder-decoder structure. The encoder comprises convolutional and pooling layers from standard classification networks, such as VGG or ResNet. These layers extract hierarchical features by progressively reducing spatial resolution while capturing high-level semantic information. The decoder restores the spatial resolution of the feature maps to match the input image dimensions. This is achieved through upsampling layers, which use learned deconvolution filters to interpolate the feature maps back to the original resolution. Skip connections are introduced to combine lower-level features from the encoder with higher-level features in the decoder, improving localization and boundary accuracy. One of the key innovations in FCN is the use of skip connections, which fuse feature maps from different layers. These connections combine coarse semantic information from deeper layers with fine spatial details from shallower layers, leading to more accurate segmentation, especially at object boundaries. 3.2. Pyramid Scene Parsing Network Pyramid Scene Parsing Network (PSPNet) [2] is a powerful model for semantic segmentation, known for its ability to capture both local and global contextual information. By employing a pyramid pooling module, PSPNet excels in understanding complex scenes with objects of varying sizes and arrangements. PSPNet builds upon a backbone network, typically ResNet [15], which serves as the feature extractor. The core innovation of PSPNet lies in the Pyramid Pooling Module (PPM), which aggregates contextual information from multiple spatial scales. The encoder extracts hierarchical feature maps from the input image. A pre-trained ResNet, truncated before the fully connected layers, is commonly used. This backbone captures rich semantic features at reduced spatial resolution through convolutional and pooling operations. The PPM is the centerpiece of PSPNet, designed to enhance the feature representation by pooling information from different regions of the image. Feature maps from the encoder are pooled into different grid sizes (e.g., 1 × 1, 2 × 2, 3 × 3, 6 × 6). Each pooled representation captures contextual information from increasingly larger regions, ranging from global to local. Finally, the pooled outputs are upsampled to match the resolution of the original feature map and concatenated with it, resulting in a feature map enriched with multi-scale context. The decoder takes the enriched feature map from the PPM and refines it through convolutional layers. This process generates pixel-wise predictions for the semantic segmentation task, with an output resolution matching the input image. 3.3. DeepLabv3+ DeepLabv3+ [3] is a state-of-the-art model for semantic segmentation, designed to achieve high ac- curacy by effectively capturing contextual information and refining spatial details. It builds upon the DeepLabv3 [16] architecture by incorporating an encoder-decoder structure, enabling more precise boundary delineation in segmentation tasks. Below, we provide a detailed description of the key components and mechanisms of DeepLabv3+. The encoder in DeepLabv3+ employs atrous (dilated) convolutions to expand the receptive field of convolutional layers without increasing the number of parameters or losing spatial resolution. Atrous convolutions allow the model to capture multi-scale context efficiently, which is crucial for recognizing objects at various scales. To further enhance multi-scale feature extraction, the encoder relies on the Atrous Spatial Pyramid Pooling (ASPP) module. ASPP applies parallel atrous convolutions with different dilation rates, capturing context at multiple resolutions. In addition to atrous convolutions, ASPP includes image-level pooling to aggregate global contextual information. The resulting feature maps are concatenated and processed through a 1x1 convolution layer and a batch normalization layer, producing robust multi-scale features. While the encoder captures high-level semantic information, it often results in reduced spatial resolution due to downsampling operations. To address this, DeepLabv3+ introduces a decoder module that refines the segmentation output by recovering spatial details. The decoder upsamples the output of the encoder using bilinear interpolation, aligning it with the spatial resolution of the lower-level feature maps extracted from earlier stages of the network. These lower-level feature maps, which retain finer spatial information, are then concatenated with the upsampled encoder output. This fused representation is further processed through convolutional layers, producing sharper and more accurate segmentation results, particularly around object boundaries. 4. Experiments 4.1. Dataset The dataset used in this study is a proprietary collection developed through a collaborative effort between computer scientists and humanities scholars. It was derived by processing images from the publicly available U-DIADS-Bib dataset [17], primarily focusing on document layout segmentation. Our dataset consists of 105 images, divided into 35 images for each of three distinct ancient manuscripts: Latin 2, Latin 14396, and Syriaque 341. Each manuscript varies in layout structure, degradation levels due to preservation and aging, and the alphabet in which it is written. Specifically, Latin 2 is a Latin-language manuscript arranged in two columns and featuring various interlinear para- texts. Similarly, Latin 14396, also in Latin, consists of text in two columns but includes diverse fonts and marginal paratexts, making its segmentation more variable. Finally, Syriaque 341 is a Syriac-language manuscript arranged in three columns. This manuscript is the most challenging, as it includes vertical comments and paratexts, and its pages are highly degraded due to aging and poor preservation. These unique characteristics of each manuscripts make the dataset particularly intriguing and challenging. The images of the three manuscripts were sourced from the French digital library Gallica1 . Given that the dataset was designed for use with few-shot learning techniques, 35 unique color page images were created for each manuscript, divided into three sets: 3 images for training, 10 for validation, and 15 for testing. Each page is paired with corresponding Ground Truth (GT) data, which includes two distinct and non-overlapping annotated classes: background and text lines. Figure 1 illustrates examples of the defined GT and their corresponding original images for each manuscript. Unlike many currently available document text line segmentation datasets, our proprietary dataset is characterized by pixel-level precision, non-overlapping elements, the absence of noise, heterogeneously oriented text lines, and a multi-column layout. 4.2. Evaluation Metrics Evaluation involves calculating key text line semantic segmentation metrics commonly adopted in the literature [18, 19], including Pixel Intersection over Union (Pixel IU), Line Intersection over Union (Line IU), Detection Rate (DR), Recognition Accuracy (RA), and F-measure (FM). 1 Source: https://gallica.bnf.fr (a) Latin 2, original (b) Latin 14396, original (c) Syriaque 341, original (d) Latin 2, GT (e) Latin 14396, GT (f) Syriaque 341, GT Figure 1: An example page for each manuscript that composes the dataset used for training and testing (a - c) and the corresponding GT masks (d - f). The background is highlighted in black, while each text line that composes the manuscript page is highlighted in white with pixel-level precision. Pixel IU and Line IU are based on the Intersection over Union (IU) metric, defined as: TP IU = (1) TP + FP + FN where TP denotes True Positives, FP False Positives, and FN False Negatives. Pixel IU evaluates IU at the pixel level, measuring the accuracy of line detection. Here, TP represents correctly detected pixels, FP represents extra (false) pixels, and FN represents missed pixels. Line IU evaluates IU at the line level, measuring how accurately lines are detected. In this case, TP is the number of correctly detected lines, FP is the number of extra lines, and FN is the number of missed lines. A threshold of 75% is applied to determine matches between predicted and ground-truth connected components. Two components are considered a match if both pixel precision and recall exceed this threshold. Otherwise, they are classified as FP (precision < threshold) or FN (recall < threshold). Detection Rate (DR), Recognition Accuracy (RA), and F-Measure (FM) rely on the MatchScore metric. For a given image, let 𝑅𝑖 represent the points within the 𝑖-th detected line segment, 𝐺𝑗 the points within the 𝑗-th ground truth line segment, and 𝑇 (𝑝) the number of points in a set 𝑝. The MatchScore between a detected and ground truth segment is calculated as: 𝑇 (𝐺𝑗 ∩ 𝑅𝑖 ) MatchScore(𝑖, 𝑗) = (2) 𝑇 (𝐺𝑗 ∪ 𝑅𝑖 ) A region pair (𝑖, 𝑗) is considered a one-to-one match if MatchScore(𝑖, 𝑗) ≥ 𝑇𝑎 , where 𝑇𝑎 = 75%. Using the one-to-one matches (𝑀 ) identified, along with the number of ground-truth lines (𝑁1 ) and detected lines (𝑁2 ), the metrics are defined as: 𝑀 𝑀 2 · DR · RA DR = , RA = , FM = (3) 𝑁1 𝑁2 DR + RA These metrics are calculated for each manuscript individually, with the overall dataset average provided for each metric as well. 4.3. Hyper-parameters setup For training the proposed model, the Jaccard loss function was selected. Jaccard loss quantifies the dissimilarity between the predicted segmentation mask and the GT mask, leveraging the intersection and union of the two masks. The Jaccard loss is defined as: TP ℒJaccard = 1 − , (4) TP + FP + FN where TP, FP, and FN represent true positives, false positives, and false negatives, respectively. The model was optimized using the ADAM optimizer with a learning rate of 1 × 10−3 and a weight decay of 1 × 10−5 . Training was conducted for a maximum of 100 epochs, with an early stopping criterion applied if the network showed no improvement over the last 20 iterations after completing 50 epochs. For all models, a ResNet50 was chosen as the backbone. 4.4. Results Table 1 summarizes the performance evaluation of the three semantic segmentation models, FCN [1], PSPNet [2], and DeepLabv3+ [3], on the text line segmentation task. The results are reported both for individual manuscripts and for the entire dataset, with averages calculated across the three manuscript classes: Latin 2, Latin 14396, and Syriaque 341. Among the evaluated models, DeepLabv3+ shows the best overall performance across all metrics when considering the average scores for the entire dataset. Specifically, it achieves the highest Pixel IU (0.490), Line IU (0.443), DR (0.291), RA (0.177), and FM (0.217). FCN exhibits competitive performance on the Latin 2 and Latin 14396 manuscripts, with an average Pixel IU of 0.416 and Line IU of 0.340. However, it struggles with the more challenging Syriaque 341 manuscript, where degradation and complex layouts significantly affect its performance. PSPNet, while demonstrating reasonable results on Latin 14396, performs less consistently overall, particularly on the Syriaque 341 manuscript, with an average Pixel IU of 0.365 and Line IU of 0.314. Overall, the results are notably low, which highlights the significant challenges still present in the task of text line segmentation for historical documents in a few-shot setting. 5. Conclusion In this study, we explored few-shot learning approaches for text line segmentation in ancient manuscripts, a challenging task due to the unique characteristics of historical documents. We ap- plied three semantic segmentation models and evaluated their performance on a proprietary dataset of ancient manuscripts, consisting of Latin and Syriac texts with varying levels of degradation and complexity. The results revealed that while DeepLabv3+ achieved the best overall performance, the performance across all models was relatively low, indicating the inherent difficulties of segmenting text lines in such challenging datasets. Despite this, the experiments demonstrate that few-shot learning can be a Backbone Metrics Latin 2 Latin 14396 Syriaque 341 Average Pixel IU 0.513 0.551 0.183 0.416 Line IU 0.449 0.493 0.079 0.340 FCN [1] DR 0.292 0.437 0.030 0.253 RA 0.166 0.269 0.041 0.159 FM 0.210 0.329 0.034 0.191 Pixel IU 0.515 0.503 0.079 0.365 Line IU 0.402 0.519 0.020 0.314 PSPNet [2] DR 0.154 0.338 0.003 0.165 RA 0.110 0.302 0.007 0.140 FM 0.126 0.316 0.004 0.149 Pixel IU 0.554 0.573 0.342 0.490 Line IU 0.530 0.582 0.218 0.443 DeepLabv3+ [3] DR 0.275 0.512 0.086 0.291 RA 0.148 0.331 0.051 0.177 FM 0.191 0.397 0.064 0.217 Table 1 Performance comparison of the different semantic segmentation models chosen tested on our dataset. The best-performing results for full dataset are shown in bold. promising approach for text line segmentation, offering a viable solution in situations where annotated data is scarce. With this research, we highlight the need and the possibility of more effective methods for the segmentation of ancient manuscripts, contributing to the preservation and accessibility of cultural heritage. Acknowledgments Partial financial support was received from: PNRR DD 3277 del 30 dicembre 2021 (PNRR Missione 4, Componente 2, Investimento 1.5) - iNEST; from Strategic Departmental Plan on Artificial Intelligence, DMIF, University of Udine; from Strategic Departmental Plan and interdepartmental center AI4CH – Artificial Intelligence for Cultural Heritage of University of Udine. References [1] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431–3440. doi:10.1109/CVPR.2015.7298965. [2] H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia, Pyramid scene parsing network, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6230–6239. doi:10.1109/CVPR. 2017.660. [3] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-decoder with atrous separable convolution for semantic image segmentation, in: V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (Eds.), Computer Vision – ECCV 2018, Springer International Publishing, Cham, 2018, pp. 833–851. [4] Z. Li, W. Wang, Y. Chen, Y. Hao, A novel method of text line segmentation for historical document image of the uchen tibetan, Journal of Visual Communication and Image Representation 61 (2019) 23–32. URL: https://www.sciencedirect.com/science/article/pii/S1047320319300288. doi:https: //doi.org/10.1016/j.jvcir.2019.01.021. [5] T.-N. Nguyen, J.-C. Burie, T.-L. Le, A.-V. Schweyer, An effective method for text line segmentation in historical document images, in: 2022 26th International Conference on Pattern Recognition (ICPR), IEEE, Montreal, Canada, 2022, pp. 1593–1599. URL: https://hal.science/hal-03922470. doi:10. 1109/ICPR56361.2022.9956617. [6] A. De Nardin, S. Zottin, E. Colombi, C. Piciarelli, G. L. Foresti, Is imagenet always the best option? an overview on transfer learning strategies for document layout analysis, in: G. L. Foresti, A. Fusiello, E. Hancock (Eds.), Image Analysis and Processing - ICIAP 2023 Work- shops, Springer Nature Switzerland, Cham, 2024, pp. 489–499. doi:https://doi.org/10.1007/ 978-3-031-51026-7_41. [7] A. De Nardin, S. Zottin, C. Piciarelli, G. L. Foresti, E. Colombi, In-domain versus out-of-domain transfer learning for document layout analysis, International Journal on Document Analysis and Recognition (IJDAR) (2024). URL: https://doi.org/10.1007/s10032-024-00497-4. doi:10.1007/ s10032-024-00497-4. [8] L. Studer, M. Alberti, V. Pondenkandath, P. Goktepe, T. Kolonko, A. Fischer, M. Liwicki, R. Ingold, A comprehensive study of imagenet pre-training for historical document image analysis, in: 2019 International Conference on Document Analysis and Recognition (ICDAR), 2019, pp. 720–725. doi:10.1109/ICDAR.2019.00120. [9] A. De Nardin, S. Zottin, M. Paier, G. L. Foresti, E. Colombi, C. Piciarelli, Efficient few-shot learning for pixel-precise handwritten document layout analysis, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, Hawaii, 2023, pp. 3680–3688. doi:10.1109/WACV56688.2023.00367. [10] A. De Nardin, S. Zottin, M. Paier, G. L. Foresti, E. Colombi, C. Piciarelli, Dynamic instance generation for few-shot handwritten document layout segmentation (short paper), in: CEUR Workshop Proceedings, volume 3286, 2022, p. 26 – 34. [11] A. De Nardin, S. Zottin, C. Piciarelli, E. Colombi, G. L. Foresti, Few-shot pixel-precise document layout segmentation via dynamic instance generation and local thresholding, International Journal of Neural Systems 33 (2023) 2350052. doi:10.1142/S0129065723500521. [12] S. Tarride, A. Lemaitre, B. Coüasnon, S. Tardivel, Combination of deep neural networks and logical rules for record segmentation in historical handwritten registers using few examples, International Journal on Document Analysis and Recognition (IJDAR) 24 (2021) 77–96. doi:10. 1007/s10032-021-00362-8. [13] A. De Nardin, S. Zottin, C. Piciarelli, E. Colombi, G. L. Foresti, A one-shot learning approach to document layout segmentation of ancient arabic manuscripts, in: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024, pp. 8112–8121. doi:10.1109/ WACV57701.2024.00794. [14] S. Zottin, A. De Nardin, G. L. Foresti, E. Colombi, C. Piciarelli, Icdar 2024 competition on few- shot and many-shot layout segmentation of ancient manuscripts (sam), in: E. H. Barney Smith, M. Liwicki, L. Peng (Eds.), Document Analysis and Recognition - ICDAR 2024, Springer Nature Switzerland, Cham, 2024, pp. 315–331. [15] Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, S. Zhou, Focusing attention: Towards accurate text recognition in natural images, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017. [16] L. Chen, G. Papandreou, F. Schroff, H. Adam, Rethinking atrous convolution for semantic image segmentation, CoRR abs/1706.05587 (2017). arXiv:1706.05587. [17] S. Zottin, A. De Nardin, E. Colombi, C. Piciarelli, F. Pavan, G. L. Foresti, U-diads-bib: a full and few- shot pixel-precise dataset for document layout analysis of ancient manuscripts, Neural Computing and Applications 36 (2024) 11777–11789. URL: https://doi.org/10.1007/s00521-023-09356-5. doi:10. 1007/s00521-023-09356-5. [18] F. Simistira, M. Seuret, N. Eichenberger, A. Garz, M. Liwicki, R. Ingold, Diva-hisdb: A pre- cisely annotated large dataset of challenging medieval manuscripts, in: 2016 15th Inter- national Conference on Frontiers in Handwriting Recognition (ICFHR), 2016, pp. 471–476. doi:10.1109/ICFHR.2016.0093. [19] M. W. A. Kesiman, D. Valy, J. C. Burie, E. Paulus, M. Suryani, S. Hadi, M. Verleysen, S. Chhun, J.-M. Ogier, Icfhr 2018 competition on document image analysis tasks for southeast asian palm leaf manuscripts, in: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), 2018, pp. 483–488. doi:10.1109/ICFHR-2018.2018.00090.