<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Enhancing Arabic Maghribi Handwritten Text Recognition with RASAM 2: A Comprehensive Dataset and Benchmarking ⋆</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Chahan</forename><surname>Vidal-Gorène</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">École Nationale des chartes</orgName>
								<orgName type="institution">Université PSL</orgName>
								<address>
									<addrLine>Centre Jean-Mabillon</addrLine>
									<country key="FR">France</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<address>
									<settlement>Calfa</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Clément</forename><surname>Salah</surname></persName>
							<email>clement.salah@unil.ch</email>
							<affiliation key="aff2">
								<orgName type="laboratory">UMR 8167)</orgName>
								<orgName type="institution" key="instit1">Sorbonne Université (</orgName>
								<orgName type="institution" key="instit2">Université de Lausanne (IHAR)</orgName>
								<address>
									<country>France, Suisse</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Noëmie</forename><surname>Lucas</surname></persName>
							<email>noemie.lucas@ed.ac.uk</email>
							<affiliation key="aff3">
								<orgName type="institution">University of Edinburgh</orgName>
								<address>
									<country key="GB">Scotland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Aliénor</forename><surname>Decours-Perez</surname></persName>
							<affiliation key="aff1">
								<address>
									<settlement>Calfa</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Antoine</forename><surname>Perrier</surname></persName>
							<email>antoine.perrier@cnrs.fr</email>
							<affiliation key="aff4">
								<orgName type="institution" key="instit1">CNRS</orgName>
								<orgName type="institution" key="instit2">Centre Jacques Berque</orgName>
								<address>
									<settlement>Maroc</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Enhancing Arabic Maghribi Handwritten Text Recognition with RASAM 2: A Comprehensive Dataset and Benchmarking ⋆</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">AEE46C6D4BA8F228374BAEE6B7F863B3</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:50+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>dataset</term>
					<term>Arabic scripts</term>
					<term>handwritten text recognition</term>
					<term>historical manuscripts</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Recent advancements in handwritten text recognition (HTR) for historical documents have demonstrated high performance on cursive Arabic scripts, achieving accuracy comparable to Latin scripts. The initial RASAM dataset, focused on three Arabic Maghribi manuscripts, facilitated rapid coverage of new documents via fine-tuning. However, HTR application for Arabic scripts remains constrained due to the vast diversity in spellings, ambiguities, and languages. To overcome these challenges, we present RASAM 2, an extended dataset with 3,750 lines from 15 manuscripts in the BULAC library, showcasing various hands, layouts, and texts in Arabic Maghribi script. RASAM 2 aims to establish a new benchmark for HTR model training for both Maghribi and Oriental scripts, covering text recognition and layout analysis. Preliminary experiments using a word-based CRNN approach indicate significant model versatility, with a nearly 40% reduction in Character Error Rate (CER) across new in-domain and out-of-domain manuscripts.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In 2020, the Recognition and Analysis of Scripts in Arabic Maghrebi (RASAM) dataset was introduced to analyze and recognize handwritten Arabic documents, specifically focusing on Arabic Maghribi script manuscripts. This dataset demonstrated the feasibility of applying Handwritten Text Recognition (HTR) to Arabic Maghribi scripts, aiming for error rates comparable to other non-Latin scripts. The initial dataset, RASAM 1, included 300 images from three Bibiothèque des langues et civilisations (BULAC) manuscripts copied between 1734 and 1875, achieving promising results with an in-domain Character Error Rate (CER) of 4.8%.</p><p>However, the limited scope of RASAM 1 restricted its effectiveness in recognizing out-ofdomain manuscripts, even those with similar contemporary scripts and themes (see Table <ref type="table" target="#tab_0">1</ref>). To overcome these limitations, we introduce RASAM 2, an expanded dataset comprising 3,750 lines from fifteen manuscripts, encompassing a broader range of themes and handwriting styles. RASAM 2 aims to provide a comprehensive reference for training HTR models for Arabic scripts, enhancing their robustness and applicability across diverse Arabic Maghribi and Oriental texts. This paper presents the technical details of RASAM 2, its composition, and the initial results of using a new word-based Convolutional Recurrent Neural Network (CRNN) approach, which shows significant improvement in model versatility and a substantial reduction in CER for both in-domain and out-of-domain manuscripts. Commentary: The nūn is mistaken for a qāf (in both cases, a single dot subscribed).</p><p>Commentary: The fā is confused with a bā (in both cases, a single point is subscribed).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Commentary:</head><p>The rā is confused with a wāw (more or less open and long final).</p><p>Commentary: The pair of letters bā and 'ayn were confused with anhā (the subscript point of the bā was not spotted). The final dāl is confused with a ḥā, they may have a close ending.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Commentary:</head><p>The rā of rummān (pomegranates) became a wāw, both often very close realisations -a possible example of a food word unknown by the model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Commentary:</head><p>The first subscribed point is misunderstood and the ǧīm of ǧawāhir (jewels or gems) is confused with an ḥā. The unusually wide realisation of the hā is mistaken for a qāf (the dot on the line below is mistakenly equated with this line) followed by a ṣād. The rā is well understood.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">State-of-the-art datasets for Arabic scripts</head><p>The study of documents in Arabic constitutes a separate field within the handwritten text recognition and document analysis questions more generally, owing to the great diversity and variability they encompass, hence the workshops dedicated to this specific issue held at the last ICDAR and ICFHR conferences. The latest developments in HTR for Arabic have however demonstrated that the use of dedicated CRNN enables to overcome the issue of text recognition for these scripts, with CER below 5%, even below 3% in specific cases, with few training data <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b8">9]</ref>. At this stage, these specialized models exceed the performance achieved by Transformers for Arabic, the latest results on Al-Soudani Maghrebi script achieving an average of 10% CER with large dataset <ref type="bibr" target="#b11">[12]</ref>. The text detection is also effective on Arabic documents, for instance, the use of FCN <ref type="bibr" target="#b7">[8]</ref> allows for a good text-line detection. For the semantic classification of contents, using a non-specialized U-net <ref type="bibr" target="#b14">[15]</ref> outperforms the FCN results, which is notably facing problems in differentiating two close text regions of the same type, unlike U-net. Several open-ended questions remain, such as the processing of very cursive scripts, the issue of transcription and the ambiguity of diacritics, or the reading of abbreviations.</p><p>In recent years, numerous datasets have emerged in an attempt to overcome these different tasks. In the instance of non-historical documents, the IFN/ENIT dataset <ref type="bibr" target="#b13">[14]</ref>, focused on modern scripts and produced in a very restricted context, is an important point of reference, notably used for the automatic generation of handwritten lines <ref type="bibr" target="#b4">[5]</ref>. Not designed for HTR purposes, the KHATT dataset offers a dataset in modern scripts with 1,000 different copyists <ref type="bibr" target="#b10">[11]</ref>, mainly intended for writer identification, as well as the QUWI and LAMIS-MSHD datasets <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b3">4]</ref>.</p><p>In the instance of historical documents, very specialized datasets exist, such as WAHD <ref type="bibr" target="#b0">[1]</ref>, dedicated to writer identification, or KERTAS <ref type="bibr" target="#b1">[2]</ref>, dedicated to manuscript dating. There exist datasets non-specialized on a specific Arabic script, such as HADARA80P <ref type="bibr" target="#b12">[13]</ref> and VML-HD <ref type="bibr" target="#b6">[7]</ref>, notably for RASM2018 <ref type="bibr" target="#b2">[3]</ref> comprised of scientific manuscripts from the Qatar Digital Library, or BADAM <ref type="bibr" target="#b7">[8]</ref> focused on line detection in Arabic documents, particularly complex ones. More recently, the RASAM 1 dataset <ref type="bibr" target="#b15">[16]</ref> targets Arabic Maghribi scripts, in contrast to RASM and BADAM, which focus on oriental scripts. It offers typical layouts and hands as representative of the common Maghribi production, selected for the purpose of quickly developing HTR models operable for both research and production. The dataset has since been extended within the scope of the TARIMA project, with 120 pages manually transcribed from 28 various Arabic Maghribi sources, including lithographs. <ref type="foot" target="#foot_0">1</ref> The dataset has been designed for fine-tuning tasks from RASAM 1. For the oriental scripts, we can also mention the Iskandar dataset from the Alexander Hackathon, focusing on 5 manuscripts of the Alexander romance in Middle Arabic. <ref type="foot" target="#foot_1">2</ref>Together, these datasets are already covering a vast part of the production of documents in Arabic scripts (subject to their compatibility, see Table <ref type="table" target="#tab_1">2</ref>). Although the proof of concept is successful for text recognition, the challenge today is to increase the versatility of existing models by providing a greater variety of fully annotated and transcribed documents.  <ref type="table">5</ref> in appendix for the complete list of manuscripts). Its purpose is to extend the variety of cases encountered in RASAM 1, in order to provide a robust training basis for documents in Arabic scripts.</p><p>• Dataset availability (v.1.0): https://github.com/calfa-co/rasam-dataset.</p><p>• License: Apache2.0</p><p>• Data format: pageXML with Text regions and lines • Annotation tool: Calfa Vision<ref type="foot" target="#foot_2">3</ref>  <ref type="bibr" target="#b14">[15]</ref> • Ontology for annotation: SegmOnto <ref type="bibr" target="#b5">[6]</ref> • Transcription guidelines: Same as RASAM 1 (no missing hamza or diacritics added)</p><p>Methodology for data creation: The images have been randomly selected in the manuscripts to constitute a representative sample of the production, of the states of conservation, and of the handwriting quality. The images have been pre-annotated with the baseline and text region detection models trained on RASAM 1 and available within the project type "Arabic Manuscript (default)" on the annotation platform. Afterwards, the predictions have been manually checked by the participants during the hackathons. Transcription guidelines follow RASAM 1 recommendations <ref type="bibr" target="#b15">[16]</ref>.</p><p>The dataset holds 522,371 characters (divided in 54 classes) for a total of 93,855 words (divided in 22,027 classes). The ḍammatan and classes in particular are under-represented and are likely to be less encountered, and so less recognized in a character-based approach (see below Section 4). The words waw ( ), min/man ( ) and fī ( ) are the most represented in the dataset, with 4,398; 2,246 and 2,189 occurrences respectively, a contrario the words al-akhdūd ( ), qaṭām ( ) and la'ād ( ) are among the least represented (a single occurrence). We retained four text regions and two annex regions for the semantic classification of contents:</p><p>• MainZone: the main text region of the document. This region can appear several times within a single page, when the text is segmented or in case of a multiple column layout; • MainZone:title: text region located at the same level as the main text, for headings and stylized titles; • MarginTextZone: marginal text region regardless of its location in the page; • MarginTextZone:catchword: marginal text region corresponding to the catchwords, systematically under the main text region;</p><p>• StampZone: stamps present on the page;</p><p>• TableZone: region corresponding to a table.</p><p>A summary of the text regions distribution is given in Table <ref type="table" target="#tab_2">3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Qualitative description</head><p>As outlined in the introduction, the aim of this new dataset is to enhance the versatility and robustness of RASAM 1 by training it on a wider variety of manuscripts in order to expand the base of its (1.) vocabulary, (2.) layouts and (3.) scripts. As a result, 15 manuscripts make up this new dataset. (2.) From the layout perspective, the RASAM 1 dataset already covered complex layouts: MS.ARA.609 integrated many tables within the body of the text and MS.ARA.1977 recorded many lines of poetry which traditionally are offset from the main text <ref type="bibr" target="#b15">[16]</ref>. The RASAM 2 dataset intends to enhance the capabilities of the model in handling complex layouts. In detail (see Figure <ref type="figure" target="#fig_0">1</ref>), the RASAM 2 dataset reinforces its capabilities in the treatment of poetry verses (MS.ARA.6), tables (MS.ARA.65) and marginal comments, whether they are aligned with the main text as in MS.ARA.1943, or rounded, or even inverted as in MS.ARA.1936. Moreover, the RASAM 2 dataset develops new skills, in particular in the identification of interlinear comments (MS.ARA.1947) or particularly stylised titles (MS.ARA.1926) as well as in the processing of more complex page layouts, notably with the presence of gap texts (MS.ARA.1960). (3.) From a strictly palaeographic point of view, the RASAM 2 dataset intends to deal with a broader variety of hands. The emphasis has been placed on three points in particular. (a.) Firstly, particular interest has been given to the use of colors within these different manuscripts. Some recent experiments conducted on the basis of RASAM 1 show that the use of colors largely hinders the models' good recognition of characters <ref type="bibr" target="#b8">[9]</ref>. Therefore, many manuscripts in the RASAM 2 corpus aim at providing the model with many color realizations (see MS.ARA.1926 and MS.ARA.6 supra, where blue, green, red and yellow are used in particular). (b.) Secondly, RASAM 2 intends to be able to handle different text densities. RASAM 1 was indeed based on only 3 manuscripts which, although different from the density aspect <ref type="bibr" target="#b15">[16]</ref>, did not cover the multiple realizations of Arabic manuscripts in Arabic Maghribi scripts. In order to fill this gap, RASAM 2 is built on a broad continuum in terms of density from very airy manuscripts -such as MS.ARA.1926 with less than ten lines per page and less than ten words per lineto extremely dense manuscripts -such as MS.ARA.1982 with more than forty lines per page and slightly less than twenty words per line, or MS.ARA.1943 with thirty-five lines per page and more than twenty words per line. (c.) Finally, RASAM 2 covers a wider range of Arabic Maghribi scripts. The model is thus built from very careful and stylized, almost calligraphic hands following the example of MS.ARA.1926 (see below <ref type="bibr" target="#b5">6)</ref> or hands that are characterized by a wide amplitude of their final tails -see in particular the realization of the final lām in the word qāla of MS.ARA. <ref type="bibr" target="#b5">6,</ref><ref type="bibr">1926,</ref><ref type="bibr">1946,</ref><ref type="bibr">1947</ref> (see Table <ref type="table">6</ref> in appendix). Conversely, RASAM 2 also includes very cursive and crowded scripts, as is the case for MS. <ref type="bibr">ARA.1943</ref><ref type="bibr">ARA. , 1982</ref>. In sum, and as schematically represented in Figure <ref type="figure" target="#fig_1">2</ref>, RASAM 2 covers a wider reality of Arabic Maghribi hands. It leads to a pre-generic model for the treatment of Arabic Maghribi scripts, far exceeding the possibilities offered by RASAM 1, which was still only a proof of concept until then.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">HTR of Arabic versatility experiments</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Methodology</head><p>The latest developments in HTR for handwritten documents in Arabic scripts have shown that operating a word-based CRNN (where every word is considered as a different class to identify) outperforms a basic character-based CRNN (where each character is considered as a different class to identify) on documents with a steady lexicon (both in learning time and CER) <ref type="bibr" target="#b8">[9]</ref>. This approach, despite being dependent on the targeted lexicon, relies on recognizing a word in context, which appears a more robust approach for cursive Arabic scripts) <ref type="bibr" target="#b8">[9]</ref>. We hold onto this approach, which is a variation of the one implemented for RASAM <ref type="bibr" target="#b15">[16]</ref>. Some underrepresented word classes are in a few-shot learning situation. In this case, the word-based approach is based on context for predictions, and failing that relies on character recognition.</p><p>Lucas et al. have notably demonstrated that a fine-tuning strategy limited to 10 images (160 transcribed lines on average) for the Arabic Maghribi scripts, on the basis of a RASAM-trained model is sufÏcient to reach a CER below 10% and to shorten the transcription work <ref type="bibr" target="#b8">[9]</ref>.</p><p>We are taking this fine-tuning approach from the RASAM model and testing it on two samples: one in-domain sample, derived from RASAM 1 and RASAM 2, and one out-of-domain sample derived from manuscripts from Lucas et al. <ref type="bibr" target="#b8">[9]</ref> (see Figure <ref type="figure" target="#fig_2">3</ref>). The latter dataset is twice out-of-domain, with new scripts and new lexicon. We compare this new model with the one strictly trained on RASAM 1 (see Figure <ref type="figure" target="#fig_3">4</ref> and Table <ref type="table" target="#tab_3">4</ref>). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Results</head><p>Table <ref type="table" target="#tab_3">4</ref> displays the average CER achieved by models trained on RASAM 1 and RASAM 2 in the in-domain and out-of-domain samples. Although RASAM 1 model evaluated on its original sample remains more efÏcient, owing to its high specialization, RASAM 2 model reaches a CER five times smaller on RASAM 2, and almost halves the CER obtained on out-of-domain documents. The lexical and visual diversity provided by RASAM 2, although relatively modest, allows the model to achieve an average CER comparable to state-of-the-art results obtained for Latin scripts, which benefit from significantly larger datasets (e.g., the CATMuS medieval dataset, which includes about 5 million characters).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1.">Out-of-domain results (Maghribi scripts)</head><p>In out-of-domain documents but belonging to the same family of scripts as RASAM 1 and 2, such as the Arabic Maghribi scripts, RASAM 2 demonstrates notable efÏciency, as evidenced in its application to TARIMA. Particularly noteworthy is its performance on Oriental scripts (RASM and Iskandar), where RASAM 2 not only outperforms RASAM 1 but also achieves significantly lower average CER scores (20.34 for RASM and 16.73 for Iskandar). These improved results not only enhance accuracy but also facilitate faster processing with minimal data requirements. Besides the versatility of RASAM 2 model, Figure <ref type="figure" target="#fig_3">4</ref> also shows its robustness with a very consistent CER per page and very little dispersion as in the case of RASAM 1. It is particularly visible on RASAM 2 dataset for which RASAM 1 model (out-of-domain test) reaches a CER between 11.67% (on the manuscript BULAC.MS.ARA.1982) and 48.80% (on the manuscript BULAC.MS.ARA.9). A contrario, the CER of RASAM 2 model ranges between 1.71% and 28.47% in an in-domain instance, and between 7.26% and 26.88% in an out-of-domain instance. The extreme values are therefore practically twice as small as those for RASAM 1. Thus, there remain pages for which our new model does not immediately succeed in producing workable outcome, for these pages, it will then be necessary to adopt a fine-tuning strategy, which should be fast. <ref type="foot" target="#foot_3">4</ref> The median observed in Figure <ref type="figure" target="#fig_4">5</ref> is 27.97% for RASAM 1 for out-of-domain documents, and is reduced to 15.83% for RASAM 2, hence a 42% decrease in the error rate. In the out-of-domain instance, the gap between the results of RASAM 1 and RASAM 2 is narrower. If the manuscripts BULAC.MS.ARA.1922 (31.44% vs 26.38%) and BULAC.MS.ARA.1957 (35.95% vs 26.33%) retain a very high CER, the manuscripts BULAC.MS.ARA.1944 and BULAC. MS.ARA.1929 achieve a CER of 7.67% and 10.16%, better than the CER obtained in-domain for the manuscripts previously cited.</p><p>Despite the diversity of the TARIMA corpus, with both manuscripts and lithographs, the results remain very good. This is due to the proximity between the RASAM 1 &amp; 2 dataset and the palaeographic characteristics of the TARIMA corpus, all of which are in Maghribi script.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.2.">Out-of-domain results (Oriental scripts)</head><p>Out-of-domain results (Oriental scripts) RASAM 2 also demonstrates significantly enhanced efÏciency when applied to Oriental manuscripts, as illustrated by its performance with RASM and Iskandar. Its versatility is particularly evident in Iskandar, where the CER remains below 30%, with an average CER ranging between 8% and 20% (Fig. <ref type="figure" target="#fig_5">4 and 5</ref>). Except for one manuscript (MS_Orient_A_02393), all the CER remain below 20% with RASAM 2. While RASM results exhibit some dispersion (albeit less than with RASAM 1), RASAM 2's perfor- mance varies across the four manuscripts comprising the RASM dataset. Its highest result is observed in Dehli_Arabic_1901 (slightly above 16%), but none exceed 25%. The disparity in out-of-domain results between RASM and Iskandar likely arises from the difference in dataset adherence to RASAM guidelines. While Iskandar follows the RASAM guidelines, the RASM dataset diverges from them, which may explain the observed gap in CER results. For example, when the scribe omitted expected diacritics on certain letters, the transcriber left the letter without them, whereas the RASAM guidelines would have added the diacritics where necessary. This suggests that with minimal fine-tuning, RASAM 2 could readily adapt to various manuscripts, regardless of their script families.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Qualitative interpretation</head><p>RASAM 2 sets a new standard for the recognition of Arabic Maghribi scripts. Figure <ref type="figure" target="#fig_4">5</ref> shows that it nevertheless produces many more errors than the average on four in-domain and outof-domain manuscripts, leading to an increase in the CER. Observation of the manuscripts (see Figure <ref type="figure" target="#fig_6">6</ref>) reveals several situations where the CER decreases naturally.</p><p>Manuscript with vowel signs and numerous interlinear notes: This is the case of the manuscripts BULAC.MS.ARA.1936 and BULAC.MS.ARA.1957 for which we observe an important vocalization which is rarely present in these manuscripts. It leads, at this stage, to a greater ambiguity of the forms to be recognized, but is however not insurmountable: a specialized approach from RASAM shows for example that 160 lines are enough with a word-based approach to reach a CER of 10.41% for the manuscript BULAC.MS.ARA.1957 <ref type="bibr" target="#b8">[9]</ref>.</p><p>Variation in line color: This is a phenomenon already observed in RASAM 1 <ref type="bibr" target="#b15">[16]</ref>, with an over-representation of colored lines among lines with high CER. The MS.ARA.1947, which alternates blue and red lines (marginally present in training) is therefore penalized. Its CER drops to 6.56% without these lines. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>In conclusion, the RASAM 2 dataset offers a high representativeness of Arabic Maghribi scripts. The word-based model trained on this dataset obtains very high in-domain and out-of-domain accuracies, achieving a 40-point CER reduction in all scenarios, which ensures an important coverage of Arabic Maghribi manuscript traditions. The dataset also demonstrates its versatility and can be easily fine-tuned on a new target, including Oriental scripts and new varieties of Arabic (Middle Arabic, Berber written in Arabic). In the future, we will study this transfer of RASAM models to other types of Arabic scripts, in particular Oriental ones. Additionally, we plan to conduct experiments using transformer-based models, as the critical mass of data for Arabic has now been reached, thanks to the RASAM team and all datasets produced within this scope. More generally, the datasets created in recent years around the RASAM team (TARIMA, Iskandar) have made it possible to create a set of open data decisive for the HTR of Arabic scripts.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Examples of complex layout. From left to right, first line: MS.ARA.6, MS.ARA.65, MS.ARA.1943, MS.ARA.1936; second line: MS.ARA.1947, MS.ARA.1926, MS.ARA.1960</figDesc><graphic coords="6,158.35,345.16,94.29,112.52" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Representativity of the cursive and dense characteristics of RASAM 2 scripts in comparison with RASAM 1 We gave each manuscript a score out of 5 to characterize the cursiveness of the writing as well as the density of the text.</figDesc><graphic coords="7,141.37,231.64,312.54,309.26" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Experiments conducted on the new dataset and comparison with the RASAM 1 and RASAM 2 models</figDesc><graphic coords="9,89.28,84.17,416.72,153.09" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Distribution of the achieved CER on the three datasets: RASAM 1 (blue) and RASAM 2 (orange)</figDesc><graphic coords="10,89.28,163.50,416.72,220.11" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5</head><label>5</label><figDesc>presents the average CER for each manuscript. In the in-domain instance, several manuscripts have a CER of less than 5%: this is the case for the manuscripts BU-LAC.MS.ARA.1943 (3.43%), BULAC MS ARA 1977 (4.91%), BULAC. MS.ARA.1982 (3.26%), BU-LAC.MS.ARA.1983 (3.58%), and BULAC MS ARA 45b (3.20%). The BULAC.MS.ARA.1936 and BULAC.MS.ARA.1947 manuscripts, even if they largely benefit from the new model, retain a high CER, higher than 15% and up to 16.25% for the BULAC.MS.ARA .1936 (compared with the 46.47% CER achieved with RASAM 1, but which is out-of-domain).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Distribution of CERs obtained by RASAM 1 (blue) and RASAM 2 (orange) for each in-domain and out-of-domain manuscript. For the out-of-domain evaluation, red dots refer to manuscripts from Lucas et al., purple dots to those from Tarima, orange dots from RASAM, and blue dots from Iskandar.</figDesc><graphic coords="11,89.28,329.54,416.72,301.95" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Examples of complex layout. From left to right: BULAC.MS.ARA.1936 (RASAM 2 dataset, in-domain), BULAC.MS.ARA.1947 (RASAM 2 dataset, in-domain), BULAC.MS.ARA.1922 (Lucas et al., out-of-domain) and BULAC.MS.ARA.1957 (Lucas et al., out-of-domain)</figDesc><graphic coords="12,109.47,443.91,81.24,112.52" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Common limitations encountered with RASAM 1 and state-of-the-art HTR models of Arabic BULAC.MS.ARA.1978 GT RASAM 1 prediction</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Summary of the main existing datasets for Arabic historical documents. Different levels of annotation are offered, often partial, thus limiting data compatibility.</figDesc><table><row><cell>Dataset</cell><cell>Images</cell><cell>Focus</cell><cell cols="3">Annotation Baseline Region</cell><cell>Text</cell><cell>Format</cell></row><row><cell></cell><cell></cell><cell cols="2">Specialized datasets</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>WAHD</cell><cell>43,976</cell><cell>Writer identification</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>NC</cell></row><row><cell>KERTAS</cell><cell>2,502</cell><cell>Manuscript dating</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>XML</cell></row><row><cell>HADARA80p</cell><cell>80</cell><cell>Word spotting</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>XML</cell></row><row><cell>VML-HD</cell><cell>680</cell><cell>Word spotting</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>Hadara XML</cell></row><row><cell></cell><cell></cell><cell cols="3">Datasets for Page Layout Analysis and HTR</cell><cell></cell><cell></cell><cell></cell></row><row><cell>RASM2018</cell><cell>100</cell><cell>General</cell><cell>Full</cell><cell>yes</cell><cell>yes</cell><cell>yes</cell><cell>pageXML</cell></row><row><cell>BADAM RASAM 1</cell><cell>400 300</cell><cell>Layout Maghribi scripts</cell><cell>Partial Full</cell><cell>yes yes</cell><cell>no yes</cell><cell>no yes</cell><cell>pageXML pageXML</cell></row><row><cell>TARIMA</cell><cell>120</cell><cell>Maghribi scripts</cell><cell>Full</cell><cell>yes</cell><cell>yes</cell><cell>yes</cell><cell>pageXML</cell></row><row><cell>Iskandar</cell><cell>297</cell><cell>Oriental scripts</cell><cell>Full</cell><cell>yes</cell><cell>yes</cell><cell>yes</cell><cell>pageXML</cell></row><row><cell cols="3">3. Dataset composition</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table><note>3.1. Quantitative descriptionSummary: RASAM 2 dataset comprises 250 images from 15 different manuscripts. 3,750 lines in total have been transcribed, 250 lines by manuscript on average, regardless of the type (main text or marginal notes). It entails 5,702 annotated lines in total and focuses on Arabic Maghribi manuscripts (see Table</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Distribution of TextRegion types in RASAM 2 dataset (v1.0)</figDesc><table><row><cell>Manuscript</cell><cell>MainZone</cell><cell>MainZone: title</cell><cell>Margin TextZone</cell><cell>Margin TextZone: catchword</cell><cell cols="2">StampZone TableZone</cell></row><row><cell>BULAC.MS.ARA.6</cell><cell>15</cell><cell>-</cell><cell>6</cell><cell>7</cell><cell>-</cell><cell>-</cell></row><row><cell>BULAC.MS.ARA.9</cell><cell>16</cell><cell>-</cell><cell>26</cell><cell>6</cell><cell>-</cell><cell>-</cell></row><row><cell>BULAC.MS.ARA.23</cell><cell>16</cell><cell>-</cell><cell>5</cell><cell>7</cell><cell>-</cell><cell>-</cell></row><row><cell>BULAC.MS.ARA.24</cell><cell>17</cell><cell>-</cell><cell>1</cell><cell>7</cell><cell>-</cell><cell>-</cell></row><row><cell>BULAC.MS.ARA.45b</cell><cell>16</cell><cell>-</cell><cell>46</cell><cell>7</cell><cell>-</cell><cell>-</cell></row><row><cell>BULAC.MS.ARA.65</cell><cell>13</cell><cell>-</cell><cell>6</cell><cell>6</cell><cell>-</cell><cell>1</cell></row><row><cell>BULAC.MS.ARA.1926</cell><cell>41</cell><cell>1</cell><cell>24</cell><cell>15</cell><cell>-</cell><cell>-</cell></row><row><cell>BULAC.MS.ARA.1936</cell><cell>20</cell><cell>-</cell><cell>41</cell><cell>8</cell><cell>-</cell><cell>-</cell></row><row><cell>BULAC.MS.ARA.1943</cell><cell>25</cell><cell>-</cell><cell>83</cell><cell>8</cell><cell>-</cell><cell>-</cell></row><row><cell>BULAC.MS.ARA.1944</cell><cell>35</cell><cell>-</cell><cell>43</cell><cell>13</cell><cell>2</cell><cell>-</cell></row><row><cell>BULAC.MS.ARA.1946</cell><cell>25</cell><cell>-</cell><cell>3</cell><cell>9</cell><cell>2</cell><cell>-</cell></row><row><cell>BULAC.MS.ARA.1947</cell><cell>18</cell><cell>1</cell><cell>28</cell><cell>7</cell><cell>-</cell><cell>-</cell></row><row><cell>BULAC.MS.ARA.1960</cell><cell>16</cell><cell>-</cell><cell>60</cell><cell>8</cell><cell>-</cell><cell>-</cell></row><row><cell>BULAC.MS.ARA.1982</cell><cell>25</cell><cell>1</cell><cell>9</cell><cell>16</cell><cell>-</cell><cell>-</cell></row><row><cell>BULAC.MS.ARA.1983</cell><cell>15</cell><cell>1</cell><cell>2</cell><cell>8</cell><cell>2</cell><cell>-</cell></row><row><cell>TOTAL</cell><cell>313</cell><cell>4</cell><cell>383</cell><cell>132</cell><cell>6</cell><cell>1</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Comparison of CER achieved on in-domain and out-of-domain samples. The outcome of RASAM 1 on RASAM 1 is drawn from the original article.</figDesc><table><row><cell></cell><cell cols="2">in-domain test</cell><cell></cell><cell cols="2">out-of-domain test</cell><cell></cell><cell></cell></row><row><cell></cell><cell cols="7">RASAM 1 RASAM 2 RASAM 2 Lucas et al. RASM TARIMA Iskandar</cell></row><row><cell>RASAM 1</cell><cell>4.8*</cell><cell>-</cell><cell>30.91</cell><cell>25.75</cell><cell>42.02</cell><cell>26.81</cell><cell>46.91</cell></row><row><cell>RASAM 2</cell><cell>5.50</cell><cell>6.79</cell><cell>-</cell><cell>16.38</cell><cell>20.34</cell><cell>9.70</cell><cell>16.73</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://github.com/calfa-co/tarima</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://gitlab.huma-num.fr/lipa/iskandar</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://vision.calfa.fr</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">In Lucas et al., a CER of 3.23% was reached with a different split and a slightly redesigned architecture, based on a meta-word-based approach (in the context of a specialized in-domain model). It also shows in particular that for the manuscript BULAC.MS.ARA.1957, the initial CER of 30.46% (RASAM 1) is reduced to 21.8% after a fine-tuning of only 20 lines. Applied to the same manuscript (see Figure</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">), RASAM 2 model obtains an initial CER of 25.5%<ref type="bibr" target="#b8">[9]</ref>.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was carried out within the framework of cooperation between the Research Consortium Middle-East and Muslim Worlds (GIS MOMM), the BULAC, and Calfa. It aligns with the scientific focus defined by the GIS MOMM, which prioritizes North African studies and digital humanities.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Data availability</head><p>• RASAM 1 and 2 datasets: https://github.com/calfa-co/rasam-dataset • TARIMA dataset: https://github.com/calfa-co/tarima • Iskandar dataset: https://gitlab.huma-num.fr/lipa/iskandar  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Paleographical features of RASAM 2 dataset</head></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Wahd: a database for writer identification of arabic historical documents</title>
		<author>
			<persName><forename type="first">A</forename><surname>Abdelhaleem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Droby</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Asi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kassis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">Al</forename><surname>Asam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>El-Sanaa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2017 1st International workshop on arabic script analysis and recognition (ASAR)</title>
				<imprint>
			<publisher>Ieee</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="64" to="68" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">KERTAS: dataset for automatic dating of ancient Arabic manuscripts</title>
		<author>
			<persName><forename type="first">K</forename><surname>Adam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Baig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Al-Maadeed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bouridane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>El-Menshawy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal on Document Analysis and Recognition (IJDAR)</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="283" to="290" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Icfhr 2018 competition on recognition of historical arabic scientific manuscripts-rasm2018</title>
		<author>
			<persName><forename type="first">C</forename><surname>Clausner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Antonacopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Mcgregor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wilson-Nunn</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">16th International Conference on Frontiers in Handwriting Recognition (ICFHR)</title>
				<imprint>
			<publisher>Ieee</publisher>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="page" from="471" to="476" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">LAMIS-MSHD: A Multi-script OfÒine Handwriting Database</title>
		<author>
			<persName><forename type="first">C</forename><surname>Djeddi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gattal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Souici-Meslati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Siddiqi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chibani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">El</forename><surname>Abed</surname></persName>
		</author>
		<idno type="DOI">10.1109/icfhr.2014.23</idno>
	</analytic>
	<monogr>
		<title level="m">14th International Conference on Frontiers in Handwriting Recognition</title>
				<imprint>
			<date type="published" when="2014">2014. 2014</date>
			<biblScope unit="page" from="93" to="97" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Generative adversarial network based adaptive data augmentation for handwritten Arabic text recognition</title>
		<author>
			<persName><forename type="first">M</forename><surname>Eltay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zidouri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Ahmad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Elarian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PeerJ Computer Science</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page">e861</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">SegmOnto: common vocabulary and practices for analysing the layout of manuscripts (and more)</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gabay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-B</forename><surname>Camps</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pinche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Jahan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">1st International Workshop on Computational Paleography (IWCP ICDAR</title>
				<imprint>
			<date type="published" when="2021">2021. 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Vml-hd: The historical arabic documents dataset for recognition systems</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kassis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Abdalhaleem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Droby</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Alaasam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>El-Sana</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2017 1st international workshop on Arabic script analysis and recognition (ASAR)</title>
				<imprint>
			<publisher>Ieee</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="11" to="14" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">BADAM: a public dataset for baseline detection in Arabic-script manuscripts</title>
		<author>
			<persName><forename type="first">B</forename><surname>Kiessling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S B</forename><surname>Ezra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Miller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th International Workshop on Historical Document Imaging and Processing</title>
				<meeting>the 5th International Workshop on Historical Document Imaging and Processing</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="13" to="18" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">New Results for the Text Recognition of Arabic Maghribi Manuscripts -Managing an Under-resourced Script</title>
		<author>
			<persName><forename type="first">N</forename><surname>Lucas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Salah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Vidal-Gorène</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">QUWI: An Arabic and English Handwriting Dataset for OfÒine Writer Identification</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Maadeed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ayouby</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hassaıne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Aljaam</surname></persName>
		</author>
		<idno type="DOI">10.1109/icfhr.2012.256</idno>
	</analytic>
	<monogr>
		<title level="m">2012 International Conference on Frontiers in Handwriting Recognition</title>
				<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="746" to="751" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">KHATT: An open Arabic ofÒine handwritten text database</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Mahmoud</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Ahmad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">G</forename><surname>Al-Khatib</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Alshayeb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tanvir Parvez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Märgner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">A</forename><surname>Fink</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.patcog.2013.08.009</idno>
	</analytic>
	<monogr>
		<title level="j">Pattern Recognition</title>
		<imprint>
			<biblScope unit="volume">47</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="1096" to="1112" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Transformer-based Model For Handwritten Recognition Arabic Words Al-soudani Maghrebi Script</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Maouloud</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">O M</forename><surname>Dyla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ba</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Theoretical and Applied Information Technology</title>
		<imprint>
			<biblScope unit="volume">101</biblScope>
			<biblScope unit="page">24</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">An historical handwritten arabic dataset for segmentation-free word spotting-hadara80p</title>
		<author>
			<persName><forename type="first">W</forename><surname>Pantke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dennhardt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fecker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Märgner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Fingscheidt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">14th International Conference on Frontiers in Handwriting Recognition</title>
				<imprint>
			<publisher>Ieee</publisher>
			<date type="published" when="2014">2014. 2014</date>
			<biblScope unit="page" from="15" to="20" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">IFN/ENIT-database of handwritten Arabic words</title>
		<author>
			<persName><forename type="first">M</forename><surname>Pechwitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Maddouri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Märgner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ellouze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Amiri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of CIFED</title>
				<meeting>of CIFED</meeting>
		<imprint>
			<publisher>Citeseer</publisher>
			<date type="published" when="2002">2002</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="127" to="136" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">A Modular and Automated Annotation Platform for Handwritings: Evaluation on Under-Resourced Languages</title>
		<author>
			<persName><forename type="first">C</forename><surname>Vidal-Gorène</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Dupin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Decours-Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Riccioli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Document Analysis and Recognition -ICDAR 2021</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Lladós</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Lopresti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Uchida</surname></persName>
		</editor>
		<editor>
			<persName><surname>Cham</surname></persName>
		</editor>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="507" to="522" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">RASAM -A Dataset for the Recognition and Analysis of Scripts in Arabic Maghrebi</title>
		<author>
			<persName><forename type="first">C</forename><surname>Vidal-Gorène</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Lucas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Salah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Decours-Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Dupin</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-86198-8\_19</idno>
	</analytic>
	<monogr>
		<title level="m">Document Analysis and Recognition -ICDAR 2021 Workshops</title>
				<editor>
			<persName><forename type="first">E</forename><forename type="middle">H</forename><surname>Barney Smith</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><surname>Pal</surname></persName>
		</editor>
		<editor>
			<persName><surname>Cham</surname></persName>
		</editor>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="265" to="281" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="middle">Ara</forename><surname>Ms</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ara</forename><surname>Ara</surname></persName>
		</author>
		<author>
			<persName><surname>Ms</surname></persName>
		</author>
		<author>
			<persName><surname>Ara</surname></persName>
		</author>
		<idno>.1982</idno>
		<title level="m">RASM (</title>
				<imprint>
			<date type="published" when="1946">1946. 1983</date>
		</imprint>
	</monogr>
	<note type="report_type">MS.</note>
	<note>Oriental script</note>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title/>
		<author>
			<persName><surname>Dehli</surname></persName>
		</author>
		<author>
			<persName><surname>Arabic</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1901">1901</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title/>
		<author>
			<persName><surname>Or</surname></persName>
		</author>
		<editor>Iskandar</editor>
		<imprint>
			<biblScope unit="page">3366</biblScope>
		</imprint>
	</monogr>
	<note>Oriental script</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title/>
	</analytic>
	<monogr>
		<title level="j">Orient.A</title>
		<imprint>
			<biblScope unit="volume">0238</biblScope>
			<biblScope unit="page">153</biblScope>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
