<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Transcribing History with Tesseract: A Monomodal OCR Approach in the PastReader 2025 Shared Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jaione Macicior-Mitxelena</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Nacional de Educación a Distancia (UNED)</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper presents the results of our participation in the PastReader 2025 shared task, which focused on the automatic transcription of historical Spanish documents. We explored a monomodal OCR approach using the open-source Tesseract engine, evaluating both its baseline performance and the efects of domain-specific finetuning. Experiments were conducted on a dataset of digitized newspaper pages provided by the Biblioteca Nacional de España (BNE). While Tesseract ofered a solid baseline, fine-tuning with task-specific data did not yield consistent improvements, revealing challenges related to data heterogeneity, layout complexity, and OCR model generalization. Our findings highlight the limitations of monomodal approaches for noisy historical documents and suggest future directions involving layout-aware and multimodal methods.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;OCR</kwd>
        <kwd>Historical Documents</kwd>
        <kwd>Digital Humanities</kwd>
        <kwd>Tesseract</kwd>
        <kwd>Finetuning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Short Description of the Shared Task and Dataset</title>
      <p>
        The PastReader 2025 shared task [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] focuses on the automatic transcription of historical printed texts
with the aim of improving OCR outputs on digitized Spanish newspapers and documents. This task
addresses a critical need within the Digital Humanities: the preservation and enhanced accessibility
of historical documents that present significant challenges for modern text processing systems due to
their age, print quality, and complex layout.
      </p>
      <p>Even if the competition is divided into two main subtasks, due to GRESEL1 time constraints only
applying for the second task End-to-End Transcription was considered. In this demanding subtask,
participants are required to develop systems that process scanned images of historical newspaper pages
and output accurate, structured transcriptions.</p>
      <p>
        The dataset used in this shared task is derived from the Hemeroteca Digital of the Biblioteca Nacional
de España (BNE) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This open-access digital archive includes historical Spanish press publications in
the public domain 1.
      </p>
      <p>OCR quality varies considerably due to factors such as the condition of the original documents, the
complexity of the layout, and the OCR technology used at the time of digitization. For the purposes of
the shared task, the dataset was partitioned into training, development, and test sets, using stratified
sampling to ensure representative coverage across the collection. The training set consists of 8,959 pages
and includes scanned PDFs, the original OCR output, and manually corrected reference transcriptions.
The development set comprises 500 pages with the same structure—PDF scans, OCR text, and corrected
ground truth—allowing participants to fine-tune and validate their models prior to submission. The
test set for Task End-to-End Transcription, contains 3,000 scanned PDF pages, without accompanying
transcriptions.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Approach Description</title>
      <p>
        The two experiments made use of the Tesseract OCR engine. Its open-source nature and adaptability
make it a practical choice for initial OCR tasks on historical documents. Tesseract is an OCR engine
developed by Hewlett-Packard and later maintained by Google that supports over 100 languages and
includes features such as layout analysis and recognition of various image formats. Its accessibility and
lfexibility make it a popular choice for digitizing printed texts. However, Tesseract’s performance can
be challenged when dealing with historical documents that often contain noise, complex layouts, or
uncommon fonts. A benchmarking study comparing Tesseract with Amazon Textract and Google
Document AI found that server-based processors outperformed Tesseract, especially on noisy documents [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Despite these limitations, Tesseract remains a valuable tool for establishing baseline OCR performance
and serves as a useful point of comparison when evaluating more advanced or specialized OCR systems.
      </p>
      <p>
        Recent research has explored enhancing Tesseract’s capabilities for historical documents. For instance,
a study focused on 19th-century printed serials employed a combined machine learning approach,
ifne-tuning Tesseract with synthesized and manually annotated data to improve OCR quality [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
This approach demonstrated that integrating monomodal OCR techniques with comprehensive layout
analysis can significantly enhance text recognition accuracy in historical documents. For instance,
Fleischhacker et al. (2024) reported substantial reductions in Character Error Rate (CER) and Word Error
Rate (WER) by combining structure detection with fine-tuned OCR processes. Similarly, the OCR4all
framework provides a semi-automatic OCR workflow that incorporates layout analysis and continuous
model training, leading to improved OCR performance in historical printings [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>Despite these advancements in OCR methods that combine text recognition with layout analysis,
our approach in this shared task focused solely on monomodal, text-based techniques. This decision
was driven by the absence of preprocessed layout information in the dataset and time constraints
that limited the feasibility of integrating comprehensive layout-aware processing. Consequently, our
experiments prioritized transcription performance based only on textual content. Initially, Tesseract
was applied in its baseline configuration to establish a performance benchmark. The model was then
ifne-tuned using the annotated training data provided by the task organizers, in an efort to improve
accuracy by adapting to the dataset’s specific characteristics.</p>
      <p>These strategies, ranging from baseline to fine-tuned OCR models, represent a progression from
general-purpose to more task-specific solutions. The following sections provide a detailed description
of each approach. Future work will explore the incorporation of layout-aware techniques to further
enhance OCR performance in complex historical documents.
1https://hemerotecadigital.bne.es/hd/es/fulltext-csv</p>
      <p>Tesseract, as mentioned earlier, is an open-source OCR engine2. The installation was performed
on a Windows-based system, but it can be done on other operating systems easily. Fine-tuning
was implemented using the tesstrain training toolkit, which is compatible with the LSTM-based
recognition architecture introduced in Tesseract 4.0 and maintained in later versions.</p>
      <p>Two experimental runs were conducted to assess Tesseract’s performance under diferent
configurations. In the first run GRESEL1_run2, the default pre-trained Tesseract model was applied directly
to the test set without any additional training or adaptation. This approach simulates a zero-shot or
“of-the-shelf” application scenario, which is useful for establishing a baseline. The model was executed
with standard parameters, and no preprocessing was applied beyond resizing to ensure compatibility
with Tesseract’s input expectations. In the second run GRESEL1_run3, the baseline was fine-tuned on
the provided training dataset to tailor it more closely to the characteristics of the historical documents.
The purpose of this fine-tuning was to determine whether training on similar materials could improve
recognition accuracy—especially on degraded, low-contrast, or typographically irregular documents
that typically reduce Tesseract’s performance.</p>
      <p>This two-step process was not only useful for benchmarking the Tesseract model but also served as a
practical diagnostic tool for assessing the quality of the dataset itself. By observing the kinds of errors
the model made—both before and after fine-tuning, it was possible to identify problematic samples,
inconsistencies in labeling, and areas where document quality or transcription standards might afect
downstream OCR performance. These insights were valuable for both improving model training and
better understanding the dataset’s challenges and limitations.</p>
      <p>The fine-tuning was conducted using version 5.5.0 of Tesseract, installed with training support
enabled (by selecting the “Install training tools” option during setup). The oficial Windows installer
with training tools was downloaded from the Tesseract GitHub releases page. All scripts and auxiliary
tools were developed and executed in a Windows environment. The tesstrain toolkit was cloned from
the oficial GitHub repository 3 to provide the core infrastructure for training. A base Spanish model
(spa.traineddata) was manually placed in a custom tessdata/ directory, and used as the initialization
point for transfer learning. To streamline the process, a Python script (Finetune_tesseract.py) was
developed to automate the full fine-tuning pipeline. The structure of the working directory before
ifne-tuning was organized as described in Figure 1 to the left.</p>
      <p>WORKING DIRECTORY STRUCTURE</p>
      <p>Before Fine-Tuning
PastReader/
|-- train/
| |-- pdf/ &lt;- Original PDF documents -&gt;
| |__ ocr/ &lt;- Transcripts (.txt) -&gt;
|
|
|-- finetuning/
| |-- spa.traineddata &lt;- Base model -&gt;
| |__ spa_custom.traineddata &lt;- Fine-tuned -&gt;</p>
      <p>After Fine-Tuning
PastReader/
|-- train/
| |-- pdf/
| |-- ocr/
| |-- tif/ &lt;- tif images (.tif) and box files
| |__ lstmf/ &lt;- Intermediate LSTM training files
|-- finetuning/
| |-- spa.traineddata
| |__ finetuned_model.traineddata</p>
      <p>The fine-tuning process began with transcription alignment, where all corrected text transcriptions
were renamed to adhere to the .gt.txt naming convention required by Tesseract for supervised learning.
This step ensures that each transcription is correctly paired with its corresponding image during
2https://github.com/tesseract-ocr/tesseract
3https://github.com/tesseract-ocr/tesstrain.git)
training. Next, image conversion was carried out. The original PDF files were transformed into TIFF
format since TIFF images are preferable for Tesseract OCR as they provide a rasterised representation
at approximately 300 DPI, which is optimal for recognition and compatible with Tesseract’s training
requirements. Following this, training file generation was performed. Bounding box annotation
ifles (.box) were created using Tesseract in training mode, capturing the spatial alignment between
characters and their positions in the image. Subsequently, LSTM-compatible training data files (.lstmf)
were generated by running Tesseract with the –psm 6 (assumes a block of text) and –oem 1 (uses the
LSTM OCR engine) settings. These .lstmf files encapsulate the image-text pairings needed for training
the recurrent neural network.</p>
      <p>For dataset compilation, an index file named list.txt was automatically generated. This file contained
the absolute paths to all .lstmf files and served as an input manifest required by the lstmtraining
binary to locate and load the training data. The model training phase was then initiated using the
lstmtraining tool. The process began from an extracted LSTM file (spa.lstm), derived from the base
model spa.traineddata. Training was conducted iteratively, with model checkpoints saved at regular
intervals to monitor progress and facilitate recovery if needed. Finally, in the model finalisation step,
the trained model weights were packaged into a usable .traineddata format using the combine_tessdata
utility. The resulting file (finetuned_model.traineddata) represents the final fine-tuned model, ready for
inference on historical document images. With this process, the updated structure of the project after
ifne-tuning should be as described above, in Figure 1 to the right.</p>
      <p>To study the efect of dataset size on model performance, multiple fine-tuning experiments were
conducted using incrementally larger subsets of the training data: 100, 1,000, 2,000 samples, and the
complete dataset. For each configuration, a separate model was fine-tuned following the same procedure
described earlier. The resulting models were then evaluated on a held-out development set to measure
changes in recognition accuracy and to analyze the trade-of between increased training time and
performance improvements.</p>
      <p>Regarding the inference process, it was applied uniformly across all models—both the baseline and
each fine-tuned variant—to ensure a fair comparison. Since Tesseract requires image input, all evaluation
data originally in PDF format was converted to high-resolution TIFF images, which are known to yield
better OCR accuracy. Again, TIFF files were used, and they served as the input for the inference stage.
Tesseract was executed via command-line interface using the same syntax for all models. The following
generic command was applied: tesseract &lt;input_image.tif&gt; &lt;output_file&gt; -l &lt;model_name&gt; –psm 6 . This
command runs Tesseract using the specified language model (the model’s name without the extension
.traineddata), with page segmentation mode 6 (–psm 6), which assumes a uniform block of text. This
setting was selected based on empirical recommendations and its suitability for the layout of historical
documents used in the dataset.</p>
      <p>Turning now to a comparison between the performance of both Tesseract runs, Figure 2 reveals
a surprising trend: the baseline model (i.e., the default pre-trained model without any fine-tuning)
consistently achieved the best performance across key evaluation metrics. It should be noted that these
results do not correspond to the test dataset, as the training partition was used for nfie-tuning and the
development set for evaluation. All fine-tuned models—regardless of the number of training samples
used—produced nearly identical results, none of which surpassed the baseline we established with the
non-fine-tuned model. This outcome suggests that the fine-tuning process, as implemented, did not lead
to meaningful performance gains and may point to limitations in the training data quality, the amount
of domain-specific signal, or the need for more aggressive preprocessing or augmentation strategies.
Further investigation would be required to identify the cause of this plateau and improve the eficacy of
the fine-tuning approach.</p>
      <p>Several factors may help explain this outcome. First, the training data itself may have been too
heterogeneous, encompassing a wide range of fonts, layouts, languages, and document qualities (e.g.,
scanned pages with noise, low contrast, or skewed alignment). Such variability can make it dificult
for the model to generalise efectively, especially when trained on small or inconsistent subsets. In
addition, historical documents often contain typography and page structures that deviate significantly
from the patterns learned by Tesseract’s pre-trained LSTM model, which is optimised for contemporary
printed text.</p>
      <p>Another possible limitation is the alignment quality of the training pairs. Even small mismatches
between the text transcriptions and the visual data can introduce noise during training, weakening the
model’s ability to learn accurate character-level associations. Furthermore, Tesseract’s architecture is
sensitive to input resolution and layout assumptions; deviations from its expected formats can lead to
suboptimal performance unless carefully corrected during preprocessing. While fine-tuning Tesseract
theoretically allows for domain-specific adaptation, the lack of performance gains in this study indicates
that the efectiveness of this approach is heavily dependent on the consistency and quality of the
training data. Future work may benefit from more targeted preprocessing, tighter control over dataset
variability, or the use of alternative OCR models better suited to historical or visually degraded texts.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Analysis of results</title>
      <sec id="sec-4-1">
        <title>4.1. Validation set: qualitative insight</title>
        <p>To gain deeper insight into the behavior of the Tesseract models, an initial qualitative analysis was
conducted based on the results of applying inference to the validation set. As a first step, it was necessary
to identify the most representative samples—specifically, those for which the transcriptions yielded the
poorest results. Given that the evaluation involved seven distinct metrics, the ten lowest-scoring files
for each metric were initially selected. Upon comparison, it was observed that certain files appeared
in multiple “top 10” worst-performing lists. Since this analysis was performed manually, attention
was focused on the files that occurred in more than four of these lists. Because these files consistently
showed poor performance across several metrics, they are assumed to be particularly informative for
identifying common failure patterns and understanding the limitations of the models. In the figures 3
and 4 we can observe some of the most problematic cases for the baseline model and fine-tuned model
(respectively).</p>
        <p>Six files—9100, 9113, 9171, 9382, 9330, and 9088—were identified as among the worst-performing
cases for both the baseline and fine-tuned models. Analysis of these files reveals several consistent
failure patterns. The models tend to perform poorly on images that,
• contain little or no text (e.g., 9100, 9113, 9171),
• exhibit low contrast (e.g., 9330, 9088),
• feature non-traditional layouts resembling diagrams or schematics (e.g., 9382).</p>
        <p>• include decorative elements, as seen in file 9111.</p>
        <p>
          Notably, while the baseline model already struggles with such inputs, the fine-tuned model
additionally fails to accurately transcribe pages with decorative elements. Probably mixing somehow the
information from both images and text will benefit transcription (e.g., 9100, 9113, 9171) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], but this
work is out of the scope of this paper at this moment.
        </p>
        <p>Although this analysis could be extended and refined through the examination of a larger sample,
the present findings ofer a preliminary understanding of the models’ limitations. Such qualitative
assessment is essential for informing future improvements to model performance.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Quantitative analysis</title>
        <p>In order to understand Table 2, a short description of each of our runs is provided below in Table 1.</p>
        <p>Table 2 presents the evaluation outcomes of the shared task, as provided by the organising committee.
Although the initial plan involved six evaluation metrics—Word Error Rate (WER), Sentence Error
Run ID
GRESEL1_run2
GRESEL1_run3</p>
        <p>Description
Using Tesseract without fine-tuning; serves as a baseline.</p>
        <p>Using the fine-tuned version of Tesseract for inference.
Rate (SER), Levenshtein Distance, Normalised Edit Distance (NED), BLEU, and ROUGE—not all were
ultimately adopted as expected. SER appears to have been excluded without explanation, whereas
ROUGE was expanded into four distinct variants: ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-LSum.
WER and Levenshtein Distance were retained, bringing the total number of metrics used to eight. This
expanded metric set enables a multifaceted assessment of OCR output quality.</p>
        <p>The evaluation framework combines both form-based and content-based approaches. On the one
hand, character-level accuracy is addressed through metrics such as Levenshtein Distance and NED,
which account for the number and nature of required edits, alongside WER, which reflects deviations at
the lexical level. These measures prioritise exact textual correspondence, with lower scores indicating
better alignment with the reference. On the other hand, semantic similarity and fluency are captured
through BLEU and the various ROUGE metrics. BLEU evaluates precision in n-gram overlap, while
ROUGE-1 and ROUGE-2 measure recall of unigrams and bigrams, respectively. ROUGE-L and
ROUGELSum focus on the longest common subsequence and overall summary-level coherence. In contrast to
form-based metrics, higher scores in these semantic metrics reflect improved retention of meaning and
structural integrity.</p>
        <p>Collectively, this comprehensive metric suite enables an in-depth analysis of OCR system performance,
encompassing both strict textual fidelity and broader linguistic adequacy. Figure 5 complements the
table by ofering a visual representation of these results.</p>
        <p>TEAM
OCRTITS_run1
GRESEL1_run3
GRESEL1_run2</p>
        <p>BASELINE</p>
        <p>Table 2 presents a generally strong performance across all submitted runs, with most configurations
surpassing the baseline across various evaluation metrics. This indicates a solid level of robustness and
efectiveness in the approaches tested, suggesting that even the least performing configurations meet a
reasonable threshold of OCR quality.</p>
        <p>The Tesseract-based configurations— GRESEL1_run2 and GRESEL1_run3—exhibit performance
patterns that align with prior results obtained during training-phase experiments (refer to Figure 2).
Notably, the non-fine-tuned model (GRESEL1_run2) outperformed the fine-tuned counterpart in six
out of eight evaluated metrics, showing broader robustness across both surface-level and semantic
evaluations. In contrast, the fine-tuned model (GRESEL1_run3) attained higher scores in only two
precision-focused metrics: Levenshtein distance, where it secured the second-best overall score, and
normalized edit distance (NED). These results point toward a potential overfitting efect during the
finetuning process, wherein the model may have adapted too closely to specific features of the training data
and failed to generalize efectively to unseen test samples. Nonetheless, performance gaps in semantic
metrics such as ROUGE-L (0.8180 for GRESEL1_run3 vs. 0.8256 for GRESEL1_run2) and BLEU (0.6220
vs. 0.6229) remain relatively minor, indicating that fine-tuning did not drastically compromise the
model’s semantic coherence. Rather, it maintained competitive performance, albeit without delivering
consistent improvements.</p>
        <p>In parallel, our investigation into the environmental impact of model deployment revealed a modest
increase in emissions associated with the fine-tuned Tesseract model, as depicted in Figure 6. Several
plausible factors could underlie this discrepancy. Fine-tuning typically introduces increased
computational overhead, whether due to subtle internal architectural shifts, longer inference times per input, or
heightened memory requirements. Additionally, variations in preprocessing pipelines—such as the use
of higher-resolution inputs or more complex image transformations—can amplify the computational
burden. Although the observed increase in emissions was relatively small, it nonetheless reflects a
broader trade-of inherent in machine learning systems: the tension between maximizing model
accuracy and maintaining energy eficiency. These findings emphasize the importance of evaluating
sustainability metrics alongside traditional performance measures, particularly when deploying models
at scale or in resource-constrained environments.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>This paper presented our participation in a challenging OCR shared task using a monomodal approach.
Despite the complexities posed by the dataset’s diverse and heterogeneous nature, all our submissions
yielded satisfactory results when compared to the baseline.</p>
      <p>While Tesseract—particularly in its default configuration—ofers a reasonable benchmark for OCR
tasks involving historical documents, it lacks the accuracy and robustness demonstrated by more recent,
domain-specific systems. Nevertheless, the comparison between Tesseract’s baseline and fine-tuned
versions proved valuable for assessing the benefits of domain adaptation and for understanding the
dataset’s inherent challenges.</p>
      <p>This endeavor provided meaningful insights and learning opportunities through our experiments.
Looking ahead, we plan to evaluate similarly scaled multimodal models to benchmark performance
across diferent approaches. We aim to further improve our results in OCR tasks by continuing to
explore novel strategies and methodologies.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author utilized ChatGPT-4o for grammar and spelling correction,
paraphrasing, and rewording. Additionally, Microsoft Copilot was employed for formatting support,
including LaTeX commands, image labeling, and table creation. ChatGPT-4o was also used to generate
the charts shown in Figures 5 and 6. All content produced with the assistance of these tools was
subsequently reviewed and edited by the author, who takes full responsibility for the final publication.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcia Serrano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Menta</surname>
          </string-name>
          <string-name>
            <surname>Garuz</surname>
          </string-name>
          ,
          <article-title>La inteligencia artificial en las humanidades digitales: dos experiencias con corpus digitales</article-title>
          ,
          <source>Revista de Humanidades Digitales</source>
          <volume>7</volume>
          (
          <year>2022</year>
          )
          <fpage>19</fpage>
          -
          <lpage>39</lpage>
          . URL: https: //revistas.uned.es/index.php/RHD/article/view/30928. doi:
          <volume>10</volume>
          .5944/rhd.vol.
          <volume>7</volume>
          .
          <year>2022</year>
          .
          <volume>30928</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>González-Barba</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <article-title>Overview of IberLEF 2025: Natural Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the Spanish Society for Natural Language Processing (SEPLN 2025), CEUR-WS</article-title>
          . org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Montejo-Ráez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sánchez-Nogales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Expósito-Álvarez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Ureña-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Martín-Valdivia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Collado-Montañez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Cabrera-de Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Cantero-Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ortuño-Casanova</surname>
          </string-name>
          ,
          <article-title>Overview of pastreader shared task in iberlef 2025: Transcribing texts from the past</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>75</volume>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Montejo-Ráez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Sánchez</given-names>
            <surname>Nogales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. Expósito</given-names>
            <surname>Álvarez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ureña</given-names>
            <surname>López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Martín-Valdivia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Collado-Montañez</surname>
          </string-name>
          , I. Cabrera de Castro,
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Cantero Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>García</surname>
          </string-name>
          <string-name>
            <surname>Serrano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Ortuño</given-names>
            <surname>Casanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. A.</given-names>
            <surname>Torterolo</surname>
          </string-name>
          <string-name>
            <surname>Orta</surname>
          </string-name>
          ,
          <year>Pastreader 2025</year>
          , https://doi.org/10.5281/zenodo.15084265,
          <year>2025</year>
          . [Data set].
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hegghammer</surname>
          </string-name>
          ,
          <article-title>Ocr with tesseract, amazon textract, and google document ai: a benchmarking experiment</article-title>
          ,
          <source>Journal of Computational Social Science</source>
          <volume>5</volume>
          (
          <year>2022</year>
          )
          <fpage>861</fpage>
          -
          <lpage>882</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Fleischhacker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Goederle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kern</surname>
          </string-name>
          ,
          <article-title>Improving ocr quality in 19th century historical documents using a combined machine learning based approach</article-title>
          ,
          <source>arXiv preprint arXiv:2401.07787</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Reul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Christ</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hartelt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Balbach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wehner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Springmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Grundig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Büttner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Puppe</surname>
          </string-name>
          ,
          <article-title>Ocr4all-an open-source tool providing a (semi-) automatic ocr workflow for historical printings</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>9</volume>
          (
          <year>2019</year>
          )
          <fpage>4853</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Garcia-Arias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcia-Serrano</surname>
          </string-name>
          , Creación de un modelo de descripciones de
          <article-title>imágenes especializado en arqueología griega (en prensa)</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>75</volume>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>García-Serrano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Benavent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Granados</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Goñi-Menoyo</surname>
          </string-name>
          ,
          <article-title>Some results using diferent approaches to merge visual and text-based features in clef'08 photo collection</article-title>
          , in: C. Peters,
          <string-name>
            <given-names>T.</given-names>
            <surname>Deselaers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kurimo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Peñas</surname>
          </string-name>
          , V. Petras (Eds.),
          <source>Evaluating Systems for Multilingual and Multimodal Information Access</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2009</year>
          , pp.
          <fpage>568</fpage>
          -
          <lpage>571</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -04447-2_
          <fpage>69</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>