<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>In Codice Ratio : OCR of Handwritten Latin Documents using Deep Convolutional Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Donatella Firmani</string-name>
          <email>donatella.firmani@uniroma3.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Merialdo</string-name>
          <email>merialdo@dia.uniroma3.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Nieddu</string-name>
          <email>ema.nieddu@gmail</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simone Scardapane</string-name>
          <email>simone.scardapane@uniroma1.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Roma Tre University com</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sapienza University</institution>
        </aff>
      </contrib-group>
      <fpage>9</fpage>
      <lpage>16</lpage>
      <abstract>
        <p>Automatic transcription of historical handwritten documents is a challenging research problem, requiring in general expensive transcriptions from expert paleographers. In Codice Ratio is designed to be an end-to-end architecture requiring instead limited labeling effort, whose aim is the automatic transcription of a portion of the Vatican Secret Archives (one of the largest historical libraries in the world). In this paper, we describe in particular the design of our OCR component for Latin characters. To this end, we first annotated a large corpus of Latin characters with a custom crowdsourcing platform. Leveraging over recent progresses in deep learning, we designed and trained a deep convolutional network achieving an overall accuracy of 96% over the entire dataset, which is one of the highest results reported in the literature so far. Our training data are publicly available.</p>
      </abstract>
      <kwd-group>
        <kwd>deep convolutional neural networks</kwd>
        <kwd>handwritten text recognition</kwd>
        <kwd>optical character recognition</kwd>
        <kwd>medieval documents</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Historical documents are an essential source of knowledge concerning past
cultures and societies [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Until recently, the main bottleneck was the availability
of large collections of historical documents in digital form. Today, many
historical archives have begun instead a full digitalization of their assets, including the
Biblioth`eque Nationale de France1 and the Vatican Apostolic Library.2 Due to
the cost (and time) required for manual transcription of these documents, and
the sheer size of the collections, the challenge has become the design of fully
automatic solutions for their transcription in computer-readable form. While
impressive results have been achieved for printed historical documents [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ],
successfully transcribing handwritten documents remains a challenging task due to
1 http://gallica.bnf.fr/
2 http://www.digitavaticana.org/
a variety of reasons, including irregularities in writing, ligatures and
abbreviations, errors in transcription, and so forth (see the discussion in Section 2).
      </p>
      <p>
        In Codice Ratio is an interdisciplinary project involving Humanities and
Engineering departments from Roma Tre University, as well as the Vatican Secret
Archives, aiming at the complete transcription of the Vatican Registers, a corpus
of more than 18000 pages contained as part of the Vatican Secret Archives, with
minimal labeling effort. The Vatican Secret Archives is one of the largest
historical libraries in the world, containing more than 85 linear kilometres of shelving.
Interestingly, ‘secret’ does not stand for confidential, but rather denotes them as
private property of the Pope. The corpus is comprised of official correspondence
of the Roman Curia produced in the 13th century, including letters, opinions on
legal questions, addressed from and to kings and sovereigns, as well as to many
political and religious institutions throughout Europe. Never having been
transcribed in the past, these documents are of unprecedented historical relevance,
and could shed light to that crucial historical period. A preliminary description
of the system appeared in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Our contribution. In this paper, we describe the design of a novel
component for optical character recognition (OCR) of the Latin characters extracted
from the text. Building a corpus for this task is extremely challenging due to
the complexity of segmenting the characters and reading ancient fonts [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. For
this project, we implemented a custom crowdsourcing platform, employing more
than a hundred high-school students to manually label the dataset. After a data
augmentation process, the result was the creation of an inexpensive, high-quality
dataset of 23000 characters. Following recent progresses in deep learning [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], we
designed a deep convolutional neural network (CNN) for the classification step.
In the last years, deep CNNs have become the de facto standard for complex
OCR problems [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Our trained deep CNN achieves an overall accuracy of 96%
on an independent test set, which is one of highest results obtained in the
literature so far. The aim of this paper is to show the effectiveness of the classification
step, and the evaluation of the pipeline in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is out of our current scope.
Structure of the paper. The rest of the paper is structured as follows. After
discussing related projects in Section 2, we detail the construction of our
annotated dataset in Section 3, and the design (and training) of the CNN in Section 4.
We experimentally evaluate the network in Section 5, before discussing future
works in Section 6.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Due to the many challenges involved in a fully automatic transcription of
historical handwritten documents, many researchers in the last years have focused on
solving easier sub-problems, most notably keywords spotting [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. However, as
more and more libraries and archives worldwide digitize their collections, great
effort is being put into the creation of full-fledged transcription systems [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        One of the largest effort to this end was the EU-funded tranScriptorium
project [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which resulted, among others, in the transcription of a relatively
(a)
(b)
large corpus of Dutch handwritten documents from the 15th century. Several
competitions have been organized on the datasets released from the
tranScriptorium project [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. State-of-the-art algorithms from these challenges generally
work by a segmentation-free approach, where it is not necessary to individually
segment each character.3 While this removes one of the hardest steps in the
process, it is necessary to have full-text transcriptions for the training corpus, in
turn requiring expensive labeling procedures with expert paleographers on the
period under consideration. To overcome this limitation and reduce the training
costs, In Codice Ratio focuses on a character-level classification, allowing us to
collect a large corpus of annotated data using a cheap crowdsourcing procedure.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Dataset Collection</title>
      <p>The dataset is collected from high-resolution (300 dpi, 2136 × 2697 pixels) scans
of 30 pages coming from register 12 of Pope Honorii III. All pages are in the
socalled Caroline minuscule script, which spread in Western Europe during
Charlemagne’s reign and became a standard under the Holy Roman Empire. Compared
to similar fonts, writings in the Caroline minuscule are relatively regular and have
fewer ligatures. A sample text is shown in Fig. 1a.</p>
      <p>
        All pages are pre-processed according to the workflow in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], by first
removing the background, splitting the text into lines, and then extracting tentative
character’ segmentations as shown in Fig. 1b. Each tentative character is then
fed to the OCR system, built on top of a deep CNN, described in the next
section. A further sub-system based on a Hidden Markov Model is then in charge
of selecting the most probable word transcription starting from all the possible
segmentations of the word. In this paper we focus on the design of the OCR
system, and we refer to [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for a more accurate description of the first and third
steps.
      </p>
      <p>
        Character classes. We take into account minuscule characters of the latin
alphabet, yielding initially 19 classes (a, b, c, d, e, f, g, h, i, l, m, n, o, p, q, r, s,
t, u) plus one special non-character class ⊗ . Since our dataset includes multiple
3 Segmenting and recognizing a character are two heavily interdependent processes:
this is known as Sayre’s paradox [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
versions of characters “d” and “s”, we split class d into two classes (d1 and d2),
and class s into three (s1, s2 and s3). The different character shapes and the
corresponding labels are shown in Fig. 2. We have total 23 classes, including 22
character classes and the special non-character class ⊗ .
      </p>
      <p>Crowdsourcing. To collect annotations on the segmentations of the manuscript
words, a custom crowdsourcing platform was developed. We enrolled 120
highschool students in the city of Rome, that did the labeling as a part of a
workrelated learning program. The task to perform was simple: having positive and
negative examples for a given character, each student was required to select any
matching images from a grid appearing on the platform. In Fig. 3, we show a
screenshot of a task.</p>
      <sec id="sec-3-1">
        <title>Each task consists of 40 images,</title>
        <p>arranged in a grid, each with its own
check-box. Every time the check-box
is marked, the image receives a vote.</p>
        <p>Image labels correspond to the most
voted characters, among those with
at most 3 votes.4 If there is no such
character, the image is labelled with
a special non-character class,
denoting a wrong segmentation.</p>
        <p>
          Characters with less than 1K
exFig. 3: Sample screen of our platform. amples were augmented to match the
required quantity and balance the
training set. The augmentation process involves slight random rotation, zooming,
shearing and shifting, both vertical and horizontal. Before training, all image
values are normalized in the range [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ]. The final dataset comprises 23K examples
evenly split between 23 classes, and is available online5.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Network Architecture</title>
      <p>
        Our deep CNN takes as input 56 × 56 single-channel images, which are binarized
before training. The input is then propagated through 8 adaptable layers, whose
design is inspired to similar networks having achieved state-of-the-art results in
4 In our experiments, we did not observe any tie.
5 http://www.dia.uniroma3.it/db/icr/.
modern OCR recently [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. First, we apply a convolutional layer having 42 filters
with size 5 × 5 and stride 1. Secondly, the output of the convolutional layer is
fed to a rectified linear (ReLU) nonlinearity applied element-wise:
g(s) = max {0, s} .
      </p>
      <p>The output of the ReLU is down-sampled using a max-pooling operation with
stride 2 × 2 to reduce the number of adaptable parameters. The previous three
operations (convolution, nonlinearity, and max-pooling) are repeated another
two times, using 28 filters for the convolutional layer instead of 42. The output
of the last convolutional layer is then flattened and fed through a fully connected
layer with 100 neurons and ReLU nonlinearities, and a final output layer with
a softmax activation function to output a probability distribution over the 23
classes.</p>
      <p>
        In order to prevent overfitting, we apply 50% dropout during training [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
before each of the nonlinearities. We minimize a regularized cross-entropy loss
given by:
      </p>
      <p>
        N K
J (w) = − X X −yˆi,k log (yi,k) + λkwk2 ,
where N is the number of examples in the training dataset, K = 23 is the
number of classes, yi,k is the correct output of the kth class over the ith input, yˆi,k
is the predicted output of the network, w is the vector of adaptable parameters
of the network, and λ &gt; 0 is a regularization factor. The regularization factor
is selected as λ = 0.001 by doing a grid-search over different values in an
exponential interval and computing the accuracy on a held-out validation set of 2500
examples from the original training set. This validation set is also used to
select a stopping point for the optimization procedure. We minimize (2) using the
Adam algorithm [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] on randomly sampled mini-batches of 128 elements until the
validated accuracy stops improving (200 epochs), using default hyperparameters
as in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The final network is then tested on a further independent test set of
another 2300 examples.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Experimental Results</title>
      <p>Overall accuracy reached 96%, while average precision, recall and F1-measure
for each class are reported in Table 5 (support is always 100). The confusion
matrix is shown in Fig. 4. Some typical errors are the following.
• Characters “f” and “s1” are easily confused, due to their similar shapes.</p>
      <p>Specifically, ≈ 8% of “s1” are labelled as “f”, and 14% of “f” as “s1”.
• Images not containing any character are sometimes mis-classified as actual
characters, mainly as “m” Specifically, ≈ 10% of “not-character” are labelled
as “m”, and ≈ 15% of “not-character” are labelled as some other character.
For comparison purposes, we report that a simple logistic regression classifier on
the same dataset achieves average 80% precision and 79% recall.</p>
      <p>Convolution visualization. We show in Fig. 6a the
effect of the filters learned by of our network at the
first level. Specifically, we show the result of
convolution with first layer filters on a sample input image
after the activation function (blues are positive
values). Visually inspecting activation output is indeed
useful for debugging purposes. In the figure, the effect
of edge and lighting detection filters is clearly visible.</p>
      <sec id="sec-5-1">
        <title>Gradient Ascent Given the filters learned by our</title>
        <p>
          network, we now perform gradient ascent over the
input image (initially random) and maximize the output
of each filter, separately. This is a common step to
visualize what the network has learnt to recognize [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>Intuitively, we generate synthetic images that
maximize the activation of the filters of each layer,
including the output layer. In deep CNNs, the first
layers usually detect simple features, and the features
become more complex and abstract as the layers go
deeper. The result of this experiment for a sample of
our filters is shown in Fig. 6b. The figure suggests that
the first layer of our network is in charge of detecting
edges, while the second layer exhibits more complex,
geometrical patterns. Finally, the third and deepest
layer, seems to detect whole character strokes.</p>
        <p>Prec. Rec. F1
a 0.98 0.99 0.99
b 0.98 0.97 0.97
c 0.95 1.00 0.98
d1 0.97 0.98 0.98
d2 0.92 0.98 0.95
e 0.99 0.98 0.98
f 0.89 0.85 0.87
g 0.97 0.99 0.98
h 0.96 0.97 0.97
i 0.98 0.96 0.97
l 0.96 0.99 0.98
m 0.91 0.99 0.95
n 0.99 1.00 1.00
o 0.98 0.91 0.94
p 0.98 0.97 0.97
q 1.00 0.99 0.99
r 0.94 0.96 0.95
s1 0.86 0.90 0.88
s2 0.99 0.94 0.96
s3 0.95 1.00 0.98
t 0.94 0.97 0.96
u 1.00 1.00 1.00</p>
        <p>0.95 0.74 0.83
⊗
avg 0.96 0.96 0.96</p>
        <p>
          Fig. 5: Per-class results.
After activation function
1st layer
2nd layer
(b)
3rd layer
In this paper, we have described the collection of a large corpus of annotated
Latin characters, and the design of a novel deep convolutional network for the
classification step. The described system is a key component in the In Codice
Ratio project, whose aim is to fully transcribe a large corpus of documents
contained in the Vatican Secret Archives [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Some preliminary results with the
entire system have shown that the framework is able to reach around 80% of
word-error rate on the pages under consideration. Thorough evaluation of the
entire system (including the segmentation step) is ongoing work.
        </p>
        <p>
          Future work will require the design of a fully differentiable system to
substitute the currently hand-tuned segmentation step. Recently, indeed, some authors
have proposed the use of recurrent networks to process the entire text
sequentially [
          <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
          ]. While these methods still require the annotation of the entire text,
annotations can be noisy, and obtained results are generally higher than related
systems based on hidden Markov models.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We thank Debora Benedetto, Elena Bernardi and Riccardo Cecere for their help
with the pre-processing steps and the crowd-sourcing application. Finally, we
are indebted to all the teacher and students of Liceo Keplero and Liceo Montale
who joined the work-related learning program, and did all the labeling effort.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>S.</given-names>
            <surname>Ammirati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Firmani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maiorino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Merialdo</surname>
          </string-name>
          , E. Nieddu,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Rossi</surname>
          </string-name>
          . In codice ratio:
          <article-title>Scalable transcription of historical handwritten documents</article-title>
          .
          <source>In 25th Italian Symposium on Advanced Database Systems (SEBD)</source>
          ,
          <year>2017</year>
          . To Appear.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. D. Cire¸san and U. Meier.
          <article-title>Multi-column deep neural networks for offline handwritten chinese character classification</article-title>
          .
          <source>In 2015 International Joint Conference on Neural Networks (IJCNN)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . IEEE,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. D. C. Cire¸san, U. Meier,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Gambardella</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          .
          <article-title>Deep, big, simple neural nets for handwritten digit recognition</article-title>
          .
          <source>Neural Computation</source>
          ,
          <volume>22</volume>
          (
          <issue>12</issue>
          ):
          <fpage>3207</fpage>
          -
          <lpage>3220</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>A.</given-names>
            <surname>Fischer</surname>
          </string-name>
          .
          <article-title>Handwriting recognition in historical documents</article-title>
          .
          <source>PhD thesis</source>
          , Universitat Bers,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>A.</given-names>
            <surname>Fischer</surname>
          </string-name>
          , E. Indermu¨hle, H. Bunke, G. Viehhauser, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Stolz</surname>
          </string-name>
          .
          <article-title>Ground truth creation for handwriting recognition in historical documents</article-title>
          .
          <source>In 9th IAPR International Workshop on Document Analysis Systems (DAS)</source>
          , pages
          <fpage>3</fpage>
          -
          <lpage>10</lpage>
          . ACM,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>A.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wuthrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liwicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Frinken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bunke</surname>
          </string-name>
          , G. Viehhauser, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Stolz</surname>
          </string-name>
          .
          <article-title>Automatic transcription of handwritten medieval documents</article-title>
          .
          <source>In 15th IEEE International Conference on Virtual Systems and Multimedia (VSMM)</source>
          , pages
          <fpage>137</fpage>
          -
          <lpage>142</lpage>
          . IEEE,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>V.</given-names>
            <surname>Frinken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bunke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Manmatha</surname>
          </string-name>
          .
          <article-title>Adapting BLSTM neural network based keyword spotting trained on modern data to historical documents</article-title>
          .
          <source>In 2010 International Conference On Frontiers in Handwriting Recognition (ICFHR)</source>
          , pages
          <fpage>352</fpage>
          -
          <lpage>357</lpage>
          . IEEE,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          .
          <article-title>Deep learning</article-title>
          . MIT press,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>D.</given-names>
            <surname>Kingma</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          .
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>In 3rd International Conference for Learning Representations (ICLR)</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>J.-B. Michel</surname>
            ,
            <given-names>Y. K.</given-names>
          </string-name>
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>A. P.</given-names>
          </string-name>
          <string-name>
            <surname>Aiden</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Veres</surname>
            ,
            <given-names>M. K.</given-names>
          </string-name>
          <string-name>
            <surname>Gray</surname>
            , ,
            <given-names>J. P.</given-names>
          </string-name>
          <string-name>
            <surname>Pickett</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Hoiberg</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Clancy</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Norvig</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Orwant</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Pinker</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Nowak</surname>
            , and
            <given-names>E. L.</given-names>
          </string-name>
          <string-name>
            <surname>Aiden</surname>
          </string-name>
          .
          <article-title>Quantitative analysis of culture using millions of digitized books</article-title>
          .
          <source>Science</source>
          ,
          <volume>331</volume>
          (
          <issue>6014</issue>
          ):
          <fpage>176</fpage>
          -
          <lpage>182</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. M. Rusin˜ol, D. Aldavert,
          <string-name>
            <given-names>R.</given-names>
            <surname>Toledo</surname>
          </string-name>
          , and
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Llado´s. Efficient segmentation-free keyword spotting in historical document collections</article-title>
          .
          <source>Pattern Recognition</source>
          ,
          <volume>48</volume>
          (
          <issue>2</issue>
          ):
          <fpage>545</fpage>
          -
          <lpage>555</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Sa</surname>
          </string-name>
          ´nchez,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bosch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Depuydt</surname>
          </string-name>
          , and J. de Does.
          <article-title>Handwritten text recognition for historical documents in the transcriptorium project</article-title>
          .
          <source>In Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage</source>
          , pages
          <fpage>111</fpage>
          -
          <lpage>117</lpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Sa</surname>
          </string-name>
          ´nchez,
          <string-name>
            <given-names>V.</given-names>
            <surname>Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Toselli</surname>
          </string-name>
          , and
          <string-name>
            <surname>E. Vidal.</surname>
          </string-name>
          <article-title>Icfhr2014 competition on handwritten text recognition on transcriptorium datasets (HTRtS)</article-title>
          .
          <source>In 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR)</source>
          , pages
          <fpage>785</fpage>
          -
          <lpage>790</lpage>
          . IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>K. M. Sayre</surname>
          </string-name>
          .
          <article-title>Machine recognition of handwritten words: A project report</article-title>
          .
          <source>Pattern Recognition</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):
          <fpage>213</fpage>
          -
          <lpage>228</lpage>
          ,
          <year>1973</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>U.</given-names>
            <surname>Springmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Najock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Morgenroth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schmid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gotscharek</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Fink</surname>
          </string-name>
          .
          <article-title>OCR of historical printings of latin texts: problems, prospects, progress</article-title>
          .
          <source>In ACM First International Conference on Digital Access to Textual Cultural Heritage (DATeCH)</source>
          , pages
          <fpage>71</fpage>
          -
          <lpage>75</lpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>M. D. Zeiler</surname>
            and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Fergus</surname>
          </string-name>
          .
          <article-title>Visualizing and understanding convolutional networks</article-title>
          .
          <source>In European conference on computer vision</source>
          , pages
          <fpage>818</fpage>
          -
          <lpage>833</lpage>
          . Springer, Cham,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>