<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A high-performance word recognition system for the biological eldnotes of the Natuurkundige Commissie?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mahya Ameryan</string-name>
          <email>mahya.ameryan@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Arti cial Intelligence, Faculty of Science and Engineering, University of Groningen</institution>
          ,
          <addr-line>Groningen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>and Lambert Schomaker</institution>
        </aff>
      </contrib-group>
      <fpage>92</fpage>
      <lpage>103</lpage>
      <abstract>
        <p>In this research, a high word-recognition accuracy was achieved using an e-Science friendly deep learning method on a highly multilingual data set. Deep learning requires large training sets. Therefore, we use an auxiliary data set in addition to the target data set which is derived from the collection Natuurkundige Commissie, years 1820-1850. The auxiliary historical data set is from another writer (van Oort). The method concerns a compact ensemble of Convolutional Bidirectional Long ShortTerm Memory neural networks. A dual-state word-beam search combined with an adequate label-coding scheme is used for decoding the connectionist temporal classi cation layer. Our approach increased the recognition accuracy of the words that a recognizer has never seen, i.e., out-of-vocabulary (OOV) words with 3.5 percentage points. The use of extraneous training data increased the performance on in-vocabulary words by 1 pp. The network architectures in an ensemble are generated randomly and autonomously such that our system can be deployed in an e-Science server. The OOV capability allows scholars to search for words that did not exist in the original training set.</p>
      </abstract>
      <kwd-group>
        <kwd>Historical handwriting recognition Convolutional Bidirectional Long Short-Term Memory (CNN-BiLSTM) E-Science server</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Historical manuscripts are an important aspect of cultural heritage [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
Extracting information from them by e-Science servers would be helpful for scholars and
the general public. An e-Science server is the application of computationally
intensive modern methods for data collection, preparation, experimentation, result
(a)
Aermej
(b)
humerumj
(c)
javanicaj
(d)
fandenj
dissemination, and long-term maintenance, e.g., Monk [
        <xref ref-type="bibr" rid="ref15 ref16 ref19 ref20 ref6">19, 20, 6, 15, 16</xref>
        ]. The
Monk e-Science server is a live, web-driven search engine for character and word
recognition, annotation, and retrieval. It contains various handwritten historical
and contemporary manuscripts in numerous languages: Chinese, Dutch, Thai,
Arabic, English, German, and Persian. Additionally, intricate machine-printed
documents such as Egyptian hieroglyphs and Fraktur are available in the Monk
e-Science server.
      </p>
      <p>
        Ideally, the use of deep learning methods should be bene cial in an e-Science
server context. Deep learning paradigms, especially Long short-term memory
(LSTM [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]), have shown superior performance in solving handwriting-recognition
problems. However, these methods have two drawbacks. First, they demand large
training sets. Secondly, the design of optimal neural architectures requires human
supervision [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], which is in contradiction with the principle of autonomy in
arti cial intelligence. Therefore, the plain application of such a ne-tuned
LSTMbased recognizer is not feasible in e-Science servers. In this research, we use a
very recent proven homogeneous ensemble of end-to-end trainable convolutional
Bidirectional -LSTMs, conveniently applicable on e-Science servers. The target
data set is derived from the 17,000-page manuscript of biological eld notes of the
Dutch Natuurkundige Commissie (NC) in the Indonesian Archipelago between
the years 1820 to 1850. Although authors attempt to track their knowledge and
observations in a disciplined way along with fascinating drawings, this systematic
does not occur in the hand-written script, e.g., allographs di er depending on
the serial position in a word. Namely, a letter at the beginning and middle
of a word is mostly well-formed. However, it can have an oscillatory slur as
an ending character. Fig. 1 shows di erent allographs for the letters 'e', 'm',
'a', and 'n'. The samples are derived from the MkS data set. Furthermore, the
ground-truth label in each data set is presented in a particular way, due to the
designers handcrafting. As an example, in the ground-truth labels of a standard
benchmark Arabic data set, characters or ligatures are separated by a token (j)
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Moreover, there are speci c requirements for each recognizer method. The
introduction of new form-based alphabets or feature extraction approaches has
addressed this signi cance [
        <xref ref-type="bibr" rid="ref10 ref4">10, 4</xref>
        ]. As a consequence, we considered it necessary
to see whether the labeling systematics of Latin-based handwritten scripts has an
e ect on recognition performance. A lot of e ort has been put into labeling the
massive target collection, which has resulted in near 10k labeled word images.
In deep learning, this is not enough data to obtain optimal recognition accuracy.
Therefore, we are faced with the question of whether another data set that has
textual and stylistic similarities to the target data set can be useful. An auxiliary
data set is extracted from hand-written diaries of Pieter van Oort [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], written
in a very neat way contrary to the NC collection. Still, this text comes from
the same historical period and the same geographical context. It is a limitation
if a recognizer can only recognize the lexical words from the training set. We
invistigate what is the performance on lexical test words that were never seen by
the recognizer. Another question is whether it is possible to design an optimized
recognizer less dependent on the presence of a machine-learning expert using
generating neural networks autonomously within an e-Science server.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        In current literature, good results are obtained using LSTM networks, however,
at the expense of a tremendous amount of training, using large ensembles of
up to a thousand neural networks [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. In our work, we aim at reaching similar
performances using a more compact approach, using a small ensemble. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
an ensemble is composed of eight recognizers: four architectures of a recurrent
neural network (RNN); a grapheme-based MLP-HMM; two di erent variants of
a context-dependent sliding window based on GMM-HMM. Such heterogeneous
architecture demands a lot of engineering e orts. Furthermore, in this study,
we will use the dual-state word-beam search (DSWBS) CTC decoder [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. This
method concerns two states in the word-search process: A word state and a
non-word state. Each character can be either a word-character or
non-wordcharacter. A search beam is an evolving list of most likely word candidates. A
search beam's temporal evolution is based on its state. A beam in the word-state
can be extended by entering a word-character. Entering a non-word character
brings the system to the non-word state. A beam in the non-word state is
extended by entering a non-word-character while entering a word character ends
the beam and brings the system to the word state. This word character is the
beginning of a new beam. This state information can be used in conjunction
with a pre x table to trace the best possible (partial) word hypotheses, from an
uncertain begin state until an uncertain end-of sequence state.
      </p>
      <p>
        The label-coding scheme may have a signi cant e ect on performance due
to the varying demands of di erent methods [
        <xref ref-type="bibr" rid="ref10 ref4">10, 4</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], it is shown that the
combination of DSWBS and a proper label-coding scheme is e ective compared
to using DSWBS combined with a Plain label-coding scheme. In the Plain
coding scheme only the characters in the word image appear in the corresponding
ground-truth label. It is reported that stressing the word-ending shape with an
extra token is bene cial when DSWBS is used as the CTC decoding method
(this is the 'extra-label coding scheme' [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). However, stressing the start-of-word
shapes with a token is detrimental for recognition accuracy.
      </p>
      <p>
        Given the lack of labeled data in handwriting recognition, as compared to,
e.g., speech recognition, it is important to be able to use existing labeled corpora.
Even more than in the case of speech, the use style variation in handwriting often
inhibits the e ective use of pre-existing data sets. To address this issue,
transfer learning has been applied. However, this may only have a clearly improved
performance in case of style similarity, e.g., in case of a well-written two-author
single-language historical script [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Transfer over di erent historical script style
periods remains notoriously di cult. The new element in this study is that we
will use additional labeled data from a di erent writer and di erent document
type, but from the same historical period and a comparable colonial context.
The research question is whether it is possible to increase the performance on a
target manuscript, even if style and content are di erent?
2
2.1
      </p>
      <sec id="sec-2-1">
        <title>Method</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Pre-processing and augmentation</title>
      <p>
        The input consists of human-labeled, isolated-word grey-scale images from the
the Monk e-Science server [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ]. The pre-processing and data augmentation
are performed in three steps: random stretching/squeezing of a gray-scale image
in the direction of width; re-sizing an image into 128 32 pixels; and nally, an
intensity normalization. These three steps are conducted in each epoch of the
training process, yielding di erent augmentations per epoch. Baseline alignment
or deslanting methods are not applied in this procedure.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Label-coding scheme</title>
      <p>
        We used a novel extra-separator label-coding scheme for words. This method
delivered an improvement of 2 to 3 percentage points in accuracy [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In this
scheme, an additional unique character (e.g. 'j') is concatenated to the end of
the ground-truth label. This character gives an extra hint relating the
endingof-word shape condition to the recognizer, which we have shown to be e ective
in several data sets [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
2.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>Neural Network</title>
      <p>
        We used a proven ensemble of ve end-to-end training Convolutional
Bidirectional Long Short-Term Memory neural networks (CNN-BiLSTMs), Fig. 2 [
        <xref ref-type="bibr" rid="ref1 ref2">2,
1</xref>
        ]. Each of the ve CNN-BiLSTMs consists of ve front-end convolutional
layers, three BiLSTM layers, and a connectionist temporal classi cation (CTC [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ])
layer. The architecture of the CNN-BiLSTMs in the ensemble only di er in the
number of hidden units of three of the ve convolutional layers, layer 2, 3, and
4. Table 1 shows the number of hidden units for the ve CNN-BiLSTMs, Ai,
i = 1::5.
      </p>
      <p>
        The convolutional layers in a CNN-BiLSTM are used for feature extraction.
The rst convolutional layer is fed by the pixel intensity of the input
wordimage. Each convolutional layer has four stages: (1) convolution operation; (2)
intensity normalization; (3) the ReLU activation function [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]; (4) a max-pooling
layer. The networks use RMSProp [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] as the gradient descent function. No drop
out is applied. The size of the batches is 50 word images. The output of the
fth convolutional layer is fed to the three-layer BiLSTM. Each BiLSTM layer
contains 512 hidden units.
      </p>
    </sec>
    <sec id="sec-6">
      <title>2.4 A connectionist temporal classi cation (CTC)</title>
      <p>
        There are jA + 2j output units in the CTC layer, where jAj is the size of the
alphabet of the training lexicon labeled using the plain label-coding scheme
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The other two residual units are for the extra separator (e.g., 'j'), and a
special blank [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for CTC, which presents observing 'no label' and di er the
space character. Dual-State Word-Beam Search (DSWBS) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] is used for CTC
decoding. In our research, DSWBS uses a pre x tree formed a given lexicon
without using any statistical language model.
2.5
      </p>
    </sec>
    <sec id="sec-7">
      <title>A voting module</title>
      <p>For an input image, each of ve CNN-BiLSTMs produces a word hypothesis with
its relative likelihood value. The ve hypotheses and likelihood values are the
input of the voting module. The hypotheses are divided into subsets. Afterward,
the nal label of the input image is determined using three rules:
1. Plurality ! choose it.
2. Tie ! choose the subset with the maximum average likelihood value.
3. Only singleton sets ! choose the subset with maximum likelihood value.
3</p>
      <sec id="sec-7-1">
        <title>Data sets</title>
        <p>
          The MkS data set is derived from 17,000 pages of biological eld-notes of the
Natuurkundige Commissie in the Indonesian Archipelago between the years
1820 to 1850 [
          <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
          ]. The manuscript is stored in the Naturalis Biodiversity
Center in Leiden, the Netherlands. Sparsely, 950 pages are labeled in this project
(NNM001001033-7). The manuscript has a wide variety of languages, including
German, Latin, Dutch, Malay, Greek, and French. After the random data split
for 5-fold cross-validation, the resulting proportion of out-of-vocabulary (OOV)
words, i.e., words in the test set not appearing in the training set, is 31.9%
(casesensitive) and 29.5% (case-insensitive count). An in-vocabulary (INV) word
exists both in the test set and in the training set. The daily routine of the committee
is described in the eld diary of Pieter van Oort, one of the major draftsmen
and collectors [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. The data sets used in this paper are summarized in Table 2.
        </p>
        <p>
          The implementation of the network is based on the Tensor ow framework.
The bar character (j) is selected as the stressed-ending sign because it is not
present in the ground-truth labels of the MkS and Van Oort data sets. The
evaluations were performed in a case-insensitive manner. We conducted 5-fold
cross validation experiments. There are two training sets in our experiments: T1
and T2. The T1 training set exclusively contains images of three folds of from
MkS. T2 has all of word images of the van Oort data set plus the images from
three folds of the MkS data set. The proportion of OOV words in the test set
equals 29.4% when T1 is the training set [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and 26.9% when T2 is used as the
training set.
        </p>
        <p>
          Table 3 shows a comparison of word-recognition accuracy (%) of single
recognizers (Table 1), and of the ensemble, separate for the training sets T1 and
T2. Results are presented of average word accuracy and its standard deviation
(av sd). Two CTC decoder variants are compared: lexicon-free best-path (BP
[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]) and dual-state word-beam search (DSWBS [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]). A complete lexicon for this
task is fed to DSWBS consisting of a list of all words occurring in the data sets
(Table2). Two label-coding schemes (Plain vs Extra-separator) are compared
when the training set is T2. Best path vs Dual-state word-beam search:
Fig. 3: A comparison of word-recognition accuracy of using the MkS data set
as the training set (T1 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]) and using the MkS and van Oort data sets as the
training set (T2) for in-vocabulary (INV) and out-of-vocabulary (OOV) words.
Using the additional training data yields to an improved accuracy (p &lt; 0:05).
Results show that using a lexicon-based search method in the CTC layer
significantly increases the word-recognition accuracy (more than 28 pp), as expected.
Single network vs Ensemble: The ensemble voting improves the performance
while its e ect is stronger on a weaker method: 8 pp in the case of BP, and 3
pp in the case of DSWBS. The solution of ties increases the performance 1.3
pp when the DSWBS CTC decoder is used and 3.3 pp in the case of best-path
CTC decoder, when the training set is the combined set T2. The ve folds vs
the ve architectures: There is no particular information for an architecture
in one of the folds (Chi-squared test, N.S. p &gt;&gt; 0:05). The ve
architectures: The architectures within the ensemble do not di er (Chi-squared test,
N.S. p &gt;&gt; 0:05).
        </p>
        <p>(a) In-vocabulary (INV) words
(b) Out-of-vocabulary (OOV) words
Fig. 4: Comparison of word-recognition accuracy (%) of the ensembles when
using T1 and T2 as training sets in term of word length of In-vocabulary (INV) and
out-of-vocabulary (OOV) words per word length. The continuous lines indicate
the proportion of words with a particular word length of the training sets.</p>
        <p>The T1 vs T2 training sets: Using the van Oort data set along MkS data set
(T2) increases the word-recognition performance 2 to 3 pp (Chi-squared test,
p &lt; 0:05, signi cant). The extra-separator label-coding scheme vs the
plain label-coding scheme: Using the extra-separator label-coding scheme
increased the performance when DSWBS is used (Chi-squared test, p &lt;&lt; 0:05,
signi cant).</p>
        <p>Since the OOV rate di ers when T1 and T2 are training sets, we consider the
words that do not appear in the lexicon of both training sets as the 'real' OOV
words. Subsequently, the words that exist in the lexicon of both training sets
are INV words. Therefore, 26.9% of the test set is counted as the OOV words
and 70.6% of the test set concerns the INV words. Fig. 3 shows a comparison
of word-recognition accuracy (%) of the ensemble when T1 with T2 are used as
the training set, for OOV and INV words, separately. The gure illustrates that
using T2 increases the performance 1 pp on the INV words to 91.7% and 3.7 pp
on OOV words to 82.8%. The reliable recognition of OOV words is important
because it allows the scholars in the humanities to search for a word that does
not exist in the training set. We scrutinized the performance of the ensembles
in recognizing the INV and OOV words per word length, and we conduct a
comparison when T1 and T2 are used as the training sets in Fig. 4. These gures
also show the proportion of words with a particular word-length of both training
sets. Fig. 4(a) shows that using the T2 data set is slightly bene cial for the INV
words which their word lengths are up to 8 characters. Fig. 4 (b) shows that using
the van Oort data set along the MkS data set (T2) increases the performance
of the ensemble from 1 to 8 characters word-length in out-of-vocabulary words.
For longer words its e ect is not clear.</p>
        <p>Fig. 5 shows the word-recognition accuracy of test words achieved by the
ve networks (Table 1) using 5-fold cross-validation and the T2 training set.
The X-axis shows the number of instances per word class, sorted in the
order of occurrence in the training set. In an e-Science server, the X-axis roughly
corresponds to time, starting with just a few instances per word class,
increasing as more labels enter the learning system. The blue circles represent the
in-vocabulary (INV) word-test classes. The dark red circle shows the average of
word-recognition accuracy of the out-of-vocabulary (OOV) words. The average
of recognition of OOV words is near 80% for a single recognizer when there is
not any example of them in the training set. There are several 'threads' in the
gure. This is due to the quantized number of samples per word in a test set.
For instance, for a word in the lexicon with three instances in the test set, the
accuracies can only be 0%, 33%, 66% or 100%. The easy words are in the stripe
at an accuracy of 100% and can be recognized well with just a few numbers of
instances in the training set. On the other end, there are di cult words at the
stripe of 0%, which are still not recognized after a dozen of training examples.
This density plot gives a more realistic view into the origins of the performance,
compared to a single average word accuracy that is computed over all words and
all instances.
4</p>
      </sec>
      <sec id="sec-7-2">
        <title>Discussion and conclusion</title>
        <p>
          In this study, high word-recognition accuracy was achieved using a compact
ensemble of ve end-to-end convolutional neural networks. The method is
applicable to e-Science servers where human intervention for hand-crafting the
hyperparameters needs to be minimal. The ensemble uses Plurality voting, with
special provision for ties. A proper label-coding scheme is applied. At the
decoding stage, i.e., the CTC layer, a search method is applied which uses a pre x
tree for a given lexicon, without using any statistical language model. The given
intrinsic lexicon consists of all the words which appear in the target and
auxiliary data sets. DSWBS shows only a slight drop of 0.4 pp in performance when
using a large external lexicon, e.g., 30 times larger [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and including the intrinsic
word list. The results should give an indication for practical applications where
the intrinsic lexicon is not known and a large external (public) lexicon is used.
The target data set (MkS) is extracted from a historical collection that belonged
to Natuurkundige Commissie in the Indonesian Archipelago between the years
1820 to 1850. MkS (T1) is highly multilingual, in multiple styles, often in sloppy
handwriting. We used an auxiliary data set, the van Oort handwritten diary, to
boost the performance. This data set, unlike MkS, is written very neatly but
belongs to the same historical period and geographical context.
        </p>
        <p>The combination of MkS and van Oort (T2) in training increased the
performance of the ensemble 3.5 pp to 89%, where 1.3 pp is the e ect of the solution
of ties. The e ect is more clear (+4pp) on the test words which the recognizer
has never seen, i.e., the out-of-vocabulary (OOV) words. The comparison of two
CTC decoder methods (with and without lexicon) con rms the signi cant e ect
of lexicon application (+25pp). The obtained high OOV word-recognition
accuracy gives the user ability to search for terms that are not in the training set.
The use of T2 notably increased the performance for OOV words with a length
of 1 to 8 characters. However, this e ect is not apparent for longer OOV words.
Reporting average accuracy hides the di erences over word classes. We provided
a detailed view of the word accuracy per class as a function of the number of
examples. There may exist INV words which demand only a few numbers of
instances in a training set to yield 100% accuracy. These are the easy words. On
the contrary, for di cult words, the performance does not clearly increase when
the number of instances for such a word-class increases in the training set. Some
words do not reach 100% accuracy despite having more than 100 examples in the
training data. The small increase of 1 percentage-point(pp) in the performance
of the system on the in-vocabulary words when using auxiliary data may indicate
that the number of word samples was probably already large enough. However,
the 3.7 pp improvement of the performance on out-of-vocabulary words due to
auxiliary training data reveals that for generalization, the system bene ts from
such data, even if it is partially di erent in style and content.</p>
        <p>
          The proposed system can be deployed in an e-Science server such as the
Monk e-Science server [
          <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
          ]: The network architectures in an ensemble are
generated randomly and autonomously. The approach also allows scholars to
search for words that did not exist in the training set that they used.
        </p>
        <p>The good results of this study are due to (a) a limited-size ensemble of LSTM
networks using e ective plurality voting; (b) an adapted label-coding scheme
stressing on word ending shapes; (c) the use of dual-state word-beam search
using a pre x lexicon and (d) the use of related but dissimilar handwriting in
the training process. For future work, we intend to develop a handwritten line
recognizer by extending our word-recognition approach.</p>
        <p>CNN/LSTMs for</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ameryan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schomaker</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>A limited-size ensemble of homoge-neous high-performance word classification (</article-title>
          <year>2019</year>
          ), https://arxiv.org/abs/
          <year>1912</year>
          .03223.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ameryan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schomaker</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Improving the robustness of LSTMs for word classification using stressed word endings in dual-state word-beam search</article-title>
          .
          <source>In: 17th Int. Conf. Frontiers in Handwriting Recognition</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dutta</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krishnan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mathew</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jawahar</surname>
            ,
            <given-names>C.V.</given-names>
          </string-name>
          :
          <article-title>Improving CNN-RNNhybrid networks for handwriting recognition</article-title>
          .
          <source>In: 2018 16th Int. Conf. Frontiersin Handwriting Recognition (ICFHR)</source>
          . pp.
          <fpage>80</fpage>
          -
          <lpage>85</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riesen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bunke</surname>
          </string-name>
          , H.:
          <article-title>Graph similarity features for HMM-based handwriting recognition in historical documents</article-title>
          .
          <source>In: 12th Int. Conf. on FronitnieHrsandwriting Recognit</source>
          . pp.
          <fpage>253</fpage>
          -
          <lpage>258</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Fern´andez,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          :
          <article-title>Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks</article-title>
          .
          <source>In: Proc. 23rd Int. Conf. Mach. Learn</source>
          . pp.
          <fpage>369</fpage>
          -
          <lpage>376</lpage>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Samara</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burgers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schomaker</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Image-based historical manuscript dating using contour and stroke fragments</article-title>
          .
          <source>Pattern Recognition</source>
          <volume>58</volume>
          ,
          <fpage>159</fpage>
          -
          <lpage>171</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>NeurCalomput</source>
          .
          <volume>9</volume>
          ,
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Klaver</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Inseparable friends in life and death: thelife and work of Heinrich Kuhl (1797-1821</article-title>
          ) and Johan Conrad van Hasselt (
          <fpage>1797</fpage>
          -
          <lpage>1823</lpage>
          ). Barkhuis,
          <string-name>
            <surname>Groningen</surname>
          </string-name>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Mees</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Achterberg</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Vogelkundig onderzoek op nieuw guinea in 1828</article-title>
          .
          <source>Zoologische Bijdragen</source>
          <volume>40</volume>
          ,
          <fpage>3</fpage>
          -
          <lpage>64</lpage>
          (
          <year>1994</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Menasri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vincent</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cheriet</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Augustin</surname>
          </string-name>
          , E.:
          <article-title>Shape-based alphabet f or off-line arabic handwriting r ecognition</article-title>
          .
          <source>Ninth I nt. Conf. Document Analysis and RecognitioICnD( AR</source>
          <year>2007</year>
          )
          <volume>2</volume>
          ,
          <fpage>969</fpage>
          -
          <lpage>973</lpage>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Menasri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Louradour</surname>
          </string-name>
          , J.B,
          <string-name>
            <surname>ianne-Bernard</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kermorvant</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The A2iA French handwriting recognition system at the Rimes-ICDAR2011 competition</article-title>
          .In:
          <string-name>
            <surname>Viard-Gaudin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zanibbi</surname>
            ,
            <given-names>R</given-names>
          </string-name>
          . (eds.)
          <article-title>Document Recognition and Retrieval XIX</article-title>
          . vol.
          <volume>8297</volume>
          , pp.
          <fpage>263</fpage>
          -
          <lpage>270</lpage>
          . International Society for Optics and Photonics,
          <string-name>
            <surname>SPIE</surname>
          </string-name>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Nair</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.E.:
          <article-title>Rectified linear units improve restricted Boltzmann machines</article-title>
          .
          <source>In: Proc. the 27th Int. Conf. Machine Learning</source>
          . pp.
          <fpage>807</fpage>
          -
          <lpage>814</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Pechwitz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maddouri</surname>
            ,
            <given-names>S.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mrgner</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ellouze</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amiri</surname>
          </string-name>
          , H.:
          <article-title>IFN/ENIT database of handwritten Arabic word</article-title>
          .
          <source>In: 7th Colloque Int. Francophone sur l'Ecrit et le Document</source>
          (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Scheidl</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fiel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sablatnig</surname>
          </string-name>
          , R.:
          <article-title>Word beam search: A connectionist temporal classification decoding algorithm</article-title>
          .
          <source>In: The Int. Conf. Frontiers of Handwriting Recognition (ICFHR)</source>
          . pp.
          <fpage>253</fpage>
          -
          <lpage>258</lpage>
          . IEEE Computer Society (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Schomaker</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>A large-scale field test on word-image classification in large historical document collections using a traditional and two deep-learning methods</article-title>
          .
          <source>ArXiv</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Schomaker</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Handwritten Historical Document Analysis, Recognition, and Retrieval State of the Art and Future Trends (chap.), in: Lifelong Learning for Text Retrieval and Recognition in Historical Handwritten Document Collections</article-title>
          .
          <source>World Scientific (November</source>
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Stuner</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chatelain</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paquet</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Handwriting recognition using Cohort of LSTM and lexicon verification with extremely large lexicon (</article-title>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Tieleman</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.:
          <article-title>Lect. 6.5-RMSProp:Divide the gradient by arunning average of its recent magnitude</article-title>
          .
          <source>COURSERA:Neural Netw. foMrach learn. 4</source>
          ,
          <fpage>26</fpage>
          -
          <lpage>31</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Van der Zant</surname>
          </string-name>
          , T.,
          <string-name>
            <surname>Schomaker</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haak</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Handwritten-word spotting usinbgiologically inspired features</article-title>
          .
          <source>IEEE Trans. Pattern Anal. Mach.Intell</source>
          .
          <volume>30</volume>
          ,
          <fpage>1945</fpage>
          -
          <lpage>1957</lpage>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Van der Zant</surname>
          </string-name>
          , T.,
          <string-name>
            <surname>Schomaker</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zinger</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Van Schie,
          <string-name>
            <surname>H.</surname>
          </string-name>
          :
          <article-title>Where are the search engines for handwritten documents?</article-title>
          <source>Interdisciplinary SciencReeviews</source>
          <volume>34</volume>
          (
          <issue>2-3</issue>
          ),
          <fpage>224</fpage>
          -
          <lpage>235</lpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Voigtlaender</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doetsch</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ney</surname>
          </string-name>
          , H.:
          <article-title>Handwriting recognition withlarge multidimensional long short-term memory recurrent neural networks</article-title>
          .
          <source>In: 15th Int. Conf. Frontiers in Handwriting Recognition</source>
          . pp.
          <fpage>228</fpage>
          -
          <lpage>233</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Weber</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ameryan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolstencroft</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stork</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heerlien</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schomaker</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Towards a digital infrastructure for illustrated handwritten archives</article-title>
          . In: Ioannides, M. (ed.)
          <article-title>Final Conf. of the Marie Sklodowska-Curie Initial Training Networfkor Digital Cultural Heritage</article-title>
          , Olimje, Slovenia, May
          <volume>23</volume>
          -25,
          <year>2017</year>
          , LNCS, vol.
          <volume>10605</volume>
          ,
          <fpage>155</fpage>
          -
          <lpage>166</lpage>
          . Springer International Publishing (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Weber</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Collecting colonial nature: European naturalists and the Netherlands Indies in the early nineteenth century</article-title>
          .
          <source>Low CountriesHistorical Review 134-3</source>
          ,
          <fpage>72</fpage>
          -
          <lpage>95</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>