<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CITlab ARGUS for Keyword Search in Historical Handwritten Documents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tobias Strauß</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tobias Grüning</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gundram Leifert</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roger Labahn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CITlab, Institute of Mathematics, University of Rostock</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe CITlab's recognition system for the Handwritten Scanned Document Retrieval Task 2016 attached to the CLEF 2016 hold in the city of Évora in Portugal, 5-8 September 2016 (see [9]). The task is to locate positions that match a given query - consisting of possibly more than one keyword - in a number of historical handwritten documents. The core algorithms of our system are based on multi-dimensional recurrent neural networks (MDRNN) trained by connectionist temporal classification (CTC). The software modules behind that as well as the basic utility technologies are essentially powered by PLANET's ARGUS framework for intelligent text recognition and image processing.</p>
      </abstract>
      <kwd-group>
        <kwd />
        <kwd>MDRNN</kwd>
        <kwd>LeakyLP cells</kwd>
        <kwd>CTC</kwd>
        <kwd>handwriting recognition</kwd>
        <kwd>neural network</kwd>
        <kwd>keyword spotting</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>of our ongoing text recognition projects and is extensively based upon PLANET’s
ARGUS software modules and the respective framework for development, testing
and training.</p>
      <p>Task The Handwritten Scanned Document Retrieval Task 2016 aims at an
advanced keyword spotting. Besides ordinary keyword search, the competition
comprises the detection of multiple word queries consisting of possibly
hyphenated keywords within sections. The writings used for this task are unpublished
manuscripts of Jeremy Bentham – an English philosopher and reformer of the
18th century.</p>
      <p>
        The goal is to detect queries in a “segment” which is defined as six consecutive
lines. A segment contains a query if all keywords appear in the correct order.
Two consecutive segments overlap in 5 lines. This means, a match of a query
possibly appears in up to 6 segments depending on the difference between the
indices of the last and first index of matching lines. A detailed description of
this task and their results can be found in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
2
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>System Description</title>
      <sec id="sec-2-1">
        <title>Basic Scheme</title>
        <p>
          For the general approach, we may briefly refer to previous CITlab system
descriptions [
          <xref ref-type="bibr" rid="ref3 ref4 ref6 ref8">4,3,8,6</xref>
          ] because the overall scheme has essentially not been changed.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>From Baseline to Polygon</title>
        <p>This section briefly describes an algorithm to calculate polygons surrounding
the text lines given its baselines. Given that for the test set (see Table 1) only
baselines are provided, such an algorithm is mandatory since the recognition
system requires a cropped text line as input.</p>
        <p>
          The baseline to polygon algorithm basically follows [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The idea is that given
a medial seam (which is roughly spoken a polyline following the main body of
the text line) separating seams are calculated by optimizing an appropriate cost
function using dynamic programming (see [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]). Here, the cost function penalizes a
separating seam for crossing regions with high Sobel values and for its distance
to the medial seam. The Sobel values are calculated by convolving the input
image with the Sobel operator.
        </p>
        <p>Using the given baseline directly as medial seam leads to insufficient results, e.g.
in Fig. 1a the provided baseline even does not touch the text – as a consequence
the calculated separating seams even do not contain the text at all. Hence, an
optimal shift is calculated for each baseline such that the sum of Sobel values
on the shifted baseline is maximal. Fig. 1b depicts the effect of this approach.</p>
        <p>There are surrounding polygons given for the training and development set
(Table 1). Since they look quite different to the polygons calculated using the
described algorithm, we did not train on the given surrounding polygons. These
(a) Medial seam (baseline) with resulting separating seams
(b) Medial seam (translated baseline) with resulting separating seams
polygons were only used to calculate baselines, which were used as input for the
baseline to polygon algorithm. This approach ensures homogeneity of training,
development and test data.
– image normalization: contrast enhancement (no binarization), size;
– writing normalization: line bends, line skew, script slant.</p>
        <p>Then, images are further unified by CITlab’s proprietary writing normalization:
The writing’s main body is placed in the center part of an image of fixed 96px
height. While the length-height ratio of the main body stays untouched, the
ascenders and descenders are squashed to focus the network’s attention on the
more informative main body. These images are the input for the feature
generation.
2.4</p>
      </sec>
      <sec id="sec-2-3">
        <title>Feature Generation</title>
        <p>
          The feature generation works like a convolutional filter with complex coefficients.
The input image is converted in a set of feature maps that contain local frequency
information. Let X ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]u×v be an image of width u and height v. Let ω ∈ R+
be a frequency, θ ∈ [0, 2π) be an angle and r0 ∈ R+ be a window radius. The
complex convolutional kernel is defined by
f (r) =
( 12 1 + cos πr
        </p>
        <p>2
0
if r &lt; 2
else
px2 + y2 !
g(x, y) := g(x, y)ω,θ,r0 = f
exp iω x cos (θ) + y sin (θ)
around a centre (0, 0). The frequency feature b(x, y) at point (x, y) is then
calculated by
b (x, y) =</p>
        <p>X
(i,j)∈Z2</p>
        <p>Xi,j g(i − x, j − y) .</p>
        <p>The advantage of this frequency features is the robustness against shifts and
noise on the input image. In this applications we use parameters ω ∈ π4 , π2 ,
θ ∈ 0, π4 , π2 , 34π and r0 ≈ 4. These 8 feature images (2 frequencies and 4 angles)
are the input for the MDRNN.
2.5</p>
      </sec>
      <sec id="sec-2-4">
        <title>Recurrent Neural Network</title>
        <p>
          The resulting features were fed into so called Sequential Processing Recurrent
Neural Network (SPRNN). The SPRNN has 3 hidden layers with 355210
trainable weights. The first and third layer are multidimensional and multidirectional
recurrent layers. To reduce the computational complexity and increase the ability
to generalize, these recurrent layers are connected through a feedforward layer.
Instead of using LSTMs in the MDRNN the LeakyLP cells are used [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>The SPRNN’s output then consists of a certain number of vectors. This
number is related to the line length because every vector contains information
about a particular image position. More precisely, the entries are understood
as the estimate of the probabilities of every alphabet character at the position
under consideration. Hence, the vector lengths all equal the alphabet size, and
putting all vectors together leads to the so-called confidence matrix “ConfMat”.
This is the intrinsic recognition result which will subsequently be used for the
decoding.</p>
        <p>Note further that, for Handwritten Scanned Document Retrieval Task 2016 ,
we worked with an alphabet containing
– all digits, lowercase and uppercase letters of the ISO basic Latin alphabet
– special characters /&amp;£§+-\_.,:;!?’"=[]() and ␣, whereby different types of
quotation marks and hyphens were mapped to one of the respective symbols.
Finally, the above alphabet is augmented by an artificial, not-a-character symbol,
CITlab’s NaC3. In particular, it may be used to detect character boundaries
because, generally speaking, our SPRNNs emit high NaC confidences in uncertain
situations.
2.6</p>
      </sec>
      <sec id="sec-2-5">
        <title>Training Data</title>
        <p>
          The composition of the data set provided by the competition organizers is
summarized in Table 1.
3 In the literature it is also called blank, no-symbol, no-label.
4 There exist polygons but they are not accurate enough for using.
The network is trained using an extension of Nesterov’s Accelerated Gradient
Descent with learning rate 5e − 4 and momentum 0.9. For each training epoch,
we choose a random subset of 10,000 lines from the training set. The first 19
epochs were trained using the original images with a fixed learning rate. For 3
epochs we added noise to the preprocess parameters and network activations.
For 19 additional epochs we set the learning rate to 5e−5 and added degradation
(pixel noise, blur, cross outs,...) to the images.
Word Matchings The neural networks output, the ConfMat, consists of
confidences yt,l for any label l and position t where the labels are the characters
and the NaC. The confidences are positive and sum to 1 for fixed position t.
Thus, they can be interpreted as conditional probability for label l at position
t given input image X . The number of positions is typically greater than the
length of the decoded words such that different label sequences decode the same
word. Following the original notation of [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], let F be the function mapping a
sequence of labels to a sequence of characters by merging consecutive identical
labels and deleting NaCs. Instead of calculating the probability5 of a string s,
we calculate the maximum probability P(π∗(s)|X )6 of any path collapsing to s
(this corresponds to the Viterbi approximation for HMMs):
        </p>
        <p>P(π∗(s)|X ) =</p>
        <p>T
max Y yt,πt .</p>
        <p>π∈F−1(s) t=1</p>
        <p>In the following, z denotes a single keyword. Since the ConfMat could contain
more than one word, the path probability of z must be calculated on a specific
submatrix:</p>
        <p>Ps:e(π∗(z )|X ) =</p>
        <p>e
max Y yt,πt ,
π∈F−1(z) t=s
5 To get the (CTC) probability one replaces the maximum operator by the sum:</p>
        <p>P(s|X ) = Pπ∈F−1(z) QtT=1 yt,πt .
6 In the following, we call P(π∗(s)|X ) the path probability of s.
where s and e are the start and end position of the submatrix within the
ConfMat. Since yt,l &lt; 1 for any t, l, the path probability typically decreases
if e − s increases. Thus, we accept a keyword z ranging from position s to e
of a certain ConfMat if path probability relative to the number of positions
Ps:e(π∗(z)|X )/(e − s + 1) is higher than a certain threshold.</p>
        <p>
          To ensure that the match is not only part of a larger word, the word has
to be separated by spaces, parentheses, hyphens etc. if it does not appear at
the beginning or the end of a line. This pattern can be described by a regular
expression: (.*[␣(-])? keyword ([␣)-].*)? . This search is accomplished using
the decoder described in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The result is a path π of length T aligned to the
ConfMat positions. The indices s and e are determined by the subpath πs:e
corresponding to the keyword (i.e. F (πs:e) = z). Because of F , s and e are still
not well defined since e.g. NaCs will be deleted. To avoid ambiguity, we use the
most greedy subpath. Again, if the probability of the separators does not exceed
a certain threshold, the match is assumed to be part of a larger word and it is
rejected.
        </p>
        <p>Multiple word queries are treated by searching the keywords individually.
Incorporating Hyphens The strategy above obviously does not work for
hyphens since the keyword is spread over two ConfMats. To treat also hyphens,
we search for pairs of consecutive ConfMats where the first ConfMat likely ends
on a hyphen symbol and extract the submatrices containing the last word of
the first matrix and the first word of second matrix. Both submatrices will be
combined and the new ConfMat is added to the list of all ConfMats such that
we can use the above strategy to search for hyphenated words in those combined
matrices.</p>
        <p>
          To search for ConfMats containing hyphenations, we simply search for
hyphens at the end again using the RegEx-Decoder from [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The used regular
expression is (.*␣)?[A-Za-z]+␣?-␣? which extracts the first part of the
hyphenation and the hyphen at once.
        </p>
        <p>Score Many of the word matches are false positives and will influence the result
in a negative way. The four evaluation measures penalize false matches with a
high score more than false positives with a relatively low score. Therefore, a
“good” score is crucial for a good evaluation.</p>
        <p>The path probability Ps:e(π∗(z)|X ) is an obvious score. This probability
reflects the maximum confidence that the input contains the word z.</p>
        <p>To our experience, the ability to learn dependencies between characters
depends highly on the training data. If many variations appear in the trained
character sequences, the network’s output will depend only weakly on character
transitions. Thus, the network will not be able to predict word priors. It rather
predicts the character sequence as accurate as possible. To incorporate also word
priors, we borrow some basic ideas from domain adaptation. The source domain
S is the domain learned by the neural network which includes those weak
dependencies on character sequences. The target domain T reflects the correct word
statistics. For sake of simplicity, assume for the moment that all input images
X reflect single word snippets. The only assumption of the below derivation is
that PS (X |z ) = PT (X |z ) which basically means that the fonts are the same.
The beauty of this approach is that the word distribution of training and test
data may differ. By Bayes law, we know
where</p>
        <p>PT (z |X ) =</p>
        <p>=
N =</p>
        <p>PS (z |X ) pS (X ) PT (z )</p>
        <p>PS (z ) pT (X )
1 PT (z )
N PS (z ) PS (z |X )
X PT (z 0)
z0 PS (z 0)</p>
        <p>PS (z0 |X ).</p>
        <p>(1)
In principle, PT (z ) could be any language model. In this task, we simply use
a word unigram. The source prior PS (z ) is a character transition probability
learned by the neural network. We estimate PS (z ) in three different ways:
c
– PS (z ) = PT (z ) thus target posterior is equal to the source posterior times
the normalization. We refer to this prior scheme as abs.
– PS (z ) ∝ 1, thus only the prior is used.7 We refer to this prior scheme as
prior.
– PS (z ) ∝ Q|iz=|1 P(zi) where zi is the character prior and 0 &lt; c ≤ 1.8 We
refer to this prior scheme as da.</p>
        <p>In last both schemes, prior and da, the prior probabilities are estimated up
to a constant factor 1/N 0 which is basically the reciprocal of the sum of the
estimates of PS (z ) over the finite set of all words z . We integrate N 0 into N for
both schemes.</p>
        <p>If the written characters of one word do not influence those of another, it
is reasonable to reestimate the word probability Ps:e(z |X ) within the positions
from s to e of the ConfMat according to eq. (1). Then, the normalization for
prior and da consists of the finite set of all word and part of word sequences
fitting in this submatrix. In the same way, we reestimate the path probability
Ps:e(π∗(z )|X ).</p>
        <p>To sort a set of words according to their probability on a specific submatrix of
a given ConfMat, there is no need to calculate the normalization 1/N . The
normalization is only crucial for comparing probabilities of different submatrices. To
analyze the impact of the normalization, we submitted results with (normed) and
without (unnormed) normalization. Typically, the vocabulary only represents a
7 It would be statistically more reasonable to model the character/label probability
by some constant such that PS (z) = c|z| or PS (z) = cT
8 In the submitted system, this character priors are estimated on the training set and
c = 0.5.
small part of all words of the considered language. Thus, it is impossible to sum
all these feasible words. Words not contained in the vocabulary are called
outof-vocabulary words (OOV words). Usually, the normalization constant N has to
be approximated.9</p>
        <p>Typically, our posterior probabilities are calculated using the path
probability. To investigate the impact of using the path probability as an approximation
of CTC we submitted comparable systems for both source posterior
probabilities. Using the CTC scheme, we only use the path probability to calculate s
and e. All the above equations we substitute the path probability P(π∗(z )|X )
by the CTC probability P(z |X ). We refer to these posterior scheme as path or
ctc, respectively.</p>
      </sec>
      <sec id="sec-2-6">
        <title>Combining Keywords for Multiple Word Queries For single word queries,</title>
        <p>we are already done. The matches can be saved for six consecutive segments. For
multiple word queries, all keywords have to be detected in a certain segment and
the order how they appear has to be the same as in the query.</p>
        <p>There is one score for each query. So the scores of the matches of multiple
word queries has to be merged to one score. We tested the minimum, the
arithmetic and geometric mean of the scores of each match. In our tests, arithmetic
and geometric yield almost the same error rate while the minimum of all scores
yielded significantly higher error rates. We worked with the geometric mean.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>This section reports the results of the Handwritten Scanned Document Retrieval
Task 2016 . The experiments are designed to combine the described decoding
components of Sect. 2.8:
– posterior probability (path and ctc)
– prior probability (abs, prior and da)
– normalization (normed and unnormed)
We were restricted to submit only 10 systems. Since we usually use the
probability of the most likely path instead of CTC probabilities, we skipped two decoding
schemes which use the CTC posterior and are not normalized at the same time.</p>
      <p>We are especially interested in the scores at segment level. The scores at box
level highly depend on the precise detection of a bounding box of a keyword.
Since this is out of our scope we concentrate our investigations on the segment
score. To get an impression of the differences, we refer to Table 4 in Appendix A
showing the same results as Table 2 only on box level. Appendix A also contains
the results of subsets of the keyword queries such as hyphenated or OOV words
for both the development and the test set.
9 In our experiments, we sum up the 10 most likely vocabulary matches plus an
additional OOV term if the best string (also raw output) is not contained in those
matches.
gAP
mAP
gNDCG
mNDCG
source prior
source posterior
normed
unnormed
normed
unnormed
normed
unnormed
normed
unnormed
gAP
mAP
gNDCG
mNDCG
source prior
source posterior
normed
unnormed
normed
unnormed
normed
unnormed
normed
unnormed
path</p>
      <p>Additionally, we submitted search results obtained by a neural network trained
with additional external data. Unfortunately, we accidentally also used data from
the HTRtS15 training set with overlaps with the development data set. The
resulting network yields improved results on the development set. Other additional
training data seems to fit poorly to the test data and thus confused the network.
Thus, the recognition rates on the test set decrease.
Normalization The Tables 2 and 3 show the impact of the normalization on
the four different measures. Normalizing the probabilities typically improves the
recognition score except for one configuration: If the source prior is equal to the
target prior (i.e. abs). Then the normalization can be counterproductive if the
data is different from the trained data. The network is trained to optimize the
unnormalized CTC probability. So it is not surprising that the system works well
if we use only the source posterior probability as score. If the network output
gets blurry (because of e.g. untrained writing styles), alternative results become
more likely compared to the proposed result. The normalization value will grow
for keywords which have a small edit distance to many other vocabulary words.
Thus, if the network is not able to make clear decisions, the normalization value
will much more depend on the keyword position in the vocabulary. Even for
the development set – where the data seems to fit the training data well – the
gain from the normalization is not significant. Thus, the normalization can be
omitted using the abs decoding scheme.</p>
      <p>If source and target prior differ, the posterior scale changes depending on the
word. Thus for different keywords, the scores are not comparable anymore. The
normalization maps the scores into the same range. Therefore, normalization
increases the recognition rate by around 7 gAP points for prior and da schemes
at test set (Table 3).</p>
      <p>All other tables show the same behavior. Therefore, we omit the row with
the unnormalized decoding schemes in Table 4 - 8.</p>
      <p>Path vs. CTC Probability The network is trained to optimize the CTC
posterior likelihood. Thus, it is not surprising that the CTC probability is
typically slightly better (less than 0.6 gAP points on the test set 3) than the path
probability except for few experimental setups: The box level gAP on the
development set (Table 4) and the gAP on the development set restricted to broken
words (Table 5). A query match may contain additional false keyword matches
although the query match on segment level is correct. These additional false
matches are penalized by the gAP on box level. Since the CTC probability is
typically higher than the path probability and the rejection thresholds stays
constant, there are more additional false keyword matches within a query match.
So the error increases for the CTC probability.</p>
      <p>Finally, the gap between path and CTC posterior probability is small for all
experiments. The path probability also preserves the relation between the source
prior and normalization decoding schemes. Thus, the path probability is a good
approximation.</p>
      <p>Priors The evaluation is even less clear than the one above. The results do not
only depend on different experimental setups (i.e. different tables) but also they
highly depend on the measure. Considering the development set (Table 2), the
prior scheme works slightly better (less than 0.6 gAP points) than abs and da.
The mAP measure puts more weight on the infrequent words. Thus, the abs
decoding scheme works better than the prior decoding scheme which naturally
favors frequent words.</p>
      <p>Compared to the development set, the results on the test set gain more
from including prior knowledge since the posterior probabilities are less reliable.
Especially, if the gAP value is measured, the da scheme yields better results
(greater than 3 gAP points) than the others. Measuring the mAP, the prior
decoding scheme is slightly better (less than 0.2 mAP points compared to da).</p>
      <p>In Table 8, the da scheme yields the lowest error rates independent of the
measure. This may indicate that the OOV prior could be improved. The current
estimation of an OOV prior is constant for all OOV words. For future research,
we plan to investigate a more sophisticated OOV prior such as a character
ngram of small order.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this paper we present the fundamental concepts of our systems submitted to
the Handwritten Scanned Document Retrieval Task 2016 attached to the CLEF
in 2016. We submitted 10 systems comparing different rescoring strategies.</p>
      <p>Unfortunately, there is no winning rescoring strategy. Normalization almost
always improves the score and typically the score is slightly better when using the
CTC posterior probability compared to the probability of the most likely path.
Nevertheless, the probability of the most likely path is a good approximation to
the CTC probability. Using domain adaptation to switch from the learned source
domain to the target domain, we scale the posterior probability by the target
prior - source prior ratio. Fixing the target prior (as unigram probability), we
vary the source prior. For all three tested source priors there are setups where
the way of calculating this specific prior is preferable. This might indicate that
the estimated prior does not fit the prior learned by the neural network.</p>
      <sec id="sec-4-1">
        <title>Acknowledgement</title>
        <p>First of all, the CITlab team really wishes to express its great gratitude to
our long-term technology &amp; development partner PLANET intelligent systems
GmbH (Raben Steinfeld &amp; Rostock, Germany) for the extremely valuable,
ongoing support in every aspect of this work. Participating in Handwritten Scanned
Document Retrieval Task 2016 would not have been possible without that! In
particular, we continued using PLANET’s software world which was developed
and essentially improved in various common CITlab–PLANET projects over
previous years.</p>
        <p>From PLANET’s side, our activities were essentially supported by Jesper
Kleinjohann and Richard Schwark, whom we especially thank for ongoing very
helpful discussions and his continuous development support.</p>
        <p>Being part of our current research &amp; development collaboration project, the
development work was funded by grant no. KF2622304SS3
(Kooperationsprojekt) in Zentrales Innovationsprogramm Mittelstand (ZIM) by Bundesrepublik
Deutschland (BMWi). The contest application has been adapted while
working in the EU Horizon 2020 project READ – Recognition and Enrichment of
Archival Documents (official no. 674943).</p>
        <p>Finally, we are indebted to the competition organizers from the PRHLT group
at UPV – in particular Mauricio Villegas – for setting up this evaluation and the
contest as well as the entire tranScriptorium project for providing all the data.
abs</p>
        <p>prior
path
ctc
path
ctc
path
ctc
path
abs
ctc
ctc
path
ctc
path
path
abs</p>
        <p>prior
ctc
path</p>
        <p>ctc
path
ctc
abs</p>
        <p>prior
ctc
path
ctc
path
da</p>
        <p>ctc
gAP
mAP
gNDCG
mNDCG
gAP
mAP
gNDCG
mNDCG
source prior
source posterior
path
28.55
38.21
52.69
39.89
abs
ctc</p>
        <p>prior
path
ctc
path
da</p>
        <p>ctc</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arvanitopoulos</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Susstrunk</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Seam carving for text line extraction on color and grayscale historical manuscripts</article-title>
          .
          <source>In: Frontiers in Handwriting Recognition (ICFHR)</source>
          ,
          <year>2014</year>
          14th International Conference on. pp.
          <fpage>726</fpage>
          -
          <lpage>731</lpage>
          . IEEE (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernández</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks</article-title>
          .
          <source>In: Proceedings of the 23rd international conference on Machine learning</source>
          . pp.
          <fpage>369</fpage>
          -
          <lpage>376</lpage>
          . ACM (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Leifert</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grüning</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strauß</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Labahn</surname>
          </string-name>
          , R.:
          <article-title>CITlab ARGUS for historical data tables: Description of CITlab's system for the ANWRESH-2014 Word Recognition task</article-title>
          .
          <source>Technical Report</source>
          <year>2014</year>
          /1,
          <string-name>
            <given-names>Universität</given-names>
            <surname>Rostock</surname>
          </string-name>
          (
          <year>Apr 2014</year>
          ), http://arXiv. org/abs/1412.6012
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Leifert</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Labahn</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strauß</surname>
          </string-name>
          , T.:
          <article-title>CITlab ARGUS for arabic handwriting: Description of CITlab's system for the OpenHaRT 2013 Document Image Recognition task</article-title>
          .
          <source>In: Proceedings of the NIST 2013 OpenHaRT Workshop [Online] (Aug</source>
          <year>2013</year>
          ), http://arXiv.org/abs/1412.6061, available: http://www.nist.gov/itl/ iad/mig/hart2013_wrkshp.cfm
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Leifert</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strauß</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grüning</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Labahn</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>Cells in multidimensional recurrent neural networks</article-title>
          .
          <source>arXiv preprint arXiv:1412.2620</source>
          (
          <year>2014</year>
          ), submitted to Journal of Machine Learning Research
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Leifert</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strauß</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grüning</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Labahn</surname>
          </string-name>
          , R.:
          <article-title>CITlab ARGUS for historical handwritten documents - Description of CITlab's System for the HTRtS 2015 Task : Handwritten Text Recognition on the tranScriptorium Dataset</article-title>
          .
          <source>Technical report, Universität Rostock (Apr</source>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Strauß</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leifert</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grüning</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Labahn</surname>
          </string-name>
          , R.:
          <article-title>Regular expressions for decoding of neural network outputs</article-title>
          .
          <source>Neural Networks</source>
          <volume>79</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          (
          <year>2016</year>
          ), http: //www.sciencedirect.com/science/article/pii/S0893608016000447
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Strauß</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grüning</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leifert</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Labahn</surname>
          </string-name>
          , R.:
          <article-title>CITlab ARGUS for historical handwritten documents: Description of CITlab's system for the HTRtS 2014 Handwritten Text Recognition task</article-title>
          .
          <source>Technical Report</source>
          <year>2014</year>
          /2,
          <string-name>
            <given-names>Universität</given-names>
            <surname>Rostock</surname>
          </string-name>
          (
          <year>Apr 2014</year>
          ), http://arXiv.org/abs/1412.3949
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , García Seco de Herrera,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Schaer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Bromuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Gilbert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Piras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Ramisa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Dellandrea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Gaizauskas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Mikolajczyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Puigcerver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Toselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.H.</given-names>
            ,
            <surname>Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.A.</given-names>
            ,
            <surname>Vidal</surname>
          </string-name>
          , E.:
          <article-title>General Overview of ImageCLEF at the CLEF 2016 Labs</article-title>
          . Lecture Notes in Computer Science, Springer International Publishing (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Villegas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Puigcerver</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toselli</surname>
            ,
            <given-names>A.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sánchez</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vidal</surname>
          </string-name>
          , E.:
          <article-title>Overview of the ImageCLEF 2016 Handwritten Scanned Document Retrieval Task</article-title>
          .
          <source>In: CLEF2016 Working Notes. CEUR Workshop Proceedings</source>
          , CEUR-WS.org &lt;http://ceurws.org&gt;, Évora,
          <source>Portugal (September 5-8</source>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>