<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Creating an Atlas over Handwritten Script Signs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anders Hast</string-name>
          <email>anders.hast@it.uu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lasse Martensson</string-name>
          <email>lasse.martensson@su.se</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ekta Vats</string-name>
          <email>ekta.vats@it.uu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raphaela Heil</string-name>
          <email>raphaela.heil@it.uu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Technology, Uppsala University</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Swedish Language and Multilingualism, Stockholm University</institution>
        </aff>
      </contrib-group>
      <fpage>175</fpage>
      <lpage>180</lpage>
      <abstract>
        <p>A framework for interactive visualization of script characteristics, as present in the form of handwritten letters, is proposed in this work. The basic idea behind this investigation is to lay the foundations for creating a comprehensive atlas over letter forms extracted from a large collection of handwritten documents, with minimal human guidance. The visualization of the results is based on the atlas metaphor and uses the t-SNE visualization method for creating island-like clusters that can be investigated using the proposed visualization framework. By changing a scale parameter one can investigate the dataset on di erent levels, i.e di erent sizes of the clusters.</p>
      </abstract>
      <kwd-group>
        <kwd>Handwritten script</kwd>
        <kwd>atlas</kwd>
        <kwd>visualization</kwd>
        <kwd>t-SNE</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Handwritten script contains an enormous amount of variation in form. In
normally executed handwritten script, each individual instance of a script sign, each
graph, is to some extent unique, even those produced by the same scribe in the
same document. However, in this variation certain regularities exist, and some
graphs display a larger degree of similarity than others. Even though each graph
produced by one speci c scribe is unique, as stated, they can still be assumed
to be similar to the extent that a reader can often observe them as being
produced by the same person. The similarities can furthermore be manifested on
di erent levels. The individual level, i.e. similarities between graphs produced
by the same person is one, but one can also account for similarities manifested
in script produced by di erent persons, but that have received the same
training and that have been active at the same time period. The latter category of
similarities becomes more relevant when we go back in time. During the Middle
Ages, certain script types existed, such as the Carolingian Minuscule and the
di erent Gothic variants (Textualis, Cursiva, Hybrida etc.). The time and place
largely determined which styles a scribe learned, but then the script of a certain
scribe also had individual features, unique to this individual.</p>
      <p>In this paper, we will address the issue of identifying handwritten script signs,
displaying similarities in a large set of handwritten documents. The documents
are modern, consisting of numbers, but this set has been chosen mainly for the
purpose of evaluating the method as such, and the next step will be to use
this method on other material. This is further described below, and the present
investigation should be seen as a proof of concept rather than as a concluded
task. The documents investigated here have been produced by di erent scribes,
and our aim is to let the computer cluster graphs that share characteristics in
form, even though they have been produced by di erent scribes. What is being
identi ed in the dataset is, thus, classes of graphs that share certain features
in form. The ultimate goal for this research track is to lay the foundations for
creating an atlas over script signs extracted from a large set of handwritten
documents, where the script signs have been clustered into classes on the basis
of their similarity to each other.</p>
      <p>It should be noted that the clustering process works on two levels of likeness.
Firstly, the computer registers the similarity regarding the basic shape of the
signs, in our case the di erent numbers. This means that, for instance, `0, 1, 2,'
etc. are divided into basic clusters on the basis of their basic form. Secondly, the
separate instances of the numbers are clustered on the basis of the more detailed
form, so that those instances of e.g. `7' that share characteristics in form are
clustered together. This will be demonstrated further below.</p>
    </sec>
    <sec id="sec-2">
      <title>Visualization framework</title>
      <p>
        To begin with, the proposed approach allows an end-user to select one or
several letters of interest using a simple drag-and-drop gesture to be searched in
the handwritten document collection. Typically, the user-marked bounding box
rectangle captures more letters than intended, and the background noise in the
document renders the user interaction to be more challenging. Therefore, a two
band-pass ltering approach based on [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is employed for automatic background
noise removal. Inspired by our previous work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the algorithm adjusts the
usermarked rectangle based on the input from the user in deciding which letters are
to be truly selected. The extent of the letters of interest is located, such that
the letters are perfectly encapsulated in the bounding box rectangle and are
noise-free. This semi-automatic process initiates faster collection of handwritten
letters for creating a comprehensive atlas over a variety of letter forms.
      </p>
      <p>
        As a proof of concept, the rst iteration of the atlas will be created on the
basis of the MNIST database [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which consists of 70,000 grey-level images (28x28
pixels) of handwritten digits (0-9), collected from more than 500 writers. This
dataset serves well as a rst use case for the evaluation of di erent clustering,
classi cation and visualization techniques, due to the limited grapheme count
and the high variance in writing styles, as a result of the large number of writers.
Once a working prototype has been established on this basis, the system can be
extended to more complex use cases, such as a higher number of graphemes (i.e.
whole alphabets), and a higher variance between writing styles.
      </p>
      <p>
        We used 10,000 images from the training dataset of MNIST and computed the
histograms of oriented gradients (HOG) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for each of the samples. HOG features
are based on the directions of gradients (i.e. changes of colour or intensity in an
image) and encode images into vectors with signi cantly fewer dimensions. This
dimensionality reduction, as well as their previous usage in the context of text
recognition (e.g [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) render HOG features interesting for our application.
      </p>
      <p>
        Following the conversion of images to HOG features, the t-distributed
stochastic neighbor embedding (t-SNE) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is applied to the data. This machine learning
technique reduces the number of dimensions further (to 2D or 3D), while
maintaining a representation of similarities among datapoints (i.e. similar points are
represented close and dissimilar points further from each other). Visualizing the
transformed datapoints in 2D (respectively 3D) results in a clustering, an
example of which is shown in Fig. 1 (for the aforementioned MNIST samples).
Ten rather distinct island-like clusters are formed where each point represents
a digit from 0-9. Typically, some instances end up in the wrong cluster since
they are written in such way that t-SNE is unable to distinguish their feature
representations from each other. Some write `1' as others would write `7' etc.
In other words, some digits are written quite sloppily and even a human cannot
always tell what number they actually represent. This will also be the case for
some graphemes such as `c' and `e'.
      </p>
      <p>In depth account of the technical details of the proposed approach is not
discussed in this paper, as the method is still in the experimental stages and
under development. To give an overview, we process the points resulting from
(a) The \islands" clustered on a rather ne
scale.
(b) Heat-maps of images in each
cluster. Note how the same digit
appears in di erent shapes.
(c) The same data visualized using a coarser
scale, generating fewer clusters.
(d) The resulting 11 clusters.</p>
      <p>Fig. 2: Visualization of the MNIST dataset. In (d), the resultant 11 clusters
captured the 10 digits even if `1' appears twice due to its elongated cluster,
which in turn is because of the two main ways of writing the digit. The number
above each image in (b) and (d) refers to the number of digits overlaid in the
heat-map. Figure best viewed in color.
t-SNE in order to be able to extract clusters using a certain level of detail or
scale, de ned by the user. This makes it possible to look at di erent parts of the
\island" appearing, and thereby also di erent clustering of the digits/graphemes,
depending on the desired scale. The result of varying the level of detail is shown
in Fig. 2. Each \island" is depicted by a varying colour that goes from blue to
yellow, depending on the density of the points in the cluster. The blue river-like
curves are showing the borders between the obtained clusters. Heat-maps are
also created where all digits in each cluster are aggregated and the average is
shown using di erent colours to indicate the so called \heat", going from blue
(low) to yellow (high), via green and orange, just as for the \islands". The more
yellow it is, the more similar the digits in the cluster in question are.</p>
      <p>By selecting one of the \islands" the user can examine it further. In Fig. 3
all numbers annotated as `1' have been chosen. The shape changes since only
a subset of the total number of features is used. The main clusters are shown
together with two cells in the heat-map, one that contains quite di erent looking
variants of the same digit while the other depicts quite similar looking digits.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion and future work</title>
      <p>This paper presented a visualization framework as a proof of concept that needs
to be further investigated as future work. The future aim is to build an interactive
visualization tool capable of handling user interaction e ectively.</p>
      <p>We intend to make use of the proposed approach on the medieval Swedish
handwritten documents, with the aim of creating an atlas of the script in this
geographic area, and during the period from the appearance of written records
(i.e. the 12th century), until the end of the middle ages (i.e. approximately
1520s). Such a catalogue is a very comprehensive yet crucial task, and will be
investigated in depth as future work. The details of the classi cation and the
structuring of the atlas, however, remain yet to be solved.
(a) The shape of the \island" becomes a bit
di erent when it is not a ected by the
digits/clusters.
(b) Heat-map of `1', showing the
main shapes of that number.</p>
      <p>(c) All digits in cell #7 (marked in
red in (b)), showing that they are
quite di erent.
(d) All digits in cell #12 (marked in green in
(b)), showing that they are indeed quite similar.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>LeCun</surname>
          </string-name>
          , Y.,
          <string-name>
            <surname>Cortes</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burges</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Mnist handwritten digit database</article-title>
          . AT&amp;
          <string-name>
            <surname>T Labs</surname>
          </string-name>
          [Online]. Available: http://yann. lecun. com/exdb/mnist 2 (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Vats</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hast</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Automatic document image binarization using bayesian optimization</article-title>
          .
          <source>In: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing</source>
          , ACM (
          <year>2017</year>
          )
          <volume>89</volume>
          {
          <fpage>94</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Vats</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hast</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>On-the- y historical handwritten text annotation</article-title>
          .
          <source>In: Document Analysis and Recognition (ICDAR)</source>
          ,
          <year>2017</year>
          14th IAPR International Conference on. Volume
          <volume>8</volume>
          ., IEEE (
          <year>2017</year>
          )
          <volume>10</volume>
          {
          <fpage>14</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dalal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Triggs</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Histograms of oriented gradients for human detection</article-title>
          .
          <source>In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05)</source>
          . Volume
          <volume>1</volume>
          . (
          <year>June 2005</year>
          )
          <volume>886</volume>
          {
          <fpage>893</fpage>
          vol. 1
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Almazn</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernndez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Forns</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Llads</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valveny</surname>
          </string-name>
          , E.:
          <article-title>A coarse-to- ne approach for handwritten word spotting in large scale historical documents collection</article-title>
          .
          <source>In: 2012 International Conference on Frontiers in Handwriting Recognition. (Sept</source>
          <year>2012</year>
          )
          <volume>455</volume>
          {
          <fpage>460</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Maaten</surname>
          </string-name>
          , L.v.d.,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.:
          <article-title>Visualizing data using t-sne</article-title>
          .
          <source>Journal of machine learning research 9(Nov)</source>
          (
          <year>2008</year>
          )
          <volume>2579</volume>
          {
          <fpage>2605</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>