<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Recommendation Systems in Mathematical Character Recognition</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science The University of Western Ontario London Ontario</institution>
          ,
          <country country="CA">Canada</country>
          <addr-line>N6A 5B7</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In handwritten text there are usually several accepted styles for forming each character. We hypothesize that in the handwriting of individuals there is a correlation among the styles used for characters, and that these correlations may be used to anticipate which styles particular writers will use for symbols that have not yet been seen. This approach may prove useful in the setting of mathematical handwriting recognition, where there are many symbols and it would be onerous to require writers to provide samples of every one in order to personalize handwriting recognition. We describe preliminary experiments using ideas from the area of recommendation systems to predict which styles writers will likely use for symbols they have not yet written. The experiments demonstrate that writers tend to use only a small fraction of the possible combinations of character writing styles, and there are correlations among the styles used for symbols.</p>
      </abstract>
      <kwd-group>
        <kwd>Mathematical handwriting recognition</kwd>
        <kwd>Recommendation systems</kwd>
        <kwd>Character classi cation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Writing style has long been taken to be a personal characteristic of an
individual. Certain speci c forms, such as signatures, have been used as a primary form
of authentication for centuries. Conversely, writing style has also been used to
narrow or even determine document authorship, when the writer is not known.
We also observe that the general shape of handwritten characters may look
similar among groups of individuals, especially those that have similar background,
e.g. locale or period of education. We are interested in online recognition of
handwritten mathematics and are currently working on improving recognition
of individual characters. Earlier, we developed a cloud-based handwriting
recognition framework that allows a user to share training data among devices [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. As
a side bene t to the developers, it facilitates access to the extensive amount of
training data that can be indexed by di erent characteristics of the writer. Each
new user is assigned a default training dataset. The dataset contains samples
that represent di erent character styles (to be de ned later) of the same
symbol, some of which are likely to be similar to the handwriting of the new user.
However, the samples that represent character styles di erent from those of the
new user make the training dataset noisy and may cause misclassi cation.
      </p>
      <p>
        In our approach to classi cation, a character is represented by the coe cients
of an approximation of trace curves with orthogonal polynomials [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Recognition
is based on computation of the distance to convex hulls of nearest neighbours
in the space of coe cients of approximation of symbol strokes. Typically, the
method does not require many training samples to discriminate a class. However,
because there is a large number of classes in handwritten mathematics, the
training dataset may contain tens of thousands of characters. Therefore, any
form of automated or semi-automated training can be a valuable asset in this
environment.
      </p>
      <p>
        We are motivated by the wide and successful usage of recommendation
systems on the Internet that are designed to recommend products to consumers,
based on their purchasing history and the history of individuals with similar
behaviour [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In this work, we investigate similarity of character styles with
respect to the writers who provided them and similarity of writers with respect
to their styles. We also develop a method for semi-automated training of the
recognizer by proposing character styles that are likely to be applicable to the
handwriting of the new user, based on the styles that the user has already
provided and the styles of writers with similar handwriting. This theory is based
on the assumption that if a group of users writes some characters in the same
style, it is likely that they will write certain other characters in the same style
as well. An example is shown in Figure 1. This assumption is supported by an
experiment we sketch in this paper.
      </p>
      <p>The remainder of the article is organized as follows. In Section 2 we de ne
some basic concepts and explain the organization of test dataset. Section 3
describes the types of handwriting similarity in which we have interest, and how
we might use this to predict character styles. Section 4 presents the
experimental evaluation. Section 5 gives an example of how this information can be used.
Section 6 gives some conclusions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>De nitions and Organization of Data</title>
      <p>In discussing similarity of handwriting we need to distinguish between various
notions such as the similarity of individual symbols versus entire writing
repertoires. We therefore introduce a few de nitions to ensure clarity:
A character or symbol or class represents a single- or multi-stroke handwritten
letter that may include an accent, e.g. \a", \1", \ ", etc.</p>
      <p>A style or character style refers to the way in which one character is written. For
our purposes, this is given by the class and the direction and order of the strokes
in which the sample has been written. Theoretically, the number of possible styles
for a single class character of k strokes is 2kk!, while in practice this number is
not more than 3, even for samples with relatively large number of stokes.
A writing style is a collection of character styles for a set of characters. It may be
viewed as a set of (character, character style) pairs. We may refer to an author's</p>
      <p>C2
C1</p>
      <p>...</p>
      <p>S11 S21 ... S12 S22 ...</p>
      <p>(a)
(b)</p>
      <p>
        Ck
C1k,S1k C2k,S2k
(b)
...
writing style to mean all the character styles observed from that author. This
de nition is similar to the concept of handwriting style investigated in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
A sample is a handwritten sample of one character provided by a user (test
sample) or available in the dataset (training sample).
      </p>
      <p>The dataset for our experiments has the following structure: There is an
alphabet of characters C with each character Ci 2 C having a set Si of
corresponding character styles, as shown in Figure 2(a). There is also a set of users U .
For each user U j 2 U there is a set of characters Cj C of interest to that user.
For each character Ckj 2 Cj there is a style Skj 2 Sk from the set of styles with
which the user writes this symbol. Each character style represents a collection
of samples { the actual handwritten symbols from the user input, Figure 2(b).
3</p>
    </sec>
    <sec id="sec-3">
      <title>User-Style Similarity and Character Style Prediction</title>
      <p>
        Collaborative ltering recommendation algorithms are typically divided in two
categories, as described in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. These are the item-based and user-based
algorithms. Similarly, we investigate character style and writer similarity in our
dataset. Further, we propose a method for prediction of character styles that are
likely to be applicable to the writer.
      </p>
      <p>Style-Based Similarity We propose the following measure to estimate the
similarity of character styles. Consider two styles Si and Sj , i 6= j and the styles
belong to classes Ci and Cj respectively. Then the style-based similarity between
Si and Sj is computed as the ratio of the number of authors who have written
the class Ci and Cj respectively in styles Si and Sj to the total number of writers
who provided samples for classes Ci and Cj . This may be computed as shown
in Algorithm 1.
Algorithm 1 StyleSimilarity()
Input: Si,Sj { character styles of which to compute similarity
Output: the similarity measure</p>
      <p>Ai list of authors who wrote character Ci in style Si.</p>
      <p>Aj list of authors who wrote character Cj in style Sj .</p>
      <p>Ai0 list of authors who wrote character Ci in any style.</p>
      <p>Aj0 list of authors who wrote character Cj in any style.
c 0
t 0
for all a in Ai do
if a 2 Aj then</p>
      <p>c c + 1, t t + 1, Aj Aj n a
else</p>
      <p>if a 2 Aj0 then t t + 1 end if
end if</p>
      <p>Ai0 Ai0 n a
end for
for all a in Aj do</p>
      <p>if a 2 Aj0 then t t + 1, Ai0 Ai0 n a end if
end for
for all a in Ai0 do</p>
      <p>if a 2 Aj0 then t t + 1 end if
end for
if t = 0 then
return null fThe dataset does not contain authors to compute the similarity
between given character styles.g
else</p>
      <p>return c=t
end if
User-Based Similarity In analogy with the style similarity, the user similarity
measures the ratio of the number of classes written in the same character style
to the total number of common classes provided by two authors.</p>
      <p>It helps to determine whether for a given user there are other individuals
who have similar writing styles and to suggest the character styles available
from those individuals to the given user.</p>
      <p>Prediction of Character Style Let P (S0jS1; S2; :::; Sn) be the conditional
probability that the character C0 is written in style S0 given that the user has provided
character styles S1; S2; :::; Sn. Then for a given symbol, the character style that
is suggested to the user at the training phase can be found as
max P (S0jS1; S2; :::; Sn)
S02S
(1)
where S is the set of character styles with which the subject symbol can be
written. It can be computed with the chain rule</p>
      <p>P (\kn=1Sk) =
The probability of the user to write n given character styles can be given as</p>
      <p>P (\kn=1Sk)
and computed as the ratio of the number of authors who write each of the classes
in the corresponding character style to the total number of authors who provided
samples for all of the corresponding characters.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experimental Evaluation</title>
      <p>
        In this section we present experimental results. The data set used for testing
consisted of 50,703 individual handwritten characters in 242 classes, including
Latin and Greek letters as well as mathematical symbols to take into account
di erent forms and styles, as described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Further, each sample is labeled
with its style and the author who provided the sample. There are 369 writers in
total.
      </p>
      <p>For the style similarity, we obtained results demonstrated in Figure 3, which
shows the portion of pairs of character styles with similarity greater than of equal
to a given value. The similarity was found between all combinations of pairs of
styles in the collection. The portion is computed as the ratio of the number of
pairs of styles with similarity greater than or equal to the given value to the
total number of pairs of styles.</p>
      <p>Writer similarity is presented in Figure 4. It shows the portion of authors
with similarity greater than or equal to a given similarity. The similarity was
computed between all combinations of pairs of authors in the dataset. As it was
described for the style similarity, the portion is computed as the ratio of the
number of pairs of authors with similarity greater than or equal to the given
value to the total number of pairs of authors.</p>
      <p>For the estimation of the character style prediction accuracy, the
experimental runs were organized as follows. For each author, we randomized the list of
character styles that the author provided. Then, for each style in the random
list, we compute the conditional probability that the corresponding character is
written in given style. Figure 5 presents the average prediction accuracy among
all writers depending on the number of character styles n available from the
author. From the results we can conclude that once an author provided more
than 10 styles, we can predict with high accuracy what corresponding character
styles the author will be using for other symbols.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Use Case: Training a Math Character Recognizer</title>
      <p>
        We now describe an application of the style recommendation algorithm.
Consider an application for training a recognizer, developed in our framework for
pen-based multi-user online collaboration in mathematical domains [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This
application, a screenshot of which is in Figure 6, is implemented as an extension
of the framework. The extension is designed to collect and organize the training
samples in character styles, symbols and catalogs as it is explained in Section 2.
This training application is the subject for improvement by asking the user to
select the styles suggested by the algorithm, that we present in this paper. Using
the idea of style recommendation, the application can be enhanced to suggest
styles and corresponding samples to a user, based on the history of styles that
the user provided. The UI can be adjusted accordingly. This can speed up the
training of a classi er, because new writers can simply accept the character styles
that represent their handwriting and use samples from those styles to train the
recognizer.
      </p>
      <p>In concrete terms, our mathematical handwriting database contains 242 classes,
and for best results 20 or 30 training samples are required. Although authors
may use general, writer-independent recognition, some will want specialized,
writer-speci c training. With 242 classes, an author who wishes writer-speci c
recognition would have to give on the order of 5000 to 7000 samples, which is
more than most users would be willing to do. Using the recommendation
approach described here, a user's style could be detected without having to do full
training.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>We explained the structure of the training dataset, used in our recognition
framework. We also brie y described the application for training the classi er. We
presented preliminary results of applicability of ideas of recommendation
systems to recognition of handwritten mathematical characters. In particular, we
performed experiments for estimation of similarity of character styles with
respect to writers who provided them, as well as estimation of similarity of writers
with respect to their writing styles. Further, we proposed a method for
semiautomated training of the classi er that can be used to enhance the described
training application. The empirical evaluation demonstrates that about 95%
accuracy of prediction of character styles from the writing style of an author can
be achieved given 10 character styles from the user.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ansari</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Essegaier</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kohli</surname>
          </string-name>
          , R.:
          <source>Internet Recommendation Systems. Journal of Marketing Research</source>
          <volume>37</volume>
          (
          <issue>3</issue>
          ),
          <volume>363</volume>
          {375 (Aug
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Crettez</surname>
            ,
            <given-names>J.P.:</given-names>
          </string-name>
          <article-title>A set of handwriting families: style recognition</article-title>
          .
          <source>In: Document Analysis and Recognition</source>
          ,
          <year>1995</year>
          .,
          <source>Proceedings of the Third International Conference on. vol. 1</source>
          , pp.
          <volume>489</volume>
          {494 vol.
          <volume>1</volume>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Golubitsky</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Watt</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          :
          <article-title>Distance-based classi cation of handwritten symbols</article-title>
          .
          <source>International J. Document Analysis and Recognition</source>
          <volume>13</volume>
          (
          <issue>2</issue>
          ),
          <volume>133</volume>
          {
          <fpage>146</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mazalov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Watt</surname>
            ,
            <given-names>S.M.:</given-names>
          </string-name>
          <article-title>A streaming digital ink framework for multi-party collaboration</article-title>
          .
          <source>In: Proceedings of the 11th international conference on Intelligent Computer Mathematics</source>
          . pp.
          <volume>81</volume>
          {
          <fpage>95</fpage>
          . CICM'12, Springer-Verlag, Berlin, Heidelberg (
          <year>2012</year>
          ), http://dx.doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -31374-
          <issue>5</issue>
          _
          <fpage>6</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Mazalov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Watt</surname>
            ,
            <given-names>S.M.:</given-names>
          </string-name>
          <article-title>Writing on clouds</article-title>
          .
          <source>In: Proceedings of the 11th international conference on Intelligent Computer Mathematics</source>
          . pp.
          <volume>402</volume>
          {
          <fpage>416</fpage>
          . CICM'12, Springer-Verlag, Berlin, Heidelberg (
          <year>2012</year>
          ), http://dx.doi.org/10. 1007/978-3-
          <fpage>642</fpage>
          -31374-5_
          <fpage>27</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Papagelis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plexousakis</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Qualitative analysis of user-based and item-based prediction algorithms for recommendation agents</article-title>
          .
          <source>Eng. Appl. Artif. Intell</source>
          .
          <volume>18</volume>
          (
          <issue>7</issue>
          ),
          <volume>781</volume>
          {789 (Oct
          <year>2005</year>
          ), http://dx.doi.org/10.1016/j.engappai.
          <year>2005</year>
          .
          <volume>06</volume>
          .010
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>