<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Recommendation Systems in Mathematical Character Recognition</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Vadim</forename><surname>Mazalov</surname></persName>
							<email>vmazalov@uwo.ca</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">The University of Western Ontario</orgName>
								<address>
									<postCode>N6A 5B7</postCode>
									<settlement>London</settlement>
									<region>Ontario</region>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Stephen</forename><forename type="middle">M</forename><surname>Watt</surname></persName>
							<email>stephen.watt@uwo.ca</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">The University of Western Ontario</orgName>
								<address>
									<postCode>N6A 5B7</postCode>
									<settlement>London</settlement>
									<region>Ontario</region>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Recommendation Systems in Mathematical Character Recognition</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">1E35C0DB23360AB00658CC1942D0D385</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T09:49+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Mathematical handwriting recognition</term>
					<term>Recommendation systems</term>
					<term>Character classification</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In handwritten text there are usually several accepted styles for forming each character. We hypothesize that in the handwriting of individuals there is a correlation among the styles used for characters, and that these correlations may be used to anticipate which styles particular writers will use for symbols that have not yet been seen. This approach may prove useful in the setting of mathematical handwriting recognition, where there are many symbols and it would be onerous to require writers to provide samples of every one in order to personalize handwriting recognition. We describe preliminary experiments using ideas from the area of recommendation systems to predict which styles writers will likely use for symbols they have not yet written. The experiments demonstrate that writers tend to use only a small fraction of the possible combinations of character writing styles, and there are correlations among the styles used for symbols.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Writing style has long been taken to be a personal characteristic of an individual. Certain specific forms, such as signatures, have been used as a primary form of authentication for centuries. Conversely, writing style has also been used to narrow or even determine document authorship, when the writer is not known. We also observe that the general shape of handwritten characters may look similar among groups of individuals, especially those that have similar background, e.g. locale or period of education. We are interested in online recognition of handwritten mathematics and are currently working on improving recognition of individual characters. Earlier, we developed a cloud-based handwriting recognition framework that allows a user to share training data among devices <ref type="bibr" target="#b4">[5]</ref>. As a side benefit to the developers, it facilitates access to the extensive amount of training data that can be indexed by different characteristics of the writer. Each new user is assigned a default training dataset. The dataset contains samples that represent different character styles (to be defined later) of the same symbol, some of which are likely to be similar to the handwriting of the new user.</p><p>However, the samples that represent character styles different from those of the new user make the training dataset noisy and may cause misclassification.</p><p>In our approach to classification, a character is represented by the coefficients of an approximation of trace curves with orthogonal polynomials <ref type="bibr" target="#b2">[3]</ref>. Recognition is based on computation of the distance to convex hulls of nearest neighbours in the space of coefficients of approximation of symbol strokes. Typically, the method does not require many training samples to discriminate a class. However, because there is a large number of classes in handwritten mathematics, the training dataset may contain tens of thousands of characters. Therefore, any form of automated or semi-automated training can be a valuable asset in this environment.</p><p>We are motivated by the wide and successful usage of recommendation systems on the Internet that are designed to recommend products to consumers, based on their purchasing history and the history of individuals with similar behaviour <ref type="bibr" target="#b0">[1]</ref>. In this work, we investigate similarity of character styles with respect to the writers who provided them and similarity of writers with respect to their styles. We also develop a method for semi-automated training of the recognizer by proposing character styles that are likely to be applicable to the handwriting of the new user, based on the styles that the user has already provided and the styles of writers with similar handwriting. This theory is based on the assumption that if a group of users writes some characters in the same style, it is likely that they will write certain other characters in the same style as well. An example is shown in Figure <ref type="figure">1</ref>. This assumption is supported by an experiment we sketch in this paper.</p><p>The remainder of the article is organized as follows. In Section 2 we define some basic concepts and explain the organization of test dataset. Section 3 describes the types of handwriting similarity in which we have interest, and how we might use this to predict character styles. Section 4 presents the experimental evaluation. Section 5 gives an example of how this information can be used. Section 6 gives some conclusions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Definitions and Organization of Data</head><p>In discussing similarity of handwriting we need to distinguish between various notions such as the similarity of individual symbols versus entire writing repertoires. We therefore introduce a few definitions to ensure clarity: A character or symbol or class represents a single-or multi-stroke handwritten letter that may include an accent, e.g. "a", "1", "Σ", etc. A style or character style refers to the way in which one character is written. For our purposes, this is given by the class and the direction and order of the strokes in which the sample has been written. Theoretically, the number of possible styles for a single class character of k strokes is 2 k k!, while in practice this number is not more than 3, even for samples with relatively large number of stokes. A writing style is a collection of character styles for a set of characters. It may be viewed as a set of (character, character style) pairs. We may refer to an author's</p><formula xml:id="formula_0">(a) (b)</formula><p>Fig. <ref type="figure">1</ref>. An example of characters written in a similar style (a) "9" and "a" are written clockwise, and (b) "a" and "9" are written counterclockwise writing style to mean all the character styles observed from that author. This definition is similar to the concept of handwriting style investigated in <ref type="bibr" target="#b1">[2]</ref>. A sample is a handwritten sample of one character provided by a user (test sample) or available in the dataset (training sample).</p><formula xml:id="formula_1">C C1 S 1 1 S 2 1 ... C2 S 1 2 S 2 2 ... ... (a) C k C k 1 ,S k 1 C k 2 ,S k 2 ... (b)</formula><p>The dataset for our experiments has the following structure: There is an alphabet of characters C with each character C i ∈ C having a set S i of corresponding character styles, as shown in Figure <ref type="figure" target="#fig_0">2</ref>(a). There is also a set of users U . For each user U j ∈ U there is a set of characters C j ⊂ C of interest to that user. For each character C j k ∈ C j there is a style S j k ∈ S k from the set of styles with which the user writes this symbol. Each character style represents a collection of samples -the actual handwritten symbols from the user input, Figure <ref type="figure" target="#fig_0">2(b</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">User-Style Similarity and Character Style Prediction</head><p>Collaborative filtering recommendation algorithms are typically divided in two categories, as described in <ref type="bibr" target="#b5">[6]</ref>. These are the item-based and user-based algorithms. Similarly, we investigate character style and writer similarity in our dataset. Further, we propose a method for prediction of character styles that are likely to be applicable to the writer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Style-Based Similarity</head><p>We propose the following measure to estimate the similarity of character styles. Consider two styles S i and S j , i = j and the styles belong to classes C i and C j respectively. Then the style-based similarity between S i and S j is computed as the ratio of the number of authors who have written the class C i and C j respectively in styles S i and S j to the total number of writers who provided samples for classes C i and C j . This may be computed as shown in Algorithm 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Algorithm 1 StyleSimilarity()</head><p>Input: Si,Sj -character styles of which to compute similarity Output: the similarity measure Ai ← list of authors who wrote character Ci in style Si. Aj ← list of authors who wrote character Cj in style Sj. A 0 i ← list of authors who wrote character Ci in any style. A 0 j ← list of authors who wrote character Cj in any style.</p><formula xml:id="formula_2">c ← 0 t ← 0 for all a in Ai do if a ∈ Aj then c ← c + 1, t ← t + 1, Aj ← Aj \ a else if a ∈ A 0 j then t ← t + 1 end if end if A 0 i ← A 0 i \ a end for for all a in Aj do if a ∈ A 0 j then t ← t + 1, A 0 i ← A 0 i \ a end if end for for all a in A 0 i do if a ∈ A 0 j then t ← t + 1 end if end for if t = 0 then</formula><p>return null {The dataset does not contain authors to compute the similarity between given character styles.} else return c/t end if User-Based Similarity In analogy with the style similarity, the user similarity measures the ratio of the number of classes written in the same character style to the total number of common classes provided by two authors.</p><p>It helps to determine whether for a given user there are other individuals who have similar writing styles and to suggest the character styles available from those individuals to the given user.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Prediction of Character Style</head><p>Let P (S 0 |S 1 , S 2 , ..., S n ) be the conditional probability that the character C 0 is written in style S 0 given that the user has provided character styles S 1 , S 2 , ..., S n . Then for a given symbol, the character style that is suggested to the user at the training phase can be found as where S is the set of character styles with which the subject symbol can be written. It can be computed with the chain rule The probability of the user to write n given character styles can be given as</p><formula xml:id="formula_3">P (∩ n k=1 S k ) = n k=1 P (S k | ∩ k−1 j=1 S j )</formula><formula xml:id="formula_4">P (∩ n k=1 S k )</formula><p>and computed as the ratio of the number of authors who write each of the classes in the corresponding character style to the total number of authors who provided samples for all of the corresponding characters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experimental Evaluation</head><p>In this section we present experimental results. The data set used for testing consisted of 50,703 individual handwritten characters in 242 classes, including Latin and Greek letters as well as mathematical symbols to take into account different forms and styles, as described in <ref type="bibr" target="#b2">[3]</ref>. Further, each sample is labeled with its style and the author who provided the sample. There are 369 writers in total.</p><p>For the style similarity, we obtained results demonstrated in Figure <ref type="figure">3</ref>, which shows the portion of pairs of character styles with similarity greater than of equal to a given value. The similarity was found between all combinations of pairs of styles in the collection. The portion is computed as the ratio of the number of pairs of styles with similarity greater than or equal to the given value to the total number of pairs of styles.</p><p>Writer similarity is presented in Figure <ref type="figure">4</ref>. It shows the portion of authors with similarity greater than or equal to a given similarity. The similarity was computed between all combinations of pairs of authors in the dataset. As it was described for the style similarity, the portion is computed as the ratio of the number of pairs of authors with similarity greater than or equal to the given value to the total number of pairs of authors. For the estimation of the character style prediction accuracy, the experimental runs were organized as follows. For each author, we randomized the list of character styles that the author provided. Then, for each style in the random list, we compute the conditional probability that the corresponding character is written in given style. Figure <ref type="figure" target="#fig_2">5</ref> presents the average prediction accuracy among all writers depending on the number of character styles n available from the author. From the results we can conclude that once an author provided more than 10 styles, we can predict with high accuracy what corresponding character styles the author will be using for other symbols.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Use Case: Training a Math Character Recognizer</head><p>We now describe an application of the style recommendation algorithm. Consider an application for training a recognizer, developed in our framework for pen-based multi-user online collaboration in mathematical domains <ref type="bibr" target="#b3">[4]</ref>. This application, a screenshot of which is in Figure <ref type="figure" target="#fig_3">6</ref>, is implemented as an extension of the framework. The extension is designed to collect and organize the training samples in character styles, symbols and catalogs as it is explained in Section 2. This training application is the subject for improvement by asking the user to select the styles suggested by the algorithm, that we present in this paper. Using the idea of style recommendation, the application can be enhanced to suggest styles and corresponding samples to a user, based on the history of styles that the user provided. The UI can be adjusted accordingly. This can speed up the training of a classifier, because new writers can simply accept the character styles that represent their handwriting and use samples from those styles to train the recognizer.</p><p>In concrete terms, our mathematical handwriting database contains 242 classes, and for best results 20 or 30 training samples are required. Although authors may use general, writer-independent recognition, some will want specialized, writer-specific training. With 242 classes, an author who wishes writer-specific recognition would have to give on the order of 5000 to 7000 samples, which is more than most users would be willing to do. Using the recommendation approach described here, a user's style could be detected without having to do full training.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion</head><p>We explained the structure of the training dataset, used in our recognition framework. We also briefly described the application for training the classifier. We presented preliminary results of applicability of ideas of recommendation systems to recognition of handwritten mathematical characters. In particular, we performed experiments for estimation of similarity of character styles with respect to writers who provided them, as well as estimation of similarity of writers with respect to their writing styles. Further, we proposed a method for semiautomated training of the classifier that can be used to enhance the described training application. The empirical evaluation demonstrates that about 95% accuracy of prediction of character styles from the writing style of an author can be achieved given 10 character styles from the user.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. (a) The structure of the dataset, (b) The structure of the user profile.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>1 )Fig. 3 .Fig. 4 .</head><label>134</label><figDesc>Fig. 3. Portion of pairs of character styles with similarity ≥ a given value</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 5 .</head><label>5</label><figDesc>Fig. 5. The character style prediction accuracy</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 6 .</head><label>6</label><figDesc>Fig. 6. The training application</figDesc><graphic coords="7,187.21,116.83,240.94,217.90" type="bitmap" /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Internet Recommendation Systems</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ansari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Essegaier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kohli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Marketing Research</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="363" to="375" />
			<date type="published" when="2000-08">Aug 2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A set of handwriting families: style recognition</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Crettez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Third International Conference on</title>
				<meeting>the Third International Conference on</meeting>
		<imprint>
			<date type="published" when="1995">1995. 1995</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="489" to="494" />
		</imprint>
	</monogr>
	<note>Document Analysis and Recognition</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Distance-based classification of handwritten symbols</title>
		<author>
			<persName><forename type="first">O</forename><surname>Golubitsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Watt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International J. Document Analysis and Recognition</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="133" to="146" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A streaming digital ink framework for multi-party collaboration</title>
		<author>
			<persName><forename type="first">R</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Mazalov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Watt</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-31374-5_6</idno>
		<ptr target="http://dx.doi.org/10.1007/978-3-642-31374-5_6" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 11th international conference on Intelligent Computer Mathematics</title>
				<meeting>the 11th international conference on Intelligent Computer Mathematics<address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="81" to="95" />
		</imprint>
	</monogr>
	<note>CICM&apos;12</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Writing on clouds</title>
		<author>
			<persName><forename type="first">V</forename><surname>Mazalov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Watt</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-31374-5_27</idno>
		<ptr target="http://dx.doi.org/10.1007/978-3-642-31374-5_27" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 11th international conference on Intelligent Computer Mathematics</title>
				<meeting>the 11th international conference on Intelligent Computer Mathematics<address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="402" to="416" />
		</imprint>
	</monogr>
	<note>CICM&apos;12</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Qualitative analysis of user-based and item-based prediction algorithms for recommendation agents</title>
		<author>
			<persName><forename type="first">M</forename><surname>Papagelis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Plexousakis</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.engappai.2005.06.010</idno>
		<ptr target="http://dx.doi.org/10.1016/j.engappai.2005.06.010" />
	</analytic>
	<monogr>
		<title level="j">Eng. Appl. Artif. Intell</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="issue">7</biblScope>
			<biblScope unit="page" from="781" to="789" />
			<date type="published" when="2005-10">Oct 2005</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
