=Paper=
{{Paper
|id=None
|storemode=property
|title=Web application for recognition of mathematical formulas
|pdfUrl=https://ceur-ws.org/Vol-788/paper7.pdf
|volume=Vol-788
|dblpUrl=https://dblp.org/rec/conf/itat/StriaP11
}}
==Web application for recognition of mathematical formulas==
https://ceur-ws.org/Vol-788/paper7.pdf
Web application for recognition of mathematical formulas?
Jan Stria1 and Daniel Průša2
1
Charles University, Malostranské nám. 25, 118 00 Prague 1, Czech Republic
stria.jan@gmail.com
2
Czech Technical University, Center for Machine Perception
Karlovo nám. 13, 121 35 Prague 2, Czech Republic
prusapa1@cmp.felk.cvut.cz
Abstract. We present a system for on-line mathematical singer and V. Hlavac in [15]. It is a general framework,
formulas recognition. The main principles of the method suitable for recognition of objects exhibiting a rich
we applied for recognition are explained. It is based on the structure. It has been also applied for musical
structural construction paradigm and utilizes a sort of two- scores [14] and electric circuits [5]. The power of the
dimensional grammars called coordinate grammars. Gram- structural construction is in driving symbols segmen-
mar productions are used to model spatial relationships
tation by the structural analysis, avoiding thus er-
among mathematical symbols. The system itself has been
rors that could be normally done during a standalone
developed as a web application in HTML5. We give details
on its client-server architecture and user interface. We also segmentation. Some of the known methods solve this
discuss advantages of the chosen approach. by error recovery techniques, but such techniques are
more complicated and less natural.
Our work is incremental. We started with the sim-
1 Introduction plistic subset of formulas and the proof-of-concept re-
cognition system implemented in Java [10, 11]. Later
Mathematical formulas recognition is a task of grow- we adopted the system to on-line inputs written on
ing importance. There are two domains in which it a tablet and we also extended the set of supported
can be applied. The first one covers scanned math- formulas [12].
ematical texts (books or notes) we want to convert This paper reports our recent results. We focus
into an electronic form automatically. We speak about on two topics. Principles of the developed recogni-
off-line formulas recognition. Another situation arises tion method are outlined in Section 2. We introduce
when we need to support entering formulas into ap- a revised formalism of the previously used two-dimen-
plications in a natural way, i.e. by drawing them by sional grammars – so called coordinate grammars. The
hand. This can be done on a tablet or simply using grammars model relationships among formula sym-
a mouse. In such a case, we do not recognize raster bols. The system itself has been completely rewritten
images but rather sets of strokes and we speak about in C# and includes several improvements. The new
on-line recognition. priority was also make the system accessible by a web
Several approaches to formulas recognition have application in which the user draws a formula and re-
been described in the literature [20, 8, 6]. A tax- ceives the output in TeX or MathML format. The web
onomy of the methods can be found in [2]. application architecture, implementation in HTML5
Only few published papers offer a publicly avail- and user interface design are described in Section 3.
able implementation for evaluation. We mention
three known applications giving good results. Infty
2 Method proposed for formulas
project [17, 16] targets primarily off-line recognition,
however, it includes a module for on-line recognition recognition
as well. xMathJournal [27] is a commercial software
The method consists of two phases which are com-
product which serves as a sophisticated calculator for
mon in pattern recognition – elementary symbols de-
handwritten mathematics. Microsoft Windows 7 pro-
tection and structural analysis. In our case, the first
vides a simple tool called Math Input Panel. It allows to
phase does not make any final decision what the ele-
draw a formula, recognize it, copy the result into the
mentary symbols are. Instead of it, only candidates to
clipboard and paste it to other programs (e.g. Micro-
such symbols are selected. It is completely up to the
soft Word).
structural analysis to decide then, which of the candi-
The approach we chose is motivated by the struc-
dates are really parts of the formula and what is their
tural construction paradigm presented by M.I. Schle-
meaning in the formula structure.
?
The authors were supported by the Grant Agency of the We have chosen grammar based structural analy-
Czech Republic under project P103/10/0783. sis, since it allows to express the syntax of the whole
48 Jan Stria, Daniel Průša
formula and fits thus well to the structural construc- 2
1
tion pattern. Non-grammar based approaches usually
recognize local configurations among symbols only and 3
transform them to graphs. We introduce a (2D) coor-
dinate grammar which can be viewed as an extension symbol variable V num. 6 frac. line minus square root
of the grammar described by R. Anderson in his sem- stroke(s) 1 3 2 2 1, 2
inal work [1]. The productions have context-free form
and are assigned by spatial constraints on the right- Fig. 1. Elementary symbol candidates: The input is a se-
quence of three strokes, each denoted by a number in the
hand side elements. The parsing is a bottom-up process
circle. The table lists subgroups of strokes recognized as
(not the top-down one like in [1]), since it allows bet- a symbol.
ter control the number of derived elements and reduce
them by pruning. Moreover, our formalism is stochas- knowledge of the formula structure. An example of
tic and penalty oriented – each derivation is assigned possible candidates is given in Figure 1.
by a real number, which determines its quality. In general, there are 2|S| subsets of S. It is obvious
The grammar is designed to recognize structures that not all these subsets have the potential to repre-
formed of terminal elements freely located in a plane. sent a symbol. For performance reasons, we propose
This is something different comparing to picture lan- several strategies that select and evaluate only some
guages [13] consisting of rectangular grids over finite of the subsets:
alphabets. We have to deal with the time complex-
1. A reasonable restriction is to limit the number
ity of parsing. In general, it can grow exponentially in
of strokes¡ that¢form one symbol: |S 0 | ≤ K. This
the number of elementary symbols. The well known
leaves O |S|K subsets to be evaluated by the
Cocke-Younger-Kasami algorithm ¡ [4,
¢ 19] recognizes OCRtool. K = 4 should fully comply with all the
context-free languages in time O n3 and can be gen-
symbols in VT .
eralized to a sort of 2D context-free grammars [15].
2. Construct a graph G where vertices correspond to
It both cases, the algorithm can rely on the sequen-
strokes and two vertices are connected iff the two
tial ordering of characters in strings or pictures. When
related strokes are ,,close enough” (various metrics
working with coordinate grammars, we are forced to
can be applied: the smallest distance between two
restrict the set of sentential forms allowed to be de-
points, the greatest distance, the average distance,
rived during the process to some sort of continuous
etc.). Consider only those S 0 ⊆ S, where |S 0 | ≤ K
areas. The task how to parse 2D structures in a plane
and all vertices in the graph that correspond to
effectively has been studied by others, especially by P.
strokes in S 0 are located in the same component
Viola and E. Miller [9, 7]. Some of the techniques we
of connectivity.
employ in our method coincide with their proposals. 3. Label each edge in G by the distance between the
related strokes. Find minimum spanning tree T .
2.1 Elementary symbols detection Search for subsets S 0 using T instead of G.
4. If we assume the user does not make any correc-
By a stroke s we mean a finite, non-empty sequence tions in the already written symbols, we can even
of points in a plane, i.e. s = ((x1 , y1 ), . . . , (xk , yk )), consider only the subsets of consecutive strokes
where xi , yi ∈ R, k ≥ 1. S 0 = {si , si+1 , . . . , si+k }, where k < K.
Let S = (s1 , . . . , sn ) be a sequence of strokes en-
0
tered to the system by the user. For i < j, we assume If the OCRtool finds a match of a subset S to some
si has been entered before sj . Moreover, we assume symbol, it provides a record consisting of the following
the following condition is fulfilled: items:
– recognized symbol identifier l ∈ VT
– one stroke is not a part of two or more different
– a penalty p expressing reliability of the recognition
symbols in the input formula
(p ∈ R+ , a lower value implies a bigger confidence)
The goal of the elementary symbols detection phase is – metrics M of the recognized symbol (a record stor-
to find each subset S 0 ⊆ S which, taken separately, is ing base line and mean line given relative within
recognized by an Optical Character Recognition tool the bounding box, see Figure 2)
(OCRtool) as an elementary symbol from a set of We denote tuple (S 0 , l, p, M ) as a terminal unit.
known symbols denoted VT (alphabet letters, num-
bers, mathematical operators, Greek letters, etc.).
2.2 Coordinate grammars
Note that there can be even more interpretations for
0
one subset S , the OCRtool can return several hypoth- Before we can describe how mathematical relation-
esis for it. The detection is performed without any ships are modeled by a coordinate grammar, we need
Formulas recognition 49
ascent
mean line
base line
descent
Fig. 2. Character metrics.
to define a structure to represent partial derivations Fig. 3. A spatial constraint evaluation example: Two con-
(an analogy to the sentential form). We denote this straining rectangles C1 and C2 are computed having the
structure a labeled group of strokes (over an input se- size and position given relative to symbol + which is the
quence of strokes S, a set of elementary symbols – ter- leading symbol in group S2 . The circled point in S1 , resp.
minals VT and a set of non-terminals VN ). It is a tuple S3 (so called a reference point) is required to be located
(S1 , l1 , p1 , T ) such that S1 ⊆ S, S1 6= ∅, l1 ∈ VN ∪ VT , in the C1 , resp. C2 . The constraint is fulfilled. Symbol +
p1 ∈ R+ and T = (S2 , l2 , p2 , M ) is a terminal unit, becomes the leading symbol of the newly derived group of
strokes.
where S2 ⊆ S1 and l2 ∈ VT . The interpretation is as
follows:
the mutual positions of leading symbols and bound-
– there is a sequence of derivations which result in ing boxes of labeled groups of strokes corresponding
assigning l1 to S1 to Si ’s. Figure 3 shows a constraint function evalua-
– penalty p expresses a confidence in the derivations tion example for the production that models a binary
– T determines so called leading symbol in S1 ; it is operation of the form:
a terminal recognized by the OCRtool as l2 , con-
sidered as the main (root) symbol among other BinOperation → Expression ¯ BinOperator ¯ Term.
symbols in S1 ; the leading symbol is determined The parsing algorithm starts with terminal units
by applied grammar productions produced by the terminals detection phase (they are
A coordinate grammar G is a tuple (VT , VN , I, P ), transformed to labeled groups of strokes first). Larger
where I ∈ VN is the initial non-terminal and P is groups of strokes are incrementally derived then. The
a set of productions of the form input is recognized iff (S, I, p, T ) is among the deriva-
tions. When there are more such groups, the resulting
N → A1 ¯ A2 ¯ . . . ¯ Ak , (1) one is the group with the lowest penalty. More details
on the parsing process can be found in [12].
where N ∈ VN and each Ai ∈ {VT ∪ VN }. Moreover,
each production is assigned by three functions: spatial
2.3 Notes on method implementation
constraint σ, penalty π and leading symbol selector µ.
To explain their meaning and define their parameters One of the weaknesses in the former Java implemen-
and functional values, let us consider k labeled groups tation was the quality of OCR results. We used own
of strokes S1 , . . . , Sk such that implementation based on the elastic matching tech-
nique [18, 3]. The OCRtool in the current version is
Si = (Si , li , pi , Ti ). a combination of two methods. We benefit partially
from the OCR provided by Microsoft .NET API which
Production (1) can be applied to derive
is quite robust. However, it does not support recog-
à k ! nition of all mathematical symbols, thus we are still
[
S= Si , N, p, T forced to maintain and use our OCR to detect certain
i=1 symbols. Nevertheless, we have made additional im-
provements to it and the overall OCRtool accuracy is
iff the following conditions are fulfilled: better now.
– Si ’s are pairwise disjunct The form of σ and π is designed based on a sta-
– ∀i ∈ {1, . . . , k} : li = Ai tistical model. Mutual placements of labeled groups of
– σ(S1 , . . . , Sk ) = true strokes are expressed by a normal distribution which
Pk of parameters are extracted from the training data for
– p = π(S1 , . . . , Sk ) + i=1 pi
– T = µ(T1 , . . . , Tk ) each production. This is another progress comparing
to Java version where we set up constraints manually.
Boolean function σ determines whether the produc- Grammar productions support constructs as sub-
tion can be applied at all, π penalizes the application scripts, superscripts, brackets, common unary and bi-
of the production and µ selects the resulting leading nary operators, fractions, integrals, sums, exponentia-
symbols. In the implementation, σ is strongly based on tion, square roots, functions and binomial coefficients.
50 Jan Stria, Daniel Průša
The related syntax is expressed by 200 productions web browsers found in Android and iOS operating sys-
stored in a text file. A sample from the file follows. tems. However despite of some optimizations editing
of formulas in these mobile web browsers in not al-
BinOperation->BinOp|Expression@L|Term@R ways as smooth as in desktop browsers. It is caused by
BinOp->[+] a poorer performance of mobile devices which is not
BinOp->[-] sufficient for a frequent rendering of formulas. But as
Fraction->[line]|Expression@T|Expression@B the performance of these mobile devices grows rapidly
Power->Factor|Expression@TR there is a well-founded reason to believe this will go
Root->[root]|Expression@TL|Expression@I better soon.
Our application for formulas recognition can be
Each line represents one production. On the left- started using immediately. This is especially impor-
hand side of the production, there is a source nonter- tant in that case when the potential user has only
minal symbol. On the right-hand side, there is a mix- several formulas to recognize and installing a new soft-
ture of nonterminal and terminal target symbols sep- ware would cost more time than writing the formula
arated by vertical lines. To distinguish the nonter- in TeX of MathML format by a hand. We have also
minal and terminal symbols, we enclose the termi- tried to make the user interface simple and friendly so
nals in brackets. The first of the target symbols is usage of our application should be intuitive enough.
a mandatory leading symbol. Potential following sym- The recognition engine itself runs at the server so it
bols are always denoted by special characters deter- can be easily updated. It is especially important in the
mining their spatial relation to the leading target sym- development phase when we can continuously provide
bol (e.g. @L stays for left which means that the appro- enhanced versions of the recognizer. We can also eas-
priate symbol is located on the left with respect to the ily collect all formulas supplied by users for a further
leading target symbol). analysis which helps us to improve the recognition.
The parameters defining σ, π and µ more precisely The disadvantage of choosing an implementation
are saved in a separate data file. as a web application is a permanent need of an Inter-
net connection to work with it. In addition this con-
3 Web application nection should have a low latency to ensure a true
real-time recognition during a formula insertion. On
We have developed two web applications, both utiliz- the other hand, high throughput of the connection is
ing the well known client-server architecture. The first not demanded as not much data is transferred between
application serves for sample mathematical formulas the client and the server.
collection. It allows to insert a mathematical formula
together with a name of the writer and store it at 3.1 jQueryInk widget
the server. Formulas collected by this application are
then used to train and test our recognizer. The second For both the data collection application and the recog-
application provides a functionality of the recognizer nition application the core part of the web user in-
itself. It continuously recognizes the formula during terface is a canvas that allows to insert mathemati-
its insertion and provides the recognition result to the cal formulas in the form of strokes. It supports writ-
user. ing new strokes, selecting and editing previously writ-
Both these applications comprise an user inter- ten groups of strokes, erasing strokes or clearing the
face running in a web browser at the client side that whole canvas. The component representing the canvas
communicates with web services at the server side. was developed in HTML5 and Javascript as a jQuery
The web interface was developed in HTML5 [22] and UI [24] widget. It is called jQueryInk [23] and it is pub-
Javascript. The web services were written in C# pro- licly available for download as an open-source project.
gramming language using WCF (Windows Communi- The jQueryInk widget internally uses a