Web application for recognition of mathematical formulas?

                                             Jan Stria1 and Daniel Průša2
                     1
                         Charles University, Malostranské nám. 25, 118 00 Prague 1, Czech Republic
                                                  stria.jan@gmail.com
                               2
                                 Czech Technical University, Center for Machine Perception
                                   Karlovo nám. 13, 121 35 Prague 2, Czech Republic
                                               prusapa1@cmp.felk.cvut.cz

Abstract. We present a system for on-line mathematical          singer and V. Hlavac in [15]. It is a general framework,
formulas recognition. The main principles of the method         suitable for recognition of objects exhibiting a rich
we applied for recognition are explained. It is based on the    structure. It has been also applied for musical
structural construction paradigm and utilizes a sort of two-    scores [14] and electric circuits [5]. The power of the
dimensional grammars called coordinate grammars. Gram-          structural construction is in driving symbols segmen-
mar productions are used to model spatial relationships
                                                                tation by the structural analysis, avoiding thus er-
among mathematical symbols. The system itself has been
                                                                rors that could be normally done during a standalone
developed as a web application in HTML5. We give details
on its client-server architecture and user interface. We also   segmentation. Some of the known methods solve this
discuss advantages of the chosen approach.                      by error recovery techniques, but such techniques are
                                                                more complicated and less natural.
                                                                    Our work is incremental. We started with the sim-
1     Introduction                                              plistic subset of formulas and the proof-of-concept re-
                                                                cognition system implemented in Java [10, 11]. Later
Mathematical formulas recognition is a task of grow-            we adopted the system to on-line inputs written on
ing importance. There are two domains in which it               a tablet and we also extended the set of supported
can be applied. The first one covers scanned math-              formulas [12].
ematical texts (books or notes) we want to convert                  This paper reports our recent results. We focus
into an electronic form automatically. We speak about           on two topics. Principles of the developed recogni-
off-line formulas recognition. Another situation arises         tion method are outlined in Section 2. We introduce
when we need to support entering formulas into ap-              a revised formalism of the previously used two-dimen-
plications in a natural way, i.e. by drawing them by            sional grammars – so called coordinate grammars. The
hand. This can be done on a tablet or simply using              grammars model relationships among formula sym-
a mouse. In such a case, we do not recognize raster             bols. The system itself has been completely rewritten
images but rather sets of strokes and we speak about            in C# and includes several improvements. The new
on-line recognition.                                            priority was also make the system accessible by a web
    Several approaches to formulas recognition have             application in which the user draws a formula and re-
been described in the literature [20, 8, 6]. A tax-             ceives the output in TeX or MathML format. The web
onomy of the methods can be found in [2].                       application architecture, implementation in HTML5
Only few published papers offer a publicly avail-               and user interface design are described in Section 3.
able implementation for evaluation. We mention
three known applications giving good results. Infty
                                                                2   Method proposed for formulas
project [17, 16] targets primarily off-line recognition,
however, it includes a module for on-line recognition               recognition
as well. xMathJournal [27] is a commercial software
                                                                The method consists of two phases which are com-
product which serves as a sophisticated calculator for
                                                                mon in pattern recognition – elementary symbols de-
handwritten mathematics. Microsoft Windows 7 pro-
                                                                tection and structural analysis. In our case, the first
vides a simple tool called Math Input Panel. It allows to
                                                                phase does not make any final decision what the ele-
draw a formula, recognize it, copy the result into the
                                                                mentary symbols are. Instead of it, only candidates to
clipboard and paste it to other programs (e.g. Micro-
                                                                such symbols are selected. It is completely up to the
soft Word).
                                                                structural analysis to decide then, which of the candi-
    The approach we chose is motivated by the struc-
                                                                dates are really parts of the formula and what is their
tural construction paradigm presented by M.I. Schle-
                                                                meaning in the formula structure.
?
    The authors were supported by the Grant Agency of the           We have chosen grammar based structural analy-
    Czech Republic under project P103/10/0783.                  sis, since it allows to express the syntax of the whole
48      Jan Stria, Daniel Průša

formula and fits thus well to the structural construc-                                                  2
                                                                                 1
tion pattern. Non-grammar based approaches usually
recognize local configurations among symbols only and                                               3
transform them to graphs. We introduce a (2D) coor-
dinate grammar which can be viewed as an extension             symbol variable V num. 6 frac. line minus square root
of the grammar described by R. Anderson in his sem-            stroke(s)   1       3        2        2      1, 2
inal work [1]. The productions have context-free form
and are assigned by spatial constraints on the right-         Fig. 1. Elementary symbol candidates: The input is a se-
                                                              quence of three strokes, each denoted by a number in the
hand side elements. The parsing is a bottom-up process
                                                              circle. The table lists subgroups of strokes recognized as
(not the top-down one like in [1]), since it allows bet-      a symbol.
ter control the number of derived elements and reduce
them by pruning. Moreover, our formalism is stochas-          knowledge of the formula structure. An example of
tic and penalty oriented – each derivation is assigned        possible candidates is given in Figure 1.
by a real number, which determines its quality.                   In general, there are 2|S| subsets of S. It is obvious
    The grammar is designed to recognize structures           that not all these subsets have the potential to repre-
formed of terminal elements freely located in a plane.        sent a symbol. For performance reasons, we propose
This is something different comparing to picture lan-         several strategies that select and evaluate only some
guages [13] consisting of rectangular grids over finite       of the subsets:
alphabets. We have to deal with the time complex-
                                                               1. A reasonable restriction is to limit the number
ity of parsing. In general, it can grow exponentially in
                                                                  of strokes¡ that¢form one symbol: |S 0 | ≤ K. This
the number of elementary symbols. The well known
                                                                  leaves O |S|K subsets to be evaluated by the
Cocke-Younger-Kasami algorithm      ¡ [4,
                                       ¢ 19] recognizes           OCRtool. K = 4 should fully comply with all the
context-free languages in time O n3 and can be gen-
                                                                  symbols in VT .
eralized to a sort of 2D context-free grammars [15].
                                                               2. Construct a graph G where vertices correspond to
It both cases, the algorithm can rely on the sequen-
                                                                  strokes and two vertices are connected iff the two
tial ordering of characters in strings or pictures. When
                                                                  related strokes are ,,close enough” (various metrics
working with coordinate grammars, we are forced to
                                                                  can be applied: the smallest distance between two
restrict the set of sentential forms allowed to be de-
                                                                  points, the greatest distance, the average distance,
rived during the process to some sort of continuous
                                                                  etc.). Consider only those S 0 ⊆ S, where |S 0 | ≤ K
areas. The task how to parse 2D structures in a plane
                                                                  and all vertices in the graph that correspond to
effectively has been studied by others, especially by P.
                                                                  strokes in S 0 are located in the same component
Viola and E. Miller [9, 7]. Some of the techniques we
                                                                  of connectivity.
employ in our method coincide with their proposals.            3. Label each edge in G by the distance between the
                                                                  related strokes. Find minimum spanning tree T .
2.1   Elementary symbols detection                                Search for subsets S 0 using T instead of G.
                                                               4. If we assume the user does not make any correc-
By a stroke s we mean a finite, non-empty sequence                tions in the already written symbols, we can even
of points in a plane, i.e. s = ((x1 , y1 ), . . . , (xk , yk )),  consider only the subsets of consecutive strokes
where xi , yi ∈ R, k ≥ 1.                                         S 0 = {si , si+1 , . . . , si+k }, where k < K.
    Let S = (s1 , . . . , sn ) be a sequence of strokes en-
                                                                                                                 0
tered to the system by the user. For i < j, we assume If the OCRtool finds a match of a subset S to some
si has been entered before sj . Moreover, we assume symbol, it provides a record consisting of the following
the following condition is fulfilled:                            items:
                                                                   – recognized symbol identifier l ∈ VT
  – one stroke is not a part of two or more different
                                                                   – a penalty p expressing reliability of the recognition
     symbols in the input formula
                                                                     (p ∈ R+ , a lower value implies a bigger confidence)
The goal of the elementary symbols detection phase is              – metrics M of the recognized symbol (a record stor-
to find each subset S 0 ⊆ S which, taken separately, is              ing base line and mean line given relative within
recognized by an Optical Character Recognition tool                  the bounding box, see Figure 2)
(OCRtool) as an elementary symbol from a set of                      We denote tuple (S 0 , l, p, M ) as a terminal unit.
known symbols denoted VT (alphabet letters, num-
bers, mathematical operators, Greek letters, etc.).
                                                                 2.2 Coordinate grammars
Note that there can be even more interpretations for
              0
one subset S , the OCRtool can return several hypoth- Before we can describe how mathematical relation-
esis for it. The detection is performed without any ships are modeled by a coordinate grammar, we need
                                                                                             Formulas recognition        49

                 ascent

               mean line
               base line
                descent


                Fig. 2. Character metrics.


to define a structure to represent partial derivations          Fig. 3. A spatial constraint evaluation example: Two con-
(an analogy to the sentential form). We denote this             straining rectangles C1 and C2 are computed having the
structure a labeled group of strokes (over an input se-         size and position given relative to symbol + which is the
quence of strokes S, a set of elementary symbols – ter-         leading symbol in group S2 . The circled point in S1 , resp.
minals VT and a set of non-terminals VN ). It is a tuple        S3 (so called a reference point) is required to be located
(S1 , l1 , p1 , T ) such that S1 ⊆ S, S1 6= ∅, l1 ∈ VN ∪ VT ,   in the C1 , resp. C2 . The constraint is fulfilled. Symbol +
p1 ∈ R+ and T = (S2 , l2 , p2 , M ) is a terminal unit,         becomes the leading symbol of the newly derived group of
                                                                strokes.
where S2 ⊆ S1 and l2 ∈ VT . The interpretation is as
follows:
                                                                the mutual positions of leading symbols and bound-
 – there is a sequence of derivations which result in           ing boxes of labeled groups of strokes corresponding
   assigning l1 to S1                                           to Si ’s. Figure 3 shows a constraint function evalua-
 – penalty p expresses a confidence in the derivations          tion example for the production that models a binary
 – T determines so called leading symbol in S1 ; it is          operation of the form:
   a terminal recognized by the OCRtool as l2 , con-
   sidered as the main (root) symbol among other                BinOperation → Expression ¯ BinOperator ¯ Term.
   symbols in S1 ; the leading symbol is determined         The parsing algorithm starts with terminal units
   by applied grammar productions                       produced by the terminals detection phase (they are
A coordinate grammar G is a tuple (VT , VN , I, P ), transformed to labeled groups of strokes first). Larger
where I ∈ VN is the initial non-terminal and P is groups of strokes are incrementally derived then. The
a set of productions of the form                        input is recognized iff (S, I, p, T ) is among the deriva-
                                                        tions. When there are more such groups, the resulting
                   N → A1 ¯ A2 ¯ . . . ¯ Ak ,       (1) one is the group with the lowest penalty. More details
                                                        on the parsing process can be found in [12].
where N ∈ VN and each Ai ∈ {VT ∪ VN }. Moreover,
each production is assigned by three functions: spatial
                                                        2.3 Notes on method implementation
constraint σ, penalty π and leading symbol selector µ.
To explain their meaning and define their parameters One of the weaknesses in the former Java implemen-
and functional values, let us consider k labeled groups tation was the quality of OCR results. We used own
of strokes S1 , . . . , Sk such that                    implementation based on the elastic matching tech-
                                                        nique [18, 3]. The OCRtool in the current version is
                        Si = (Si , li , pi , Ti ).      a combination of two methods. We benefit partially
                                                        from the OCR provided by Microsoft .NET API which
Production (1) can be applied to derive
                                                        is quite robust. However, it does not support recog-
                           Ã k                    !     nition of all mathematical symbols, thus we are still
                             [
                     S=         Si , N, p, T            forced to maintain and use our OCR to detect certain
                            i=1                         symbols. Nevertheless, we have made additional im-
                                                        provements to it and the overall OCRtool accuracy is
iff the following conditions are fulfilled:             better now.
  – Si ’s are pairwise disjunct                             The form of σ and π is designed based on a sta-
  – ∀i ∈ {1, . . . , k} : li = Ai                       tistical model. Mutual placements of labeled groups of
  – σ(S1 , . . . , Sk ) = true                          strokes  are expressed by a normal distribution which
                              Pk                        of parameters  are extracted from the training data for
  – p = π(S1 , . . . , Sk ) + i=1 pi
  – T = µ(T1 , . . . , Tk )                             each  production.  This is another progress comparing
                                                        to Java version where we set up constraints manually.
Boolean function σ determines whether the produc-           Grammar productions support constructs as sub-
tion can be applied at all, π penalizes the application scripts, superscripts, brackets, common unary and bi-
of the production and µ selects the resulting leading nary operators, fractions, integrals, sums, exponentia-
symbols. In the implementation, σ is strongly based on tion, square roots, functions and binomial coefficients.
50     Jan Stria, Daniel Průša

The related syntax is expressed by 200 productions web browsers found in Android and iOS operating sys-
stored in a text file. A sample from the file follows.   tems. However despite of some optimizations editing
                                                         of formulas in these mobile web browsers in not al-
BinOperation->BinOp|Expression@L|Term@R                  ways as smooth as in desktop browsers. It is caused by
BinOp->[+]                                               a poorer performance of mobile devices which is not
BinOp->[-]                                               sufficient for a frequent rendering of formulas. But as
Fraction->[line]|Expression@T|Expression@B               the performance of these mobile devices grows rapidly
Power->Factor|Expression@TR                              there is a well-founded reason to believe this will go
Root->[root]|Expression@TL|Expression@I                  better soon.
                                                             Our application for formulas recognition can be
    Each line represents one production. On the left- started using immediately. This is especially impor-
hand side of the production, there is a source nonter- tant in that case when the potential user has only
minal symbol. On the right-hand side, there is a mix- several formulas to recognize and installing a new soft-
ture of nonterminal and terminal target symbols sep- ware would cost more time than writing the formula
arated by vertical lines. To distinguish the nonter- in TeX of MathML format by a hand. We have also
minal and terminal symbols, we enclose the termi- tried to make the user interface simple and friendly so
nals in brackets. The first of the target symbols is usage of our application should be intuitive enough.
a mandatory leading symbol. Potential following sym-         The recognition engine itself runs at the server so it
bols are always denoted by special characters deter- can be easily updated. It is especially important in the
mining their spatial relation to the leading target sym- development phase when we can continuously provide
bol (e.g. @L stays for left which means that the appro- enhanced versions of the recognizer. We can also eas-
priate symbol is located on the left with respect to the ily collect all formulas supplied by users for a further
leading target symbol).                                  analysis which helps us to improve the recognition.
    The parameters defining σ, π and µ more precisely        The disadvantage of choosing an implementation
are saved in a separate data file.                       as a web application is a permanent need of an Inter-
                                                         net connection to work with it. In addition this con-
3 Web application                                        nection should have a low latency to ensure a true
                                                         real-time recognition during a formula insertion. On
We have developed two web applications, both utiliz- the other hand, high throughput of the connection is
ing the well known client-server architecture. The first not demanded as not much data is transferred between
application serves for sample mathematical formulas the client and the server.
collection. It allows to insert a mathematical formula
together with a name of the writer and store it at         3.1   jQueryInk widget
the server. Formulas collected by this application are
then used to train and test our recognizer. The second     For both the data collection application and the recog-
application provides a functionality of the recognizer     nition application the core part of the web user in-
itself. It continuously recognizes the formula during      terface is a canvas that allows to insert mathemati-
its insertion and provides the recognition result to the   cal formulas in the form of strokes. It supports writ-
user.                                                      ing new strokes, selecting and editing previously writ-
     Both these applications comprise an user inter-       ten groups of strokes, erasing strokes or clearing the
face running in a web browser at the client side that      whole canvas. The component representing the canvas
communicates with web services at the server side.         was developed in HTML5 and Javascript as a jQuery
The web interface was developed in HTML5 [22] and          UI [24] widget. It is called jQueryInk [23] and it is pub-
Javascript. The web services were written in C# pro-       licly available for download as an open-source project.
gramming language using WCF (Windows Communi-                  The jQueryInk widget internally uses a <canvas>
cation Foundation) technology and they run on Mi-          element introduced by HTML5. It has a specified
crosoft Windows Server 2008 R2. The client and the         width and height and provides a Javascript interface to
server communicate via Ajax. The main reasons for          access and modify its surface dynamically. The inter-
choosing such an architecture were accessibility and       face comprises functions for drawing and transforma-
easy maintenance.                                          tion of 2D shapes and bitmaps. These functions are
     Both applications are accessible from an every        used by our widget to render strokes and additional
modern web browser supporting HTML5. They were             annotations (as selection polygon). The jQueryInk
successfully tested in the latest versions of Internet     widget can be easily created over a specified <div>
Explorer, Mozilla Firefox, Google Chrome, Apple Sa-        element by creating a jQuery object representing the
fari and Opera. They also work in the default mobile       element and calling ink() method on it.
                                                                                       Formulas recognition      51

<div id="myInk"></div>                                     many vertices of a polyline approximating the stroke
$("#myInk").ink( /* settings */ );                         lie inside the selection polygon. To speed up these
                                                           tests, results from the previous selection polygon tests
    Calling the ink() method creates a new <canvas>        are reused. Finally, when at least 2/3 of stroke approx-
element inside the <div> element and initializes a new     imating vertices lie in the selection polygon, the stroke
jQueryInk object with the settings specified by the        is selected. The described algorithm allows the user to
method parameters. These settings can be changed           draw a shape of a selection region freely. This is im-
later by calling the ink() method again. The most          portant because using a simple rectangular selection
important setting is a mode in which the widget is         region would not be sufficient due to a spatial com-
operating. Depending on this mode strokes are writ-        plexity of the formula. The described optimizations
ten, erased or selected while the mouse interacts with     ensure then a smooth selection.
the <canvas> element. The operating mode can be set
independently for left and right mouse buttons. The
settings parameters also specify handler functions for     3.2   Data collection
events generated by the widget which are fired when
something important happens as writing a new stroke,       The data collection application employs the described
erasing a stroke etc. Depending on the usage of the in-    jQueryInk widget on the client side to enable insertion
troduced jQueryInk widget, these handler functions         of a mathematical formula. The user is also asked to fill
can react on the events properly.                          his name to help us identify who wrote which formula.
    The jQueryInk widget also defines supporting data      Then the formula can be sent and stored at the server.
structures representing strokes, their collections etc.    This is archieved by serializing objects representing
These data can be accessed either directly through         strokes to a JSON format and sending them via an
the widgets interface or as a parameters of the pre-       Ajax request. The receiver of this request is a WCF
viously described events. The handler functions can        service which stores the strokes together with the filled
utilize the data, e.g. send it to the recognizer located   username and IP address of the sender.
on the server.                                                 We have already collected approximately 200 for-
    As it has been stated, our jQueryInk widget can        mulas. To use these data to train and test our rec-
operate in three modes:                                    ognizer we have to assign some semantics to them at
                                                           first. Unfortunately as far as we know there is no stan-
Writing: When it is in the writing mode and the user       dard file format for storing handwritten mathematical
is writing a stroke, our widget has to repeatedly render   formulas with annotations denoting their meaning. So
the stroke. However, because rendering of all strokes      we have developed our own XML file format based
would be time consuming, only added points are ren-        on the Presentation MathML [28] which is commonly
dered. After finishing writing the stroke the whole can-   used for representing math in web pages and other
vas is rendered once again to avoid artifacts.             documents. The first part of our annotated MathML
                                                           file comprises coordinates and timestamps of points
Erasing: In the erasing mode there are two meth-           forming individual handwritten strokes. Each of these
ods how to determine which strokes should be erased        strokes is also assigned an unique numeric identifier
when the user moves a mouse. The first method serves       that can be referenced in the second part of the doc-
for testing short strokes comprising only a few points     ument that comprises MathML description of the for-
and having a small bounding rectangle. These strokes       mula. To allow referencing strokes we have extended
are tested for proximity to a mouse path. The second       MathML by tags denoting single symbols and having
method tests longer strokes by checking their intersec-    identifiers of appropriate strokes as their attributes.
tion with a mouse path. We have to incorporate both            Before the MathML description of the formula can
these methods because its quite difficult for the user     be annotated by identifiers of strokes, the MathML
to intersect short strokes such as dots.                   notation itself has to be created. To accomplish that
                                                           we have developed a fairly simple desktop application.
Selection: When in the selection mode the closed path      It loads the strokes as they were stored at the server
of a mouse is continuously approximated by a poly-         and passes them to the Math Input Control [25] which
gon which determines the selection region. Also the        is a COM control in Microsoft Windows 7. This con-
strokes themselves are approximated by polylines to        trol is a panel on which surface the formula can be
reduce the count of their vertices. Each time the selec-   written by hand. The control continuously tries to rec-
tion polygon changes, its bounding rectangle its com-      ognize the formula and shows the recognition result.
puted. This rectangle is then tested for intersection      Simply said, the control does almost the same thing
with bounding rectangles of all strokes to determine       as our recognition application. The recognition result
potentially selected strokes. It is determined, how        can be accessed as a string containing Presentation
52      Jan Stria, Daniel Průša

MathML notation. Unfortunately, the strokes to be
recognized cannot be passed programmatically to the
control. Thus we developed an algorithm that utilizes
coordinates of the loaded strokes to move a mouse cur-
sor over the control simulating a real user interaction.
    Besides obtaining a meaning of the handwritten
formula in a MathML, the presented mechanism can
also be used for a comparison of our recognizer with
the Math Input Control. Although their recognizer
performs quite well its not definitely perfect. But when
used by a real user it allows to erase and rewrite arbi-
trary strokes or assign a meaning to a group of strokes
by a hand. So when the recognition of automatically
written strokes fails we input it to the panel by hand      Fig. 4. User interface of the recognizer: Shows the selec-
using the mentioned editing possibilities to always ob-     tion (number 1) and the editing of a meaning (variable q).
tain their correct description in MathML notation.          Result of the recognition is at the top.
    Once we have the MathML notation of the formula,
we process it by adding special tags for each symbol
and we can start annotating it. It has to be done by        the TeX format, the MathJax always converts it to an
a hand as the Math Input Control gives us only one          image that is then shown by the browser.
MathML string describing the whole formula. Using               Our goal was to develop a truly interactive appli-
our application, each symbol included in the formula        cation for formulas recognition. We wanted the user
has to be selected and bound with the appropriate           to see partial results of a recognition as he inserts the
symbol tag.                                                 formula. We also wanted to provide a possibility of se-
    The data collection application is available at [21].   lecting a group of strokes and determining its meaning.
                                                            This can help the recognizer a lot when a part of the
                                                            formula is not recognized correctly, e.g. one of the sym-
3.3   Data recognition                                      bols is recognized wrong. Then strokes forming this
                                                            symbol can be selected and the recognizer proposes all
The web interface of the formulas is shown in the           alternative meanings of that group of strokes. Assum-
figure 4. It also utilizes presented jQueryInk widget       ing that the correct meaning is present among these
for strokes insertion. Besides that it also comprises       alternatives, the user can choose it. Then the correct
a field showing the recognition result which is a for-      meaning is assigned to that group and the recognizer
mula represented in the Presentation MathML for-            has to respect it in a following recognition process.
mat. Although MathML is intended for incorporat-                To achieve the described interactivity, the
ing mathematics in web pages, its support in the cur-       Javascript client and the WCF service running at the
rent browsers is quite poor. Among the most used web        server have to communicate a lot. Figure 5 gives us
browsers only Mozilla Firefox and Opera support it.         a closer look at that communication and it also shows
Moreover, even these two browsers do not render ev-         how the user affects it. Each time the user enters the
erything correctly. Remaining browsers (Internet            web interface, a new asynchronous Ajax request is sent
Explorer, Google Chrome and Apple Safari) still do          to the service. The service generates a unique session
not support it at all. However, the MathML support          id and initializes a new instance of the recognizer for
is planned in future versions of all mentioned browsers.    this id. The session id is then sent back to the client
    To ensure a correct rendering of the recognized         for a further communication.
formulas in all current browsers, we incorporated the           Meanwhile the user could start inserting the for-
MathJax library [26]. It is written in Javascript and       mula, so it has to be checked whether he performed
it serves for a correct displaying of mathematics in all    some actions that need to be sent to the server. These
major web browsers. It can display mathematics in           actions comprehend insertion of a new stroke, erasing
both the MathML and TeX format. While displaying            an existing stroke, clearing the whole canvas, selection
mathematics in the MathML format, it automatically          of a group of strokes or determining a meaning of the
detects whether the browser supports the MathML na-         previously selected group of strokes. Every time one of
tively. If so, it lets the browser to render the MathML.    these actions is performed by the user, it is added to
If the browser lacks a MathML support, it converts          a queue of actions waiting to be sent to the server. We
the MathML to an image and lets the browser display         use this queue to ensure that there is always only one
this image instead. While displaying mathematics in         request from the client to the server being processed.
                                                                                          Formulas recognition       53

                                                                  Unfortunately, sending an Ajax request when
                                                              a web page is being left or a web browser is being
                                                              closed is not fully reliable. We have adopted an an-
                                                              other mechanism for releasing unused resources. The
                                                              service periodically checks all the sessions and if it
                                                              finds a session without any requests for some time, it
                                                              releases the recognizer allocated for it. This can lead to
                                                              releasing a recognizer for a session that has not been
                                                              accessed for a long time but has not truly ended at
                                                              the client side. When a request is received from such
                                                              a session, a new recognizer for a specified session id is
                                                              created and the client is asked to send all its strokes
                                                              to restore an original state of the recognizer.


Fig. 5. Client-server architecture of the recognizer: Shows
                                                              4    Conclusions
how user actions are handled by the client script and how
the client script communicates with the service running on    We have described a grammar based approach
a server.                                                     to mathematical formulas recognition. Our previous
                                                              results [11, 12] show that the proposed method is prac-
                                                              tical and can be implemented effectively. The imple-
When the session is initialized and the queue of actions      mentation is responsive and gives acceptable results.
is non-empty or every time a new action is performed          It is true that some OCR errors usually occur during
and there is no request being processed, the queue of         the recognition and even the structural analysis does
actions is emptied and a new request containing all           not help to prevent them all. We have addressed this
actions from the queue is sent to the server.                 problem in the new user interface which allows to cor-
    The request with the recent actions is sent as an         rect OCR results interactively.
asynchronous Ajax request so the user can keep editing            We have redesigned the architecture of the whole
the formula meanwhile. The request always contains            system and realized it as a web application. This gives
a forementioned session id so the service could pass          new possibilities for further improvements. Our prior-
the actions to the appropriate recognizer. The recog-         ity now is to assemble a sufficiently large set of formu-
nizer updates its set of strokes and provides the best        las, which will include samples from all common areas
estimation of what these strokes mean. When there             of mathematics, written by different users. We want
is an action denoting selection of a group of strokes,        to analyze such data and use it for tuning parameters
the recognizer also offers all possible meanings of these     of the grammar productions. We also plan to publish
strokes to allow the user to determine their meaning.         a database of annotated formulas to be used by others
And finally, when there is an action denoting a mean-         for testing purposes. We have not found any set of on-
ing determination of some group of strokes, the recog-        line formulas publicly available. There are only off-line
nizer uses it.                                                formulas [16] or samples of single on-line characters,
    After that, a response containing a recognition re-       so this kind of contribution could be valuable for the
sult is sent back to the client. The response can also        community.
contain possible meanings of a selected group of
strokes if the request comprised a strokes selection ac-
tion. The client script receives the response and shows References
the recognition result. It can also show possible mean-
ings of a previously selected group of strokes to let the
user to select the correct meaning. Then the queue of 1. R. Anderson: Syntax-directed recognition of hand-
                                                            printed two-dimensional mathematics. In Interac-
actions is checked and if it is non-empty a new request
                                                            tive Systems for Experimental Applied Mathematics,
is generated. This goes over and over.
                                                            pp. 436–459, London, 1968. Academic Press.
    Besides the request for a session initialization and 2. K.F. Chan and D.Y. Yeung: Mathematical expression
the request containing a queue of performed actions,        recognition: a survey. In IJDAR, vol. 3, pp. 3–15, 2000.
there is also the third one. It is sent synchronously 3. S. Hellkvist: On-line character recognition on small
when the user leaves our web interface and it notifies      hand-held terminals using elastic matching. Master’s
the service that the sessions ends and the recognizer       Thesis, Royal Institute of Technology, Department of
can be released.                                            Numerical Analysis and Computing Science, 1999.
54      Jan Stria, Daniel Průša

 4. T. Kasami: An efficient recognition and syntax analy-      20. R. Zanibbi, D. Blostein, and J.R. Cordy: Recogniz-
    sis algorithm for context-free languages. Scientific Re-       ing mathematical expressions using tree transforma-
    port AFCLR-65-758, Air Force Cambridge Research                tion. IEEE Transactions on Pattern Analysis and Ma-
    Laboratory, Bedford, Mass., USA, 1965.                         chine Intelligence, 24(11), 2002, 1455–1467.
 5. V.M. Kiyko: Recognition of objects in images of paper      21. Data collection apllication.
    based line drawings. In Third International Conference         http://mfr.aspone.cz/DataCollector.html.
    on Document Analysis and Recognition, pp. 970–973,         22. HTML5 working draft.
    Montreal, Canada, 1995.                                        http://www.w3.org/TR/html5.
 6. S. Lavirotte and L. Pottier: Mathematical formula          23. jQueryInk widget.
    recognition using graph grammar. In Proceedings of             http://plugins.jquery.com/project/Ink.
    the SPIE 1998, vol.3305, p p.44–52, San Jose, CA,          24. jQuery UI library. http://jqueryui.com.
    1998.                                                      25. Math Input Control reference.
 7. P. Liang, M. Narasimhan, M. Shilman, and P. Vi-                http://msdn.microsoft.com/en-us/library/
    ola : Efficient geometric algorithms for parsing in two        dd317324.aspx.
    dimensions. International Conference on Document           26. MathJax library. http://www.mathjax.org.
    Analysis and Recognition, pp. 1172–1177, 2005.             27. MathJournal 2.1.
 8. N. Matsakis: Recognition of handwritten mathematical           http://www.xthink.com/MathJournal.html.
    expressions. Master’s Thesis, Massachusetts Institute      28. MathML2 recommendation.
    of Technology, Cambridge, MA, May 1999.                        http://www.w3.org/TR/MathML2.
 9. E.G. Miller and P.A. Viola: Ambiguity and constraint
    in mathematical expression recognition. In AAAI
    ’98/IAAI ’98, pp. 784–791. American Association for
    Artificial Intelligence, 1998.
10. D. Průša and V. Hlaváč: 2D context-free grammars:
    Mathematical formulae recognition. In J. Holub and
    J. Zdarek, (eds), Proceedings of the Prague Stringol-
    ogy Conference, pp. 77–89, Czech Technical University
    in Prague, Czech Republic, 2006.
11. D. Průša and V. Hlaváč: Mathematical formulae recog-
    nition using 2d grammars. In Proceedings of the
    9th International Conference on Document Analysis
    and Recognition, vol. II, pp. 849–853, Curitiba, Brazil,
    2007.
12. D. Průša and V. Hlaváč: Structural construction for
    on-line mathematical formulae recognition. In Pro-
    ceedings of the Iberoamerican Conference on Pattern
    Recognition, pp. 317–324. Springer Verlag, September
    2008.
13. A. Rosenfeld: Picture languages - formal models of
    picture recognition. Academic Press, New York, 1979.
14. B. Savchynsky, M.I. Schlesinger, and M.O. Anochina:
    Parsing and recognition of printed notes. In Proceed-
    ings of the Conference Control Systems and Com-
    puters, pp. 30–38, Kiev, Ukraine, 2003. in Russian,
    preprint in English available.
15. M.I. Schlesinger and V. Hlaváč: Ten lectures on sta-
    tistical and structural pattern recognition. Vol. 24 of
    Computational Imaging and Vision. Kluwer Academic
    Publishers, Dordrecht, The Netherlands, 2002.
16. M. Suzuki: Infty project.
    http://www.inftyproject.org/en/index.html.
17. M. Suzuki, F. Tamari, R. Fukuda, S. Uchida, and
    T. Kanahori: Infty: an integrated ocr system for math-
    ematical documents. In DocEng ’03: Proceedings of
    the 2003 ACM Symposium on Document Engineering,
    pp. 95–104, New York, NY, USA, 2003. ACM.
18. C.C. Tappert: Cursive script recognition by elastic
    matching. IBM J. Res. Dev., 26, November 1982, 765–
    771.
19. D.H. Younger: Recognition of context-free languages in
    time n3 . Information and Control, 10, 1967, 189–208.