=Paper=
{{Paper
|id=Vol-2718/paper09
|storemode=property
|title=Solvers for Mathematical Word Problems in Czech
|pdfUrl=https://ceur-ws.org/Vol-2718/paper09.pdf
|volume=Vol-2718
|authors=Jan Kadlec,Daniel Průša
|dblpUrl=https://dblp.org/rec/conf/itat/KadlecP20
}}
==Solvers for Mathematical Word Problems in Czech==
<pdf width="1500px">https://ceur-ws.org/Vol-2718/paper09.pdf</pdf>
<pre>
                          Solvers for Mathematical Word Problems in Czech

                                                           Jan Kadlec, Daniel Průša

                                        Czech Technical University, Faculty of Electrical Engineering
                                            Karlovo nám. 13, 121 35 Prague 2, Czech Republic
                                                   {kadlej24,prusa}@fel.cvut.cz

Abstract: We study the task of an automatic evaluation of                  next development was not so rapid since this initial study
mathematical word problems, which belongs to the cate-                     until 2014 when Kushman et al. published their solution
gory of natural language processing and has become pop-                    based on semantic interpretation and information extrac-
ular in recent years. Since all the so far published meth-                 tion [9]. Several other works then followed.
ods were developed for inputs in English, our goal is to                      The survey [12] suggests that three different princi-
review them and propose solutions able to cope with in-                    ples for solving word problems automatically can be dis-
puts in the Czech language. We face the question whether                   tinguished. The simplest approach works with patterns
we can achieve a competitive accuracy for a natural lan-                   which are sequences of words. Each pattern represents
guage with flexible word order, and with the assumption                    one type of word problems. Classification is done by de-
that only a relatively small dataset of training and testing               tecting the sequences in inputs. The approach is used e.g.
data is available.                                                         by WolframAlpha. It might be problematic to use it for
   We propose and evaluate two methods. One relies on                      languages with flexible word order.
a rule-based processing of dependency trees computed by                       The second approach relies on syntactic analysis. Syn-
UDPipe. The other method builds on machine learning. It                    tactic dependencies among words are used to extract infor-
transforms word problems into numeric vectors and trains                   mation from a given word problem. A disadvantage might
SVM to classify them. We also show that it improves in                     be a dependence on linguistic tools. System ARIS [5] is
a combination with a search for predefined sequences of                    an example of this type of solver.
words and word classes, achieving 75% accuracy on our
                                                                              The most efficient approach is to apply machine learn-
dataset of 500 Czech word problems.
                                                                           ing. Kushman et al. [9] use machine learning together with
                                                                           information extraction and semantic interpretation. The
1    Introduction                                                          role of the machine learning part is to estimate parame-
                                                                           ters of a probabilistic model. Huang et al. [6] apply neural
The research in an automatic evaluation of mathemati-                      networks and reinforcement learning. Their method works
cal word problems has nowadays a good practical moti-                      without an explicit information extraction and is even able
vation. This motivation comes from the fact that a nat-                    to solve types of word problems that were not present in
ural language is the easiest way how many people can                       the training dataset. On the other hand, the method fails
express themselves. Hence, it is useful to have methods                    sometimes because of generating numbers not involved in
that analyze assignments described by words, transform                     the input instance.
them into an internal representation, calculate results and                   We can compare the accuracy of the methods since both
present them in a descriptive form. This is exactly what                   were evaluated on dataset Alg514 [9] consisting of 514
is aimed by computational knowledge engines like Wol-                      word problems. The method from [9] achieved 70% ac-
framAlpha1 . Mathematical word problems can be seen as                     curacy while the method from [6] achieved 82.5% accu-
one of the most complicated text assignments. Moreover,                    racy. However, it is a known fact that the accuracy of
the complexity of translating a mathematical text into sym-                all solvers drops significantly down when they are applied
bolic math scales with the difficulty of math involved in                  to instances from a large and much more diverse dataset.
the problem solution. Hence, from a scientific point of                    One such a dataset, Dolphin18K consisting of 18.000 word
view, word problems represent a very suitable domain for                   problems, was introduced in [7]. None of the solvers eval-
research.                                                                  uated in [7] and [6] was able to solve correctly more than
   The interest in automatic evaluation of word problems                   34% instances from this dataset. This shows that solving
dates back to 1963. It was identified in [4] as one of the im-             math word problems automatically is a really challenging
portant challenges for artificial intelligence in upcoming                 task.
decades. A first attempt to implement an automatic solver                     All the systems discussed above were developed for
was done by Bobrow within his diploma thesis [2]. The                      word problems in English. It is therefore natural to ask
                                                                           how the used principles would work for inputs in other lan-
       Copyright c 2020 for this paper by its authors. Use permitted un-
der Creative Commons License Attribution 4.0 International (CC BY
                                                                           guages, especially if the language is in some sense more
4.0).                                                                      difficult for analysis, for example, due to more flexible
     1 https://wolframalpha.com                                            word order.
   In this paper, we present a study of methods suitable for    electronic form partially by using OCR, but manual edit-
processing word problems in Czech, but we also present          ing and several corrections of OCR outputs were also
some new linguistically independent ideas. As no suit-          needed. Each word problem was annotated by the ex-
able annotated data has existed so far, it was necessary to     pected answer (a non-negative integer) and also the arith-
build a new dataset. We have collected 500 word problems        metic expression used to derive the answer.
intended for the first three classes of elementary schools.        Let us list four samples from the dataset, together with
These word problems can be solved by substituting num-          their English translations:
bers from the problem to an arithmetic expression. Hence,
the problems are not so complex as some of those in             W1 Jana má 4 pastelky nové a 3 pastelky staré. Kolik
Alg514 and Dolphin18K that also include problems whose             pastelek má Jana?
solution requires to assemble and solve systems of equa-           (Jane has 4 new crayons and 3 old crayons. How
tions. Nevertheless, we consider our dataset to be suffi-          many crayons does Jane have?)
cient for an initial study, which can serve as a step towards
methods processing more complex word problems. Fur-             W2 Pavel má 120 Kč. Ota má 80 Kč. Kolik korun mají
thermore, a solver working reliably on the considered do-          dohromady?
main might be appreciated by the youngest pupils, espe-            (Paul has 120 CZK. Otto has 80 CZK. How many
cially if there is a possibility to provide them a step by         Czech korunas do they have together?)
step explanation of how to solve a given problem. This
                                                                W3 Petr přečetl 9 knih. Eva přečetla 6 krát více knih.
should justify enough the focus of our research.
                                                                   Kolik knih Eva přečetla?
   We touch all three principles discussed before and pro-
                                                                   (Peter read 9 books. Eve read 6 times more books.
pose two solvers. The first one is based on a support vec-
                                                                   How many books did Eve read?)
tor machine (SVM) classifier and it is further extended
by incorporating the patterns-based classification to han-      W4 Maminka koupila 20 tvarohových koláčků a 15
dle some of the easier inputs. The advantage of this solver        makových koláčků. Děti 8 koláčů snědly. Kolik
is that it does not require any knowledge of semantic (un-         koláčů zůstalo?
like the system from [5]). The accuracy achieved on the            (Mom bought 20 cheesecakes and 15 poppy seed
created dataset is 75%. The second solution builds on syn-         cakes. The children ate 8 cakes. How many cakes
tactic analyses. It first calls UDPipe [11] to create syn-         are left?)
tactic dependency trees for all sentences, then it traverses
the trees, tries to retrieve information on logical entities      Word problem W1 is represented in a text file as follows.
encoded in the problem, processes the question part and
computes the result. Unlike the first solution, the extracted   Jana má 4 pastelky nové a 3 pastelky staré.
information could be used to output details on how to solve     Kolik pastelek má Jana? | 7 | NUM1 + NUM2
the problem. On the other hand, the accuracy is much
                                                                   As it can be seen, there are three parts separated by
worse when compared to the first solution.
                                                                the pipe characters. The first part is a word problem, the
   The collected dataset and implementation of the pro-
                                                                second part is the answer and the third part is the expres-
posed methods are freely available2 .
                                                                sion solving the problem. Variables in the expression are
   The content of the paper is structured as follows. Sec-
                                                                strings of the form NUMx where x is the order number of
tion 2 describes our dataset of 500 word problems in
                                                                the referenced numeric value in the assignment.
Czech. The next three sections describe and evaluate the
proposed principles. Section 3 discusses how to extract            Note that this approach has some limitations. For ex-
and use patterns, Section 4 explains the use of SVM, Sec-       ample, it does not allow to reference numbers expressed
tion 5 focuses on retrieving information from dependency        by words like day (has 24 hours) or June (has 30 days).
trees produced by UDPipe. Finally, Section 6 summarizes         However, our dataset does not contain word problems re-
the achieved results, gives additional thoughts on the pre-     laying on these translations of words to numbers.
sented methods and outlines possible future work.                  A word problem itself consists of two parts – a word
                                                                problem assignment (WPA-part) and a question (QUE-
                                                                part). Each word problem in WP500CZ fulfills that its
2   WP500CZ dataset                                             QUE-part is a simple sentence asking for one detail and
                                                                the expression which gives the answer is exclusively com-
We have created a dataset called WP500CZ comprising             posed of addition, subtraction, multiplication and division
of 500 word problems taken from the schoolbook [10].            operations.
We have further divided the whole dataset into train-
ing and testing subsets, denoted WP500CZ-train and              Task 1. Given a word problem W , the goal of an auto-
WP500CZ-test, respectively, each comprising 250 word            matic word problem evaluation is to find an expression E
problems. The word problems were transformed into an            solving W .
    2 https://github.com/hkad98/problem_solver                  Remark 1. We say that W is of type E.
                                                                                                                                                                                                                                                                                                                                             (to be NUM NOUN high)
 160
                                                                                                                                                                                                                                                                                                                                             kolik NOUN zbýt NOUN v NOUN | NUM1 - NUM2
 140
                                                                                                                                                                                                                                                                                                                                             (how many NOUN to left NOUN in NOUN)
 120                                                                                                                                                                                                                                                                                                                                         kolik NOUN NOUN zbýt | NUM1 - NUM2
                                                                                                                                                                                                                                                                                                                                             (how many NOUN NOUN to left)
 100
    80                                                                                                                                                                                                                                                                                                                                       Note that each line contains a pattern separated by the pipe
                                                                                                                                                                                                                                                                                                                                             character from the expression used for classification when
    60                                                                                                                                                                                                                                                                                                                                       the pattern is matched. In addition, NOUN and NUM
    40                                                                                                                                                                                                                                                                                                                                       in the pattern part represent word classes (noun and nu-
                                                                                                                                                                                                                                                                                                                                             meral).
    20
                                                                                                                                                                                                                                                                                                                                                We propose an algorithm which obtains the patterns au-
    0                                                                                                                                                                                                                                                                                                                                        tomatically. Let us describe how it extracts e.g. WPA-
                                                                                                                                                                                                                                                                                                                                             patterns. Let S1 , . . . , Sk be the partition of training dataset
           NUM1 * NUM2
                         NUM1 * NUM2 - NUM1
                                              NUM1 + NUM1 * NUM2
                                                                   NUM1 + NUM1 - NUM2
                                                                                        NUM1 + NUM1 / NUM2
                                                                                                             NUM1 + NUM2
                                                                                                                           NUM1 + NUM2 + NUM1
                                                                                                                                                NUM1 + NUM2 + NUM3
                                                                                                                                                                     NUM1 + NUM2 - NUM3
                                                                                                                                                                                          NUM1 - NUM2
                                                                                                                                                                                                        NUM1 - NUM2 + NUM3
                                                                                                                                                                                                                             NUM1 - NUM2 - NUM3
                                                                                                                                                                                                                                                  NUM1 / NUM2
                                                                                                                                                                                                                                                                NUM2 + NUM3
                                                                                                                                                                                                                                                                              NUM2 - NUM1
                                                                                                                                                                                                                                                                                            NUM2 - NUM3
                                                                                                                                                                                                                                                                                                          NUM2 / NUM1
                                                                                                                                                                                                                                                                                                                        NUM3 - NUM1 - NUM2
                                                                                                                                                                                                                                                                                                                                             by word problem types (i.e., each Si consists of word
                                                                                                                                                                                                                                                                                                                                             problems of the same type). Maps P1 , . . . , Pk are al-
                                                                                                                                                                                                                                                                                                                                             located to collect the found patterns (stored as keys) and
                                                                                                                                                                                                                                                                                                                                             their number of occurrences (stored as values). All these
Figure 1: Histogram of word problem types in WP500CZ.                                                                                                                                                                                                                                                                                        maps are initially empty. In the end, each Pi contains pat-
                                                                                                                                                                                                                                                                                                                                             terns extracted from Si .
                                                                                                                                                                                                                                                                                                                                                The algorithm has three phases. First, it process the sub-
   Types of the word problem in WP500CZ and their fre-                                                                                                                                                                                                                                                                                       sets Si one by one. For each Si , it iterates through all
quencies are depicted in Figure 1.                                                                                                                                                                                                                                                                                                           pairs U,V ∈ Si , U 6= V . Let u and v be the WPA-part of
   As it can be seen, approximately one-third of the in-                                                                                                                                                                                                                                                                                     U and V , respectively. Moreover, for a sequence of words
stances are of type NUM1 + NUM2, one-third are of type                                                                                                                                                                                                                                                                                       x = (x1 , . . . , xs ), let `(x) be the sequence (x10 , . . . , xs0 ) where
NUM1 − NUM2, and the remaining instances are of 11                                                                                                                                                                                                                                                                                                        
different types (note that these types include NUM2 −                                                                                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                                                                                           NOUN                      if xi is a noun,
NUM1, which is considered as a type different to NUM1−                                                                                                                                                                                                                                                                                                      ADJ                       if xi is an adjective,
                                                                                                                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                                                    xi0 =
NUM2). This corresponds to the distribution of word                                                                                                                                                                                                                                                                                                        NUM
                                                                                                                                                                                                                                                                                                                                                                                     if xi is a numeral,
problem types in [10].                                                                                                                                                                                                                                                                                                                                      the lemma of xi           otherwise.
                                                                                                                                                                                                                                                                                                                                                          

                                                                                                                                                                                                                                                                                                                                             A longest common subsequence of `(u) and `(v), denoted
3        Pattern-based classification                                                                                                                                                                                                                                                                                                        Pu,v , is found. If Pi does not yet contain Pu,v , the sequence
                                                                                                                                                                                                                                                                                                                                             is inserted to Pi and its number of occurrences is set to 1.
In this section, we describe a classification of word                                                                                                                                                                                                                                                                                        If Pu,v is already in Pi , the number of occurrences of Pu,v
problems based on patterns extracted from a training                                                                                                                                                                                                                                                                                         is incremented.
dataset. This approach is able to handle reliably a                                                                                                                                                                                                                                                                                             In the second phase, the algorithm discards sequences
subset of instances (we will show that the patters ex-                                                                                                                                                                                                                                                                                       that were not detected frequently. It iterates through sets
tracted from WP500CZ-train apply to 54% instances of                                                                                                                                                                                                                                                                                         Pi and deletes from Pi each sequence P for which the
WP500CZ-test). For this reason, we later combine the                                                                                                                                                                                                                                                                                         number of occurrences does not exceed τ · |Pi | where τ ∈
method with machine learning.                                                                                                                                                                                                                                                                                                                (0, 1) is a suitable constant.
   A pattern is a sequence of word lemmas and classes.
                                                                                                                                                                                                                                                                                                                                                Finally, the third phase of the algorithm discards every
Patterns are extracted from WPA-parts and QUE-parts
                                                                                                                                                                                                                                                                                                                                             sequence P which is shared by two distinct sets Pi , P j .
separately and we call them WPA-patterns and QUE-
                                                                                                                                                                                                                                                                                                                                                In terms of the time complexity, the first phase is the
patterns, respectively. Given a word problem W , it
                                                                                                                                                                                                                                                                                                                                             most demanding. Let us assume that the number of words
matches a WPA-pattern (QUE-pattern) P if the WPA-part
                                                                                                                                                                                                                                                                                                                                             in the WPA-part as well as QUE-part of considered          word
(QUE-part) of W contains a subsequence of words whose
                                                                                                                                                                                                                                                                                                                                             problems is at most N. Moreover, let S = ki=1 Si .
                                                                                                                                                                                                                                                                                                                                                                                                     S
lemmas/word classes are in the one-to-one correspondence
                                                                                                                                                                                                                                                                                                                                             Since the basic dynamic programming algorithm finding
with the elements of P, preserving the order of the ele-
                                                                                                                                                                                                                                                                                                                                             the longest common subsequence       for two sequences of
ments.
                                                                                                                                                                                                                                                                                                                                             length at most N works in O N 2 time [3], the time com-
   To illustrate the form of patterns, we give two exam-
                                                                                                                                                                                                                                                                                                                                             plexity of the first phase, denoted T1 (S ), satisfies
ples of WPA-patterns followed by two examples of QUE-
patterns.                                                                                                                                                                                                                                                                                                                                                           k   !        k
                                                                                                                                                                                                                                                                                                                                                                                          !
                                                                                                                                                                                                                                                                                                                                                                   |Si | 2      2       2
                                                                                                                                                                                                                                                                                                                                                  T1 (S ) = O ∑          N = O N ∑ |Si |
NOUN a NUM krát málo NOUN | NUM1 / NUM2                                                                                                                                                                                                                                                                                                                             2
                                                                                                                                                                                                                                                                                                                                                              i=1                 i=1
(NOUN and NUM times little NOUN)                                                                                                                                                                                                                                                                                                                               2    2
                                                                                                                                                                                                                                                                                                                                                                      
být o NUM NOUN vysoký | NUM1 + NUM2                                                                                                                                                                                                                                                                                                                       = O N |S | .
When WPA-patterns are available, they are ordered into a        support vector machine (SVM) [1] to classify the inputs by
sequence PWPA = P1 , . . . , Pn which fulfills that a pattern   their types (expressions). In the experimental part, we also
Pj is a proper subsequence of a pattern Pi if and only if       show that it is worth to combine results produced by SVM
 j > i. Analogously, the extracted QUE-patters are ordered      and the pattern matching from Section 3.
into a sequence PQUE with the same property. If a word             We decided to use SVM because it is a classifier able
problem W with a WPA-part u and QUE-part v is given,            to cope with high-dimensional data and it is not prone
the pattern-based classification iterates through elements      to overfitting. A flexibility is achieved through a kernel
of PQUE until it finds a pattern PQUE that matches u, or the    choice, which gives a possibility to linearly separate non-
end of PQUE is reached. If PQUE is found, the classifica-       linearly separable data by mapping them into a higher di-
tion result is the expression assigned to PQUE . Otherwise      mension. It also does not require a large dataset for train-
the algorithm iterates through PWPA and tries to find a         ing, unlike neural networks.
pattern PWPA matching v, which again determines the clas-          Our method of representing word problems by numeric
sification result. No result is returned if PQUE neither PWPA   vectors is semantic-independent. It is based on histograms
is found.                                                       of words occurring in word problems of the same type. For
                                                                this purpose we define
3.1    Experiments                                               1. a frequency function ω(w, p, E) which for a word w
                                                                    in its canonical form, p ∈ {WPA, QUE} and a word
The described procedures work with lemmatized inputs.
                                                                    problem type E is equal to the number of word prob-
We thus used corpus SYN2015 [8] to build a vocabulary
                                                                    lems of type E in the training dataset that contain a
where keys are words extracted from the corpus and values
                                                                    word of the canonical form w in the p-part, and
are pairs consisting of a word class and lemma. The am-
biguity of some words is resolved by taking into account         2. a weight function α(c, p) which for a word class c
their frequency. The built vocabulary has about 1 million           and p ∈ {WPA, QUE} is equal to an integer deter-
items.                                                              mining the importance of word class c in the classifi-
   The proposed pattern extraction algorithm was imple-             cation process.
mented in Python 3. It returned 28 WPA-patterns and 8
QUE-patterns for instances in WP500CZ-train. The run-             The values of the frequency function are calculated
ning time was 4.9 [s] on a notebook equipped with Intel(R)      based on the training dataset. For example, we obtained
Core(TM) i5-5250U CPU @ 1.60GHz, 8GB RAM and                    the following values for verb mít (to have) and adverb
MacOS.                                                          dohromady (together) from WP500CZ-train:
   Table 1 shows the efficiency of the subsequent pattern-
based classification. Three possible classification out-                      ω(mít, WPA, NUM1 + NUM2) = 27 ,           (1)
comes are distinguished – a correct classification, incor-                    ω(mít, QUE, NUM1 + NUM2) = 28 ,           (2)
rect classification and no result when the classification al-                 ω(mít, WPA, NUM1 − NUM2) = 25 ,           (3)
gorithm does not return any expression.
                                                                              ω(mít, QUE, NUM1 − NUM2) = 11 ,           (4)
                        correct   incorrect    no result              ω(dohromady, WPA, NUM1 + NUM2) = 0 ,              (5)
      WP500CZ-train         85          10          155               ω(dohromady, QUE, NUM1 + NUM2) = 6 ,              (6)
      WP500CZ-test         107           8          135               ω(dohromady, WPA, NUM1 − NUM2) = 0 ,              (7)
Table 1: The accuracy of pattern matching on WP500CZ.                 ω(dohromady, QUE, NUM1 − NUM2) = 0 .              (8)

                                                                   Values of the weight function were learned by a genetic
  The collected patterns were detected in nearly one-           algorithm, taking SVM accuracy as the objective function.
half of the word problems in WP500CZ-test (115 out of              When the functions ω and α are known, we transform a
250). In those cases, the reliability of classification was     word problem W to a numeric vector x using the following
quite high, achieving 93% accuracy (107 correct classifi-       algorithm.
cations). Interestingly, the accuracy was better than in the
case of training data, however, there were fewer instances       1. Components of x are indexed by types of word prob-
matching a pattern.                                                 lems in the training dataset (thus the vector dimension
                                                                    coincides with the number of SVM classes) and all of
                                                                    them are initially set to 0.
4     Machine learning
                                                                 2. Words of W are taken one by one. Let the current
This section proposes a solver based on machine learning.           word be of a lemma w and word class c. Moreover,
The main idea is to come up with a suitable transforma-             let it be from a p-part of W . For each word problem
tion of a word problem into a numeric vector of fixed di-           type E, it is checked if ω(w, p, E) is defined and the
mension (a so-called feature vector) and train a multi-class        expression E evaluates to a non-negative integer after
      substituting the numeric values from W to E (i.e., E
      is feasible to W ). If true, then the E-component of x
      is increased by α(c, p) · ω(w, p, E).
                                                                      0.5
   Let us demonstrate how the frequencies (1)-(8) are used
to calculate the numeric vector for word problem W1 in                0.0
Section 2. For simplicity, assume that word problems in
                                                                      0.5
the training dataset are only of types NUM1 + NUM2 and
NUM1 − NUM2, and that α(c, p) = 1 for all c and p (i.e.,
                                                                      1.0
we will ignore the weigh function).
   First, we observe that both considered expressions eval-           1.5                                               Classes
uate to a positive integer when the numbers 4 and 3 from                                                               NUM1 - NUM2
                                                                                                                       NUM1 + NUM2
W1 are substituted for NUM1 and NUM2, respectively,                   2.0                                              input
hence both expressions are feasible to W1 . Since the verb                    2.5   2.0   1.5    1.0    0.5    0.0     0.5   1.0
mít is in the WPA-part as well as QUE-part of W1 , it con-
tributes by frequencies (1) and (2) to vector component
NUM1 + NUM2, and, analogously, by frequencies (3) and             Figure 2: A SVM trained for normalized 2-dimensional
(4) to the component NUM1 − NUM2. The adverb dohro-               vectors obtained for several word problems of NUM1 +
mady appears only in the QUE-part of W1 , hence it con-           NUM2 and NUM1 − NUM2 types. The circled samples
tributes by frequency (6) to component NUM1 + NUM2                are the support vectors. The input displayed in red repre-
(note that frequency (7) is zero, hence it does not con-          sents the normalized vector of an input word problem to
tribute to component NUM1 − NUM2). If we sum the                  be classified.
listed contributions, we obtain the vector

      {NUM1 + NUM2 : 61, NUM1 − NUM2 : 36} .                      The combined solvers sSVM+PAT and sMAX+PAT try to
                                                                  apply the pattern matching first. If it does not return a
Remark 2. It is important to notice that the proposed form        result, then sSVM and sMAX is called to complete the
of vectors suggests to use a simple classifier which only         classification.
finds a component of maximum value and returns the ex-               We used scikit-learn3 library to implement sSVM. Ra-
pression assigned to it. We will consider this strategy in        dial basis function kernel was chosen as the best perform-
the experimental part where it will be shown that it per-         ing one among tested kernels.
forms considerably worse than SVM.                                   The accuracy of the solvers can be compared in Table 2
                                                                  and Table 3. The combined solver sSVM+PAT was the
  As the testing data might have a different distribution         best solver on the testing dataset. In addition, the strategy
than the training data, we normalize the vectors before           of sMAX solver is inferior to the SVM classification.
passing them to SVM. For the SVM training stage, the
normalization modifies the training set vectors so that the                                            sSVM          sMAX
mean value is 0 and the variance is 1. The same transfor-                       WP500CZ-train          87.6%         58.0%
mation is applied to each input vector before it is classified.                 WP500CZ-test           61.1%         51.0%
  Figure 2 shows a visualization of an SVM trained for
                                                                            Table 2: The accuracy of sSVM and sMAX.
word problems of types NUM1 + NUM2 and NUM1 −
NUM2.
                                                                                                sSVM+PAT             sMAX+PAT
4.1   Experiments                                                       WP500CZ-train             86.8%                69.2%
                                                                        WP500CZ-test              74.9%                66.1%
In the next paragraphs we use the following abbreviations
to denote the proposed solvers:                                        Table 3: The accuracy of the combined solvers.

  • sSVM is SVM-based solver,                                        All instances of WP500CZ were evaluated in 3.66 [s]
                                                                  by sSVM+PAT and 3.15 [s] by sMAX+PAT (the running
  • sMAX is the classifier suggested by Remark 2,                 environment was identical to that one described in Subsec-
                                                                  tion 3.1).
  • sSVM+PAT is sSVM combined with the pattern                       Figures 3 and 4 show distributions of errors per word
    matching, and                                                 problem type for sSVM+PAT (note that only word prob-
                                                                  lem types present in the training dataset are considered).
  • sMAX+PAT is sMAX combined with the pattern
    matching.                                                        3 https://scikit-learn.org/stable/
                                                                                                                                                                                                                                                                                 correct         given word problem. UDPipe returns information on
                                                                                                                                                                                                                                                                                 wrong           word classes, sentence elements and their dependencies in
 80
                                                                                                                                                                                                                                                                                                 CONNL-U format4 . This information allows to construct
                                                                                                                                                                                                                                                                                                 a dependency tree.
                                                                                                                                                                                                                                                                                                    We apply several algorithms working over the tree to re-
 60
                                                                                                                                                                                                                                                                                                 trieve structured information on entities encoded in the as-
                                                                                                                                                                                                                                                                                                 signment. For each number N, it is extracted which entity
 40                                                                                                                                                                                                                                                                                              E1 the number refers to and which entity E2 is the owner
                                                                                                                                                                                                                                                                                                 of E1 .
                                                                                                                                                                                                                                                                                                    Let us demonstrate it for the sentence:
 20
                                                                                                                                                                                                                                                                                                      Alenka si koupila 5 čokoládových bonbónů.
                                                                                                                                                                                                                                                                                                      (Alice bought 5 chocolate candies.)
    0                                                                                                                                                                                                                                                                                              Figure 5 shows the dependency tree constructed from
         NUM1 * NUM2

                        NUM1 + NUM1 * NUM2

                                                     NUM1 + NUM1 - NUM2

                                                                                         NUM1 + NUM1 / NUM2

                                                                                                                     NUM1 + NUM2

                                                                                                                                         NUM1 + NUM2 + NUM3

                                                                                                                                                              NUM1 + NUM2 - NUM3

                                                                                                                                                                                   NUM1 - NUM2

                                                                                                                                                                                                    NUM1 / NUM2

                                                                                                                                                                                                                         NUM2 + NUM3

                                                                                                                                                                                                                                                NUM2 - NUM1

                                                                                                                                                                                                                                                                   NUM2 - NUM3

                                                                                                                                                                                                                                                                                   NUM2 / NUM1
                                                                                                                                                                                                                                                                                                 the data returned by UDPipe.


                                                                                                                                                                                                                                                                                                   <root>

Figure 3: The accuracy of sSVM+PAT on WP500CZ-train.                                                                                                                                                                                                                                                              koupila
                                                                                                                                                                                                                                                                                                                  root
                                                                                                                                                                                                                                                                                                                  VERB
                                                                                                                                                                                                                                                                                 correct
 80                                                                                                                                                                                                                                                                              wrong
                                                                                                                                                                                                                                                                                                     Alenka     si                       bonbónů    .
 70                                                                                                                                                                                                                                                                                                  nsubj      obl                      obj        punct
                                                                                                                                                                                                                                                                                                     NOUN       PRON                     NOUN       PUNCT
 60
 50                                                                                                                                                                                                                                                                                                                 5                čokoládových
                                                                                                                                                                                                                                                                                                                    nummod:gov       amod
 40                                                                                                                                                                                                                                                                                                                 NUM              ADJ
 30
                                                                                                                                                                                                                                                                                                            Figure 5: A dependency tree example.
 20
                                                                                                                                                                                                                                                                                                   The structure which is supposed to be extracted from
 10
                                                                                                                                                                                                                                                                                                 the tree is visualized in Figure 6.
    0
          NUM1 * NUM2

                                NUM1 + NUM1 - NUM2

                                                                    NUM1 + NUM1 / NUM2

                                                                                                              NUM1 + NUM2

                                                                                                                                   NUM1 + NUM2 + NUM3

                                                                                                                                                              NUM1 + NUM2 - NUM3

                                                                                                                                                                                      NUM1 - NUM2

                                                                                                                                                                                                           NUM1 / NUM2

                                                                                                                                                                                                                                       NUM2 + NUM3

                                                                                                                                                                                                                                                              NUM2 - NUM1

                                                                                                                                                                                                                                                                                  NUM2 / NUM1


                                                                                                                                                                                                                                                                                                                        Alenka (Alice)


                                                                                                                                                                                                                                                                                                                   čokoládových (chocolate)
Figure 4: The accuracy of sSVM+PAT on WP500CZ-test.                                                                                                                                                                                                                                                                   bonbónů (candies)


5       Syntactic analysis
                                                                                                                                                                                                                                                                                                                              5
The so far presented solvers worked as classifiers but they
were not suitable to be used for a step by step explana-
                                                                                                                                                                                                                                                                                                              Figure 6: Extracted information.
tion of how to derive a solution to a given word problem.
Here we give a high-level description of a method based on
syntactic analysis, which creates an internal representation                                                                                                                                                                                                                                        A given word problem is solved in three phases de-
of the input problem. Although our current implementa-                                                                                                                                                                                                                                           signed to handle a variety of assignment formulations:
tion of this method lags behind the SVM-based solver, we
                                                                                                                                                                                                                                                                                                  1. Preprocessing phase splits compound sentences and
think that there is a room for improvements and it is thus
                                                                                                                                                                                                                                                                                                     evaluates each part separately. It also replaces
worth to analyze causes of errors.
   The method uses UDPipe (with Universal Dependen-
cies 2.5 Models 2019-12-06) to process sentences of a                                                                                                                                                                                                                                               4 https://universaldependencies.org/format.html
       phrases like 3 times more than by numbers, i.e., if ap-     • The pattern extraction is a less efficient approach
       plicable, it computes intermediate results and passes         when used as a standalone solver for word problems
       them to the next phases.                                      entered in a language with flexible word order. It is
                                                                     applicable only to a portion of inputs. However, we
    2. Structure creation phase connects numbers and enti-           showed that it can be well used in a combination with
       ties. This is done by traversing the dependency tree          other solvers since a matched pattern usually results
       from nodes representing numbers. When a structure             in a correct classification. Another success is that we
       is extracted, a vocabulary of verbs that represent a          managed to implement a fully automatic pattern ex-
       subtraction is used to adjust sign of the number (i.e.,       traction.
       unlike in the previous methods, verbs semantic is
       taken into account here).                                   • A semantic-independent representation of word prob-
                                                                     lems, considering only frequencies of words, turned
    3. Evaluation phase checks the question part and
                                                                     out to be sufficient to obtain a relatively accurate clas-
       guesses which entity E the word problem queries for.
                                                                     sifier. The trained SVM performed better than the
       The answer is derived based on the extracted struc-
                                                                     straightforward sMAX classifier. This might be in-
       tures that relate to E.
                                                                     terpreted as meaning that the SVM was able to find
                                                                     nontrivial relationships in the used features vectors.
5.1    Experiments
                                                                   • We have tested the suitability of existing linguistic
To tune the method and to analyze errors encountered dur-            tools for the studied task. The method based on syn-
ing testing, we considered only a subset of WP500CZ con-             tactic analysis processed dependency trees produced
sisting of 150 word problem instances, denoted here as               by UDPipe. The method was not able to recover from
WP150CZ. The reason is that to analyze error types re-               errors occasionally done by UDPipe. Improving the
quired a manual approach, it would, therefore, be too la-            accuracy of UDPipe is thus supposed to result in a
borious to work with the entire dataset.                             higher accuracy of the method.
   The method achieved 76% accuracy on the training
part of WP150CZ (formed of 75 instances) and 67.1%               The accuracy of sSVM+PAT on WP500CZ (achieving
accuracy on the testing part. However, the performance           74.9%) was not too far from the accuracy of the state-
was much worse when the method was evaluated against             of-the-art solvers on Alg514 (achieving 70% or 82.5% by
all instances of WP500CZ (which took 23.5 [s]), solv-            methods from [9] or [6], respectively), it must, however,
ing correctly 41% of word problems. This indicates that          be said that the complexity of some instances in Alg514 is
WP500CZ is more diverse than WP150CZ and the rules               out of the scope of our methods. To make a fairer com-
designed based on WP150CZ are not general enough.                parison with the state-of-the-art would require to create an
   Table 4 shows detailed analysis of errors made by the         English version of WP500CZ, or to adapt the state-of-the-
method on WP150CZ. The analysis reveals that UDPipe              art solvers to inputs in Czech.
itself is a non-negligible cause of errors since it often re-        It must also be noted that WP500CZ is a small dataset
turns partially incorrect outputs and our method is not able     (however, as we explained in the introduction, the problem
to recover from these errors.                                    has not been studied for non-English languages by others
                                                                 and there are no publicly available datasets of word prob-
            UDPipe     preproc.    creation   eval.    total
                                                                 lems for e.g. Slavic languages). A worse performance can
    train     7           3           5         3       18       be expected if sSVM+PAT is applied to larger and more
    test      9           5           9         1       24       diverse datasets. A broad expansion of the dataset and a
                                                                 creation of multilingual versions can, therefore, be iden-
Table 4: Errors by type made by the solver on WP150CZ.
                                                                 tified as the goal of our further work. Except much more
                                                                 thorough testing, it would allow us to apply deep learning
                                                                 which is expected to improve the quality of the results.
                                                                     In addition to our future plans, we also hope that the pre-
6     Conclusion                                                 sented research can inspire others to consider non-English
                                                                 inputs for the studied problem.
Our paper is an initial study of an automatic evaluation
of math word problems in Czech. We have reviewed
the state-of-the-art methods for inputs in English and pro-
posed three solvers, based on pattern matching, SVM and
                                                                 Acknowledgment
syntactic analysis. We also contributed by a dataset of 500
annotated word problems in Czech.                                We thank anonymous reviewers whose suggestions helped
  The main findings and new ideas of our work can be             improve this paper. Our work was supported by the Czech
summarized as follows.                                           Science Foundation grant no. 19-21198S.
References
 [1] Christopher M. Bishop. Pattern Recognition and Machine
     Learning (Information Science and Statistics). Springer-
     Verlag, Berlin, Heidelberg, 2006.
 [2] Daniel G. Bobrow. Natural language input for a computer
     problem solving system. Technical report, USA, 1964.
 [3] Thomas H. Cormen, Charles E. Leiserson, Ronald L.
     Rivest, and Clifford Stein. Introduction to Algorithms,
     Third Edition. The MIT Press, 3rd edition, 2009.
 [4] Edward A. Feigenbaum and Julian Feldman. Computers
     and Thought. McGraw-Hill, Inc., USA, 1963.
 [5] Mohammad Javad Hosseini, Hannaneh Hajishirzi, Oren Et-
     zioni, and Nate Kushman. Learning to solve arithmetic
     word problems with verb categorization. In Proceedings of
     the 2014 Conference on Empirical Methods in Natural Lan-
     guage Processing (EMNLP), pages 523–533, Doha, Qatar,
     October 2014. Association for Computational Linguistics.
 [6] Danqing Huang, Jing Liu, Chin-Yew Lin, and Jian Yin.
     Neural math word problem solver with reinforcement
     learning. In Proceedings of the 27th International Confer-
     ence on Computational Linguistics, pages 213–223, Santa
     Fe, New Mexico, USA, August 2018. Association for Com-
     putational Linguistics.
 [7] Danqing Huang, Shuming Shi, Chin-Yew Lin, Jian Yin,
     and Wei-Ying Ma. How well do computers solve math
     word problems? Large-scale dataset construction and eval-
     uation. In Proceedings of the 54th Annual Meeting of
     the Association for Computational Linguistics (Volume 1:
     Long Papers), pages 887–896, Berlin, Germany, August
     2016. Association for Computational Linguistics.
 [8] Michal Křen, Václav Cvrček, Tomáš Čapka, Anna
     Čermáková, Milena Hnátková, Lucie Chlumská, Do-
     minika Kováříková, Tomáš Jelínek, Vladimír Petkevič,
     Pavel Procházka, Hana Skoumalová, Michal Škrabal,
     Petr Truneček, Pavel Vondřička, and Adrian Zasina.
     SYN2015: representative corpus of written Czech, 2015.
     LINDAT/CLARIAH-CZ digital library at the Institute of
     Formal and Applied Linguistics (ÚFAL), Faculty of Math-
     ematics and Physics, Charles University.
 [9] Nate Kushman, Yoav Artzi, Luke Zettlemoyer, and Regina
     Barzilay. Learning to automatically solve algebra word
     problems. In Proceedings of the 52nd Annual Meeting of
     the Association for Computational Linguistics (Volume 1:
     Long Papers), pages 271–281, Baltimore, Maryland, June
     2014. Association for Computational Linguistics.
[10] Marie Reischigová. Matematika na základní a obecné škole
     ve slovních úlohách. Pansofia, 1996. In Czech.
[11] Milan Straka and Jana Straková. Tokenizing, POS tagging,
     lemmatizing and parsing UD 2.0 with UDPipe. In Proceed-
     ings of the CoNLL 2017 Shared Task: Multilingual Parsing
     from Raw Text to Universal Dependencies, pages 88–99,
     Vancouver, Canada, August 2017. Association for Compu-
     tational Linguistics.
[12] Dongxiang Zhang, Lei Wang, Luming Zhang, Bing Dai,
     and Heng Shen. The gap of semantic parsing: A survey on
     automatic math word problem solvers. IEEE Transactions
     on Pattern Analysis and Machine Intelligence, PP:1–1, 04
     2019.

</pre>