<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mathematical Word Problem Solution Evaluation via Data Preprocessing Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrii D. Nikolaiev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anatoliy V. Anisimov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>4d, Glushkova str., Kyiv, 8300</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>94</fpage>
      <lpage>103</lpage>
      <abstract>
        <p>This article describes an overview of methods for processing mathematical text problems and correspondent domain datasets. It was proposed a new approach to estimate MWP solutions that could use more context around the problem. Described how method could be used for MWP similarity and for automation mathematical solutions grading. The importance of personalized learning has always been in high demand. But for quality learning it is necessary to have an expert - a person or a system for self-control. There are several ways to estimate level of knowledge, the most common is with the help of multiple-choice tests. Modern mathematical platforms are limited in the ability to accept tasks and mostly capable only to accept answers for MCQ-type questions (Multiple Choice Question), which are about to choose between only correct and incorrect answers from the proposed ones. Despite the fact, this approach provides an opportunity to objectively compare student results, it is not objective for estimation of comprehension of subject. More indicative is the concept inventories. Unfortunately, there are no significant automatic mathematical word problem solvers and estimators. It imposes high limitation for the possibilities in learning new mathematical concepts and detailed feedback loss over the learning process. Therefore, it is important to be able to accept detailed solutions over mathematical problems automatically. The problem is particularly challenging because a wide semantic gap remains between the humanreadable words and machine understandable logics. It is also hard to provide relevant feedback about the given solution. That's why we need to dive into some well-known NLP problems connected with the mathematical text interpretation. The one of them is designing an automatic solver for the mathematical word problem (MWP) which started back in 1960's, and in the last years due to scientifically increased interest to artificial intelligence (AI) became more popular. Formalization systems could also be applied for solving and understanding mathematical problems (e.g., proof assistant LEAN [1], SAD [2]). The proof assistant is a piece of software that provides a language for defining objects, specifying properties of these objects, and proving that these specifications hold. The system checks that these proofs are correct down to their logical foundation. These tools are often used to verify the correctness of programs. But they can be also used for abstract mathematics. In the formalization, all definitions are precisely specified and all proofs are virtually guaranteed to be correct. The International Mathematical Olympiad Grand Challenge was announced in 2020 [3, 4]. The challenge is about creating an AI that can win a gold medal at the IMO. To remove ambiguity of the scoring rules, teams were proposed</p>
      </abstract>
      <kwd-group>
        <kwd>1 Math word problem</kwd>
        <kwd>natural language processing</kwd>
        <kwd>machine learning</kwd>
        <kwd>artificial intelligence</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>the formal-to-formal (F2F) variant of the IMO: the AI receives a formal representation of the problem (in
the LEAN Theorem Prover), and it is required to emit a formal (i.e., machine-checkable) proof. However,
this way is less practical because of complexity of describing problem statement and check the correctness
of the given solution via such systems.</p>
      <p>The remainder of the paper is organized as follows. Paper starts with the review of the AWP solvers in
Section 2, followed by problem statement and method implementation main steps in Section 3. Since dataset
deserves special attention, it was summarized as well as the associated ideas of dataset extension in Section
4. The general algorithm of the proposed method is described in Section 5. In the section was also described
related ideas about similarity to math problems and variants of method usage for real case situations. The
paper concludes list of references in the final sections.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>MWP is the common math problem type used for building solvers. There are some categories of MWP:
Arithmetic Word Problem (AWP), Equation Set (ESS), Geometric Word Problem (GWP).</p>
      <p>We will take a closer look at AWP type problems which are due to given textual condition of the problem
is about to compose a corresponding equation and get the answer to the problem. The problem is represented
by a sequence of words 〈 0,  1, …   〉 – some of them are quantities  0,  1, …   , which are mentioned in
the text and the unknown variable  . The main goal is to present a text problem in the form of the
corresponding equation E which is linear for the case. Among the basic arithmetic operations there are only
{+, −, ×, ÷}.</p>
      <sec id="sec-2-1">
        <title>Misha found 221 seashells and 35 starfish on the beach. He gave 101 of the seashells to Katya. How many seashells does Misha now have? OUTPUT EQUATION: x = 221 + 35 – 101</title>
        <p>The different approaches which have been invested in solving math word problems can be categorized
into four main efforts: (1) statistics-based, (2) tree-based, (3) deep learning-based and rule-based methods
which were used on the early approaches which we will skip for now.</p>
        <p>Some approaches used a combination of the above categories.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2.1 Statistic-based Approach</title>
      <p>
        The statistic-based approach tries to identify the question entities, their values and the desired operators
that need to be evaluated in order to achieve the correct answer. The identifications are obtained with
common machine learning methods. Hosseini et al. 2014 suggested solving arithmetic word problems
which include addition and subtraction operations [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The problem text is split into parts where each
represents a transition between two world states. The quantities of the entities for each such transition are
updated or observed, and the predicted solution is inferred from changing the world states until reaching
the end of transitions. Solving Math Word Problems Using Encoder-Decoder Neural Network Mitra &amp;
Baral (2016) used supervised learning for specifying the formula that should be applied to generate the
appropriate equation and the relevant variables [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Liang et al. showed a similar work using log-linear
models with handcrafted feature engineering [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>2.2 The Tree-based Approach</title>
      <p>
        The tree-based approach focuses on representing the problem in hierarchical manner. The hierarchy is
represented by a unique tree named binary expression tree. An expression tree can be evaluated by applying
the operator at the root to the values obtained by recursively evaluating the left and right subtrees.
KoncelKedziorski et al. (2015) suggested a system named ALGES which generates a tree over the space of all
valid expression trees, given a math word problem with single equation formula as the answer [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. It uses
integer linear programming and maximum-likelihood estimator. Roy &amp; Roth (2016) presented the
Expression Tree method [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. It uniquely decomposes the math problem into multiple classification
problems and then composes a monotone expression tree, which defines a collection of simple prediction
problems, each determining the lowest common ancestor operation between a pair of quantities mentioned
in the problem. Roy &amp; Roth (2017) defined a new structure named Unit Dependency Graph (UNITDEP),
an annotated graph with vertices for each of the quantities appearing in the problem and edges representing
the relationship between two quantities. The graph is annotated by classifiers for node labeling and edge
properties annotating [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
    </sec>
    <sec id="sec-5">
      <title>2.3 Deep Learning-based Approach</title>
      <p>
        In the deep learning approach, MathDQN by Wang et al., (2018) is a customized form of the general
deep reinforcement learning framework. They defined actions, states and a reward function, and used a
feed-forward network [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. TheSeq2Seq model proposed by Wang et al. (2017) includes an encoder (a
GRU unit) and a decoder (an LSTM unit) producing an equation template [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. T-RNN (Wang et al., 2019)
also extends the Seq2Seq model. It encodes the quantities using Bi-LSTM and self-attention network. It
uses an RNN to construct a tree-structure template with inferred numbers as the leaf nodes and unknown
operators as the inner nodes. The tree-structure template is designed to reduce the number of template space
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Amini et al. (2019) suggested the Sequence2Program model for multiple-choice math word problems.
It uses an encoder-decoder neural network that maps word problems to a set of feasible operation programs.
The results of the executed operation program are matched against the given multiple-choice options for a
particular problem [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
    </sec>
    <sec id="sec-6">
      <title>3. Problem statement</title>
      <p>The aim of the work is to propose an algorithm that is able to maintain significant points on the given
solution for the mathematical text problem and provide the verdict.</p>
      <p>For input algorithm takes solution sample of the given problem statement. Algorithm output is to return
solution score (0 or 1) and maintain solution part which is led to the decision (provide keywords or crucial
points for qualitive estimation).
3.1</p>
    </sec>
    <sec id="sec-7">
      <title>Method implementation main steps</title>
      <p>In order to implement method, it is important to complete all the steps below:
1. Algorithm. Machine learning methods could be useful for the problem. The approach uses n-gram
extraction as the main source of comparison of solutions that could be inaccurate. At the same time, using
phrase embeddings to resolve such cases as paraphrases is promising.</p>
      <p>2. Dataset. One of the most important steps in method implementation is to create dataset of type
Problem-Solution-Result. It could help to understand the way of solutions evaluation for future work.</p>
      <p>3. Evaluation metrics. To develop evaluation metrics for a given task. This should be a vital part as
the task is non-standard and it’s not obvious how to compare different systems. The same equation could
be formulated in different ways. While for the standard MWPs F1 score is OK there is no such metrics for
estimation of detailed solutions.</p>
      <p>4. Methods leaderboard. Leaderboard is highly convenient and presentable for better comparison
over other methods in future studies. Adding scores/results of described methods and their implementation
versions should help in evaluation and estimation of provided different methods.</p>
      <p>5. Module implementation. Module needs to be connected to database so it could upload a new
solution and estimate it in correspondence to known verdicts. The module also could bring some useful
insights on how the “ideal” solution could look like. Therefore, module itself could be used as generator
for problem solutions and be used for finding similarities between the problems statement.</p>
      <p>6. Experiments and discussion part. After module implementation, we need to test proposed method
solution, evaluate it and compare with well-known approaches using NLP methods. Module could also be
tested over the real case situations as it could be used in education field for correspondent math classes in
automation processes.</p>
      <p>In this paper was provided first version of algorithm via machine learning methods, overview of dataset
and some basic ideas of how the proposed algorithm could use for MWP similarity and for real case
situations.</p>
    </sec>
    <sec id="sec-8">
      <title>Datasets</title>
      <p>For providing solution for a given type of problem we could obtain semantic part via some NLP-models.
There are bunch of datasets which is used for the providing solutions to MWP.</p>
      <p>
        For the experiments before 2016 commonly used datasets were [
        <xref ref-type="bibr" rid="ref15">15, 16</xref>
        ]:
 Alg514 (Kushman et al., 2014) The dataset is crawled from algebra.com, a crowd-sourced tutoring
website and contains only 514 linear algebra problems with 28 equation templates.
      </p>
      <p> AI2 (Hosseini et al., 2014): There are 395 single-step or multi-step arithmetic word problems for
the third, fourth, and fifth graders. It involves problems that can be solved with only addition and
subtraction. The dataset is harvested from two websites: math-aids.com and ixl.com.</p>
      <p> Dolphin1878 (Shi et al., 2015) includes 1,878 number word problems with 1183 equation
templates, obtained from algebra.com and Yahoo! answers.</p>
      <p> DRAW (Upadhyay and Chang, 2016) containing 1,000 algebra word problems from algebra.com,
each annotated with linear equations.</p>
      <p> SingleEQ (by Koncel-Kedziorski et al., 2015) The dataset contains both single-step and
multi-step arithmetic problems and is a mixture of problems from a number of sources, including
mathaids.com, k5learning.com, ixl.com and a subset of the data from AI2. Each problem involves operators of
multiplication, division, subtraction, and addition over non-negative rational numbers.</p>
      <p> Dolphin18K which contains over 18,000 annotated math word problems. It is constructed by
semiautomatically extracting problems, equation systems and answers from community question-answering
(CQA) web pages. The source data we leverage are the (question, answer, text) pairs in the math category
of Yahoo! answers.
 MAWPS is another testbed for arithmetic word problems with one unknown variable in the
question. Its objective is to compile a dataset of varying complexity from different websites. Operationally,
it combines the published word problem datasets used in AI2 and some others. There are 2,373 questions
in the harvested dataset [17].</p>
      <p>
         In 2017 Wang et al. presented the Math23K dataset. The dataset contains Chinese math word
problems for elementary school students and is crawled from multiple online education websites [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>The latest datasets are:
 Ape210K (2020): 210K Chinese elementary school-level math problems [18].</p>
      <p> In 2021 Arkil at el. used models Graph2Tree with RoBERTa, GTS with RoBERTa, LSTM Seq2Seq
with RoBERTa, Transformer with RoBERTa on the data of a presented SVAMP dataset, which contained
1000 problems with compiling elementary school level equations created by applying carefully chosen
variations over examples sampled from existing datasets [19].</p>
      <p> In 2021 Dan et al. used a GPT-3 model based on MATH dataset which consists of 12,500 from
high school math competitions which was proposed in the work and also contains solution base for each
problem [20] and ASMP pretraining corpus, which consists of Khan Academy and Mathematica data.
AMPS has over 100,000 Khan Academy problems with step-by-step solutions in LaTeX. It also contains
over 5 million problems generated using Mathematica scripts, based on 100 hand-designed modules
covering topics such as conic sections, div grad and curl, KL divergence, eigenvalues, polyhedra, and
Diophantine equations. In total AMPS contains 23GB of problems and solutions.</p>
      <p>Yet, none of datasets contains one or more alternative solutions for every problem sample. Therefore,
in order to run experiments for future method it’s essential to create Problem-Solution-Result dataset.
Hopefully, we already have first such corpuses (as ASMP) and we could expect even more big corpuses as,
for instance, in 2015 was mentioned that Baidu has collected over 950 million questions and solutions in
its database for K-21 [16]. Additionally, for our dataset we could use the information of some well-known
sites and databases where problems are stored with solutions refer to them, e.g.:</p>
      <p> Use some of fully solved shortlists of recent IMO competitions which are available on the official
IMO web-site, while the solutions of all shortlisted problems from 1959-2009 are available in The IMO
Compendium [21].</p>
      <p> Export and use anonymous data from messenger chat history which are used to process problem
solutions of refer mathematical classes as far as more schools switched to online format.</p>
    </sec>
    <sec id="sec-9">
      <title>5. Solution Preprocessing Approach</title>
    </sec>
    <sec id="sec-10">
      <title>5.1 Solutions Context for Semantic Extraction</title>
      <p>It is difficult to understand the semantics behind the problem statement only. So, in order to simplify the
task, we can use more context around the problem.</p>
      <p>For the context we could use other solutions (both incorrect and correct ones) and their estimation scores
for them.</p>
    </sec>
    <sec id="sec-11">
      <title>5.2 The Algorithm for Solution Evaluation</title>
      <p>For the given text problem  0, we have M solutions  = 〈 0,  1, …   〉, each solution could be presented
with their own sequence of words  = 〈 0,  1, …   〉 – some of them are quantities  0,  1, …   . Every
solution corresponds some binary vector of results   : 0 refer to incorrect solution and 1 – to the correct
one. We use one of mentioned models to extract vector of keywords (or phrases – together main features
for a given problem  0)  = 〈 0,  1, …   〉 from all solutions data. We train our model on correlation
between  and  . Every keyword could be positive and negative depending on the correlation with  . Then
for each value   we obtain the F1-score   1 of how often the specific keyword resulted in correct
evaluation of a given solution plus sign-factor   which is 1 in case of positive keyword and −1 – for
negative keyword. The final vector  sorted by key =  1(  ) and  form set of parameters for our evaluator
  : 〈 ,  〉.</p>
      <p>Note that it is important to give more penalties for the solutions which contains negative keywords with
high F1-scores as far as correct solutions may not include any mistakes.</p>
      <p>On the given problem set we could now generate 

 = 〈 ,  〉 for each problem.</p>
    </sec>
    <sec id="sec-12">
      <title>5.3 MWP Similarity</title>
      <p>Therefore,
we
need to
calculate
all
permutations of the
word “AAABB”
&lt;…&gt;. The answer is
SOLVER OUTPUT:  !
 !
 ! !
 ! !
↔</p>
      <sec id="sec-12-1">
        <title>SIMILAR MATH WORD PROBLEM B</title>
      </sec>
      <sec id="sec-12-2">
        <title>We have 5 letters: three letters “A” and two</title>
        <p>letters “B”. How</p>
        <p>many different 5-letter
words could be created by using all given
letters?</p>
      </sec>
      <sec id="sec-12-3">
        <title>OUTPUT SOLUTION: Imagine if we</title>
        <p>have 5 different letters and the same
question. Then the answer would be 5!.</p>
      </sec>
      <sec id="sec-12-4">
        <title>But we need not to count the repetitions.</title>
      </sec>
      <sec id="sec-12-5">
        <title>Therefore, the answer needs to be divided</title>
        <p>by number of permutations among three
letters “A” and two letters “B”, which is 3!
and 2! respectively. So, the answer is
 !
 ! !</p>
      </sec>
      <sec id="sec-12-6">
        <title>SOLVER OUTPUT:</title>
        <p>!
 ! !</p>
        <p>It could be hard to understand problem similarity via the statement. Instead, proposed to find the relation
and solutions we need to compare their  vectors plus their parameters and  .
between 
 and</p>
        <p>of the problems. Hence, to understand the similarity between two given problems</p>
        <p>Solution similarity helps with semantic problem understanding and also help in classifying the problems
into smaller classes.</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>5.4 Solutions Preprocessing for MWP</title>
      <p>As far the MWP mostly about composing appropriate equations based on the text in order to provide the
correct answer, the solutions preprocessing approach can’t be used directly.</p>
      <p>However, the solutions context could be applied for MWP. For instance, in the given example there was
statement incorrect interpretation.
An example of correctly solved arithmetic word problem</p>
      <sec id="sec-13-1">
        <title>SIMILAR MATH WORD PROBLEM</title>
      </sec>
      <sec id="sec-13-2">
        <title>Misha found 221 seashells and 35 starfish on the beach. He gave 101 of the seashells to Katia. How many seashells does Misha now have?</title>
        <p>↔</p>
      </sec>
      <sec id="sec-13-3">
        <title>Vasya count 10 red cars and 3 green cars while was walking to home. Then he saw 5 more red cars from the window. How many red cars did he saw?</title>
        <p>similarity (5.3). Over the big amount of data, for the given example it would be not difficult to see that “35
starfish” is irrelevant data because the similar solutions do not include it in solutions as far as the problem


question does not contain word “starfish”.
5.5.</p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>Method usage perspectives for real case situations</title>
      <p>Such system could be effectively applied for automated checking of school children mathematical
solutions. In order to use the system in real case situations it is also important to use some recognition
techniques and learn our method to differ where is the problem statement and solution is placed. After we
get the text, we could process it through the mentioned method and obtain evaluation results.</p>
      <p>Of course, for the given task it’s important to obtain high quality data what would be difficult in refer
to pupils. However, the practical usage of the system without no doubt is highly promising.</p>
    </sec>
    <sec id="sec-15">
      <title>6. Conclusion</title>
      <p>It was done an overview of well-known methods for MWP and correspondent domain datasets. Also, it
was proposed a new approach to estimate MWP solutions and described an idea of how module could be
used for MWP similarity. The proposed method could be used for automation mathematical solutions
grading.</p>
      <p>The last works approaches about MWP reached a high degree of accuracy [22, 23, 24], however, for
some other datasets (such as SVAMP) the majority of models can’t provide significant results. That is
because of semantic gap as natural language texts invariably assume some knowledge implicitly or
information noise. Humans know the relevant information, but a computer reasoning from texts must be
given it explicitly. Filling these information gaps is a serious challenge; representation and acquisition of
the necessary background knowledge are very hard AI problems.</p>
      <p>Therefore, there is a big necessity in having more detailed datasets of not only problems with correct
answers, but with suggested solutions part too. With the help of given algorithm using preprocessing
approach to the problem sets, our method could use more context around the problem and increase accuracy
in MWP and even more difficult (such as mathematical olympiad) problems.</p>
    </sec>
    <sec id="sec-16">
      <title>7. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] Lean system proofer</article-title>
          . URL https://leanprover.github.io/about/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Verchinine</surname>
            ,
            <given-names>Konstantin</given-names>
          </string-name>
          &amp; Lyaletski, Alexander &amp; Paskevich, Andrey &amp; Anisimov,
          <string-name>
            <surname>A..</surname>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>On Correctness of Mathematical Texts from a Logical and Practical Point of View</article-title>
          ..
          <volume>583</volume>
          -
          <fpage>598</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>IMO</given-names>
            <surname>Grand</surname>
          </string-name>
          <article-title>Challenge</article-title>
          . URL https://imo-grand-challenge.github.io/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>The</given-names>
            <surname>International Mathematical</surname>
          </string-name>
          <article-title>Olympiad (IMO)</article-title>
          . URL https://www.imo-official.org/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Hosseini</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajishirzi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Etzioni</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Kushman</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <article-title>Learning to solve arithmetic word problems with verb categorization</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pp.
          <fpage>523</fpage>
          -
          <lpage>533</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Mitra</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Baral</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>Learning to use formulas to solve simple arithmetic problems</article-title>
          .
          <source>In Proceedings of the 54thAnnual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pp.
          <fpage>2144</fpage>
          -
          <lpage>2153</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>C.-C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>Y.-S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Y.-C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Su</surname>
          </string-name>
          , K.-Y.
          <article-title>A meaning-based statistical english math word problem solver</article-title>
          .
          <source>arXiv preprint arXiv:1803.06064</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Koncel-Kedziorski</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hajishirzi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sabharwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Etzioni</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ang</surname>
            ,
            <given-names>S. D.</given-names>
          </string-name>
          <article-title>Parsing algebraic word problems into equations</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          ,
          <volume>3</volume>
          :
          <fpage>585</fpage>
          -
          <lpage>597</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Roy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Solving general arithmetic word problems</article-title>
          .
          <source>arXiv preprint arXiv:1608.01413</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Roy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Unit dependency graph and its application to arithmetic word problem solving</article-title>
          .
          <source>In Thirty-First AAAI Conference on Artificial Intelligence</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Shen</surname>
          </string-name>
          ,H. T. Mathdqn:
          <article-title>Solving arithmetic word problems via deep reinforcement learning</article-title>
          .
          <source>In Thirty-Second AAAI Conference on Artificial Intelligence</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Deep neural solver for math word problems</article-title>
          .
          <source>In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          , pp.
          <fpage>845</fpage>
          -
          <lpage>854</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Zhang, J.,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Shen</surname>
          </string-name>
          , H. T.
          <article-title>Template-based math word problem solvers with recursive neural networks</article-title>
          .
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Amini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gabriel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koncel-Kedziorski</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Hajishirzi</surname>
          </string-name>
          , H. Math qa:
          <article-title>Towards interpretable math word problem solving with operation-based formalisms</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Danqing</given-names>
            <surname>Huang Shuming Shi Chin-Yew Lin Jian Yin</surname>
          </string-name>
          Wei-Ying Ma. “
          <article-title>How well do Computers Solve Math Word Problems? Large-Scale Dataset Construction and Evaluation”. Meeting of the Association for Computational Linguistics (</article-title>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>