<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantic Graphs for Mathematics Word Problems based on Mathematics Terminology</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rogers Jeffrey Leo John</string-name>
          <email>rl2689@columbia.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas S. McTavish</string-name>
          <email>tom.mctavish@pearson.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rebecca J. Passonneau</string-name>
          <email>becky@ccls.columbia.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Computational, Learning Systems, Columbia University</institution>
          ,
          <addr-line>New York, NY</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Center for Digital Data, Analytics &amp; Adaptive Learning</institution>
          ,
          <addr-line>Pearson, Austin, TX</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present a graph-based approach to discover and extend semantic relationships found in a mathematics curriculum to more general network structures that can illuminate relationships within the instructional material. Using words representative of a secondary level mathematics curriculum we identi ed in separate work, we constructed two similarity networks of word problems in a mathematics textbook, and used analogous random walks over the two networks to discover patterns. The two graph walks provide similar global views of problem similarity within and across chapters, but are a ected di erently by number of math words in a problem and math word frequency.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        As a means to automatically discover relationships among
learning objects and to reveal their knowledge components,
we demonstrate the use of direct similarity metrics and
random graph walks to relate exercises in a mathematics
curriculum. We rst apply a standard cosine similarity measure
between pairs of exercises, based on bag-of-word vectors
consisting of math terms that we identi ed in separate work
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Then, to extract less explicit relationships between
exercises, we randomly walk a graph using the cosine distance
as edge weights. We also recast the problem as a bipartite
graph with exercises on one side and words on the other,
providing an edge when an exercise contains the math word.
We contrast these two di erent types of random walks and
nd somewhat similar results, which lends con dence to the
analysis. The bipartite graph walks, however, are more
sensitive to di erences in word frequency. Casting measures of
similarity as graphs and performing random walks on them
a ords more nuanced ways of relating objects, which can be
used to build more granular domain models for analysis of
prerequisites, instructional design, and adaptive learning.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. RELATED WORK</title>
      <p>
        Random walks over graphs have been used extensively to
measure text similarity. Applications include similarity of
web pages [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and other documents [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], citations [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
passages [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], person names in email [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and so on. More
recently, general methods that link graph walks with external
resources like WordNet have been developed to produce a
single system that handles semantic similarity for words,
sentences or text [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Very little work compares walks over
graphs of the same content, where the graphs have di
erent structure. We create two di erent kinds of graphs for
mathematics word problem and compare the results. We
nd that the global results are very similar, which is good
evidence for the general approach, and we nd di erences in
detail that suggest further investigation could lead to
customizable methods, depending on needs.
      </p>
      <p>
        An initiative where elementary science and math tests are a
driver for arti cial intelligence has led to work on knowledge
extraction from textbooks. Berant et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] create a system
to perform domain-speci c deep semantic analysis of a 48
paragraphs from a biology textbook for question answering.
Extracted relations serve as a knowledge base against which
to answer questions, and answering a question is treated as
nding a proof. A shallow approach to knowledge extraction
from a fourth grade science curriculum is taken in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and
the knowledge base is extended through dialog with users
until a path in the knowledge network can be found that
supports a known answer. In the math domain, Kushman et
al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] generate a global representation of algebra problems
in order to solve them by extracting relations from sentences
and aligning them. Seo et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] study text and diagrams
together in order to understand the diagrams better through
textual cues. We are concerned with alignment of content
Two machines produce the same type of widget. Machine
A produces W widgets, X of which are damaged. Machine B
produces Y widgets, Z of which are damaged. The fraction of
damaged widgets for Machine A is WX or (simpli ed fraction).
The fraction of damaged widgets for Machine B is YZ or
(simpli ed fraction). Write each fraction as a decimal and a
percent. Use pencil and paper. Select a small percent that
would allow for a small number of damaged widgets. Find
the number of widgets by which each machine exceeded the
acceptable number of widgets.
across rather than within problems, and our objective is
ner-grained analysis of curricula.
      </p>
      <p>
        Other work that addresses knowledge representation from
text includes ontology learning [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which often focuses on
the acquisition of sets of facts from text [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. There has
been some work on linking lexical resources like WordNet
or FrameNet to formal ontologies [
        <xref ref-type="bibr" rid="ref13 ref17">17, 13</xref>
        ], which could
provide a foundation for reasoning over facts extracted from
text. We nd one work that applies relation mining to
elearning: Simko and Bielikova [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] apply automated relation
mining to extract relations to support e-course authoring in
the domain of teaching functional programming. Li et al.
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] apply k-means clustering to a combination of problem
features and student performance features, and propose the
clusters correspond to Knowledge Components [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. METHODS 3.1 Data</title>
      <p>We used 1800 exercises from 17 chapters of a Grade 7
mathematics curriculum. Most are word problems, as illustrated in
Figure 1. They can incorporate images, tables, and graphs,
but for our analysis, we use only the text. The
vocabulary of the resulting text consists of 3,500 distinct words.
We construct graphs where math exercises are the nodes, or
in a bipartite graph, math exercises are the left side nodes
and words are the right side nodes. Our initial focus is on
exercise similarity due to similarity of the math skills that
exercises tap into, and we use mathematics terminology as
an indirect proxy of skills a problem draws upon.</p>
    </sec>
    <sec id="sec-4">
      <title>3.2 Math Terminology</title>
      <p>The text of the word problems includes ordinary language
expressions unrelated to the mathematics curriculum, such
as the nouns machines, widgets shown in problem in
Figure 1, or the verbs produces, damaged. For our purposes,
mathematics terminology consists of words that expresses
concepts that are needed for the mathematical competence
the curriculum addresses. To identify these terms, we
developed annotation guidelines for human annotators who label
words in their contexts of use, and assessed the reliability of
annotation by these guidelines. Words can be used in the
math texts sometimes in a math sense and sometimes in a
non-math sense. Annotators were instructed to label terms
based on the most frequent usage.</p>
      <p>
        Using a chance-adjusted agreement coe cient in [
        <xref ref-type="bibr" rid="ref1">-1,1</xref>
        ] [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
reliability among three annotators was 0.81, representing
high agreement. All the non-stop words were then labeled
by a trained annotator. We developed a supervised machine
learning approach to classify vocabulary into math and
nonmath words [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] that can be applied to new mathematics
curricula. For the text used here, there were 577 math terms.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3.3 Random Walks in Graphs</title>
      <p>A random walk on a graph starts at a given node and steps
with random probability to a neighboring node. The same
random decision process is employed at this and every
subsequent node until a termination criterion is met. Each time
a node is visited, it is counted. Open random walks require
that the start node and end nodes di er. Traversal methods
may employ a bias to navigate toward or away from certain
neighbors through edge weights or other graph attributes.
In a graph, G = (V; E) with nodes V and edges E, a
random walk that begins at vx and ends at vy can be denoted as
(vx; :::; vy). By performing several random walks, the
fraction of times the node vy is visited converges to the
probability of target vy being visited given the start node vx,
which can be expressed as P (vyjvx) under the conditions of
the walk. In the case of a random walk length of 1, P (vyjvx)
will simply measure the probability of vy being selected as
an adjacent node to vx.</p>
    </sec>
    <sec id="sec-6">
      <title>3.4 Cosine Similarity Graph</title>
      <p>Math exercises are represented as bag-of-words vectors with
boolean values to indicate whether a given math term is
present. Cosine similarity quanti es the angle between the
two vectors, and is given by the dot product of two vectors.
cos(t; e) =</p>
      <p>te
ktkkek</p>
      <p>Pn
= pPn i=1 tieii=1 (ei)2
i=1 (ti)2pPn
(1)
Similarity values of 1 indicate that both the vectors are the
same whereas a value of zero indicates orthogonality between
the two vectors. Pairwise cosine similarities for all 1800
exercises were computed, yielding a cosine similarity matrix
Mcos. The matrix corresponds to a graph where non-zero
cosine similarities are edge weights between exercises.
In a graph walk, the probability that a node vy will be
reached in one step from a node vx is given by the
product of the degree centrality of vx and the normalized edge
weight (vx; vy). With each exercise as a starting node, we
performed 100,000 random walks on the cosine-similarity
graph, stepping with proportional probability to all
outgoing cosine similarity weights. To measure 2nd degrees of
separation, with each walk we made two steps.</p>
      <p>For two math vectors considered as the sets A and B, cosine
similarity can be conceptualized in terms of the intersection
set C = A [ B and set di erences A n B and B n A. Cosine
similarity is high when jCj A n B and jCj B n A.
The degree of a node a ects the probability of traversing
any edge from that node. The two factors that a ect
degree centrality of a start node are the document frequencies
of its math words, and the total number of math words.
Here, document frequency (df) is the normalized number of
exercises a word occurs in. A high df math word in a
problem increase its degree centrality because there will be more
problems it can share words with, resulting in non-zero
cosine values and therefore edges. The number of math words
in a problem also increases its degree centrality.</p>
    </sec>
    <sec id="sec-7">
      <title>3.5 Bipartite exercise and word graph</title>
      <p>The set of exercises Ve are the left-side nodes and the math
words Vw are the right-side nodes in the undirected bipartite
graph G = (Ve; Vw; E), where an edge exists between vex and
vwi if exercise x contains the math word i.</p>
      <p>We performed open random walks on this graph to measure
similarity between nodes. To measure the similarity of
exercises, we walk in even steps { a step to a connected word
followed by a step back to one of the exercises that shares
that word. The degrees of separation between vertices on
the same side of the graph (e.g. exercise-to-exercise) will be
l=2 where l is the length of the walk. In this paper, we
explored rst and second degrees of separation so our bipartite
graphs had a walk length of 4.
Because exercise nodes are connected via word nodes, we
interpret the fraction of node visits as a similarity measure
between the source node and any node visited. We performed
100,000 random walks from each node. Exercise-to-exercise
similarity can be visualzed as square matrices with source
nodes in the rows and target nodes in the columns. To
factor out the times a source may have been selected as one of
the targets, we set the diagonal of the matrix to zero. We
then normalized across the rows so that we could interpret
the distribution across the row as a probability distribution
to all other nodes for that source node.</p>
    </sec>
    <sec id="sec-8">
      <title>4. RESULTS</title>
      <p>We compare the three measures of similarity between
exercises: 1) cosine similarity, 2) random walks using cosine
similarity as edge weights, and 3) random walks along a
bipartite graph of exercises and words.</p>
    </sec>
    <sec id="sec-9">
      <title>4.1 Exercise-to-Exercise Similarity</title>
      <p>We describe exercise-to-exercise similarity with square
matrices where each exercise is represented as a row-column. A
number of features of the measures are embedded in Figure
2, which shows heatmaps of color values for pairs of exercises
in chapter 6 for each matrix. We nd that within chapters
and especially within sections of those chapters, there is a
high degree of similarity between exercises regardless of the
measure. This demonstrates that words within sections and
chapters share a common vocabulary. We can see that Mcos
has more extreme values than Mcosrw; as explained below,
it has both more zero cosine values, and more very high
values. This is most likely because Mcosrw, from doing the
walk, picks up exercises that are another degree of
separation away. When the row of the matrix is normalized to
capture the distribution of the source node, the otherwise
high values from Mcos are tempered in the Mcosrw matrix.
This shift to a large number of lower scores is shown in the
bottom panel of Figure 3. Mbp and Mcosrw are very similar,
but Mbp generally has a wider dynamic range.</p>
    </sec>
    <sec id="sec-10">
      <title>4.2 Comparison of the Graph Walks</title>
      <p>Table 1 provides summary statistics for cosine similarity and
the two random walks for all pairs of problems (N=3,250,809).
The cosine matrix is very sparse, as shown by the median
value of 0. Of the two random walk similarities, rwcos has
a lower standard deviation around the mean, but otherwise
the two random walks produce similar distributions.
The similarity values given by cosine and the cosine random
walk will increasingly di er the more that the start problem
has relatively higher degree centrality due either to more
words or higher frequency of words in exercises (df). For
reference, the word that occurs most frequently, number,
has a df of 0.42, and the second most frequent occurs in
only 15% of the exercises. Fifty eight nodes have no edges
(0 degree), the most frequent number of edges is 170, and
the maximum is 1,706. Table 2 gives the summary statistics
for df, number of math words, and degree centrality.
Inspection of the data shows that for pairs of problems in
the two chapters for our case study, if the cosine
similarity between a pair is high ( 0.75), the similarity values
for rwcos tend to go down as the number of shared word
increases from 3 to between 5 and 7. For the rwbp, the
opposite trend occurs, where the similarity goes up as the
number of words increases. This di erence helps account for
an observed divergence in the two graph walks for sections
5 and 6 of Chapter 6.</p>
      <p>Table 3 illustrates two pairs of problems from section 5 that
have high cosine similarities, and relatively higher rwbp
similarities (greater than the rw means of 0.0055) and relatively
lower rwcos (lower than the rw means). The reverse pattern
is seen for two pairs of problems from section 6 that have
high cosine similarities. These problems have higher than
average rwcos and lower than average rwbp. What di
erentiates the two pairs of problems is that the section 5
problems have a relatively large number of words in common:
14 for the rst pair, 12 for the second pair. In both pairs,
some of the words have relatively high document frequency.
As discussed above, these two properties increase the degree
centrality of the start node of a step in the rwcos graph, and
thus lower the probability of hitting each of the start node's
one-degree neighbors. This e ect propagates along the two
steps of the walk. For the rwbp graph, however, as the
number of shared math words for a pair of problems increases,
the number of paths from one to the other also increases,
thus raising the probability of the traversal. This e ect also
propagates through a two-step walk. In contrast to the
section 5 problems, the two section 6 problems have relatively
fewer words in common: 3 for both pairs.</p>
      <p>For problem pairs where the cosine similarity is between
0.40 and 0.60, the mean similarity from rwbp is 30% higher
than for rwcos for when the number of math words in
common is 3 (0.0033 vs. 0.0043), 80% higher when the number
of math words in common is 6 (0.0024 versus 0.0045), and
three as high when the number of math words in common
is 9 (0.0023 versus 0.0068). For problems pairs where the
minimum
maximum
mean
median
std. dev.</p>
      <p>df
cosine similarity is less than 0.20, the two walks produce
very similar results. The average similarity values for the
bipartite walk are about 20% higher, and the maximum
values are higher, but the two walks produce similar means,
independent of the lengths of the common word vectors, or
the total number of math words.</p>
      <p>Since we normalized the matrices across rows, which are
the source nodes, di erences between the bipartite matrix,
Mbp, and the cosine matrices implied that the degree of the
target node had a greater impact on the variability in the
bipartite matrix. To measure the impact of the edge degree
on the target nodes, we considered the column sum for those
targets that had 1 edge, those that had 2, etc. up to 20
edges. The results are summarized in Figure 4. As can be
seen, the column sum varies linearly by the number of target
edges in the bipartite matrix, whereas the cosine matrices
do not. We found the cubed root of the column sum in Mbp
approaches the distribution of column sums of the cosine
matrices, which is provided in Figure 4.</p>
    </sec>
    <sec id="sec-11">
      <title>5. CONCLUSION</title>
      <p>Visualization of the three similarity matrices shows they
reveal the same overall patterns, thus each is con rmed by
the others. However, the bipartite walk was the most
sensitive to word frequency across exercises, and the number
of words in problems. With our goal of automatically
discovering knowledge components and identifying their
relationships, the random walk that stepped in proportion to
its cosine similarity performed best. It was able to discover
second-degree relationships that seem reasonable as we
explore by eye those matches. Future work will test these
relationships with student performance data. We should nd,
for example, that if two exercises are conceptually similar,
then student outcomes should also be similar and learning
curves should reveal shared knowledge components. In this
respect, such automatically constructed knowledge graphs
can create more re ned domain models that intelligent
tutoring systems and robust assessments can be built upon.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>An</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Janssen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E. E.</given-names>
            <surname>Milios</surname>
          </string-name>
          .
          <article-title>Characterizing and mining the citation graph of the computer science literature</article-title>
          .
          <source>Knowledge and Information Systems</source>
          ,
          <volume>6</volume>
          (
          <issue>6</issue>
          ):
          <volume>664</volume>
          {
          <fpage>678</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Berant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Srikumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.-C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Vander</given-names>
            <surname>Linden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Harding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Clark</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <article-title>Modeling biological processes for reading comprehension</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pages
          <fpage>1499</fpage>
          {
          <fpage>1510</fpage>
          ,
          <string-name>
            <surname>Doha</surname>
          </string-name>
          , Qatar,
          <year>October 2014</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Frank</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hartung</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Racioppa</surname>
          </string-name>
          .
          <article-title>Ontology-based information extraction and integration from heterogeneous data sources</article-title>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Carlson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Betteridge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kisiel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Settles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Jr.</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          .
          <article-title>Toward an architecture for never-ending language learning</article-title>
          .
          <source>In Proceedings of the 24th Conference on Arti cial Intelligence (AAAI)</source>
          , volume
          <volume>2</volume>
          , pages
          <fpage>1306</fpage>
          {
          <fpage>1313</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Erkan</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Radev</surname>
          </string-name>
          . Lexrank:
          <article-title>Graph-based lexical centrality as salience in text summarization</article-title>
          .
          <source>J. Artif. Int. Res.</source>
          ,
          <volume>22</volume>
          (
          <issue>1</issue>
          ):
          <volume>457</volume>
          {
          <fpage>479</fpage>
          ,
          <string-name>
            <surname>Dec</surname>
          </string-name>
          .
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hixon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Clark</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Hajishirzi</surname>
          </string-name>
          .
          <article-title>Learning knowledge graphs for question answering through conversational dialog</article-title>
          .
          <source>In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Denver, CO, May-June
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R. J.</given-names>
            L. John, R. J.
            <surname>Passonneau</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T. S.</given-names>
            <surname>McTavish</surname>
          </string-name>
          .
          <article-title>Semantic similarity graphs of mathematics word problems: Can terminology detection help</article-title>
          ?
          <source>In Proceedings of the Eighth International Conference on Educational Data Mining</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Koedinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Corbett</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Perfetti</surname>
          </string-name>
          .
          <article-title>The knowledge-learning-instruction (KLI) framework: Toward bridging the science-practice chasm to enhance robust student learning</article-title>
          .
          <source>Cognitive Science</source>
          ,
          <volume>36</volume>
          (
          <issue>5</issue>
          ):
          <volume>757</volume>
          {
          <fpage>798</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Krippendor</surname>
          </string-name>
          .
          <article-title>Content analysis: An introduction to its methodology</article-title>
          .
          <source>Sage Publications</source>
          , Beverly Hills, CA,
          <year>1980</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Kushman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Artzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Barzilay</surname>
          </string-name>
          .
          <article-title>Learning to automatically solve algebra word problems</article-title>
          .
          <source>In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pages
          <fpage>271</fpage>
          {
          <fpage>281</fpage>
          ,
          <string-name>
            <surname>Baltimore</surname>
          </string-name>
          , Maryland,
          <year>June 2014</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. W.</given-names>
            <surname>Cohen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Koedinger</surname>
          </string-name>
          .
          <article-title>Discovering student models with a clustering algorithm using problem content</article-title>
          .
          <source>In Proceedings of the 6th International Conference on Educational Data Mining</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Minkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. W.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          .
          <article-title>Contextual search and name disambiguation in email using graphs</article-title>
          .
          <source>In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '06</source>
          , pages
          <fpage>27</fpage>
          {
          <fpage>34</fpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>I.</given-names>
            <surname>Niles</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Pease</surname>
          </string-name>
          .
          <article-title>Mapping WordNet to the SUMO ontology</article-title>
          .
          <source>In Proceedings of the IEEE International Knowledge Engineering Conference</source>
          , pages
          <volume>23</volume>
          {
          <fpage>26</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Otterbacher</surname>
          </string-name>
          , G. Erkan, and
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Radev</surname>
          </string-name>
          .
          <article-title>Biased lexrank: Passage retrieval using random walks with question-based priors</article-title>
          .
          <source>Information Processing and Management</source>
          ,
          <volume>45</volume>
          (
          <issue>1</issue>
          ):
          <volume>42</volume>
          {
          <fpage>54</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L.</given-names>
            <surname>Page</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Motwani</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Winograd</surname>
          </string-name>
          .
          <article-title>The pagerank citation ranking: Bringing order to the web</article-title>
          .
          <source>Technical Report 1999-66</source>
          ,
          <string-name>
            <surname>Stanford</surname>
            <given-names>InfoLab</given-names>
          </string-name>
          ,
          <year>November 1999</year>
          .
          <article-title>Previous number = SIDL-</article-title>
          <string-name>
            <surname>WP-</surname>
          </string-name>
          1999-0120.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>T. M. Pilehvar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Jurgens</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          .
          <article-title>Align, disambiguate and walk: A uni ed approach for measuring semantic similarity</article-title>
          .
          <source>In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pages
          <fpage>1341</fpage>
          {
          <fpage>1351</fpage>
          . Association for Computational Linguistics,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>J. Sche czyk</surname>
          </string-name>
          , A. Pease, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ellsworth</surname>
          </string-name>
          .
          <article-title>Linking FrameNet to the suggested upper merged ontology</article-title>
          . In B. Hennett and C. Fellbaum, editors,
          <source>Formal Ontology in Information Systems</source>
          , pages
          <fpage>289</fpage>
          {. IOS Press,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>M. J. Seo</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Hajishirzi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Farhadi</surname>
            , and
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Etzioni</surname>
          </string-name>
          .
          <article-title>Diagram understanding in geometry questions</article-title>
          .
          <source>In Proceedings of the 28th AAAI Conference on Arti cial Intelligence</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Simko</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Bielikova</surname>
          </string-name>
          .
          <article-title>Automatic concept relationships discovery for an adaptive e-course</article-title>
          .
          <source>In Proceedings of the Second International Conference on Educational Data Mining (EDM)</source>
          , pages
          <fpage>171</fpage>
          {
          <fpage>179</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>