<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dowsing for Math Answers with Tangent-L</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yin Ki NG</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dallas J. Fraser</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Besat Kassaie</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>George Labahn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mirette S. Marzouk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frank Wm. Tompa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kevin Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>David R. Cheriton School of Computer Science, University of Waterloo</institution>
          ,
          <addr-line>Waterloo, ON</addr-line>
          ,
          <country country="CA">Canada</country>
          ,
          <addr-line>N2L 3G1</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>DigitalEd</institution>
          ,
          <addr-line>630 Weber Street North, Suite 100, Waterloo, ON</addr-line>
          ,
          <country country="CA">Canada</country>
          ,
          <addr-line>N2V 2N2</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present our application of the math-aware search engine Tangent-L to the ARQMath Community Question Answering (CQA) task. Our approach performs well, placing in the top three positions out of all 23 submissions, including the baseline runs. Tangent-L, built on the text search platform Lucene, handles math formulae by first converting a formula's Presentation MathML representation into a Symbol Layout Tree, followed by extraction of math tuples from the tree that serve as search terms. It applies BM25+ ranking to all math tuples and natural language terms in a document during searching. For the CQA task, we index all question-answer pairs in the Math Stack Exchange corpus. At query time, we first convert a topic question into a bag of formulae and keywords that serves as a formal query. We then execute the queries using Tangent-L to find the best matches. Finally, we re-rank the matches by a regression model that was trained on metadata attributes from the corpus. Our primary run produces an nDCG0 value of 0.278 and MAP0 value of 0.063, where these are two common measures of quality for ranked retrieval. However, our best performance, an nDCG0 value of 0.345 and MAP0 value of 0.139, is achieved by an alternate run without re-ranking. Follow-up experiments help to explain which aspects of our approach lead to our success.</p>
      </abstract>
      <kwd-group>
        <kwd>Community Question Answering (CQA)</kwd>
        <kwd>Mathematical Information Retrieval (MathIR)</kwd>
        <kwd>Symbol Layout Tree</kwd>
        <kwd>Lucene</kwd>
        <kwd>Mathematics Stack Exchange</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Mathematical Information Retrieval (MathIR) focuses on using mathematical
formulae and terminology to search and retrieve documents that include
mathematical content [11, 22]. MathIR is important because content expressed in
formal mathematics and formulae is often crucial and non-negligible in STEM
papers [27]. In the recent decade, the MathIR research community has been
growing and developing ever-improved math-aware search systems (e.g. [7, 10, 13,
14, 23–28, 30, 32]). Most of these efforts have been encouraged through a series
of MathIR evaluation workshops at NTCIR-10 [2], NTCIR-11 [3], and
NTCIR12 [29]. The workshops have provided corpora derived from arXiv and Wikipedia
for traditional ad-hoc retrieval tasks and formula search tasks, and the data and
tasks have since served as benchmarks for the research community.</p>
      <p>The ARQMath Lab (ARQMath) [16, 31] is the first Community Question
Answering (CQA) task with questions involving math data, using collections
from the Math Stack Exchange (MSE),3 a math question answering site. The
training corpus contains approximately 1.1 million mathematical questions and
1.4 million answers, covering MSE threads from the year 2010 to 2018. Like the
NTCIR workshops that precede it, it is centered around an evaluation exercise
that aims to advance math-aware search and the semantic analysis of
mathematical notation and texts. The main task of ARQMath is the answer retrieval task,
in which participating systems need to find answers to a set of mathematical
questions among previously posted answers on MSE. A secondary task considers
matching relevant formulae drawn from the mathematical questions from the
same collection. Participating teams were asked to submit up to five runs for
either or both tasks, and selected results received relevance assessments from
human evaluators.</p>
      <p>Related tasks to the ARQMath Lab task have been held previously: a recent
math question answering task was held as part of SemEval-2019 [12],
following CQA challenge series held at SemEval-2015 [18], SemEval-2016 [19], and
SemEval-2017 [17]. The math question answering task at SemEval-2019
considered a math question set that was derived from Math SAT practice exams. This
task was different from the ARQMath CQA task, since the data does not involve
question-answer threads from a community forum, and the task targeted
identification of one uniquely correct answer by multiple-choice selection or by
numerical computation, instead of retrieving all relevant answers from a corpus. On
the other hand, the earlier CQA challenge series at SemEval involved
questioncomment threads from the Qatar Living Forum, which is a data collection that is
similar to the MSE collection. This CQA challenge series, however, differs from
the ARQMath CQA task in that the questions are not necessarily mathematical,
and the task objective is answer-ranking instead of answer retrieval from a
corpus. Besides SemEval tasks, related tasks under the question-answering context
were also held previously at TREC, CLEF, and NTCIR ([1], [20]), but the data
involved was not drawn from the mathematical domain and the data did not
follow a community-forum structure in general.</p>
      <p>Our team of MathDowsers4 participated in the
ARQMath CQA task, with an approach based on the
Tangent-L system, a math-aware search engine
proposed by Fraser et al. [9]. Tangent-L is a traditional
math-aware query system developed after NTCIR-12,
3 https://math.stackexchange.com
4 Water loo researchers dowsing for math (Fig.1)
Fig. 1. Dowsing for math.
using the data provided for all three NTCIR math search workshops, and
appeared to be competitive with the systems participating in those workshops. We
wished to determine whether it could, in fact, perform well against other
traditional math-aware query systems and whether a traditional math-aware query
system could compete with modern machine-learning approaches that might be
adopted by other workshop participants in a question-answering task.</p>
      <p>Our experiments included five submitted runs that were designed to address
the following research objectives:
RQ1 What is an effective way to convert each mathematical question (expressed
in mathematical natural language) into a formal query consisting of keywords
and formulae?
RQ2 Should keywords or math formulae be assigned heavier weights in a query?
RQ3 What is the effect of a re-ranking algorithm that makes use of metadata?</p>
      <p>We present an overview of Tangent-L in Section 2. In Section 3, we describe
our approach to CQA and provide details on how we retrieve and rank answer
matches for a mathematical question from the MSE corpus with the use of
Tangent-L. The submitted runs and the results are discussed in Section 4. In
Section 5, we present conclusions and propose future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Overview of Tangent-L</title>
      <p>The Tangent-L system is the core component of our submission. It is a traditional
query system built on the popular Lucene [4] text search platform, adding
methods adapted from the Tangent-3 math search system [30] so that it is capable of
matching documents based on queries with keywords and formulae.</p>
      <p>Tangent-L handles formulae by converting them into a bag of “math terms”
to be matched in the same way that natural language terms are handled by
Lucene. More specifically, Tangent-L takes as input a formula in Presentation
MathML [5] format5 and converts it into a symbol layout tree (SLT) [30] where
nodes represent the math symbols and edges represent spatial relationships
between these symbols (Figure 2). Thereafter, this tree-like representation is
traversed to extract a set of features, or “math tuples,” of four types to capture
local characteristics of a math formula as depicted in Figure 3. In preparation
for search, the math tuples replace the formula itself in the document and are
then considered by Tangent-L as if each were a term in the text to be matched.</p>
      <p>After formula conversion, Tangent-L applies BM25+ ranking [15] to all the
terms in a document. Specifically, given a collection of documents D containing
jDj documents and a query q consisting of a set of query terms, the score for a
document d 2 D is given by</p>
      <p>BM25+(q; d) = X
w2q k 1:0
(k + 1)tf w;d
b + b jdj + tf w;d
d</p>
      <p>
        !
+
log
jDj + 1
jDwj
!
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
5 A formula in LATEX representation can be converted into MathML by using
LaTeXML (https://dlmf.nist.gov/LaTeXML/).
      </p>
      <p>y
%
&amp;
i
!
=
!
1
!
+
!
%
x
where k, b, and are constants (following common practice, chosen to be 1.2,
0.75, and 1, respectively); tf w;d is the number of occurrences of term w in
document d; jdj is the total number of terms in document d; d = Pd2D jjDdjj ; and
jDwj is the number of documents in D containing term w. This formula is easily
applied to a bag of query terms: if a term is repeated in the query, the
corresponding score for that term is simply accumulated multiple times. To allow for
math tuples to be given a weight that differs from natural language terms, we
assign weights to query terms as follows:</p>
      <p>
        BM25w+(qt [ qm; d) = BM25+(qt; d) +
BM 25+(qm; d)
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
where qt is the set of keywords in a query, qm is the set of math feature tuples in
that query, and is a parameter to adjust the relative weight applied to math
tuples.
      </p>
      <p>In the NTCIR-12 arXiv Main task benchmark, where queries are composed of
formulae and keywords, the Tangent-L system gives a comparable performance
to other MathIR systems [9]. We are interested in determining whether and how
Tangent-L could be adapted to address the ARQLab CQA task.</p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>After indexing the corpus, we adopt a three-phase architecture to return answer
matches for a mathematical question:
1. Conversion: Transform the input (a mathematical question posed on MSE)
into a well-formulated query consisting of a “bag” of formulae and keywords.
2. Searching : Use Tangent-L to execute the formal query to find the best
matches against the indexed corpus (MSE question-answer pairs).
3. Re-ranking : Re-order the best matches by considering additional metadata
(such as votes and tags) associated with question-answer pairs.
3.1</p>
      <p>
        Conversion: Extracting Formulae and Keywords from Questions
For the CQA task, participants are given 98 real-world mathematical questions
selected from MSE posts in 2019.6 Each mathematical question is a topic that
contains: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) the topic-ID, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) the title for this topic, (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) the question (body text)
for this topic, and (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) the tags for this topic. The title and body text are free
text fields describing a question in mathematical natural language (as opposed to
formal logic, for example), and the tags indicate the question’s academic areas.
      </p>
      <p>We adopt the following automated mechanism to extract a list of topic
formulae and keywords:
Topic formulae. Formulae within the topic’s title and body text are extracted.
All formulae within the title are selected as topic formulae. Formulae within the
body text are selected only if they are not single variables (e.g., n or i) nor
isolated numbers (e.g., 1 or 25).</p>
      <p>Topic keywords. Keyword selection is summarized in Algorithm 1. Each of the
topic’s tags is selected as one of the topic keywords. For the topic’s title and
body text, we first tokenize the text (Algorithm 2) to obtain a list of potential
word tokens. A token is then selected as a keyword if it contains a hyphen
(such as “Euler-Totient” or “Cesáro-Stolz”) (Algorithm 3) or it is not a stopword
(Algorithm 4) and its stem appears on a pre-constructed list of mathematical
stems.</p>
      <p>
        The mathematical stem list is created in a pre-processing step by
automatically extracting terms from two sources: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) tags from the indexed MSE dataset
and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) titles from English Wikipedia articles comprising the corpus for the
NTCIR-12 MathIR Wikipedia task[29]. For the former source, each tag present
in the ARQMath MSE corpus is tokenized, stemmed, and added to the stem list.
For example, two stems (“totient” and “function”) are added to the list for the
tag “totient-function.” For the latter, we first collect the HTML filenames of all
articles used in the NTCIR-12 MathIR Wikipedia corpus, where each filename
reflects the corresponding Wikipedia article’s title. The filenames are then
transformed by removing file extensions and replacing underscores and parentheses
6 In addition, three extra questions are provided as samples with annotated answers.
with spaces. Each hyphenated term is expanded to include all components as
well as the hyphenated term itself. The resulting cleaned text strings are
tokenized, stemmed, and punctuation removed, and the resulting stems are added
to the stem list. Thus, for example, four stems (“exponenti,” “logarithm,”
“distribut,” and “exponential-logarithm”) are added for the filename
Exponentiallogarithmic_distribution.html. The resulting mathematical stem list comprises
about 1200 stems from MSE tags and about 21,000 stems from Wikipedia article
titles.
      </p>
      <p>for tags in topic do</p>
      <p>Split input by ',' to obtain a list of tags;
foreach tag do
add tag to keyword list;
if hyphen in tag then
further split by '-' to obtain a list of parts;
foreach part do</p>
      <p>add part to keyword list;
for title, body text in topic do
foreach token in Tokenize(input) do
if !IsStopword(token) &amp;&amp; (IsPreserved(token) jj
OnMathList(Stem(token))) then
append to keyword list;</p>
      <sec id="sec-3-1">
        <title>Algorithm 1: ExtractKeywords(topic)</title>
      </sec>
      <sec id="sec-3-2">
        <title>Split the input by space into substrings;</title>
        <p>foreach substring do
if hyphen in substring then
add substring to the token list;</p>
      </sec>
      <sec id="sec-3-3">
        <title>Replace '-' with space throughout input;</title>
        <p>foreach token in Treebank-Tokenization(input) do
add token to the token list;</p>
      </sec>
      <sec id="sec-3-4">
        <title>Algorithm 2: Tokenize(input)</title>
        <p>return ('-' in token) jj ('–' in token)</p>
        <p>Algorithm 3: IsPreserved(token)
return (token in stopword set provided by the NLTK library) jj (token
contains only a single character) jj (token is a numeric string)</p>
        <p>Algorithm 4: IsStopword(token)</p>
        <p>When selecting keywords with this procedure, we use the Treebank tokenizer,
the Porter stemmer, and a list of English stop-words provided by the Python
NLTK library.7 Using this approach, on average 8 topic formulae and 38 topic
keywords were extracted for the CQA task (Table 1). The complete list of topic
formulae and keywords can be found in the Appendix.
We use the Tangent-L system to retrieve answers that match the extracted
keywords and formulae.</p>
        <p>Indexing. We build the indexed corpus with question-answer pairs. Each
indexed unit includes an MSE answer along with the content of its associated
question.8 For each question-answer pair, we extract the following content from
the corpus XML files:</p>
        <p>From the answer: the body text and the number of votes (from Posts.xml );
From the associated question: the title, body text, and tags of the
question (from Posts.xml ) plus comments associated with each question (from
Comments.xml ). Additionally, we also include the titles of all related and
duplicated posts for this question (from PostLinks.xml ), having first converted
all one-way links between posts into two-way links.</p>
        <p>All formulae within the text are replaced by their Presentation MathML form
using the formula mapping in the TSV files provided with the corpus data.
An HTML file containing the final extracted content is then assembled as an
indexing unit (Figure 4.). The indexable version of the MSE corpus includes a
total of 1,445,488 documents, indexed by Tangent-L in preparation for search.
Searching. Searching the corpus is a straightforward application of
TangentL for the converted topics. The list of keywords and formulae (in MathML)
are passed to the search engine. Tangent-L then converts the formulae to math
tuples and uses corresponding postings lists in the index to compute BM25+
scores (Equation 2), weighted by a value that depends on the experimental
setup for the run.
7 https://www.nltk.org/
8 As many of us were admonished in school: “You should always include the question
as part of your answer.”
&lt;body&gt;
&lt;div class="row" id="question header"&gt;</p>
        <p>&lt;h1&gt; The cow in the field problem (intersecting circular areas) &lt;/h1&gt;
&lt;div class="question"&gt;
&lt;div class="question" data questionid="#QID#" id="question"&gt;
&lt;div class="post text"&gt;
&lt;p&gt;What length of rope should be used to tie a cow to an &lt;strong&gt;exterior fence
post&lt;/strong&gt; of a &lt;em&gt;circular&lt;/em&gt; field so that the cow ....</p>
        <p>&lt;/div&gt;
&lt;/div&gt;
&lt;div class="post taglist"&gt;</p>
        <p>&lt;span class="post tag"&gt; geometry &lt;/span&gt; &lt;hr&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="row" id="answers"&gt;
&lt;div class="answer" data answerid="62"&gt; &lt;div class="post text"&gt; &lt;p&gt;So, the area
of the field is &lt;span class="math container" id=795 fid=795&gt;&lt;?xml version="1.0"
encoding="UTF 8"?&gt;&lt;math xmlns="http://www.w3.org/1998/Math/MathML" alttext=
"\pi␣r^{2}" display="block"&gt; &lt;mrow&gt; &lt;mi&gt;$\pi$&lt;/mi&gt; &lt;mo&gt;~&lt;/mo&gt; &lt;msup&gt; &lt;mi&gt;r&lt;/mi&gt;
&lt;mn&gt;2&lt;/mn&gt; &lt;/msup&gt; &lt;/mrow&gt;&lt;/math&gt;&lt;/span&gt; and you want the cow to be able to
graze an area equal to half of that.&lt;/p&gt; &lt;p&gt;All you need to do is set up the equation ....
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="row" id="question comments"&gt;
&lt;table&gt;
&lt;tr&gt;&lt;td&gt; I'm guessing that most fence posts are at the edge of a field, which
makes this a far more interesting problem. &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; ....</p>
        <p>&lt;/tr&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;div class="row" id="duplicated"&gt;
&lt;table&gt;
&lt;tr&gt;&lt;td&gt; A confusing word problem related to geometry (circles) &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;
&lt;td&gt; Is it possible to express the area of the intersection of 2 circles as a
closed form expression? &lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;div class="row" id="related"&gt;
&lt;table&gt;
&lt;tr&gt;&lt;td&gt; Find the area where dog can roam &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; A goat tied to a
corner of a rectangle &lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/body&gt;
The CQA task is made more challenging by the potential to use semantic
information relating candidate answers to topic questions. For instance, an answer
should be ranked higher in position if it answers the topic question correctly.
However, a correct answer does not necessarily have the highest score based
merely on matching keywords and formulae by similarity.</p>
        <p>As such, we are motivated to explore a re-ranking algorithm applied to the
results returned from the search engine. To this end, we wish to use information
from the complete MSE corpus to build a model that reflects how valuable a
potential answer might be. We hypothesize, with the law of simplicity in mind,
that a linear function of the following four variables might serve well:
1. similarity: The similarity of keywords and formulae between the target
answer (including its associated question) and the topic query is clearly an
important component. This is captured by the search score returned from
the Tangent-L system.
2. tags: The number of overlapping tags between the question associated with
the answer and the topic query reflects how well the answer matches the
query’s academic area(s).
3. votes: The fraction of votes received by the answer when posted with its
associated question reflects the community’s belief in the answer’s value.
4. reputation: The reputation of the author who wrote an answer implies
the trustworthiness of that answer. Intuitively, a good answer comes from
an author with a good reputation. This can be computed from the user
reputation score, the number of user up-votes, and the number of user
downvotes.</p>
        <p>The remaining problem is to determine what coefficient values to use when
linearly combining these inputs. For this we need a training set that includes
relevance assessments, which are not available as part of the corpus.
Mock Relevance Assessments. As a substitute for assessed relevance, we
can build a training set of queries from questions already in the corpus: we
hypothesize that relevant answers include those that were actually provided for
those questions as well as those provided for duplicate and related questions.</p>
        <p>We use tags, related posts, and duplicate posts of a question, and the number
of votes for each answer to calculate mock relevance assessments for answers to
the training topics, based on the following two observations:
1. Considering the question associated with a target answer, the more
“ontopic” that question is to the query, the more relevant the target answer is
likely to be for that query. A corpus question and a query are related if they
have overlapping tags, but are more related if one is marked as a related
post of the other, and still more related if it is marked as a duplicate post
of the other (or is, in fact, the same question).
2. If two potential answers are associated with the same question, the one with
more votes should be preferred.</p>
        <p>With these assumptions, an assessment value can be computed as follows:
– Integral assessment value.</p>
        <p>1. An answer gets an assessment value of 2, if the question associated with
that answer is a duplicate post to the query question, or if the answer
comes from the same question.
2. An answer gets an assessment value of 1, if the question associated with
that answer is a related post to the query question;
3. Otherwise, the answer gets an integral assessment value of 0
– Fractional assessment value.</p>
        <p>1. Assuming that all votes for answers are non-negative, an answer gets a
normalized fractional assessment value between 0 and 1, calculated as
the fraction of its votes to the sum of votes of all answers for the
associated question, if the associated question is the same or a duplicate or
related post to the query question or if the associated question contains
any overlapping tags with the query question.</p>
        <p>2. Otherwise, the answer gets a fractional assessment value of 0.</p>
        <p>The final mock relevance assessment is the sum of the integral and fractional
assessment values and, when all answers have non-negative votes, ranges from 0
to 3 (matching the assessment range expected in the CQA task):</p>
        <sec id="sec-3-4-1">
          <title>Score</title>
          <p>0
0 .. 1
1 .. 2
2 .. 3</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>Meaning</title>
          <p>irrelevant
tags overlap
posts are related
posts are identical or duplicates</p>
          <p>In fact, some answers in the corpus have negative votes (down-votes). In
this case, the fractional assessment value is adjusted as follows: for a thread
containing any answers with negative votes, the fraction’s denominator is the
absolute value of the total negative votes plus the sum of votes within a thread.
If an answer has positive votes, the fraction’s numerator is the absolute value of
the total negative votes plus the number of votes for the answer, otherwise, the
fraction’s numerator is the (negative) value of the votes. For example, suppose
a thread contains three answers, and their votes are -2, 2, 6, respectively. The
fractional assessment values for these answers are -0.2, 0.4, 0.8, respectively.
Thus the mock relevance assessment penalizes answers with negative votes.
Linear Regression Formula. To determine the coefficients for the linear
regression model, we first generate 1300 training topics from the MSE collection.
We use the Tangent-L system with = 1:0 (in Equation 2) to retrieve the top
10,000 answers for each valid topic, resulting in a total of 12,970,000 answers.
* the original answer vote
y the denominator of the fractional assessment value
Next, we generate the mock relevance assessments for these answers and
associate these assessments with the values for similarity, tags, votes, and reputation
for those answers, as discussed above. These 12,970,000 tuples then serve as
training data for the linear regression model. Example items of the training data
are shown in Table 2, and the trained coefficients are shown in Table 3.</p>
          <p>We validate the trained linear regression model by first applying the model
to predict a relevance score for the top 1000 retrieved answers of a separate set of
another 100 topics, and then re-ranking the answers according to the predicted
mock relevance score. Finally, we adopt the normalized Discounted Cumulative
Gain (nDCG), which measures the gain of a document based on its position
in the result list and its graded relevance. We compare the nDCG value of the
results for those 100 topics according to the mock relevance assessment before
and after re-ranking. This simple model gives a slight improvement in nDCG
after the re-ranking (from 0.3192 to 0.3318).
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>Description of Runs
The MathDowsers submitted a primary run and four alternate runs, each run
returning at most 1000 answer matches for all 98 topics. The primary run is
designed as a combination of our hypotheses for the best configuration for all
research objectives. The alternate runs are designed to test these hypotheses: the
setup for each of them is the same as the primary run, except for a single aspect
that is associated with a testing hypothesis, as described here and summarized
in Table 4.</p>
      <sec id="sec-4-1">
        <title>RQ1 Topic Conversion</title>
        <p>In the primary run, we use the auto-extracted topic keywords and formulae
described in Section 3.1 and listed in the Appendix as the query input to the
search engine. To compare the effectiveness of our extraction algorithm with
human understanding of the questions, an alternate run takes as input lists
of up to five topic keywords and up to two formulae per topic, all manually
chosen by the lab organizers after reading each question and made available
to all participants. This alternate run, which includes “translated” in its label,
is therefore a manual run, in contrast to all other runs that are “automatic.”</p>
      </sec>
      <sec id="sec-4-2">
        <title>RQ2 Formula Weight</title>
        <p>We test the weights of math formulae in a query by tuning Tangent-L’s
parameter (Equation 2). In the primary run, = 0:5, which means math
terms are given half the weight of keywords in a query. The reasoning behind
this choice is that each formula generates many terms, one for each feature
extracted, and previous experiments showed some benefit in reducing the
weight accordingly [9]. Two alternate runs operate with = 0:2 (even less
weight on formulae) and = 1:0 (Tangent-L’s default setting), respectively.</p>
      </sec>
      <sec id="sec-4-3">
        <title>RQ3 Re-ranking</title>
        <p>In the primary run, we re-rank the results from the search engine, using the
model described in Section 3.3. To evaluate the effectiveness of the model,
an alternate run has no re-ranking.
Indexing. Indexing is done on a Ubuntu 16.04.6 LTS server, with two Intel
Xeon E5-2699 V4 Processors (22 cores 44 threads, 2.20 GHz for each), 1024GB
RAM and 8TB disk space (on an USB3 external hard disk). The size of the
document corpus is 24.3GB.</p>
        <p>Tangent-L requires 5.0GB of storage on the hard drive and approximately 6
hours to index all documents with parallel processing.</p>
        <p>Searching and Re-ranking. Training and testing the model for re-ranking is
done on a Linux Mint 19.1 machine, with an Intel Core i5-8250U Processor (4
cores 8 threads, up to 3.40 GHz), 24GB RAM and 512GB disk space.9</p>
        <p>Model training using the Python scikit-learn library10 takes less than 30
seconds, and re-ranking for all 98 topics requires around 3 seconds per run.</p>
        <p>Searching is executed on this same Mint machine, and retrieval time statistics
for Tangent-L are reported in Table 5.
alpha05 y 13.3 0.669 (A.67) / 0.775 (A.94) 63.4 (A.76) / 48.7 (A.28)
alpha02 13.3 0.661 (A.67) / 0.850 (A.94) 59.1 (A.76) / 49.5 (A.28)
alpha10 13.1 0.616 (A.67) / 0.784 (A.83) 54.0 (A.76) / 48.7 (A.28)
alpha05-translated 5.3 0.247 (A.99) / 0.291 (A.94) 32.8 (A.11) / 25.0 (A.67)
alpha05-noReRank 13.3 0.669 (A.67) / 0.775 (A.94) 63.4 (A.76) / 48.7 (A.28)
y The run alpha05 does not, in fact, take any additional retrieval time, since it
merely re-ranks the retrieved items from the alpha05-noReRank run.
The primary measure for the task is the Normalized Discounted Cumulative
Gain (nDCG) with unjudged documents removed (thus nDCG0). Following the
organizers’ practice [31], we also measure Mean Average Precision (MAP) with
unjudged documents removed (MAP0) and Precision at top-10 matches (P@10).
Additionally, we calculate the bpref measure, which other researchers have found
useful, and the count for Unjudged Answers within the top-k retrieved answers
for each topic.
9 A NVIDIA GeForce MX150 graphics card with 2GB on-card RAM is available on
the machine, but it was not used for the experiments.
10 https://scikit-learn.org</p>
        <p>The lab organizers obtained relevance assessments for 77 of the 98 topics.
With respect to the primary measure, MathDowsers placed in the top three
positions out of all 23 submissions including the five baseline runs [31]. The retrieval
performance for the MathDowsers’ submissions is summarized in Table 6, along
with the performance of baseline systems provided by the lab organizers. One
of the five baselines is an unrealizable model and the other four are traditional
text or math-aware query systems adapted for the task. They are described by
the lab organizers as follows [31]:
Linked MSE posts: an unrealizable model “built from duplicate post links
from 2019 in the MSE collection (which were not available to participants).
This baseline returns all answer posts from 2018 or earlier that were in
threads from 2019 or earlier that MSE moderators had marked as duplicating
the question post in a topic. The posts are sorted in descending order by their
vote scores.”
Approach0: “the ECIR 2020 version of the Approach0 text + math search
engine [32], using queries manually created by the third and fourth authors.
This baseline was not available in time to contribute to the judgement pools
and thus was scored post hoc.” Formulae are represented by operator trees
(OPT) where internal nodes are operators and leaves are operands, and
formulae are matched efficiently by finding structurally identical sub-trees
with dynamic pruning.</p>
        <p>TF-IDF + Tangent-S: “a linear combination of TF-IDF and Tangent-S
results [see below]. To create this combination, first the relevance scores from
both systems were normalized between 0 and 1 using min-max
normalization, and then the two normalized scores were combined using an unweighted
average.”
TF-IDF: “a TF-IDF (term frequency–inverse document frequency) model from
the Terrier platform [21]... Formulae are represented using their LATEX string...</p>
        <p>The TF-IDF baseline used default parameters in Terrier.”
Tangent-S: like Tangent-L, an extension from the Tangent-3 math search
system [30]: “a formula search engine using SLT and OPT formula
representations [6]. One formula was selected from each Task 1 question title if possible;
if there was no formula in the title, then one formula was instead chosen from
the question’s body. If there were multiple formulae in the selected field, the
formula with the largest number of nodes in its SLT representation was
chosen... (Tangent-S) retrieves formulae independently for each
representation, and then linearly combines SLT and OPT scoring vectors for retrieved
formulae [6]. For ARQMath, we used the average weight vector from cross
validation results obtained on the NTCIR-12 formula retrieval task.”
From the top half of Table 6, the first observation is that the primary run,
with our presumed best configuration, performs better than only one of our
alternate runs. The second observation is that lowering the weight placed on math
terms ( = 0.2) improves the performance, and using the default weight ( =
1.0) hurts the performance. Thirdly, the alternate run using manually extracted
formulae and keywords performed better than the primary run. Finally, the
alternate run without re-ranking achieves the best performance in all evaluation
measures. This best submission also performs well with respect to all other
evaluation measures, with the best (but unrealizable) baseline system, Linked MSE
posts, being the only submission to perform better with respect to those other
measures.</p>
        <p>In retrospect, it appears as if all aspects of our primary run leave room for
improvement. In order to explore these observations more closely, we execute
several additional runs designed after the conclusion of the formal experiment
(post-experiment ). As detailed in Table 7, these runs examine the performance
of our system without re-ranking for additional values of and with automatic
or manual choice of keywords and formulae. We can now summarize our insights
from all of Table 6 with respect to our research objectives.</p>
        <p>The effect of re-ranking. Comparing the result from the submissions and
the post-experiment runs, we see that our re-ranking design was detrimental
to the performance (RQ3). A consistent drop of at least 0:06 in nDCG0 can
be observed for runs after re-ranking (e.g alpha02 vs. alpha02-noReRank ). A
similar deterioration can be observed in other evaluation measures as well.
The effect of . Considering runs without re-ranking, we confirm that the
performance gradually improves with the decrease of weight for math formulae
in a query. When we decrease from 1.0 to 0.1, a gain of 0.06 in nDCG0 is
achieved for runs with auto-extracted topic conversion (alpha10-noReRank vs.
alpha01-noReRank ). Similarly, a gain of 0.03 in nDCG0 is achieved for runs with
manual topic conversion (alpha10-trans-noR vs. alpha01-trans-noR). Gains are
also observed using other evaluation measures.</p>
        <p>Unfortunately, when is set to a much smaller value, namely 0.01, the overall
performance is questionable since the result of different evaluation measures is
contradictory. We observe that for the run with auto-extracted topic conversion
(alpha001-noReRank ), all measures are at their lowest compared to runs with
other choices of . However, for the run with manual topic conversion
(alpha001trans-noR), nDCG0 and P@10 are still at their lowest, but MAP0 is the
secondbest, and bpref is at its highest among all other choices of .</p>
        <p>Table 8 shows that with = 0:01 unjudged answer coverage at top-k for
alpha001-noReRank is extraordinary high (at least 80%). Similarly, among the
manual runs, alpha001-trans-noR also has by far the highest unjudged answer
coverage for various values of k across all choices of . The large percentage
of unjudged answers implies that the evaluation of these two runs might not
actually be informative.</p>
        <p>We conclude that keywords should be assigned heavier weights in a query
(RQ2). Good performance is achieved when math terms are given one-tenth of
the weight of keywords in a query ( = 0.1). We are unable to determine the
effect when the weight of math formulae is set even smaller.</p>
        <p>The effect of topic conversion. Table 6 shows that with high values (0.5
and 1.0), runs with manual topic conversion consistently perform better than
runs with auto-extracted topic conversion in all evaluation measures. However,
with lower values (0.1 and 0.2), runs with auto-extracted topic conversion
performs better for P@10. When = 0:1, which we have just concluded to be
the best setting for , the performance of the two runs with respect to nDCG0
and bpref measures becomes essentially indistinguishable. We conclude that the
proposed method to convert a topic into a query composed of keywords and
formulae is competitive with human ability to select search terms (RQ1). We
discuss this observation further in Section 5.
bpref
To gain a deeper insight into the behaviour of our best-performing submission
run (alpha05-noReRank ), we examine its performance within each topic category,
as determined by the lab organizers over three aspects. Dependency shows to
what extent a topic depends on the text, formula, or both. Topic Type gives a
broad categorization of whether a topic asks about a computation, a concept,
or a proof. Difficulty approximates how hard a topic is to answer, using three
levels of difficulty: easy, medium, and hard. The breakdown counts of the 77 test
topics by category is shown in Table 9.</p>
        <p>Table 10 tabulates the performance by category for our best-performing
submission run vs. the best-performing (but unrealizable) baseline. It shows that our
system has a weaker performance than the ideal with respect to MAP0, P@10,
and bpref measures in all categories. However, with respect to the nDCG0
measure, its better overall performance results from its performance in some
particular categories.</p>
        <p>In terms of dependency, our system has strong performance for topics that
rely heavily on formulae. We attribute this strength to Tangent-L, the
underlying MathIR system that is competitive with other state-of-the-art systems for
formula retrieval [9]. We further conclude that, in spite of our earlier
conclusion that performance improves for lower values of , setting = 0—assigning
no weight to math terms in a query—would be a poor design decision for our
system.</p>
        <p>In terms of topic type, we observe that our system is strong at
Computationtype and Proof -type topics, but is particularly weak at Concept -type topics
when compared with the other two types of topics. With further inspection, we
see that none of the Concept -type topics have a Formula-dependency as shown
in Table 11, which might be the reason why our system does not perform as well
in that category.</p>
        <p>Our system excels at all three levels of difficulty: Easy, Medium, and Hard. We
attribute this performance to the even distribution of a significant number of
topics relying on formulae (with either a Formula-dependency or Both-dependency)
among the three levels, as observed in Table 11.</p>
        <p>Similar conclusions can also be made when comparing the nDCG0 of our
best-performing submission with that of other baselines, as shown in Table 12.</p>
        <p>Finally, we observe that our system might be tuned to boost performance to
accommodate its weakness if the type of category were known in advance. For
instance, in Table 13 we observe that the overall nDCG0 value can be slightly
improved if our trained re-ranking model is applied on Concept -type topics only.
This boosting effect is only essentially distinguishable for the setting of = 1:0
and not effective for our recommended setting of = 0:1.11 This leads us to
speculate that some advantage might be gained by building a system that adapts
to the type of query being posed, instead of seeking a one-size-fits-all solution.
11 This effect might be partially attributable to the fact that the re-ranking model is
trained over answers retrieved by Tangent-L under the setting = 1:0, as described
in Section 3.3.
alpha10 ( = 1:0)
alpha05 ( = 0:5)
alpha02 ( = 0:2)
alpha01 ( = 0:1)</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>We conclude that a traditional math-aware query system remains a viable option
for addressing a CQA task specialized in the mathematical domain. In
particular, Tangent-L is a math-aware search engine well-suited to retrieve answers to
many computational and proof-like questions in the presence of math formulae.
We hypothesize that part of the success for all five MathDowsers runs results
from the search engine having indexed question-answer pairs instead of answers
only, thereby providing a context for evaluating the suitability of each answer in
serving as an answer to newly posed topic questions.</p>
      <p>Nevertheless, several of our initial experimental design decisions turn out
to be somewhat disappointing. In the remainder of this section, we share our
thoughts regarding room for improvement.</p>
      <p>Improvements for re-ranking. The first improvement is with respect to the
design and training of the re-ranking model described in Section 3.3. Several
avenues are open:
1. We realize in retrospect that we should normalize the scores returned by
Tangent-L within each topic: as for many search engines, the scores for one
query are not comparable to the scores for another query. However, we have
since found that even with proper normalization, the linear regression model
does not produce an overall valuable re-ranking.
2. The mock relevance scores used for training the model might not be
indicative of assessed relevance, therefore giving a poor model for re-ranking. For
future work, now that there are actual assessments from the ARQMath Lab,
we could use those for training and avoid the need for mock scores. This
approach could be tested by conducting some cross-validation studies.
3. Perhaps using linear regression is not appropriate for this application. An
alternate approach is to first transform (mock or actual) assessments into a set
of discrete scores, and then treat these scores as categories to be predicted by
a classification model, such as a Support Vector Machine. Other approaches
to model CQA-type features should also be investigated, including those that
have been shown to be successful in the SemEval CQA Challenge series [17–
19].</p>
      <p>Improvements for topic conversion. Another area for improvement is with
respect to the design of the auto extraction algorithm described in Section 3.1.
Although the use of our automatic extraction algorithm performs comparably
to manual topic conversion, we have not considered constraining the number of
keywords and formulae in the algorithm. With a widely fluctuating
formula-tokeyword ratio across topics, the performance of our extraction algorithm might
be hindered by using any fixed value for all queries. Preliminary investigations
have been unable to establish a correlation between the best value for and the
ratio of the number of keyword terms to the number of terms extracted from
math formulae [8], but this new benchmark might provide additional insights
into how to choose a value for “on the fly” that depends on the number of
keyword terms and math terms.</p>
      <p>Alternatively, constraining the maximum number of keywords and formulae
and the sizes of formulae extracted from a topic description might be a better
approach, since it would also constrain the number of keyword and math terms.
However this poses a new question on determining the best maximum limit. We
believe that research related to Automatic Term Extraction (ATE) in technical
domains, or in mathematical domains, might provide valuable insights into our
problem.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>We gratefully acknowledge financial support from the Waterloo-Huawei Joint
Innovation Lab and from NSERC, the Natural Science and Engineering Research
Council of Canada in support of our project entitled “Searching Documents with
Text and Mathematical Content Using a Pen Based Interface.” Prof. Gordon
Cormack kindly let us use one of his research machines for indexing the corpus.
We also thank the ARQMath Lab organizers (including, notably, Behrooz
Mansouri) for the time and effort expended on developing the idea for the Lab and
submitting the proposal to CLEF, as well as preparing the corpus, the
questions, the manual translation of the topic questions into formulae and keywords,
and the relevance assessments. Andrew Kane and anonymous reviewers from
other participating lab teams made several valuable suggestions for improving
our presentation after reading a draft version of this paper.</p>
    </sec>
    <sec id="sec-7">
      <title>Appendix</title>
      <p>Below is the list of formulas and keywords converted from the given topics using
the method described in Section 3.1. Among the 98 topics submitted for each run,
77 were (partially) assessed for the CQA task. The topics with no assessments
are A.2, A.6, A.22, A.25, A.34, A.46, A.57, A.64, A.70, A.71, A.73, A.76, A.81,
A.82, A.84, A.91, A.92, A.94, A.95, A.97 and A.100, marked with a dagger (y)
below.</p>
      <p>Topic List of formulas and keywords
A.1
[c, f (x) = xx22++2xx++cc , [ 1; 13 ], f (x) = xx22++2xx++cc , f (x), [ 1; 13 ], Finding,
value, range, rational, function, does, contain, calculate, range, rational,
function, reverse, across, this, find, value, range, does, contain, functions]
[f 0(x) = f (x + 1), ddfx = f (x + 1), Solving, differential, equations,
form, solve, differential, equations, form, ordinary, differential,
equations, ordinary-differential-equations]
[p5, 10 10, p5, 10 10, p5, f (x), [a; b], f (a), f (b), Approximation,
correct, resolve, problem, Find, approximation, correct, using, bisection,
has, placed, function, sure, go, function, Mathematica, given,
calculations, function, interval, opposite, signs, tolerance, number, iterations,
numerical, methods, algorithms, bisection, numerical-methods]
[Pn</p>
      <p>
        k=0 nk k, n2n 1, compute, this, combinatoric, sum, sum, know,
result, know, does, one, even, sum, like, this, has, binomial, coefficients,
combinatorics, number, theory, summation, proof, explanation,
numbertheory, proof-explanation]
[P ((
        <xref ref-type="bibr" rid="ref2">2</xref>
        )j(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )) = P ((
        <xref ref-type="bibr" rid="ref2">2</xref>
        )\(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )) = P ((
        <xref ref-type="bibr" rid="ref2">2</xref>
        ))P ((
        <xref ref-type="bibr" rid="ref1">1</xref>
        )) = P ((
        <xref ref-type="bibr" rid="ref2">2</xref>
        )), family, has, two, Given,
      </p>
      <p>
        P ((
        <xref ref-type="bibr" rid="ref1">1</xref>
        )) P ((
        <xref ref-type="bibr" rid="ref1">1</xref>
        ))
one, boy, probability, boys, family, has, two, Given, one, boy,
probability, boys, was, this, question, using, conditional, probability, event, first,
boy, event, second, probability, second, boy, given, first, boys, formula,
since, second, boy, does, depend, first, detailed, solution, correct,
probability, proof, verification, conditional, probability, proof-verification,
conditional-probability]
[5133 mod 8:, 5n mod 8 = 5, 5133 mod 8 = 5, calculate, mod, number,
big, exponent, find, noticed, lead, say, know, prove, prove, this, case,
find, solution, algebra, precalculus, arithmetic, algebra-precalculus]
[ 1111000 1 , 1110 1, 1110 1 = x (mod 100), 112 21 (mod 100),
(112)2 (
        <xref ref-type="bibr" rid="ref21">21</xref>
        )2 (mod 100), 114 441 (mod 100), 114 41
(mod 100), (114)2 (41)2 (mod 100), 118 1681 (mod 100), 118
81 (mod 100), 118 112 (81 21) (mod 100), 1110 1701
(mod 100) =) 1110 1 (mod 100), 1110 1 (
        <xref ref-type="bibr" rid="ref11">1 1</xref>
        ) (mod 100) =)
1110 1 0 (mod 100), x = 0, 1110 1, Finding, remainder,
using, modulus, divided, solve, term, mod, tried, mod, mod, mod, mod,
mod, mod, mod, mod, mod, mod, mod, mod, value, divisible, this,
approach, long, time, competitive, exam, math, contest, without, using,
process, determine, remainder, above, problem, very, helpful, advance,
elementary, number, theory, modular, arithmetic, divisibility,
alternative, proof, elementary-number-theory, modular-arithmetic,
alternativeproof]
[limn!1 qn (27()3nn()n!!)3 , limn!1 qn (27()3nn()n!!)3 , finding, value, Finding,
value, try, solve, help, limits]
[PnN=0 nxn, Pin=0 i2 = (n2+n)6(2n+1) , this, series, need, write, series,
form, does, involve, summation, notation, example, Does, anyone, idea,
this, multiple, ways, including, using, generating, functions, sequences,
series, sequences-and-series]
[R01 sixnax , R01 sixnax , Find, values, improper, integral, sin, converges, Find,
values, improper, integral, sin, converges, expand, using, series,
expansion, improper, integrals, improper-integrals]
A.12
A.13
A.14
A.15
A.16
A.17
[R3, u = (a; b; c), v = (d; e; f ), R2, R3, u = (a; b), v = (d; e), R2, R3,
(u; v) = ( f (u); g(v) ), R2, R3, R2, (u; v) = (2u cos v; u sin v),
cross, product, dimensions, math, book, using, states, cross, product,
two, vectors, defined, direction, resultant, determined, curling, fingers,
vector, pointing, direction, cross, product, cross, product, defined, Is,
degenerate, case, cross, product, like, this, type, determinant, instance,
parameterization, needed, calculate, examples, book, calculating,
determinate, cos, sin, multivariable, calculus, vectors, multivariable-calculus]
[(1 + ip3)1=2, Finding, roots, complex, number, was, solving, practice,
problems, this, question, It, find, roots, sketch, linear, algebra, complex,
numbers, polar, coordinates, linear-algebra, complex-numbers,
polarcoordinates]
[Rab f (x)dx + Rff((ab)) f 1(x)dx ?, Rab f (x)dx + Rff((ab)) f 1(x)dx ?, bf (b)
af (a), expression, expression, answer, answer, calculus]
[y = xy0 + 12 (y0)2, 12 y0(2x + y0) = y, x2 + y = t, Help, solving, first, order,
differential, equation, first-order, first, order, differential, equation, this,
find, way, solve, use, derivate, idea, first-order, ordinary, differential,
equations, ordinary-differential-equations]
[Pn
      </p>
      <p>i=1 ixi 1, 1+2x+3x2 +4x3 +5x4 +:::+nxn 1 +:::, x 6= 1; jxj &lt; 1, Sn,
S2 = 1 + x + x2 + x3 + x4 + :::, d(S2) = S1, jxj &lt; 1, S2, 11 xxn = 1 1 x , S1 =
dx
d(S2) = d( 1 1x ) = (1 1x)2 , Derive, sum, series, need, find, partial, sums,
dx dx
finally, sum, tried, series, source, series, sum, geometric, progression,
this, answer, sequences, series, convergence, summation, power, series,
sequences-and-series, power-series]
[R01 ln(1+x1)+lxn(1 x) dx, R01 ln(1+x1)+lxn(1 x) dx, I(a; b) =
R01 ln(1 ax1+)lxn(1+bx) dx, d2dIa(dab;b) , Finding, Calculate, My, try, Let,
compute, happy, see, ideas, order, kill, this, integral, integration,
sequences, series, definite, integrals, closed, form, sequences-and-series,
definite-integrals, closed-form]
[Rx1=0 sinx(x) , eziz , Rx1=0 sinx(x) , f (z) = eziz , = 1 + R + 2 + ,
1(t) = t; t 2 [i ; iR], R(t) = Reit; t 2 [ 2 ; 2 ], 2(t) = t; t 2 [ iR; i ],
(t) = eit; t 2 [ 2 ; 2 ], sinx(x) , R f = i , ! 0, R R f = 0, R ! 1,
Calculate, sin, function, calculate, sin, function, using, closed, path, use,
sin, even, function, has, anti, derivative, integral, closed, path,
managed, show, show, Help, complex, analysis, improper, integrals,
complexanalysis, improper-integrals]
A.18
[limn!1
, limn!1
,
9</p>
      <p>
        n
log 2 + n log , 2e , 4e , 2=e, Evaluate, Evaluate, using, Stolz, know,
n + 1
many, question, like, this, solve, using, Stolz, method, log, applied, Stolz,
log, log, answer, answer, help, Edit, On, log, log, log, log, log, log, log,
log, log, log, Cesáro-Stolz, Cesáro-Stolz, Cesáro-Stolz,, sequences, series,
sequences-and-series]
[p4 1, p4 1, 74 1, 24 3 5 2, 114 1, 24 3 5 61, 24 3 5,
p4 1, (p2 + 1)(p 1)(p + 1), 24, 16n + x, Greatest, common, factor, was,
find, greatest, common, factor, primes, First, value, has, divisors, has,
divisors, has, prove, divisible, even, integers, know, prove, divisibility,
since, check, numbers, prove, divisibility, assigning, divisibility, greatest,
common, divisor, greatest-common-divisor]
[n 2 N n f41g, (n) = 40, n 2 N, (n) = 40, , n = 41, n0s, Calculate,
looking, Euler, Totient, Function, found, one, namely, calculate,
EulerTotient, totient, function, totient-function]
[999:::9 , 999:::9 , 999:::9 x(mod100), 0 x 100, 999:::9 (nine9s) =
9a, 9a(mod100), a(mod (100)), (100) = 40, a = b(mod40),
99:::9
(eight9s) = 9b, 9b(mod40), b(mod (40)),
(40) = 16, b =
c(mod16), 999:::9 (seven9s) = 9c, 9c(mod16), c(modphi(
        <xref ref-type="bibr" rid="ref16">16</xref>
        )), (
        <xref ref-type="bibr" rid="ref16">16</xref>
        ) = 8,
c(mod8), 9 = 1(mod8), c = 1(mod8), Finding, last, two, digits, nine,
continuing, learning, modular, arithmetic, confused, this, question, Find,
last, two, digits, nine, phi, function, used, this, problem, far, this,
In, order, know, need, know, In, order, know, need, know, In, order,
know, need, know, need, find, like, made, along, way, lot, back, order,
value, last, two, anyone, help, this, number, theory, modular, arithmetic,
number-theory, modular-arithmetic]
A.22y [d; d + 1; d + 2::: = N , N , d; d + 1; d + 2:::, N = 30, Ans = 3, d1 =
4; d2 = 6; d3 = 8, d1 : 4 + 5 + 6 + 7 + 8 = 30, (P(d + n) P(d 1)),
Find, number, satisfies, challenge, cat, trip, walk, way, sum, given, many,
ways, Example, Edit, way, see, many, subsets, way, sequences, series,
number, theory, sequences-and-series, number-theory]
A.23 [27 38 52 711, 23 34 5, x = 23 34 5x, find, product, two, integers,
greatest, common, divisor, least, common, multiple, this, question, help,
solve, tried, assuming, Gcd, product, factors, prime, numbers,
greatest, common, divisor, least, common, multiple, prime-numbers,
greatestcommon-divisor, least-common-multiple]
A.24 [p2i 1?, p2i 1?, 2i 1 = (a+bi)2, a2 +2abi b2, a2 b2 = 1, 2ab =
2, a2 = b 2, b 2 b2 = 1, b4 + 1 = 1, b4 = 2, b = p42, p2i 1, Is,
this, only, way, evaluate, work, solve, way, algebra, precalculus,
algebraprecalculus]
A.29
A.30
A.31
A.25y [P (x2 + 1) = (P (x))2 + 1, P (0), P (x2 + 1) = (P (x))2 + 1, P (x),
P (0) = 0, P (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) = 1, P (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) = 2, P (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) = 5, P (
        <xref ref-type="bibr" rid="ref26">26</xref>
        ) = 26, P (677) = 677,
P (x) = x, y = P (x), y = x, P (0) = 2, P (x) = (x2 + 1)2 + 1,
P (0) = 3, limx!1 logx P (x), P (x), P (0), P (x), polynomial,
polynomial, Let, points, log, does, converge, So, values, makes, polynomial,
polynomials]
A.26 [R01 sinx x dx:, R0x sixnx dx, solve, indefinite, integral, using, Taylor, series,
trying, show, integral, convergent, sin, My, first, taylor, series, sin, next,
step, real, analysis, calculus, integration, taylor, expansion, riemann,
integration, real-analysis, taylor-expansion, riemann-integration]
A.27 [e3i =2, e i = 1, e3 i=2, (e i)3=2, (p 1)3 = i3 = i, p(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )3 = p 1 =
i, value, solving, value, know, confused, right, answer, evaluate, two,
possible, answers, this, one, correct, answer, going, complex, numbers,
exponentiation, complex-numbers]
A.28 [sin(
        <xref ref-type="bibr" rid="ref18">18</xref>
        ) = a+cpb , a + b + c, sin(
        <xref ref-type="bibr" rid="ref18">18</xref>
        ) = a+cpb , a + b + c, sin(
        <xref ref-type="bibr" rid="ref18">18</xref>
        ), xz ,
x = a + pb; z = c, y = qc2 (a + pb)2, cos(
        <xref ref-type="bibr" rid="ref18">18</xref>
        ) = yz = pc2 (ca+pb)2 ,
b = (c sin(
        <xref ref-type="bibr" rid="ref18">18</xref>
        ) a)2 = c2 sin2(
        <xref ref-type="bibr" rid="ref18">18</xref>
        ) 2ac sin(
        <xref ref-type="bibr" rid="ref18">18</xref>
        ) + a2, sin(
        <xref ref-type="bibr" rid="ref18">18</xref>
        ) = 1+4 p5 ,
A; B; C, A sin(
        <xref ref-type="bibr" rid="ref18">18</xref>
        )2 + B sin(
        <xref ref-type="bibr" rid="ref18">18</xref>
        ) + C = 0, sin(
        <xref ref-type="bibr" rid="ref18">18</xref>
        ), Ax2 + Bx + C, a =
      </p>
      <p>
        B; b = B2 4AC; c = 2A, sin(
        <xref ref-type="bibr" rid="ref18">18</xref>
        ) = ( 1 + p5)=4, sin, sin, form,
sin, right, triangle, sides, front, corner, angle, degrees, hypotenuse, find,
cos, found, sin, sin, sin, solution, says, sin, intuition, find, sin, sin, sin,
root, Totally, This, question, prove, part, solution, algebra, precalculus,
trigonometry, euclidean, geometry, contest, math, algebra-precalculus,
euclidean-geometry, contest-math]
[3 + 2i, 4i, i = p 1, 15i = 0, 15 p11 = 0 0 = 0;, 51x = 0,
5 x = 0 0 = 0:, Dividing, Complex, Numbers, Infinity, My,
PreCalcu1 1
lus, properties, limits, before, test, stated, real, number, divided, infinity,
equals, This, whether, complex, number, divided, infinity, equal, This,
completely, was, theoretical, calculation, knowing, calculated, complex,
number, since, using, properties, utilized, real, numbers, state, since,
Is, this, theoretical, calculation, correct, concept, this, algebra,
precalculus, limits, complex, numbers, infinity, algebra-precalculus,
complexnumbers]
[a3 +b3 +c3 3abc, a3 +b3 +c3 3abc, (a) 1, (b) 0, (c) 1, (d) 2, a; b, ex,
Find, binomial, theorem, Find, help, solve, this, added, It, expansion,
know, use, binomial, theorem, binomial-theorem]
      </p>
      <p>
        training data
A.32 [Empty(x) () 6 9y(y 2 x), Empty(x) () 6 9y(y 2 x), definitions,
axioms, very, elementary, definition, first, order, logical, example, say,
Define, Is, definition, axiom, call, definitional, this, one, place, predicate,
symbol, Empty, new, among, listed, primitives, say, Zermelo, has, only,
identity, membership, primitive, So, stating, definitions, effect, stating,
axioms, characterizing, primitive, definitional, axioms, complete,
reference, specified, set, symbols, Is, correct, case, why, call, axiom, state,
mean, why, say, example, Definitional, axiom, terminology, definition,
first, order, logic, axioms, first-order-logic]
A.33 [f (t; x), @@3t3f , @@3xf3 , Physical, meaning, significance, third, derivative,
function, Given, physical, quantity, represented, function, meaning, third,
derivative, physics]
A.34y [a; b &gt; 0, a " b, a ### b, a b, a #n b, n 3, a ## b = a + 1, b + 1,
a "n b, n 0, a ## b = b + 1, a # b = a + b 1, a &gt; b, a # b, a ### b,
a &gt; b, a ## b = b + 1 + ab ;, a ### b, Extending, Knuth, non,
positive, values, up-arrow/hyperoperations, non-positive, So, idea, extend,
Knuth, arrow, notation, included, zero, negative, It, normally, defined,
basically, hyperoperation, sequence, only, try, go, backwards, trivial,
extension, letting, arrows, represent, negative, arrows, why, heck,
coming, expression, does, Alternatively, way, defining, zero, arrows, does,
So, question, Is, extension, Knuth, arrow, notation, exists, Edit, this,
question, initially, was, correct, So, example, extension, modified,
question, An, extension, define, satisfies, recursive, definition, Edit, turns,
correct, this, imply, example, close, need, exception, case, does, show,
evaluating, need, defined, try, extend, instance, abuse, case, allowed, let,
finding, intractable, result, up-arrow, up-arrow, hyperoperation,
ackermann, function, ackermann-function]
A.35 [R ex2 dx, R e2xdx, does, function, antiderivative, know, this, question,
sound, why, ca, write, does, antiderivative, In, light, this, question,
sufficient, conditions, function, need, careful, examination, function, say,
does, antiderivative, way, see, function, right, say, does, antiderivative,
integration]
A.36 [:P , true, :P ! A1 ! ::: ! An ! P , :P ! P () :(:P )_P ()
P , :P , :P , :P , Proof, contradiction, status, initial, assumption, proof,
complete, First, like, say, looked, answers, specific, question, found,
existing, question, Say, need, prove, statement, method, Assuming, holds,
using, list, statements, proven, hold, derived, proof, arrive, list, proven,
statements, was, proofs, contain, lines, contradiction, proves, initial,
assumption, was, holds, initial, assumption, proven, FALSE, why, sure,
derived, holds, particular, holds, On, hand, derived, assumption, explain,
why, this, type, argument, used, logic, proof, writing, proof-writing]
A.38
A.39
A.40
A.41
[f g = g f , f 1 6= g, f 6= id 6= g, f g = g f ?, f (x) = 2x, g(x) = 3x,
f : R r f1g ! R, f (x) = 2x=(1 x), g(x) = f 1 g f = 2 +g(g1(21x2xxx) ) ;, g0,
gn+1 7! f 1 gn f , g0, Non, trivial, examples, real, valued, functions,
inverses, identity, linear, Examples, trivial, this, given, function, one,
go, example, function, commutes, this, similar, fixed, point, iteration,
defined, need, function, function, finding, fixed, point, possible, strategy,
ca, find, real-valued, real, analysis, functional, analysis, functions,
realanalysis, functional-analysis]
[a; b, q; r : a = bq + r, Choose, q : qb &lt;= a, Uses, Axiom, Choice,
first, year, maths, student, drift, years, ZFC, axioms, first, time, college,
stuff, far, nowhere, near, ZFC, terms, happened, use, axiom, choice,
time, module, even, know, name, example, proof, non, negative, integers,
exist, integers, known, restrictions, proof, like, this, largest, integer, Is,
axiom, choice, allows, this, simple, important, step, couple, questions,
name, simple, proofs, theorems, results, etc, axiom, choice, essential,
read, has, long, topic, dispute, mathematicians, even, people, accept,
alternative, axiomatic, systems, work, equally, well, without, needing,
first-year, non-negative, set, theory, set-theory]
[20182019, 20192018 , 2019 log(2018) , 2018 log(2019), log 2019 &gt;
log 2018, 20192018, know, value, logs, sides, log, log, know, log, log, does,
this, mean, one, algebra, precalculus, logarithms, algebra-precalculus]
[a1x1 +a2x2 +a3x3 +:::+anxn =, f (x+y) = f (x)+f (y), f (cx) = cf (x),
meaning, term, linear, called, linear, equation, represents, equation, line,
dimensional, So, linear, comes, word, line, higher, power, graph,
function, straight, called, linear, differential, equation, derivatives, power,
equal, similar, above, definition, linear, function, called, linear, In, this,
definition, linearity, function, does, word, linear, means, does, relate,
straight, line, Finally, does, term, linear, means, case, linear, vector,
spaces, reference, straight, line, So, whether, linear, word, used,
different, contexts, Does, different, meaning, different, situation, Or, linearity,
refers, relation, straight, line, At, Least, explain, linearity, function,
linear, vector, space, relate, equation, line, linear, algebra, linear-algebra]
[n m, Prn=1(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )(n r) nr (r)m, Confusion, find, number, onto,
functions, two, sets, given, In, book, given, two, finite, sets, containing,
elements, number, onto, functions, well, ca, combinations, combinatorics,
functions, combinations]
f0; 1; 2; 3; :::g, f = 0, R01 f 2(x) dx = 0, &gt; 0, M = supx2[0;1] jf (x)j, [0; 1],
R01 xkf (x) dx = 0, k = f0; 1; 2; 3; :::g, f = 0, [0; 1], R 1
0 xkf (x) dx = 0,
k = f0; 1; 2; 3; :::g, f = 0, f = 0a.e., Lebesgue, integrable, function,
satisfies, prove, indefinitely, differentiable, real, valued, function, satisfies,
prove, My, prove, this, assertion, prove, approximation, theorem, find,
polynomial, sup, It, need, condition, indefinitely, differentiable,
continuous, hold, My, question, Lebesgue, integrable, real, valued, function,
satisfies, prove, except, set, measure, TRUE, Riemann, integrable, real,
valued, function, satisfies, prove, except, set, measure, EDIT, prove,
prove, fourier, coefficient, equal, cos, cos, proof, complete, real, analysis,
real-analysis]
A.48
A.49
A.50
A.51
[p, 0 &lt; r &lt; p 1, q, rq 1 mod p, 0 &lt; r &lt; p 1, rq 1 mod p,
rq 1 = kp, qr kp = 1, Prp=12 r = (p 2)2(p 1) = p p 2 3 + 1 1 mod p,
(p 1)! 1 mod p, (p 1)! + 2 1 mod p, (p + 1), (p + 1)(p 1)! =
(p + 1) mod p, (p + 1)(p 1)! = 1 mod p, Prove, given, prime,
exists, mod, Prove, given, prime, exists, mod, only, taken, one, number,
theory, course, years, this, popped, computer, science, class, was,
assuming, this, proof, elementary, since, current, class, algorithm, cours,
basic, tried, look, couple, approaches, reverse, engineer, arrive,
conclusion, need, little, manipulation, looks, ca, see, sum, mod, looks, good,
know, final, Wilson, Lagrange, vaguely, this, theorem, was, looking, old,
book, was, see, arrived, prime, mod, multiplier, built, factorial,
expression, was, adding, side, mod, dead, end, sure, was, multiplying, mod,
results, mod, multiple, sure, valid, elementary, number, theory, proof,
verification, elementary-number-theory, proof-verification]
[x; y 0, (x+y)k xk +yk, k 1, x; y 0, (x+y)k xk +yk, k 2 R 1,
x x+y, xk (x+y)k, k 1, yk (x+y)k, xk +yk 2(x+y)k, x; y
0, f (k) = (x + y)k, g(k) = xk + yk, (x + y)r xr + yr, r 2 Z+, k = 1,
(x+y)r &gt; xr +yr, r 2, m 2= Z, f (m) = g(m), k = 1, (x+y)r &gt; xr +yr,
r 2, k = 1, showing, trying, show, So, far, tried, things, since, sides,
inequality, positive, inequality, hold, Adding, inequalities, this, close,
course, Alternatively, was, fix, let, let, show, using, binomial, theorem,
show, intersect, only, better, proving, statement, since, positive, integers,
real, number, property, exist, since, violate, only, continuity, positive,
integers, course, this, dependent, showing, intersect, only, running, low,
real, analysis, functions, inequality, real-analysis]
[ 2nn = Pkn=0 nk 2, (1 + x)2n = [(1 + x)n]2, xn, 2nn , Pkn=0 nk 2,
Is, simple, combinatoric, interpretation, this, identity, across, exercise,
prove, identity, exercise, It, use, prove, identity, expressions, identity,
coefficients, expansions, expressions, course, number, was, whether,
equivalent, counting, interpretation, It, clear, number, ways, half, elements,
set, this, possible, interpret, equivalently, equivalent-counting,
combinatorics, binomial, coefficients, binomial-coefficients]
[P n2+1cos n , P n2+1cos n , , A , n; j2 + cos nj 1 + , (n 2 A () 1
cos n 1 + ), A , Divergent, series, cos, Show, cos, divergent, My,
main, problem, infinitely, small, positive, real, number, define, set, cos,
cos, divergence, come, sum, idea, handle, this, real, analysis, integration,
sequences, series, analysis, real-analysis, sequences-and-series]
[Xn n + r 21r = 2n, xn, Sum, series, binomial, coefficients, Prove, try,
r
r=0
coefficient, solve, binomial, coefficients, binomial-coefficients]
A.52 [8n 2 N, 9m 2 N, m &gt; n, m, n1; n2; :::; nk 2 N, n = n1n2:::nk + 1,
n1; n2; :::; nk, a; b 2 N, Z+, a = qb + r, 0 r &lt; q, k 2 N, n1; n2; :::; nk
1, 8i, ni - n = n1n2:::nk + 1, 9n 2 N, 8m 2 N, m n, Prove, prime, two,
parts, Prove, least, divisible, numbers, Prove, truth, negation, leads, Use,
theorem, exist, unique, quotient, remainder, part, given, show, set, sure,
go, part, know, negation, prime, sure, proof, theorem, proof, writing,
prime, numbers, proof-writing, prime-numbers]
A.53 [g 2 G, gh = 1, hg = 1, AB = 1 ; BA = 1, AB = 1 ) BA = 1,
Show, one, sided, inverse, square, matrix, TRUE, inverse, one-sided,
know, group, element, does, mean, In, case, matrices, linear, maps,
vector, spaces, TRUE, This, happens, square, matrices, case, even, form,
group, multiplication, restrict, square, matrices, simple, proof, this,
avoids, chasing, entries, makes, use, simply, vector, space, structure,
linear, transformations, In, prove, this, this, imply, group, one, sided,
two, sided, inverses, has, infinite, since, finite, group, finite, dimensional,
representation, one-sided(but, two-sided), linear, algebra, group, theory,
linear-algebra, group-theory]
A.54 [P (N ) = (SjS N ), P (N ) = (SjS N ), using, diagonal,
argument, show, uncountable, tips, solutions, this, one, using, diagonal,
argument, show, uncountable, discrete, mathematics, elementary, set,
theory, discrete-mathematics, elementary-set-theory]
A.55 [ p1 1 = p 1, p 1, (p 1)2 = 1, p1 1 = q 11 = p 1:, calculation,
set, new, number, property, write, know, this, correct, result, missing,
complex, numbers, definition, complex-numbers]
A.56 [9p pisprime ! 8x(xisprime) , curious, logical, formula, involving,
prime, numbers, Let, set, natural, Is, formula, TRUE, FALSE, know,
answer, this, question, shortest, way, arrive, conclusion, using,
deduction, system, logic, first, order, logic, first-order-logic]
A.57y [Rn, f : B ! Rm, f 1, f (B), f (X), Rn, f : X ! Rm, f 1 : f (X) 7! X,
f (X), Preimage, continuous, one, one, function, connected, domain,
continuous, one-to-one, know, given, compact, subset, continuous, injective,
one, one, function, continuous, This, TRUE, know, image, connected,
subset, connected, continuous, let, connected, non, compact, subset,
continuous, injective, one, one, trying, counterexample, mapping,
continuous, parametrized, example, advance, (one-to-one), (non-compact),
(one-to-one), real, analysis, general, topology, analysis, continuity,
metric, spaces, real-analysis, general-topology, metric-spaces]
A.58 [3 arcsin 14 + arccos 1116 = 2 , 3 arcsin 41 + arccos 1161 = 2 , Prove, arcsin,
help, this, exercise, know, prove, answer, fully, Exercise, Prove, arcsin,
trigonometry]
A.59 [Pdjn (d) = n, (n), Pdjn (d) = n, n = Qkm=1 pk k , d = Qkm=1 pkk , 0
k k, Multiple, proofs, looking, multiple, proofs, statement, denotes,
Euler, totient, one, unique, factorisation, theorem, align, group,
theory, number, theory, alternative, proof, big, list, group-theory,
numbertheory, alternative-proof, big-list]
A.60 [an = 1 p12 ::: 1 pn1+1 , n 1, limn!1 an, p1 , an, (0:293)
(0:423) (0:5) (0:553) (0:622) (0:647) (0:667) ::::, (D), an, Limiting,
value, sequence, tends, infinity, Let, equals, does, exist, equals, equals,
My, Approach, particular, direction, procedure, find, value, tends, So,
tried, like, this, simple, way, substitute, values, trying, find, limiting,
value, So, value, tending, option, tried, like, this, find, value, converges,
tends, infinity, help, procedure, solve, this, question, calculus, sequences,
series, limits, products, sequences-and-series]
A.61 [i; j 2 N, n = 3i + 5j, n 8, i; j 2 N, n = 3i + 5j, n 8, n = 8 =) 8 =
3 1 + 5 1, n = 9 =) 9 = 3 3 + 5 0, n = 10 =) 10 = 3 0 + 5 2, n = h,
n = h + 1, k + 1 = 3i + 5j, exists, Prove, exists, hard, time, this, exercise,
trying, prove, induction, Basis, step, Induction, step, TRUE, TRUE,
So, know, proving, elementary, number, theory, discrete, mathematics,
induction, diophantine, equations, elementary-number-theory,
discretemathematics, diophantine-equations]
A.62 [Q, Z, jQj = jZj, Q, Z Z, jZ Zj = jZj, Prove, cardinality, set,
rational, numbers, set, integers, equal, learned, cardinality, discrete, class,
days, this, This, confusing, entirely, sure, even, full, question, Let,
denote, set, rational, numbers, denote, set, Prove, saying, element,
element, know, prove, bijection, even, prove, help, discrete, mathematics,
discrete-mathematics]
A.63 [gcd, lcm, 2, n1; n2, lcm(n1; n2) = gcdn(n1n1;2n2) , n1; n2; n3; :::; nr, gcd,
positive, integers, two, positive, integers, relationship, greatest,
common, divisor, least, common, multiple, given, gcd, set, positive,
integers, does, relationship, hold, Is, TRUE, like, this, prove, handle, proof,
explanation, greatest, common, divisor, least, common, multiple,
proofexplanation, greatest-common-divisor, least-common-multiple]
A.64y [f : [a; b] ! R, f ([a; b]) [a; b], c 2 [a; b], f (c) = c, f (a) = a, f (b) = b,
f (a) = a, f (b) = b, [a; b], f (a) &gt; a, f (b) &lt; b, f (a) &gt; a, f (b) &lt; b,
x; y 2 [a; b], f (a) = x, f (b) = y, f [a; b] = [x; y], [x; y] [a; b], [x; y], c 2
[a; b], f (c) = c, continuous, Prove, exists, point, satisfying, continuous,
Prove, exists, point, satisfying, left, show, well, assume, Since, values,
this, assuming, So, far, Assume, Let, means, Notice, Since, continuous,
exists, equal, equal, This, assume, Intermediate, Value, Theorem, real,
analysis, continuity, proof, explanation, real-analysis, proof-explanation]
A.65 [e 2 t 2 e21t2 , ; t 0, e 2 t 2 e21t2 1, ; t 0, ln, (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), t
et 12:, x ex 1, x 0, show, show, Applying, sides, yields, equivalent,
So, show, this, calculus, inequality, exponential, function,
exponentialfunction]
A.66 [x; h 2 Rd, A 2 Rd d, (xT Ah)T = hT AT x, x; h 2 Rd, A 2 Rd d,
(xT Ah)T = hT AT x, possible, possible, linear, algebra, transpose,
linearalgebra]
A.67 [k k, k l, l l, Combination, matrixes, matrix, matrix, matrix, prove,
matrix, elements, equal, know, rules, calculating, determinants, know,
this, question, calculus, determinant]
A.68 [an + 1, a + 1, n, an + 1, a + 1, 1, n 2 N, 2k + 1, a2(k+1)+1 + 1,
= a2k+3 + 1, = a3 a2k + 1, = (a3 + 1) a2k a2k + 1, a2k, an + 1, a + 1,
Prove, divisible, odd, Prove, divisible, odd, know, Since, odd, rewrite,
assume, holds, prove, holds, next, sure, Since, means, exponential, term,
even, cant, use, divisible, odd, polynomials, induction, divisibility]
A.69 [ ss + s+s 1 + ::: + ns = ns++11 , n s, 0 s n, Induction, two,
variable, parameters, So, was, assigned, this, problem, tried, professor,
explanations, My, professor, saying, statement, need, prove, formula,
correct, need, use, induction, variables, sure, help, combinatorics]
A.70y [PjN=01 cos (2j2+N1) = 0, l 2 Z, N 2 N, PjN=01 cos l (2j2+N1) = 0,
Proving, cos, Let, need, prove, cos, tried, use, Euler, formula, sum, first,
terms, geometric, serie, ideas, trigonometry, summation]
A.71y [12 + 22 + :::: + n2 = n(n+1)6(2n+1) , 12 + 22 + :::: + n2 = n(n+1)6(2n+1) ,
LHS = 12, RHS = (1+1)6(2+1) = 263 = 1, LHSp = 12 + 22 + ::: + p2,
RHSp = p(p+1)6(2p+1) , LHSp+1 = 12 + 22 + :::: + p2 + (p + 1)2,
RHSp+1 = (p+1)((p+1)+61)(2(p+1)+1) , RHSp+1 = RHSp + (p + 1)2,
RHSp+1 = p(p+1)(2p+1) + (p + 1)2, RHSp+1 = (p+1)((p+1)+1)(2(p+1)+1) ,
6 6
p(p+1)(2p+1) + (p + 1)2, Show, induction, Show, induction, My, Case,
6
Case, Case, show, this, induction, need, show, So, need, rewrite, equal,
Anyone, see, Or, solution, induction]
A.72 [X = Y, X 2 Y, X = Y, X 2 Y, Is, possible, Is, possible, set, equal, set,
set, element, set, set, theory, axioms, set-theory]
A.73y [ n0 2 + n1 2 + ::: + nn 2 = n0 2 + n1 2 + ::: + nn 2 =
(1 + x)n(1 + x)n = (1 + x)2n, xn2n,n n0, 1 + n1 x + ::: +
+ ::: + nn 2, nrxnx,r +2nn::,: x+n,nn 22nnxnnn,,,
xn, xi, xn i, xn, nn r = nr , n0 2 + n1 2
n0 2 + n1 2 + ::: + nn 2 = 2nn , xn, xn, xn, Help, proof, proof, required,
made, binomial, expose, was, forward, questions, exposing, see,
question, marks, like, this, one, points, quite, numeration, This, Prove, use,
equality, call, result, proved, finding, coefficient, terms, this, equality,
binomial, theorem, left, hand, side, this, equation, product, two, factors,
equal, factors, multiply, term, term, first, factor, has, term, second,
factor, has, coefficients, Since, summation, equal, So, left, hand, side,
equation, coefficient, expand, right, hand, side, equation, find, coefficient,
left, hand, side, equation, prove, equal, In, This, was, My, one, heck,
does, this, equation, come, know, equation, come, requested, prove,
different, equality, two, why, solution, first, equation, finding, coefficients,
one, made, see, one, three, why, coefficient, left, hand, side, equation,
made, coefficient, right, hand, side, this, equation, well, prove, original,
equation, one, was, prove, first, place, know, many, guys, help, long,
post, long, (?-n), (?-1), (?-2)., left-hand, right-hand, (?-3), left-hand,
(?-1), (?-2), (?-3), right-hand, discrete, mathematics, binomial,
coefficients, binomial, theorem, discrete-mathematics, binomial-coefficients,
binomial-theorem]
A.79
[
      </p>
      <p>1
A.74 [f : (0; 1) ! R, f (x) = x + , [2; 1), f : (0; 1) ! R, f (x) = x +
x
1</p>
      <p>
        , [2; 1), x = 1, f (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) = 2, [2; 1), Show, image, function, interval,
x
Show, image, function, interval, So, show, function, interval, functions,
elementary, set, theory, elementary-set-theory]
A.75 [m, limu!1 uemu = 0, limu!1 uemu = 0, eu, &gt;, (umm++11)! , Prove, integer,
show, integer, Looking, solutions, sure, this, logical, step, real, analysis,
calculus, limits, real-analysis]
A.76y [Z, a + bZ = fz 2 Z j z = a + bkforsomek 2 Zg, a 2 Z, b 2 Z f0g,
fai + biZ j i 2 Ng, Si2N(ai + biZ) = Z, I N, Si2I (ai + biZ) = Z,
fpkg = f2; 3; 5; : : : g, `1; `2, `1 = `2 = 5, f0 + pkj g, j = 1; : : : ; n,
p &gt; maxfpk1 ; : : : ; pkn g, p 2= S1 j n(0 + pkj Z), p 2= ( 1 + 5Z) [ (1 + 5Z),
5 - p 1, 5 - p + 1, Covering, arithmetic, progressions, solving, problems,
old, exam, topology, translated, problem, algebraic, terms, problem, Let,
collection, sets, satisfying, Show, whether, always, possible, extract,
finite, lot, elementary, algebra, Let, set, construct, non, negative, integers,
instance, pick, finite, sub, collection, assume, prime, max, run, This,
course, possible, prime, last, this, approach, means, prove, infinitely,
many, primes, ending, like, thing, prove, simple, problem, like, Surely,
way, solving, this, EDIT, interested, solution, topology, whether,
solution, like, solution, works, non-negative, sub-collection, general,
topology, elementary, number, theory, general-topology,
elementary-numbertheory]
A.77 [(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) = 1, (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) = 1, 1 1 = 1, 1 x = x, Show, relation,
consequence, distributive, law, Show, relation, consequence, distributive,
This, question, first, problem, Theory, Andre, point, tried, using, help,
work, elementary, number, theory, elementary-number-theory]
A.78 training data
      </p>
      <p>e ixu 1
A.80</p>
      <p>
        jxj, u 6= 0, Inequality, complex, exponential, Rudin,
u
Real, Complex, Analysis, uses, this, proof, near, chapter, real, Why, this,
TRUE, Edit, real, inequality, exponential, function, fourier, transform,
exponential-function, fourier-transform]
[N, ;; f1g; f2g; f1; 2g; f3g; f1; 3g; f2; 3g; f1; 2; 3g; f4g; : : :, f1g, f2g, [n] =
f1; 2; : : : ; ng, [n], 2n, Why, does, this, proof, set, finite, subsets,
countable, set, work, set, subsets, found, this, proof, thread, found, simple,
answers, sort, formula, like, trying, way, see, set, finite, subsets,
countable, probably, list, elements, set, one, coming, first, next, shows, set,
pattern, see, least, In, words, first, comes, comes, time, new, integer, list,
subsets, contain, ones, contain, showed, subsets, show, first, elements,
this, applies, finite, subsets, cant, why, apply, set, subsets, continue, this,
scheme, assume, way, infinity, quite, help, analysis, elementary, set,
theory, proof, explanation, elementary-set-theory, proof-explanation]
A.81y [n 2 N, f1; ; ng, M 6= ;, x 2 M , f (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), f (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) := x, f1g, f1; ; ng,
M ff (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ); ; f (n)g, x 2 M ff (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ); ; f (n)g, g(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ); ; g(n + 1),
g(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) := f (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ); ; g(n) := f (n), g(n + 1) := x, f1; ; n + 1g, n1, h(n1),
f1; ; n1g, g(n1), h(n1), (1; g(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )); ; (n1; g(n1)), h(n2), n2 n1,
g(n2), g(n2), h(n2), h(n3), n3 &gt; n1, (n1 + 1; g(n1 + 1)); ; (n3; g(n3)),
g(n3), h(n3), h(n), n 2 N, h : N ! M , h : N ! M , h : N ! M ,
infinite, set, contains, countable, Why, proof, Axiom, Choice, Let, infinite,
Proposition, exists, injection, Since, exists, Define, injection, exists,
injection, Since, infinite, set, empty, So, exists, Define, injection, Let,
arbitrary, natural, calculate, injection, Proposition, return, value, pairs,
calculate, search, database, value, database, return, value, calculate, pairs,
database, Proposition, return, value, calculate, above, injection, Why,
proof, man, know, injection, man, know, injection, elementary, set,
theory, axiom, choice, elementary-set-theory, axiom-of-choice]
      </p>
      <p>Z 2
A.82y [f (x), [0; 2 ], f (x), [0; 2 ], A =
f (x)dx, g(x), f (x) = g(x) cos(x),
f (x), A =</p>
      <p>
        Z 2
g(x) cos(x)dx, t = sin(x), dt = cos(x)dx, x = arcsin(t),
t = sin(x), sin(0) = 0, 2 , sin(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) = 0, A =
g(arcsin(t))dt, A = 0,
0
definite, integrals, evaluate, using, periodic, functions, know, reasoning,
know, went, discuss, this, Maths, even, find, Let, assuming, function,
continuous, has, antiderivative, interval, Let, area, curve, interval,
exist, function, cos, Substituting, value, cos, Using, substitution, Let, sin,
cos, arcsin, Changing, limits, sin, sin, sin, Substituting, definite,
integral, arcsin, Definite, Integral, lower, upper, bounds, So, help, calculus,
integration, trigonometry, definite, integrals, inverse, function,
definiteintegrals, inverse-function]
A.83 [an M , Is, sequence, sums, inverse, natural, numbers, bounded,
reading, Infinite, Sequences, right, trivial, matter, determine,
boundedness, mind, sequence, before, learn, sequence, know, sequence,
bounded, above, number, calculus, sequences, series, harmonic,
numbers, sequences-and-series, harmonic-numbers]
A.84y [4; x, Z[x], I =&lt;; p; x &gt;, Z[x], I =&lt; p; x &gt;, Z[x], 4; x, Z[x], Is, ideal,
generated, principal, ideal, principal, ideal, My, question, Is, principal,
ideal, prime, ideal, generated, principal, ideal, abstract, algebra, ring,
theory, ideals, principal, ideal, domains, abstract-algebra, ring-theory,
principal-ideal-domains]
[N , (+1), 1=2, 1=2, pn = 21 pn 1, pn, pN = 1, pN 1 = 2, p0 = 2N ,
Expected, number, steps, bug, reach, position, bug, time, position, At,
step, bug, moves, right, step, probability, returns, origin, probability,
expected, number, steps, this, bug, reach, position, tried, first, find,
possibility, this, bug, reaches, number, steps, recurrence, equation, find,
possibility, bug, position, reach, boundary, condition, see, does, make, sense,
sort, value, probability, first, number, expected, steps, sure, recurrence,
equation, markov, chains, random, walk, markov-chains, random-walk]
0 n 1
[Pkn=0 k @ A = O 2n log3 n ?, Is, TRUE, log, Problem, Is, TRUE, log,
k
My, solution, this, upper, bound, way, large, ca, find, solution,
combinatorics, elementary, number, theory, discrete, mathematics,
elementarynumber-theory, discrete-mathematics]
A.87 [8n 2 N : (Pin=1 ai)(Pin=1 a1i ) n%, ai, 8i 2 N : ai 2 R+, n = 1,
n = 2, n = 3, a; b 2 R+, ab = 1, a + b 2, n = 3, a; b; c 2 R+,
ab + ab 2, ac + ac 2, cb + cb 2, P (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), P (n), Is, TRUE, positive,
TRUE, prove, this, holds, using, lemma, Lemma, Let, example, case,
proven, like, this, Let, lemma, sure, generalized, version, natural, ca,
come, counterexample, try, prove, induction, Let, Base, case, Inductive,
hypothesis, assume, Inductive, step, know, Is, this, inequality, TRUE,
prove, anyone, show, counterexample, algebra, precalculus, inequality,
algebra-precalculus]
A.88 [x4 + 10x2 + 1, Z[x], x4 + 10x2 + 1, Z[x], Is, polynomial, reducible,
Is, polynomial, reducible, abstract, algebra, ring, theory, field,
theory, irreducible, polynomials, abstract-algebra, ring-theory, field-theory,
irreducible-polynomials]
A.89 [A2 + B2 = C2 + D2, A; B; C; D, Parametrization, pythagorean, like,
equation, pythagorean-like, Is, known, complete, parametrization,
Diophantine, equation, positive, rational, numbers, equivalently, integers,
number, theory, diophantine, equations, number-theory,
diophantineequations]
A.90 [n n, n n, A 1, A 1A = In ^ AA 1 = In (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), In, n n, A 1A = In,
AA 1 6= In, Question, definition, Inverse, matrix, definition, matrix,
inverse, matrix, property, identity, cases, way, around, making, FALSE,
statement, linear, algebra, matrices, linear-algebra]
A.91y [R ! R, x2, Continuous, function, reaches, value, range, times, Is,
continuous, function, reaches, possible, values, value, range, times, example,
perfect, was, question, almost, certain, functions, knows, bunch, real,
analysis, calculus, real-analysis]
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abacha</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agichtein</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pinter</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demner-Fushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Overview of the medical question answering task at TREC 2017 LiveQA</article-title>
          . In: Voorhees,
          <string-name>
            <given-names>E.M.</given-names>
            ,
            <surname>Ellis</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the 26th Text REtrieval Conference</source>
          ,
          <string-name>
            <surname>TREC</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>NIST Special Publication</article-title>
          , vol.
          <volume>500</volume>
          -
          <fpage>324</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Aizawa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kohlhase</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ounis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>NTCIR-10 Math Pilot task overview</article-title>
          .
          <source>In: Proceedings of the 10th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-10)</source>
          . pp.
          <fpage>654</fpage>
          -
          <lpage>661</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Aizawa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kohlhase</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ounis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schubotz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : NTCIR-11 Math-
          <article-title>2 task overview</article-title>
          .
          <source>In: Proceedings of the 11th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-11)</source>
          . pp.
          <fpage>88</fpage>
          -
          <lpage>98</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Białecki</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muir</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ingersoll</surname>
          </string-name>
          , G.:
          <article-title>Apache Lucene 4</article-title>
          . In: SIGIR 2012 Workshop on Open Source Information Retrieval. pp.
          <fpage>17</fpage>
          -
          <lpage>24</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Carlisle</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ion</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miner</surname>
          </string-name>
          , R.:
          <article-title>Mathematical markup language (MathML) version 3.0 2nd edition</article-title>
          .
          <source>W3C recommendation (Apr</source>
          <year>2014</year>
          ), http://www.w3.org/TR/2014/REC-MathML3-20140410/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Davila</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zanibbi</surname>
          </string-name>
          , R.:
          <article-title>Layout and Semantics: Combining Representations for Mathematical Formula Search</article-title>
          . ACM Reference (
          <year>2017</year>
          ). https://doi.org/10.1145/3077136.3080748
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Davila</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zanibbi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kane</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tompa</surname>
            ,
            <given-names>F.W.</given-names>
          </string-name>
          :
          <article-title>Tangent-3 at the NTCIR-12 MathIR task</article-title>
          .
          <source>In: Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-12)</source>
          . pp.
          <fpage>338</fpage>
          -
          <lpage>345</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Fraser</surname>
          </string-name>
          , D.J.:
          <article-title>Math Information Retrieval using a Text Search Engine</article-title>
          .
          <source>Master's thesis</source>
          , Cheriton School of Computer Science, University of Waterloo (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Fraser</surname>
            ,
            <given-names>D.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kane</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tompa</surname>
            ,
            <given-names>F.W.</given-names>
          </string-name>
          :
          <article-title>Choosing math features for BM25 ranking with Tangent-L</article-title>
          .
          <source>In: Proceedings of the 18th ACM Symposium on Document Engineering (DocEng</source>
          <year>2018</year>
          ). pp.
          <volume>17</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          :
          <fpage>10</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>The math retrieval system of ICST for NTCIR-12 mathir task</article-title>
          .
          <source>In: Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-12)</source>
          . pp.
          <fpage>318</fpage>
          -
          <lpage>322</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Guidi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sacerdoti Coen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A survey on retrieval of mathematical knowledge</article-title>
          .
          <source>Mathematics in Computer Science</source>
          <volume>10</volume>
          (
          <issue>4</issue>
          ),
          <fpage>409</fpage>
          -
          <lpage>427</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hopkins</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            <given-names>Bras</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Petrescu-Prahova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Stanovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Hajishirzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Koncel-Kedziorski</surname>
          </string-name>
          , R.: SemEval-2019 task 10:
          <article-title>Math question answering</article-title>
          .
          <source>In: Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval2019)</source>
          . pp.
          <fpage>893</fpage>
          -
          <lpage>899</lpage>
          (
          <year>June 2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kristianto</surname>
            ,
            <given-names>G.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Topić</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aizawa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Combining effectively math expressions and textual keywords in Math IR</article-title>
          .
          <source>In: Proceedings of the 3rd International Workshop on Digitization and E-Inclusion in Mathematics and Science</source>
          <year>2016</year>
          (
          <year>DEIMS2016</year>
          ). pp.
          <fpage>25</fpage>
          -
          <lpage>32</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kristianto</surname>
            ,
            <given-names>G.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Topic</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aizawa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>MCAT math retrieval system for NTCIR12 mathir task</article-title>
          .
          <source>In: Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-12)</source>
          . pp.
          <fpage>323</fpage>
          -
          <lpage>330</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lv</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Lower-bounding term frequency normalization</article-title>
          .
          <source>In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM'11)</source>
          . pp.
          <fpage>7</fpage>
          -
          <lpage>16</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Mansouri</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oard</surname>
            ,
            <given-names>D.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zanibbi</surname>
          </string-name>
          , R.:
          <article-title>Finding old answers to new math questions: The ARQMath lab at CLEF 2020</article-title>
          . In: Advances in Information Retrieval,
          <source>Proceedings of the 42nd European Conference on IR Research (ECIR 2005). Lecture Notes in Computer Science</source>
          , vol.
          <volume>12036</volume>
          , pp.
          <fpage>564</fpage>
          -
          <lpage>571</lpage>
          . Springer (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoogeveen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Màrquez</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moschitti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mubarak</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldwin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verspoor</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>SemEval-2017 task 3: Community question answering</article-title>
          .
          <source>In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)</source>
          . pp.
          <fpage>27</fpage>
          -
          <lpage>48</lpage>
          (
          <year>Dec 2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Màrquez</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Magdy</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moschitti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glass</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Randeree</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>SemEval-2015 task 3: Answer selection in community question answering</article-title>
          .
          <source>In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval2015)</source>
          . pp.
          <fpage>269</fpage>
          -
          <lpage>281</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Màrquez</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moschitti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Magdy</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mubarak</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freihat</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glass</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Randeree</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>SemEval-2016 task 3: Community question answering</article-title>
          .
          <source>In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval2016)</source>
          . pp.
          <fpage>525</fpage>
          -
          <lpage>545</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Olvera-Lobo</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutiérrez-Artacho</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Question answering track evaluation in TREC, CLEF and NTCIR</article-title>
          .
          <source>In: Advances in Intelligent Systems and Computing</source>
          . vol.
          <volume>353</volume>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>22</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Ounis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amati</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plachouras</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnson</surname>
          </string-name>
          , D.:
          <article-title>Terrier information retrieval platform</article-title>
          .
          <source>In: Advances in Information Retrieval, Proceedings of the 27th European Conference on IR Research (ECIR 2005). Lecture Notes in Computer Science</source>
          , vol.
          <volume>3408</volume>
          , pp.
          <fpage>517</fpage>
          -
          <lpage>519</lpage>
          . Springer (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Pineau</surname>
            ,
            <given-names>D.C.</given-names>
          </string-name>
          :
          <article-title>Math-aware search engines: Physics applications and overview</article-title>
          .
          <source>CoRR abs/1609</source>
          .03457 (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Růžička</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sojka</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Líška</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Math indexer and searcher under the hood: Fine-tuning query expansion and unification strategies</article-title>
          .
          <source>In: Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-12)</source>
          . pp.
          <fpage>331</fpage>
          -
          <lpage>337</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Schubotz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meuschke</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leich</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gipp</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Exploring the one-brain barrier: A manual contribution to the NTCIR-12 mathir task</article-title>
          .
          <source>In: Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-12)</source>
          . pp.
          <fpage>309</fpage>
          -
          <lpage>317</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Schubotz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Youssef</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markl</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohl</surname>
            ,
            <given-names>H.S.:</given-names>
          </string-name>
          <article-title>Challenges of mathematical information retrieval in the NTCIR-12 math Wikipedia task</article-title>
          .
          <source>In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2015</year>
          ). pp.
          <fpage>951</fpage>
          -
          <lpage>954</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Sojka</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Líšaka</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The art of mathematics retrieval</article-title>
          .
          <source>In: Proceedings of the 11th ACM Symposium on Document Engineering (DocEng</source>
          <year>2011</year>
          ). pp.
          <fpage>57</fpage>
          -
          <lpage>60</lpage>
          . ACM Press, New York, New York, USA (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Sojka</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Novotný</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ayetiran</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lupták</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Štefánik</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Quo Vadis, math information retrieval.
          <source>Tech. rep. (</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Thanda</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singla</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prakash</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A document retrieval system for math queries</article-title>
          .
          <source>In: Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-12)</source>
          . pp.
          <fpage>346</fpage>
          -
          <lpage>353</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Zanibbi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aizawa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kohlhase</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ounis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Topić</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davila</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>NTCIR-12 MathIR task overview</article-title>
          .
          <source>In: Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-12)</source>
          . pp.
          <fpage>299</fpage>
          -
          <lpage>308</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Zanibbi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davila</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kane</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tompa</surname>
            ,
            <given-names>F.W.:</given-names>
          </string-name>
          <article-title>Multi-stage math formula search: Using appearance-based similarity metrics at scale</article-title>
          .
          <source>In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2016</year>
          ). pp.
          <fpage>145</fpage>
          -
          <lpage>154</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Zanibbi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mansouri</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of ARQMath 2020: CLEF Lab on answer retrieval for questions on math</article-title>
          .
          <source>In: Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Zhong</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rohatgi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giles</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zanibbi</surname>
          </string-name>
          , R.:
          <article-title>Accelerating substructure similarity search for formula retrieval</article-title>
          . pp.
          <fpage>714</fpage>
          -
          <lpage>727</lpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>