<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dowsing for Answers to Math Questions: Ongoing Viability of Traditional MathIR</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yin Ki Ng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dallas J. Fraser</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Besat Kassaie</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frank Wm. Tompa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>David R. Cheriton School of Computer Science, University of Waterloo</institution>
          ,
          <addr-line>Waterloo, ON</addr-line>
          ,
          <country country="CA">Canada</country>
          ,
          <addr-line>N2L 3G1</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Knowledgehook Inc</institution>
          ,
          <addr-line>151 Charles St W, Kitchener, ON</addr-line>
          ,
          <country country="CA">Canada</country>
          ,
          <addr-line>N2G 1H6</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present our application of the math-aware search engine Tangent-L to the 2021 ARQMath Lab. This is a continuation of our MathDowsers submissions to last year's Lab, where we produced the best Task 1 participant run. Since then, we have improved the search engine's formula retrieval power by considering additional math features in the ranking function. This year, we also explore two approaches to incorporate proximity in evaluating the suitability of a document to be considered a match to a query. For the 2021 ARQMath Lab, our primary run in Task 1 produces an nDCG′ value of 0.434, which is nearly five points higher than that produced by the second-best participant run. An unsubmitted run, which corrects the setup of the primary run and preserves duplicate keyword terms during query term extraction, produces an even higher nDCG′ of 0.462. Meanwhile, our primary run in Task 2 produces an nDCG′ value of 0.552, which is the best automatic run and is comparable to the best participant run, a manual run from the Approach0 team. The success of our runs continue to demonstrate that a traditional math information retrieval system remains a viable option for Community Question Answering specialized in the mathematical domain and for in-context formula retrieval.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Community Question Answering (CQA)</kwd>
        <kwd>Mathematical Information Retrieval (MathIR)</kwd>
        <kwd>Symbol Layout Tree (SLT)</kwd>
        <kwd>Mathematics Stack Exchange (MSE)</kwd>
        <kwd>ARQMath Lab</kwd>
        <kwd>Tangent-L</kwd>
        <kwd>formula matching</kwd>
        <kwd>proximity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>a CQA Task with questions involving math data. The Labs use a collection of questions and
answers from MSE between 2010 and 2018 consisting of approximately 1.1 million question-posts
and 1.4 million answer-posts. In this Lab series, Task 1 is the CQA Task in which participants
are asked to return potential answers to unseen mathematical questions among existing
answerposts in the collection. The closely related Task 2 considers formula retrieval in-context, in which
formulas within questions serve as queries for matching relevant formulas from question-posts
and answer-posts in the same collection.</p>
      <p>
        In ARQMath-1, the Waterloo team of MathDowsers (Figure 1) participated in Task 1, and
our best run achieved an nDCG′ value of 0.345 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which outperformed other participating
systems [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7">4, 5, 6, 7</xref>
        ]. Our approach was a three-stage Mathematics Information Retrieval (MathIR)
system centered around the use of a math-aware search engine, Tangent-L [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]: first, topics
of mathematical questions were automatically transformed into formal queries consisting of
keywords and formulas; then the formal queries were executed against a corpus of MSE
questionanswer pairs by Tangent-L; finally, results were re-ranked based on a linear regression model
trained on CQA metadata using mock relevance assessments. Submissions were made based on
diferent configurations in each stage of the system, and the best run was produced without
re-ranking, demonstrating success of a traditional math-aware query system in addressing a
CQA task specialized in the mathematical domain.
      </p>
      <p>
        For ARQMath-2, we participate again as the MathDowsers team for Task 1 and (for the first
time) Task 2, with the goal to continue exploring the potential of a traditional math-aware query
system in tackling both tasks. In particular, we are interested in further developing the formula
matching capability of our core math-aware search engine Tangent-L, given that a satisfactory
performance has been observed over formula-dependent questions in ARQMath-1 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. With
the empowered Tangent-L, we then refine our system for Task 1 and develop two baseline
approaches for Task 2.
      </p>
      <p>Our refinement is successful, with our primary run for Task 1 continuing to be the best
participant run with respect to the primary measure nDCG′3. Our primary run for Task 2
turns out to be the most efective automatic run, essentially indistinguishable from the best
participant run, a manual run from the Approach0 team. In this paper, we present:
• an updated Tangent-L with several avenues that improve its formula matching capability,
3Normalized Discounted Cumulative Gain (nDCG) with unjudged documents removed
↗

• a refinement of our system for mathematical answer retrieval with respect to query
conversion and searching with Tangent-L,
• two related approaches that are motivated by proximity,
• for in-context formula retrieval, two simple baselines based on our developed system,
• performance results for both Task 1 and Task 2 in ARQMath-2</p>
    </sec>
    <sec id="sec-2">
      <title>2. Improving Formula Matching with Tangent-L</title>
      <p>
        Tangent-L is the cornerstone of our system for the tasks. It is a traditional math-aware query
system built on the popular Lucene text search platform [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. During both index time and search
time, it converts a formula into a bag of math tokens that each capture local characteristics of the
Symbol Layout Tree (SLT) representation of a formula [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], so that mathematical documents can
be matched against a query through text tokens and converted math tokens using a weighted
BM25+ ranking [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        The basic math tokens used by Tangent-L and the approach to weighting text against math
tokens are described elsewhere [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In this section, we describe improvements tested in this
year’s Lab.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Repeated Symbols</title>
        <p>Repetitions of symbols are commonplace in a formula; for instance,  repeats in the formula
2 + 3 + , as does the operator +. Ideally, a search for either  −  or 63 −  +  could
match that formula because of the pattern of repetitions for , and a search for 23 +  + 5
could also match because of the repeated symbol +.</p>
        <p>With this motivation, a new type of token—repetition tokens—is introduced into Tangent-L’s
formula representation to capture this characteristic. Repetition tokens are generated based on
the relative positions of the repeated symbols in the formula’s SLT representation. For every
pair of repeated symbols:
1. if the pair of repeated symbols reside on the same root-to-leaf path of the SLT (that is,
one is an ancestor of the other), then a repetition token {symbol, } is generated, where 
represents the path between the repeated symbols;
2. otherwise, a repetition token {symbol, 1, 2} is generated where 1 and 2 represent the
paths from the closest common ancestor in the SLT to each repeated symbol.</p>
        <p>If a symbol repeats  times where  &gt; 1, (︀ 2)︀ repetition tokens are generated for that symbol
following the above procedure. For each of these tokens, an additional “location” token is
generated with the augmentation of the path traversing from the root to the closest common
ancestor of the pair. As such, a total of 2 · (︀ )︀ repetition tokens are generated and indexed.</p>
        <p>2
Table 1 shows the repetition tokens that would be indexed for the formula 2 + 3 +  in
Figure 2.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Revised Ranking Formula</title>
        <p>With the introduction of repetition tokens, Tangent-L now generates three token types: text
tokens, regular math tokens, and repetition tokens from documents or queries containing
mathematical expressions. During a search, Tangent-L applies BM25+ ranking to the query
terms and the document terms, using custom weights for each class of token as described here.</p>
        <p>Let  be the set of text tokens,  be the set of regular math tokens, and  be the set of
repetition tokens generated for the query terms. Let  be a document represented by the set of
all its indexed tokens. Then the revised ranking formula with the repetition tokens is:
BM25w+( ∪  ∪ , ) =
 ·
 · BM25+(, ) + (1 −  ) · BM25+(, )
max(, 1 −  )
+ (1 −  ) · BM25+(, )
(1)
where  and  are parameters ranging from 0 to 1. The value of  balances the weight of math
features against keyword features, while the value of  balances the weight of repetitions within
math formulas against other math features. Both parameters can be tuned based on the target
dataset.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Formula Normalization</title>
        <p>Mathematical expressions can be rewritten in numerous ways without altering their meaning.
For example,  +  matches  +  semantically because of the commutative law. To
accommodate such variability and increase recall, we equip Tangent-L with the ability to generate
similar math features for two formulas with the same semantics.</p>
        <p>We consider the following five classes of semantic matches:
1. Commutativity:  +  should match  + 
2. Symmetry:  =  should match  = 
3. Alternative Notation:  ×  should match  , and  ≯  should match  ≤ 
4. Operator Unification :  ≺  should match  &lt; 
5. Inequality Equivalence:  ≥  should match  ≤ 
and simple adjustments are applied to Tangent L’s regular math tokens to support these semantic
matches.</p>
        <p>The adjustment to handle the first two classes, Commutativity and Symmetry, are similar.
Recall that originally Tangent-L generates a math token for each pair of adjacent symbols with
their orders preserved. For example, two math tokens (, +, →) and (+, , →) are generated
for the expression  + , and two diferent math tokens ( , +, →) and (+, , →) are generated
for the expression  + . In order for an exact match to take place for the two expressions,
a simple adjustment to the math tokens is to ignore the order of a pair of adjacent symbols
whenever commutative operators or symmetric relations are involved. With this approach, both
expressions  +  and  +  generate the same pair of math tokens, (+, , →) and (+, ,
→), so that an exact match is made possible.4</p>
        <p>The next two classes, Alternative Notation and Operator Unification , can be easily
accommodated by choosing a canonical symbol for each equivalence class of operators and consistently
using only the canonical symbols in any math tokens generated as features.</p>
        <p>The final class, Inequality Equivalence, can be handled by choosing a canonical symbol (for
instance, choosing the symbol “≤ ” in preference to “≥ ”) and then reversing the operands
whenever necessary during math tokens generation.5</p>
        <p>For each of these five classes of semantic matches, Tangent-L provides a separate flag to
control whether or not the class is to be supported, so that only those deemed to be advantageous
are applied when math tokens are generated.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Data Cleansing</title>
        <p>For the ARQMath dataset, the original LATEX formulas from the Math Stack Exchange collections
are wrapped within an identifiable block (a span tag with class="math-container" and
4Our simple implementation sufers from the fact that math tokens handle only a pair of adjacent symbols at
a time. For a longer expression, such as  +  × 5, the overly simplistic approach generates the same set of math
tokens as the expression  +  × 5, failing to consider the priority of operators. nevertheless, we have chosen to
take this approach because correct treatment requires that the math formulas be parsed properly, which is dificult
to achieve when the input of Tangent-L—Presentation MathML—captures layout only.</p>
        <p>5Similar to commutative operations and symmetric relations, the reversion of operands is implemented
simplistically over a pair of adjacent symbols at a time. Thus the generated set of math tokens might equally well
represent a semantically distinct formula.
an id identifier), and the corresponding Presentation MathML representations are provided as
separate files. Since the input to Tangent-L includes formulas encoded in Presentation MathML,
its formula matching ability will be hindered when the quality of the MathML representation is
poor or conversions from LATEX are missing.</p>
        <p>
          Thanks to the efort from the Lab organizers, coverage of the Presentation MathML for
detected formulas has been increased from 92% for ARQMath-1 to over 99% for ARQMath-2 [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
However, further cleansing is still beneficial in preparation for search. We further improve the
data cleansing in preparation for search as follows.
        </p>
        <p>Correcting Conversion Errors: The provided Presentation MathML, generated from LATEX
representation using LaTeXML6, contains conversion errors for formulas including either
less-than “&lt;” or greater-than “&gt;”operators. In particular, when a LATEX formula contains
the operator “&lt;”, it is first encoded as “ &amp;lt;”, but then erroneously escaped again to
form“&amp;amp;lt;”. This results in an erroneous encoding in Presentation MathML, as
shown in Table 2.</p>
        <p>As part of our data preparation, Presentation MathML encodings with doubly-escaped
representations for “&lt;” and “&gt;” are recognized with regular expression matching and
replaced by our own converted representations, improving 869,074 (∼ 3%) formulas.
Providing Missing Formula Identifiers: Approximately 10% of the annotated formulas
in the postings are not correctly and completely captured, many missing their unique
formula identifiers, as shown in Figure 3. In this case, our program is unable to locate
their Presentation MathML representation in the file provided by the Lab organizers.
Formulas such as those from Figure 3 are recognized as much as possible through regular
expression matching for text within $ and $$ blocks. These are then checked against
the formula file provided by the lab organizers to reverse-trace their formula-ids. As a
6https://dlmf.nist.gov/LaTeXML
result, our program is able to capture over 99% of the formulas, including the 10% that
are improperly represented in math-container blocks without ids.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Task 1: Finding Answers to Math Questions</title>
      <p>In Task 1, participants are given mathematical questions selected from MSE posts from either
year 2019 (for ARQMath-1) or year 2020 (for ARQMath-2). Each question is formatted as a
topic that contains a unique identifier, the title, the question body text, and the tags. Participant
systems are asked to return the top-1000 potential answer-posts for each of the topics from the
MSE collection.</p>
      <p>
        For ARQMath-2, we continue to use the three-stage system adopted for ARQMath-1 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]:
Stage 1 Conversion: Transform the input (a mathematical question posed on MSE) into a
well-formulated query consisting of a bag of formulas and keywords.
      </p>
      <p>Stage 2 Searching: Use Tangent-L, the math-aware search engine, to execute the formal query
to find the best matches against an indexed document corpus created from the collection.
Stage 3 Re-ranking: Re-order the best matches with a run-specific re-ranking model.</p>
      <p>In this section, we describe various modifications we wished to explore. We first validate the
benefits of each modification using the ARQMath-1 benchmark, and then we test them using
the ARQMath-2 benchmark.</p>
      <sec id="sec-3-1">
        <title>3.1. Conversion: Fine-tuning Keyword Extraction from Formulas</title>
        <p>
          For ARQMath-1, our designed automated mechanism used to extract query keywords and
formulas from the task topics was shown to be competitive with the human ability to select
search terms [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], as it produces an result that is comparable to the manual set of query terms
selected by the Lab organizers. For ARQMath-2, we fine-tune this automated mechanism using
the ARQMath-1 benchmark for validation as follows:
1. Keywords within a formula representation are intentionally7 retained and extracted, as a
drop in nDCG′ occurs if they are removed. For example, “mod” is a crucial keyword for
topic .7—Finding out the remainder of 1110− 1 using modulus – but this word is present
100
within a formula representation only and not anywhere else in the text. Similarly, “sin”,
“cos”, “tan” can be extracted from \sin, \cos, \tan in formula representations after
punctuation is removed.
2. Every term extracted by the automated mechanism should become part of the query, and
their weight should be boosted naturally if they repeat.8 On the other hand, restricting the
number of keywords and formulas extracted from the mechanism (as we had hypothesized
to be a possible improvement last year) does not show an improved result.
        </p>
        <p>After fine-tuning the automated mechanism, results obtained for the ARQMath-1 benchmark
can be observed to consistently outperform those obtained with the manual set of query terms,
validating the potential of this mechanism.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Searching: Enriching the Document Corpus</title>
        <p>For ARQMath-2, we continue to use question-answer pairs as indexing units for the document
corpus, as worse performance results for the ARQMath-1 benchmark if the content of the
associated question is dropped and only text from each answer is indexed. In addition to the
ifelds included for ARQMath-1, comments 9 associated with answers are also included. As a
7Keywords were not intended to be extracted from within formula representations in the original design for
ARQMath-1, but turned out to be a valuable “mistake” that helped boost performance.</p>
        <p>8In the submission for ARQMath-1, duplicate terms were extracted, but their weights were not boosted
accordingly because of an oversight in our implementation.</p>
        <p>9When extracting the comments, the file Comments.V.1.0.xml is used instead of the more recently released
Comments.V.1.2.xml because the former contains approximately three times as many comments as the latter. Note,
however, that the former file contains more “noise” that requires cleansing as discussed in Section 2.4.
(R), partially relevant (PR), and non-relevant (NR) math answers, where Δ(, ) = 0.5p(rporxo(x()−) +prporxo(x())) .</p>
        <p>Δ(HR,R) Δ(R,PR) Δ(PR,NR) Δ(R,NR) Δ(HR,NR)</p>
        <sec id="sec-3-2-1">
          <title>Span</title>
          <p>Span-NormByDocLen
Normalized-Span
Normalized-Span-NormByDocLen
Min-Span
Min-Span-NormByDocLen
Normalized-Min-Span
Normalized-Min-Span-NormByDocLen
Min-Distance
Min-Distance-NormByDocLen
Ave-Distance
Ave-Distance-NormByDocLen
Max-Distance
Max-Distance-NormByDocLen
7%
0%
-5%
-20%
9%
-1%
2%
-11%
1%
-10%
4%
-7%
9%
-1%
8%
1%
-6%
-13%
7%
2%
1%
-3%
-2%
-9%
3%
-2%
7%
2%
3%
5%
-62%
-64%
6%
8%
-39%
-40%
-89%
-104%
-16%
-15%
6%
9%
10%
5%
-67%
-76%
13%
11%
-38%
-43%
-90%
-111%
-14%
-17%
13%
11%
18%
5%
-72%
-92%
21%
10%
-36%
-53%
-89%
-117%
-10%
-24%
21%
10%
result, more formulas and more text words are available for matching. Figure 4 illustrates the
ifelds indexed as part of each question-answer pair.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Re-ranking: Proximity</title>
        <p>Whereas in ARQMath-1 we attempted re-ranking the retrieved answers from Tangent-L based on
CQA metadata, for ARQMath-2 we investigate the possibility of re-ranking based on proximity.
Proximity is a measure of distance between matched query terms as detailed in Table 3, which
can be a strong signal for document relevancy.</p>
        <p>
          Following the experimental design used by Tao and Zhai [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], we measure the average
proximity of search terms for highly relevant, relevant, partially relevant, and non-relevant
documents in the ARQMath-1 benchmark. The experimental result is shown in Table 4. We
observe strong signals from several measures that distinguish relevance with the correct order
(marked in gradient orange), particularly for normalized-span which correctly orders all four
levels of relevancy (a smaller normalized-span indicating a higher level of relevancy) without
the need to be normalized by document length.
        </p>
        <p>Motivated by this finding, for ARQMath-2 we attempt re-ranking of the retrieved answers by
Tangent-L in increasing order of normalized-span, breaking ties by a decreasing BM25+ score
returned from Tangent-L.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Matching Formulas Holistically</title>
        <p>Formula matching within Tangent-L is based on comparing a set of math tokens from the
query to those from each document (Equation 1). If we index a document that has multiple
formulas, math tokens generated from all the formulas within the document are considered as a
single unordered bag of terms. However, given the strong signal of proximity playing a role in
document relevancy (Table 4), we hypothesize that matching each formula as a whole within a
document, instead of matching math tokens irrespective of formulas that might scatter across a
document, could produce a better result.10 As such, as a post-experiment we design a holistic
formula search as follows:</p>
        <p>At preparation time, we first pre-build a formula corpus for Tangent-L that indexes all visually
distinct formulas in the MSE dataset, each as a separate document with the formula’s visual-id
serving as a key. We define the formula similarity between two formulas to be the normalized
BM25+ score for one formula when the other formula acts as a query. When indexing the
question-answer corpus, rather than replacing each formula within the document by the set of
math tokens generated for that formula, we represent each formula by a single holistic formula
token that contains the formula’s visual-id (that is, its key from the formula corpus). At query
time, we first search for each query formula in the formula corpus and then replace the formula
text in the query by the keys of the top- most similar formulas, thus changing the query to
search for those visual-ids (as well as whatever keywords are also part of the query, of course).
Finally, the ranking formula for documents is revised to weight each match of a formula id by
its formula similarity with respect to the original query formula.</p>
        <p>In the following subsections, we describe these ideas in greater detail.
3.4.1. Formula Corpus
The formula corpus is built by extracting all visually distinct formulas from the document corpus
described in Section 3.2—including formulas found within questions, answers, and comment
posts. Each formula in this corpus is associated with the formula’s visual-id, which serves
as a key. The resulting corpus contains 8,595,899 out of 9,329,274 (∼ 92%) visually distinct
formulas and is indexed by Tangent-L under the setup described in Section 2, each formula
being considered as a document.</p>
        <p>10Note, however, that this ignores proximity among keywords and between keywords and formulas.
We define “formula similarity” as follows: Let  be an arbitrary formula used as a query, 
score obtained for formula  when the query is , using the following definition:
be the set of formulas in the formula corpus, and  ∈  . Let RawScore(,  ) represents the</p>
        <p>RawScore(,  ) = (1 −  ) · BM25+(,  ) +  · BM25+(,  )
where  is the set of regular math tokens and  is the set of repetition tokens in a query
regular math tokens.
formula . As in Equation 1, 0 ≤  ≤</p>
        <p>1 balances the weight of repetition tokens against
The Normalized Formula Similarity of  with respect to  is:
 (, ) =</p>
        <p>
          RawScore(,  )
max ∈ RawScore(,  )
The value of  (, ) is in the range [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ] and represents how well the query formula  is
matched by  relative to other formulas within the formula corpus.
3.4.3. Holistic formula token
A holistic formula token is a placeholder token that incorporates the formula’s visual-id.
Formulas in a question-answer document are replaced by their holistic formula tokens only, so
that when searching the question-answer corpus, formulas can only be matched as a whole.
3.4.2. Normalized Formula Similarity
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Task 1: Runs and Result</title>
        <p>Parameter settings are chosen based on testing with the ARQMath-1 benchmark. For
ARQMath2, we prepared four automatic runs:</p>
        <p>
          11As usual for BM25+[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], , , and  are constants (following common practice, chosen to be 1.2, 0.75, and
1, respectively); tf  is the number of occurrences of formula  in ; || is the total number of terms in ;  =
∑︀∈ |||| is the average document length; and | | is the number of documents in  containing formula  .
3.4.4. Ranking for Holistic Search
Let  be the set of keyword tokens and  be the set of query formulas. Let  ∈  be a query
formula and let () be the set of keys for the top- most similar formulas with respect to ,
determined by Normalized Formula Similarity. Let  be a document represented by the set of
all its indexed tokens.
        </p>
        <p>When searching the document corpus, we adopt the following variant of BM25+:</p>
        <p>BM25w+( ∪  , ) = (1 −  ) · BM25+(, ) +  · BM25+( , )
and</p>
        <p>BM25+( , ) = ∑︁
∑︁</p>
        <p>︃(
∈ ∈( ∩ ())
 (, ) ·</p>
        <p>( + 1)tf 
 ︁( 1.0 −  +  || )︁ + tf 

+  )︃ log ︃( || + 1 )︃
| |
where, as in Equation 1, 0 ≤  ≤ 1 balances math features against keyword features.11
(2)
(3)
(4)
(5)</p>
        <sec id="sec-3-5-1">
          <title>Repetition tokens are adopted.</title>
          <p>In Equation 1,  = 0.25 and  = 0.1.</p>
          <p>Only semantic matches of Commutativity is supported.</p>
          <p>Recognition of Presentation MathML is improved.</p>
          <p>Comments from answers are added to the indexing unit.</p>
          <p>Keywords within a formula representation are retained.</p>
          <p>Query terms are (unintentionally) de-duplicated.
primary: A submitted run with most of the presumably best setup, based on tests on the</p>
          <p>ARQMath-1 benchmark, as described in Table 5.
proximityReRank: A submitted run based on Section 3.3. This uses the same setup as the
primary run, but the top-1000 matches are subsequently re-ranked by proximity, using
normalized span as the proximity measure.
holisticSearch: A post-experiment run that matches formulas holistically based on
Section 3.4. When searching in the formula corpus,  is set to 0.1 in Equation 2 and when
searching in the document corpus,  is set to 0.5 in Equation 4 and  is set to be 300.
duplicateTerms: A post-experiment run sharing the same setup as the primary run, except
that duplicate query terms are preserved as described in Section 3.1.</p>
          <p>The results of these runs for ARQMath-2 are shown in Table 6, together with the baseline
runs and our submissions from last year over the ARQMath-1 benchmark. In general, after
parameter selection based on the ARQMath-1 benchmark, our updated system produces results
that have a significant improvement compared with those from last year’s system over the
ARQMAth-1 topics. For instance, our primary setup evaluated over the ARQMath-1 benchmark
achieves an nDCG′ score of 0.433, which is nearly a 10-point gain over the nDCG′ score of
0.345 produced by our best participant run (alpha05-noR) last year.</p>
          <p>
            This parameter selection based on the ARQMath-1 benchmark helps our updated system
to achieve equally good results for the new set of math topics in ARQMath-2. Our primary
run produces an nDCG′ of 0.434, which remains the best run among all participants[
            <xref ref-type="bibr" rid="ref13">13</xref>
            ]. The
unsubmitted run duplicateTerms, which corrects an oversight in the primary run and therefore
reflects our intended “best” setup, scores even higher, with an nDCG′ of 0.462.
          </p>
          <p>
            The duplicateTerms run also has the highest values for the ARQMath-2 benchmark in all
other evaluation measures, with the exception of P′@10 for the baseline run Linked MSE posts
(which uses human-built links that were not available to participating teams[
            <xref ref-type="bibr" rid="ref13">13</xref>
            ]). With a closer
look to the efectiveness breakdown by topic category in Table 7, we observe that this run has a
strong performance for Formula-dependent topics, Proof-like topics, and topics of Low-level
dificulty. In spite of a diferent set of math topics being evaluated, these observed strengths are
similar to the observed strengths of our best participant run last year [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ].
          </p>
          <p>On the other hand, our submitted alternative run proximityReRank, which tries to re-rank the
results using the proximity signal Normalized-Span, does not perform well. For the ARQMath-1
benchmark, this run shows a 6-point loss compared to the primary run (0.373 vs 0.433) and the
loss is enlarged to nearly 10 points in ARQMath-2 (0.335 vs 0.434), indicating an unsatisfactory
re-ranking. It seems that even for a measure that shows a strong signal for proximity in Table 4,
the separation among documents based on proximity might be inadequate to reflect relevance.</p>
          <p>Finally, our unsubmitted run holisticSearch, which is an approach also motivated by proximity,
performs fairly well. Compared to the primary run, the nDCG′ score for the holisticSearch
run shows a 3-point loss over the ARQMath-1 benchmark (0.405 vs 0.433) and similarly a
2-point loss in ARQMath-2 (0.414 vs 0.434). Notably, this run outperforms all other runs
submitted by participants in ARQMath-2 and outperforms our primary run in the  ′@10 and
bpref measures. However, this run is outperformed by the unsubmitted duplicateTerms run in
all evaluation measures (with nearly a 5-point loss (0.414 vs 0.462 for nDCG′), suggesting room
for improvement for this approach.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Task 2: In-context Formula Retrieval</title>
      <p>For Task 2, participants are asked to retrieve the top matching formulas, together with their
associated posts, for each topic formula chosen from the set of topics used for Task 1. Relevancy
of a retrieved formula is evaluated in context: both the associated post of a retrieved formula and
the associated topic content of the topic formula are presented to the assessors for evaluation.
Assessments are then aggregated so that each visually distinct formula is judged to be relevant
if any of the corresponding formula occurrences are deemed to be relevant. The performance of
a system is then determined by its performance with respect to visually distinct formulas only.</p>
      <p>For ARQMath-2, we propose two simple approaches that re-use two major components
created for Task 1:
1. the Formula Corpus of all visually distinct formulas, as described in Section 3.4.1;
2. the results from Task 1 Answer-Ranking of the top 10,000 answer-posts for each topic,
run with the primary setup as detailed in Table 5.</p>
      <p>The rest of this section describes our two approaches built on these components.</p>
      <sec id="sec-4-1">
        <title>4.1. Formula-centric: Selecting Visually Matching Formulas</title>
        <p>The first straightforward approach is formula-centric, relying on Tangent-L’s internal formula
matching capability to find the matching formulas. To create a list of matching formulas for
a topic, we first search for matches to the topic formula in the formula corpus of all visually
distinct formulas. This gives us a ranking  of visually distinct formulas. We then expand
each element of  with its set of formula occurrences: formulas that have the same visual-id
but appearing in diferent posts. 12 We refer to a set of formula occurrences having the same
12Only question-posts and answer-posts are of concern in the task, so any returned formulas from
commentposts are ignored.
visual-id as a visual group. The selection of formula occurrences to return is then governed by
the rank of their associated posts in the answer retrieval task. In particular,
1. Formulas within the same visual group are ranked in the same order as the ranking of
their associated posts in Task 1 for the corresponding topic. If the associated posts of
formulas are question-posts that are not associated with any answer from Task 1, the
formulas are assigned the lowest ranking. Finally, the lexical order of formula-ids is used
to break ties.
2. For each of the top-20 visually distinct formulas in , we select the top five formulas
from its visual group (or all formulas in the visual group if there are fewer than five); for
the remainder, we select the top formula only (if any have associated question or answer
posts).
3. Sequentially considering the formulas in  in order, selected formula occurrences from
each visual group are appended to the final list of matching formulas until 1000 formula
occurrences are selected in total.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Document-centric: Screening Formulas from Matched Documents</title>
        <p>The second straightforward approach is document-centric, relying more on the results from
the answer retrieval task. Based on the answer-ranking from Task 1, the final list of matching
formula occurrences is selected from the answers as follows:
1. For each matched answer-post for the corresponding topic in Task 1, we retrieve its
question-answer document from the document corpus. If the document contains only
one formula, that formula is selected. Otherwise, each formula from the document is
mapped to its visual group, and its Normalized Similarity Score (Equation 3) with respect
to the topic formula is computed using  = 0.1 in Equation 2 (but see below). Formulas
having a score less than a threshold of 0.8 are screened out, and the rest are preserved
and ranked accordingly.
2. Following the original answer-ranking, preserved formulas from each question-answer
pair are appended to the final list until 1000 formulas are selected in total.</p>
        <p>Formulas in an answer-post might correspond to visually distinct formulas any where in
the formula corpus, but it is highly ineficient to compute the Normalized Similarity Score for
every formula in the formula corpus, which requires retrieving over 8.5 million RawScores
using Tangent-L. Therefore, for each topic, formulas in answer-posts that are not within the
top 10,000 most similar formulas to the query formula are assigned a score of 0 and therefore
screened out.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Task 2: Runs and Result</title>
        <p>For ARQMath-2, we include two automatic runs:
formulaBase: A submitted run selecting visually matching formulas as in Section 4.1;
docBase: A submitted run selecting formulas from matched documents as in Section 4.2.
¶ submitted primary run * submitted alternate run M manual run † using H+M binarization
The result of both runs in ARQMath-2 are shown in Table 8, together with the baseline run
and the best participant runs for the ARQMath-1 and ARQMath-2 benchmarks. Our primary
run formulaBase, with parameter selection based on the ARQMath-1 benchmark, achieves a
very close performance to the best participant run Tangent-CFTED produced from the DPRL
team last year (0.562 vs 0.563). However, on the ARQMath-1 benchmark, it does not perform as
well as the ltrall run submitted this year by the DPRL team, having a 17-point loss on nDCG′
over the same set of math topics (0.562 vs 0.735).</p>
        <p>On the ARQMath-2 benchmark, however, with a new set of math topics, our primary run
formulaBase performs approximately as well, with an nDCG′ score of 0.552. This score is the
best among all automatic runs, and it is almost indistinguishable from the best participant run
P300 from the Approach0 team, which is a manual run. Notably, on the ARQMath-2 benchmark,
it outperforms the ltrall run from the DPRL team by over 10 points (0.552 vs 0.445).</p>
        <p>On the other hand, our alternative run docBase does not perform as well as expected. For
the ARQMath-1 benchmark, this run shows nearly a 16-point loss with respect to our primary
run (0.404 vs 0.562) and nearly a 12-point loss (0.433 vs 0.552) for the ARQMath-2 in terms of
nDCG′. This run also achieves lower scores in all other evaluation measures, suggesting that
simply selecting formulas from matching documents does not work well.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Eficiency</title>
      <p>The machines used for our experiments have the following specifications:</p>
      <p>Machine A
Machine B</p>
      <p>A Ubuntu 20.04.1 LTS Server with an AMD EPYC™ 7502P
Processor (32 Cores 64 Threads, 2.50GHz), 512GB RAM and 3.5TB
disk space.</p>
      <p>A Linux Mint 19.1 Server with an Intel Core i5-8250U Processor
(4 Cores 8 Threads, up to 3.40GHz), 24GB RAM and 512GB disk
space.</p>
      <p>All indexing was performed on Machine A, yielding the following performance characteristics:
Corpus</p>
      <p>See Section
Document Corpus</p>
      <p>Formula Corpus
Document Corpus
(for holistic search)
Note that data and index sizes show the values reported by the du command on Linux, which
measures disk space usage based on blocks; thus the many small documents in the formula
corpus require much more disk space than might be expected. (In fact, the total size of the data
in the formula corpus is only 9.2 GB.)</p>
      <p>Runs for ARQMath-2 were executed on Machine B with the following average, minimum,
and maximum query times per topic as follows:</p>
      <p>Run \ Query Time
Task 1
primary
holisticSearch
duplicate
Task 2
(pre-computing Answer-Ranking)
formulaBase
docBase</p>
      <p>Avg. (sec)</p>
      <p>Min. (sec)</p>
      <p>Max. (sec)</p>
      <p>The proximityReRank run uses Machine A to rerank the output from the primary run, thus
requiring first the time shown for the primary run on Machine B and then an additional 8 hours
to re-rank all topics on Machine A.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Further Work</title>
      <p>We conclude that a traditional math-aware search system continues to be an eficient and
efective approach to tackle the CQA task, which is proven by producing the best participant
run in Task 1 again this year. In particular, a significant boost in efectiveness for Task 1 can be
observed on both years’ math topics after parameter selection based on tests on the ARQMath-1
benchmark. The best result is achieved through several aspects of improvement of the formula
matching capability of Tangent-L, demonstrating the competitiveness of this math-aware search
engine in handling text and mathematical notations together.</p>
      <p>We also develop a simple but strong baseline for the in-context formula retrieval task. Being
the best automatic run and competitive with the best participant run, our formula-centric run
demonstrates again the strong formula matching ability of Tangent-L.</p>
      <p>Nevertheless, several aspects of our runs turn out to be somewhat disappointing again. In
the CQA task, we explore the incorporation of proximity in two approaches and the result does
not improve efectiveness over using a bag-of-terms approach:</p>
      <p>
        Proximity Re-Ranking: Re-ranking based on proximity is unsatisfactory, despite some
proximity diference being observed based on the relevancy of judged documents. Perhaps
proximity is a more important measure when the BM25+ score is low, and therefore it
needs to be incorporated into the initial retrieval [
        <xref ref-type="bibr" rid="ref14 ref16">14, 16</xref>
        ] rather than used for re-ranking.
Alternatively, despite the percentage diferences observed, the actual diferences might
be too small to serve as a reliable signal of relevance.
      </p>
      <p>Matching Formulas Holistically: The proposed method to match formulas holistically
shows some promise but does not perform as well as matching based on math tokens.
Perhaps Equation 5 can be improved to make better use of the formula similarity scores
returned from the formula corpus. Improvements here might also provide insights into
further improving our formula-centric approach in Task 2.</p>
      <p>Additionally, our proposed document-centric baseline for the in-context formula retrieval
task, which selects formulas from top matching math answers, does not perform as well as
expected given our strong result in the answer retrieval task. Investigation into the distribution
of matching formulas among the top relevant answers might be helpful in further exploring
this simple tactic for the task.</p>
      <p>All in all, while our updated system with Tangent-L continues to excel in both tasks, there
is still a huge room for improvement in how we might use the document relevancy signals
observed from the ARQMath-1 benchmark to propose new approaches that might further
improve efectiveness. In retrospect, approaches that we attempted through re-ranking did
not benefit suficiently from the raw signals obtained from the ARQMath-1 benchmark. With
the additional new evaluation data available from the ARQMath-2 benchmark, we expect to
gain better insights, and we are excited to continue exploring question answering for the
mathematical domain.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This research has been funded by the Waterloo-Huawei Joint Innovation Lab and NSERC, the
Natural Science and Engineering Research Council of Canada. The NTCIR Math-IR dataset
used for earlier benchmarks and as a source of relevant keywords was made available through
an agreement with the National Institute of Informatics.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zanibbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mansouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Oard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          , Overview of ARQMath-
          <volume>2</volume>
          (
          <year>2021</year>
          )
          <article-title>: Second CLEF lab on answer retrieval for questions on math</article-title>
          ,
          <source>in: CLEF</source>
          <year>2021</year>
          , volume
          <volume>12880</volume>
          <source>of LNCS</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zanibbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Oard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mansouri</surname>
          </string-name>
          ,
          <article-title>Overview of ARQMath 2020 (updated working notes version): CLEF lab on answer retrieval for questions on math</article-title>
          ,
          <source>in: CLEF</source>
          <year>2020</year>
          , volume
          <volume>2696</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y. K.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Fraser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kassaie</surname>
          </string-name>
          , G. Labahn,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Marzouk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. W.</given-names>
            <surname>Tompa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Dowsing for math answers with Tangent-L, in: CLEF 2020</article-title>
          , volume
          <volume>2696</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mansouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Oard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zanibbi</surname>
          </string-name>
          ,
          <string-name>
            <surname>DPRL</surname>
          </string-name>
          <article-title>Systems in the CLEF 2020 ARQMath Lab</article-title>
          ,
          <source>in: CLEF</source>
          <year>2020</year>
          , volume
          <volume>2696</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Novotný</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sojka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Štefánik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lupták</surname>
          </string-name>
          ,
          <source>Three is Better than One Ensembling Math Information Retrieval Systems, in: CLEF</source>
          <year>2020</year>
          , volume
          <volume>2696</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rohatgi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Giles</surname>
          </string-name>
          , PSU at CLEF-2020
          <string-name>
            <surname>ARQMath Track</surname>
          </string-name>
          <article-title>: Unsupervised Re-ranking using Pretraining</article-title>
          ,
          <source>in: CLEF</source>
          <year>2020</year>
          , volume
          <volume>2696</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Scharpf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schubotz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Greiner-Petter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ostendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Teschke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          , ARQMath Lab:
          <article-title>An Incubator for Semantic Formula Search in zbMATH Open?</article-title>
          ,
          <source>in: CLEF</source>
          <year>2020</year>
          , volume
          <volume>2696</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Fraser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. W.</given-names>
            <surname>Tompa</surname>
          </string-name>
          ,
          <article-title>Choosing math features for BM25 ranking with Tangent-L, in</article-title>
          :
          <source>DocEng</source>
          <year>2018</year>
          ,
          <year>2018</year>
          , pp.
          <volume>17</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          :
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y. K.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Fraser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kassaie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. W.</given-names>
            <surname>Tompa</surname>
          </string-name>
          , Dowsing for math answers,
          <source>in: CLEF</source>
          <year>2021</year>
          , volume
          <volume>12880</volume>
          <source>of LNCS</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Białecki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Muir</surname>
          </string-name>
          , G. Ingersoll,
          <article-title>Apache Lucene 4</article-title>
          , in: SIGIR 2012 Workshop on Open Source Information Retrieval,
          <year>2012</year>
          , pp.
          <fpage>17</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zanibbi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Blostein</surname>
          </string-name>
          ,
          <article-title>Recognition and retrieval of mathematical expressions</article-title>
          ,
          <source>Int. J. Document Anal. Recognit</source>
          .
          <volume>15</volume>
          (
          <year>2012</year>
          )
          <fpage>331</fpage>
          -
          <lpage>357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <article-title>Lower-bounding term frequency normalization</article-title>
          ,
          <source>in: CIKM'11</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>7</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mansouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Oard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zanibbi</surname>
          </string-name>
          ,
          <source>Advancing math-aware search: The ARQMath-2 lab at CLEF</source>
          <year>2021</year>
          , in:
          <source>ECIR</source>
          <year>2021</year>
          , volume
          <volume>12657</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2021</year>
          , pp.
          <fpage>631</fpage>
          -
          <lpage>638</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <article-title>An exploration of proximity measures in information retrieval</article-title>
          ,
          <source>in: SIGIR</source>
          <year>2007</year>
          ,
          <year>2007</year>
          , pp.
          <fpage>295</fpage>
          -
          <lpage>302</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zaragoza</surname>
          </string-name>
          ,
          <article-title>The probabilistic relevance framework: BM25 and beyond</article-title>
          ,
          <source>Foundations and Trends in Information Retrieval</source>
          <volume>3</volume>
          (
          <year>2009</year>
          )
          <fpage>333</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Rasolofo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Savoy</surname>
          </string-name>
          ,
          <article-title>Term proximity scoring for keyword-based retrieval systems</article-title>
          , in: Advances in Information Retrieval,
          <source>Proceedings of the 27th European Conference on IR Research (ECIR</source>
          <year>2003</year>
          ), volume
          <volume>2633</volume>
          , Springer,
          <year>2003</year>
          , pp.
          <fpage>207</fpage>
          -
          <lpage>218</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>