<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>CLEF</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>ARQMath Lab: An Incubator for Semantic Formula Search in zbMATH Open?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Philipp Scha rpf</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Moritz Schubotz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>André Greiner-Petter</string-name>
          <email>andre.greiner-petter@zbmath.org</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ma lte Ostendorff</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ola f Teschke</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>a nd Bela Gipp</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FIZ Karlsruhe</institution>
          ,
          <addr-line>Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Konstanz</institution>
          ,
          <addr-line>Konstanz</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Wuppertal</institution>
          ,
          <addr-line>Wuppertal</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>22</volume>
      <fpage>22</fpage>
      <lpage>25</lpage>
      <abstract>
        <p>The zbMATH database contains more than 4 million bibliographic entries. We aim to provide easy access to these entries. Therefore, we maintain different index structures, including a formula index. To optimize the findability of the entries in our database, we continuously investigate new approaches to satisfy the information needs of our users. We believe that the findings from the ARQMath evaluation will generate new insights into which index structures are most suitable to satisfy mathematical information needs. Search engines, recommender systems, plagiarism checking software, and many other added-value services acting on databases such as the arXiv and zbMATH need to combine natural and formula language. One initial approach to address this challenge is to enrich the mostly unstructured document data via Entity Linking. The ARQMath Task at CLEF 2020 aims to tackle the problem of linking newly posted questions from Math Stack Exchange (MSE) to existing ones that were already answered by the community. To deeply understand MSE information needs, answer-, and formula types, we performed manual runs for tasks 1 and 2. Furthermore, we explored several formula retrieval methods: For task 2, such as fuzzy string search, k-nearest neighbors, and our recently introduced approach to retrieve Mathematical Objects of Interest (MOI) with textual search queries. The task results show that neither our automated methods nor our manual runs archived good scores in the competition. However, the perceived quality of the hits returned by the MOI search particularly motivates us to conduct further research about MOI.</p>
      </abstract>
      <kwd-group>
        <kwd>Information Retrieval</kwd>
        <kwd>Mathematical Information Retrieval</kwd>
        <kwd>Question Answering</kwd>
        <kwd>Semantic Search</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Mathematical Objects of Interest</kwd>
        <kwd>ARQMath Lab</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In 2013 the first prototype of formula -sea rch in zbMATH wa s a nnounced [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which
beca me a n integra l pa rt of the zbMATH interfa ce by now. At the beginning of 2021,
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution
zbMATH will tra nsform its business model from a subscription-ba sed service to a
publicly funded open service. In this context, we eva lua te novel a pproa ches to include
ma thematical formula e a s first-cla ss citizens in our ma thematica l informa tion retrieval
infra structure. Despite the sta nda rd sea rch tha t ta rgets a bstract, review, a nd publica tion
meta -da ta, zbMATH a lso tra ces incoming links from the Question Answering pla tform
Ma thOverflow a nd provides ba cklinks from scientific a rticles to Ma thOverflow links,
mentioning the publica tion [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We hypothesize tha t federa ting inform a tion from
zbMATH a nd Ma thOverflow will enha nce the zbMATH sea rch experience
significa ntly. The ARQMa th La b a t CLEF 2020 a ims to ta ckle the problem of linking newly
posted questions from Ma th Sta ck Excha nge to existing ones tha t we re a lrea dy a
nswered by the community [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Using question postings from a test collection (extra cted
by the ARQMa th orga nizers from a n MSE Internet Archive Sna pshot1 until 2018) as
queries, the goa l is to retrieve releva nt a nswer posts, conta ining both text a nd a t least
one formula . The test collection crea ted for the ta sk is intended to be used by resea
rchers a s a benchma rk for ma thematica l retrieva l ta sk s tha t involve both na tura l a nd
mathema tica l la ngua ge. The ARQMa th La b consists o f two sepa ra te subta sks. Ta sk 1 –
Answer poses the cha llenge to retrieve releva nt community a nswer post given a question
from Ma th Sta ck Excha nge (MSE). Ta sk 2 – Formula s poses the cha llenge to retrieve
releva nt formula s from question a nd a nswer posts. Specifica lly, the a im of Ta sk 1 is to
be a ble to find old a nswers to new questions to speed up the community a nswer process.
The a im of Ta sk 2 is to find a ra nked list of releva nt formula e in old questions and
a nswers to ma tch to a query formula from the new question. This ta sk design seems to
a good fit for our resea rch interest, since the informa tion needs a re rela ted. Moreover,
Ma thOverflow a nd ma th.stackexchange use the sa me da ta -format, which ena bles us to
reuse softwa re developed during this competition a nd to tra nsform it into production
softwa re la ter on. On the other ha nd, the ma thematical level of questions on Ma th Stack
Excha nge is less sophistica ted a nd thus not a ll releva nt ra nkings might be suita ble for
our use-ca se.
1.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>ARQMath Lab</title>
      <p>
        The ARQMa th la b wa s motiva ted by the fa ct tha t Ma nsouri et a l. discovered “tha t 20%
of the ma thema tica l queries in genera l-purpose sea rch engines were expressed a s
wellformed questions” [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Furthermore, with the increa sing public interest in
Community Question Answering sites such a s MSE2 a nd Ma thOverflow3, it will be beneficial
to develop computa tional methods to support human a nswerers. Pa rticula rly, the
“timeto-a nswer” should be shortened by linking to rela ted a nswers a lrea dy provided on the
pla tform, which ca n potentia lly lea d to the a nswer more quickly. This will be of grea t
help since most of the time the question is urgent a nd rela ted – sometimes even directly
exa ct – existing a nswers a re a va ila ble. However, the ta sk is cha llenging beca use both
1 https://archive.org/download/stackexchange
2 https://math.stackexchange.com
3 https://mathoverflow.net
questions and answers can be a combination of natural and mathematical language,
involving words and formulae. ARQMath lab at CLEF 2020 will be the first in a
threeyear sequence through which the organizers “aim to push the state of the art in
evaluation design for math-aware IR” [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The task starts with the domain of mathematics
involving formula language. The goal is to later extend the task to other domains (e.g.,
chemistry or biology), which employ other types of special notation.
1.2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Math Stack Exchange</title>
      <p>
        Stack Exchange is an online platform with a host of Q&amp;A forums [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The Stack
Exchange network consists of 177 Q&amp;A communities including Stack Overflow, which
claims to be “the largest, most trusted online community for developers to learn and
share their knowledge”2. The different topic sites include Q&amp;A on com puter issues,
math, physics, photography, etc. Users can rank questions and answers by voting them
up or down according to their quality assessment. Stack Exchange provides its content
publicly available in XML format under the Creative Commons license [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The Math
Stack Exchange collection for the ARQ lab tasks comprises Q&amp;A postings extracted
from data dumps from the Internet Archive4. Currently, over 1 million questions are
included [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
2
2.1
      </p>
      <sec id="sec-3-1">
        <title>Related Work</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Mathematical Question Answering</title>
      <p>
        Already in 1974, Smith [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] describes a project investigating the understanding of
natural language by computers. He develops a theoretical model of natural language
processing (NLP) and algorithmically implements his theory. Specifically, he chooses the
domain of elementary mathematics to construct a Q&amp;A system for unrestrict ed natural
language input. However, for some time later, there was little interest and progress in
the field of mathematical question answering. In 2012, Nguyen et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] present a
mathaware search engine for a math question answering system . Their system handles both
textual keywords as well as mathematical expressions. The math feature extraction is
designed to encode the semantics of math expressions via a Finite State Machine model.
They tested their approach against three classical information retrieval strategies on
math documents crawled from Math Overflow, claiming to outperform them by more
than 9%. In 2017, Bhattacharya et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] publish a survey of question answering for
math and science problems. They explore the current achievements towards the goal of
making computers smart enough to pass math and science tests. They conclude
claiming that “the smartest AI could not pass high school”. In 2018, Gunawan et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
present an Indonesian question answering system for solving arithmetic word problems
using pattern matching. Their approach is integrated into a physical humanoid robot.
For auditive communication with the robot, the user’s Indonesian question must be
      </p>
      <sec id="sec-4-1">
        <title>4 https://archive.org</title>
        <p>
          translated into English text. They employ NLP usin g the NLTK toolkit5, specifically
co-referencing, question parsing, and preprocessing. They conclude claiming that the
Q&amp;A system achieves an accuracy between 80% and 100%. However, they state that
the response time is rather slow with average about more than one minute. Also in 2018,
Schubotz et a l. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] present MathQA6, an open-source math-aware question a nswering
system based on Ask Platypus7. The system returns as a single mathematical formula
for a natural language question in English or Hindi. The formulae are fetched from the
open knowledge-base Wikidata 8. With numeric values for constants loaded from
Wikidata, the user can do computations using the retrieved formula. It is claimed that the
system outperforms a popular computational mathematical k nowledge-engine by 13%.
In 2019, Hopkins et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] report on the SemEva l 2019 task on math question
answering. The derived a question set from Math SAT practice exams, including 2778 training
questions and 1082 test questions. According to their study, the top system correctly
answered 45% of the test questions, with a random guessing baseline at 17%. Beyond
the domain of math Q&amp;A, Pineau [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and Abdi et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] present first approaches to
answer questions on physics.
2.2
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Mathematical Document Subject Class Classification</title>
      <p>
        For open-domain question redirection, it is beneficial to classify a given mathematical
question by its domain, e.g. geometry, calculus, set theory, physics, etc. There have
been several approaches to perform categorization or subject class classification for
mathematical documents. In 2017, Suzuki and Fujii [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] test classification methods on
collections built from MathOverflow9 and the arXiv10 paper preprint repository. The
user tags include both keywords for math concepts and categories form the
Mathematical Subject Classification (MSC) 201011 top and second-level subjects. In 2020,
Scharpf et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] investigate how combining encodings of natural and mathematical
language affect the classification and clustering of documents with ma thematical
content. They employ sets of documents, sections, and abstracts from the arXiv 10, labeled
by their subject class (mathematics, computer science, physics, etc.) to compare
different encodings of text and formulae and evaluate the performance and runtimes of
selected classification and clustering algorithms. Also in 2020, Schubotz et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
explore whether it is feasible to automatically assign a coarse-grained prima ry
classification using the MSC scheme using multi-class classification algorithms. They claim to
achieve a precision of 81% for the autom atic article classification. We conclude that
for math Q&amp;A systems, the classification needs to be performed at the sentence level.
If MSE questions contain several sentences, the problem could potentially also be
framed as an abstract classification problem.
5 https://www.nltk.org
6 http://mathqa.wmflabs.org
7 https://askplatyp.us
8 https://www.wikidata.org
9 https://mathoverflow.net
10 https://arxiv.org
11 http://msc2010.org
2.3
      </p>
    </sec>
    <sec id="sec-6">
      <title>Connecting Natural and Mathematical Language</title>
      <p>
        For ma thematica l question a nswering, ma thematical informa tion needs to be connected
to na tura l la ngua ge queries. Ya ng &amp; Ko [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] present a sea rch engine for formula e in
Ma thML12 using a pla in word query. Ma nsouri et a l. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] investiga te how queries for
ma thematical concepts a re performed in sea rch engines. They conclude “tha t math
sea rch sessions a re typica lly longer a nd less successful tha n genera l sea rch sessions”.
For non-ma thematical queries, sea rch engines like Google13 or DuckDuckGo14 a lready
provide entity ca rds with a short encyclopedic description of the sea rched concept [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
For ma thematical concepts, however, there is a n urgent need to connect a na tura l la
ngua ge query to a formula representing the keyword. Dmello [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] proposes integra ting
entity ca rds into the ma th-a wa re sea rch interfa ce Ma thSeer15. Scha rpf et a l. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]
propose a Formula Concept Retrieva l cha llenge for Formula Concept Discovery (FCD)
a nd Formula Concept Recognition (FCR) ta sks. They present first ma chine lea rning
ba sed a pproa ches for retrieving formula concep ts from the NTCIR 11/12 a rXiv da
ta set16.
2.4
      </p>
    </sec>
    <sec id="sec-7">
      <title>Semantic Annotations</title>
      <p>
        To connect ma thematica l formula e a nd symbols to na tura l la ngua ge keywords, sema
ntic a nnota tions a re a n effective mea ns. So fa r there a re only a few a nnota tion systems
a va ila ble for ma thematical documents. Dumitru et a l. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] present a browser-ba sed a
nnota tion tool (“KAT system”) for linguistic/sema ntic a nnotations in structured
(XHTML5) documents. Scha rpf et a l. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] present “AnnoMa thTeX”, a recommender
system for formula a nd identifier a nnota tion of Wikipedia a rticles using Wikidata 17
QID item ta gs. The a nnotations ca n be integra ted into the Ma thML ma rkup using
Ma thML Wikida ta Content Dictiona ries18 [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
3
      </p>
      <sec id="sec-7-1">
        <title>Summary of Our Approach</title>
        <p>We ta ckle the ARQMa th la b ta sks (Ta sk 1 – a nswer retrieva l, Ta sk 2 – formula
retrieva l) using ma nua l run selection benchma rking. Therefore, we crea te, p opula te, and
employ a Wiki19 with pa ges for norma l (Ta sk 1) a nd formula (Ta sk 2) topics. The main
objective of our experiments wa s to explore methods to ena ble a utomatic a nswer a
ssignment recommendations to question postings on Ma thematics Sta ck Excha nge
(MSE). We tested the following a pproa ches or methods: 1) ma nual run a nnotation using
12 https://www.w3.org/TR/MathML3
13 https://www.google.com
14 https://duckduckgo.com
15 https://www.cs.rit.edu/~dprl/mathseer
16 http://ntcir-math.nii.ac.jp
17 https://www.wikidata.org
18 https://www.openmath.org
19 https://arq20.formulasearchengine.com
Google a nd MSE sea rch, 2) formula TF-IDF or Doc2vec20 encodings [23] using the
Python libra ries Scikit-lea rn21 [24] a nd Gensim22 [25], 3) fuzzy string compa rison or
ma tching using ra pidfuzz23, 4) k-nea rest neighbors a lgorithm, a nd 5) discovering of
Ma thematica l Objects of Interest (MOI) with textua l sea rch queries [26].
As result, we obta ined a releva nt MSE a nswer(s) ID for ea ch query in the sa mple of
Ta sk 1, a nd a ra nked list of most releva nt formula e for ea ch query in the sa mple of Task
2 (if a va ila ble). Fina lly, we a na lyzed our results using a ma nua l consistency a nd quality
check.
4</p>
      </sec>
      <sec id="sec-7-2">
        <title>Workflow of Our Approach</title>
        <p>The workflow of our a pproa ch is illustra ted in Fig. 1. It ca n be logica lly divided into
three sta ges: 1) the crea tion of a Wiki with pa ges for norma l a nd formula topics, 2)
methods to ta ckle Ta sk 1, a nd 3) methods to ta ckle Ta sk 2.</p>
        <p>Wiki
Task 1
Task 2
•Retri eval of URLs using Google a nd MSE search
•Crea ti on of Wi ki at a rq20.formulasearchengine.com
•Crea ti on of Wi ki pages for normal a nd formula topics
•Ins ert links to math.stackexchange.com/questions/xxx on Wikipedia page
•Ma nual run selection of the most s uitable answer
•Ins ert links to https://math.stackexchange.com/a/xxx a s “relevant a nswers”
property on Wikidata i tem for normal topics
•Ma nual run selection of the most s uitable formula(e)
•La TeX s tring as “defining formula” property a s subproperty of “relevant
a ns wers” on Wikidata item for formula topics
The initia l prepa ra tion step for our a pproa ch to ta ckle Ta sk 1 a nd 2 wa s to crea te,
popula te, a nd employ a Media Wiki environment connected to a m a thoid [27] rendering
20 Also known as “Paragraph Vectors”, as introduced in [23].
21 https://scikit-learn.org
22 https://radimrehurek.com/gensim
23 https://github.com/maxbachmann/rapidfuzz
service with pages for normal and formula topics. For each query, there is a Wikibase
item with the following properties: ‘math-stackexchange-category’ (P10), ‘topic-id’
(P12), ‘post-type’ (P9), ‘math stackexcange post id’ (P5), and ‘relevant answers’ (P14).
Having set up the Wiki, we manually retrieved the question URLs using Google and
MSE search and inserted them as values for the ‘math stackexchange post id’ on the
respective question pages. Unfortunately by doing so some post 2019 new post-ids were
entered because we did not check the date carefully enough. The
‘math-stackexchangecategory’ values were automatically retrieved from the question tags. The ‘topic-id’
(e.g., A.50) was transferred from the task dataset, the ‘post-type’ set to “Question”.
Unfortunately, as we discovered later, the use of Google and MSE search led to results
outside the task dataset. This means that the answer that was accepted as the best answer
by the questioner was often not included in the task da taset. However, our aim was to
establish the “correct” answer as semantic reference in our MediaWiki.
4.2</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Populate Topic Answers (Task 1)</title>
      <p>The first part in our experimental pipeline was a manual run selection of the most
suitable answer from the MSE question posting page (preferably the one selected by the
questioner, if available). Subsequently, we inserted links to the answers, i.e.,
math.stackexchange.com/a/xxx to the ‘relevant answers’ property of the query item
normal topics page.
4.3</p>
    </sec>
    <sec id="sec-9">
      <title>Populate Formula Answers (Task 2)</title>
      <p>The second part in our experimental pipeline was a manual run selection of the most
suitable formula per question or answer. The chosen formula was considered to answer
the given question as concise as possible. Thus, we did interpret Task 2 as having to
find formula answers to the question and only not similar formulae. We inserted the
extracted LaTeX string to the ‘defining formula ’ property, as a subproperty of ‘relevant
answers’ on the Wikidata item for formula topics.
4.4</p>
    </sec>
    <sec id="sec-10">
      <title>Preparing Data for Experiments and Submission</title>
      <p>After having populated our Wiki database, we used a SPARQL query (Fig. 2) to have
an overview of its content. The query fetches all Wikidata question items, displaying
their ‘topic-id’ (e.g. A.1 or B.1), ‘post-id’ (e.g., 3063081), and the formula LaTeX
string. With the list of normal and formula topic insertions, we performed a quality
check, correcting wrong or missing values.
The previously developed MOI sea rch engine [26] a llows us to sea rch mea ningful
ma thematical expressions by a given textua l sea rch query. This workflow ca n be used
to solve Ta sk 2, but it requires some substa ntia l upda tes. Essentia lly, Ta sk 2 requests
releva nt formula IDs for a given input formula ID. Ea ch formula ID is ma pped to the
corresponding post ID. Hence, we ca n ta ke the entire post of a formula ID a s the input
for our MOI sea rch engine. However, there a re two ma in problems with the existing
a pproa ch: (i) the MOI sea rch engine wa s developed a nd tested only to sea rch for k
eywords, thus, entering entire posts a t once ma y ha rm the a ccura cy, a nd (ii) every
retrieved MOI is by design a subexpression a nd, thus, ha s proba bly no designa ted formula
ID. To overcome these issues, we need to understa nd the current system. The MOI
sea rch system retrieves MOIs in two steps. The first step retrieves releva nt documents
from a n ela sticsea rch24 insta nce for the input query. Hence, we first indexed a ll
ARQMa th posts in ela sticsea rch. To index the content of ea ch post a ppropria tely, we
set up the sta nda rd English stemmer, stopword filtering, HTML strippin g (filters out
HTML ta gs but preserves the content of ea ch ta g), a nd ena ble ASCII folding (converts
a lpha betic, numeric, a nd symbolic cha ra cters to their ASCII equiva lence, e.g., ‘á ’ is
repla ced by ‘a ’). For the sea rch query, we used the sta nda rd ma tch que ry system but
boosted every ma thema tica l expression in the input. This tells ela sticsea rch to focus
more on the ma th expressions in a sea rch query , ra ther tha n the a ctual text. With this
setup, we overcome the mentioned issue (i) a nd ca n sea rch for releva n t posts by
entering a n entire content of a post. In the second step of the MOI sea rch engine, the engine
24 https://www.elastic.co
s( ,  ) ≔
,
disa ssembles a ll formula e in the retrieved documents a nd ca lcula tes the mBM25 score
[26] for ea ch of these subexpressions (MOI)
(
 + 1)IDF( )ITF( ,  )TF( ,  )</p>
      <p>AV GDL )</p>
      <p>TF( ′,  ) +  (1 −  + | |AVGc
max
 ′∈ | ( )
mBM25( ,  ) ≔ max s( ,  ),
 ∈
where mBM25( ,  ) is a modified version of the BM25 releva nce score [28] with  as
the entire ARQMa th corpus, IDF( ) is the inverse document frequency of the term  ,
TF( ,  ) the term frequency of the term  in the document  ∈  , ITF( ,  ) the inverse
term frequency (ca lcula ted the sa me wa y a s IDF( ) but on the document level for the
document  ), AVGDL the a vera ge document length of  a nd AVGC the a vera ge
complexity of  (see [26] for a more deta iled description). The top-scored expressions will
be returned. The mBM25 score requires the globa l term a nd document frequencies of
every subexpression. Hence, we first ca lcula ted these globa l va lues for every
subexpression of every formula in the ARQMa th da ta set. Table 1 shows the sta tistics of this
MOI da ta base in compa rison to the previously genera ted da ta bases for a rXiv and
zbMATH. A document in ARQMa th is a post from MSE. The da ta set only includes
Ma thML representa tions. The complexity of a formula is the ma ximum depth of the
Presenta tion Ma thML representa tion of the formula . As Table 1 shows, the ARQMa th
da ta base ca n be interpreted a s a hybrid between the full resea rch pa pers in a rXiv and
rela tively short review discussions in zbMATH (ma inly conta in ing reviews of ma
thema tica l a rticles).
Machine Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz - 4 Cores / 8 Threads
RAM 32GB 2133 MHz
Disk 1TB SSD
Required Diskspace 7.8 GB (Posts) + 3 GB (MOIs) = 10.8 GB</p>
      <p>Runtime 6.0 s / query (average over all queries)
Considering tha t every formula in the ARQMa th da ta set ha s its own ID a nd the system
needs to preserve the ID during computa tion, we need to a tta ch the ID to every
genera ted MOI. However, this would result in a ma ssive overloa d. For exa m ple, the single
identifier  a ppea rs 7.6 million times in ARQMa th a nd thus will ha ve millions of
different formula IDs. The entire ARQMa th da ta set ha s 16.8 million unique MOIs. Ha ndle
this number of different IDs is impra ctica l. Hence, we choose a different a pproach to
get the formula IDs for every MOI. Since the sea rch engine retrieves the releva nt
documents first, we only need to consider formula IDs tha t exist in these retrieved
documents. To a chieve this, we a tta ched the formula IDs to every post in the ela sticsea rch
da ta base ra ther tha n to the MOIs itself. A single document in ela sticsea rch now contains
the post ID, the textua l content, a nd a list of MOIs with loca l term frequencies (how
often the MOI a ppea rs in the corresponding post) a nd formula IDs. Note tha t most MOI
still ha s multiple formula IDs, since a subexpression ma y a ppea r multiple times in a
single post, but the number of different IDs reduced dra stica lly. Since the IDs a re now
a tta ched to ea ch post but a re not used in the sea rch query, the performa nce of retrieving
releva nt documents from ela sticsea rch sta ys the sa me. With this a pproa ch, we ma y ca
lcula te multiple but different mBM25 scores for a single formula ID, since a single
unique formula ID ca n be a tta ched to multiple MOIs. To ca lcula te the fina l score fora
formula ID, we ca lcula ted the a vera ge of a ll mBM25 scores for a formula ID. For
exa mple, consider we would retrieve the document with the ID 2759760. This post
conta ins the formula ID 25466124

 6
,
which would be disa ssembled into its subexpressions  ,  6, a nd  . Hence, we would
ca lcula te three mBM25 scores for  / 6. The a vera ge of these scores would be the score
for the formula ID.</p>
      <p>We used this upda ted MOI sea rch engine to retrieve results for Ta sk 2. Note t ha t the
a pproa ch might be a bit unorthodox, since the MOI sea rch engine ta kes the entire post
of the given formula ID ra ther tha n the formula ID a lone. We interpreted Ta sk 2 to
retrieve a nswer formula e for a given question formula , ra ther tha n retrieving v isua lly
or sema ntica lly simila r formula e. Ba sed on this interpreta tion, it ma kes sense to use the
entire post of a formula ID to sea rch for releva nt answers. In other words, we interpreted
Ta sk 2 a s a n extension a nd ma th specific version of Ta sk 1. In summa ry, the key steps
of the MOI sea rch engine to solve Ta sk 2 were the following:
1.
2.
3.
4.</p>
      <p>Ta ke the entire post of the given formula ID.</p>
      <p>Sea rch for posts simila r to the retrieved post in step 1.</p>
      <p>Extra ct a ll MOI from a ll retrieved posts in step 2.</p>
      <p>Ca lcula te mBM25 scores for a ll MOIs of step 3.</p>
      <p>Group the MOIs by their a ssocia ted formula IDs (every formula ID ha s now
multiple mBM25 scores).</p>
      <p>Avera ge the mBM25 scores for ea ch formula ID.</p>
      <p>For Ta sk 2, we retrieved 107,476 MOIs. We used the provided a nnota tion da taset to
eva lua te the retrieved results. For a better compa rison, we ca lcula ted the nDCG′p
(nDCG-prime) score, a s the ta sk orga nizers did [29]. Note the nDCG′p removes
unjudged documents before ca lcula ting the score. Since these were post-experiment ca
lcula tions, there is not much correla tion between the retrieved MOI documents a nd the
judged formula IDs. We found 179 formula IDs tha t were retrieved by our MOI engine
a nd conta ined a judgment by the a nnotators of the ARQMa th ta sk. Ba sed on these 179
judges, we retrieved a n nDCG′p va lue of 0.374, which is in the midra nge compa red to
the other competitors.
4.6</p>
    </sec>
    <sec id="sec-11">
      <title>Data Integration of Query and Pool Formulae</title>
      <p>We tested two other a pproa ches for Ta sk 2: Formula pool retrieva l via k -nea rest
neighbors a nd fuzzy string ma tching. For both methods, we first needed to integra te the pool
of formula e (the ta sk da ta set) with our query set, consistin g of the formula e, which we
‘ma nua lly’ chose from the ca ndida te a nswers to be a formula a nswer to the question
a sked.</p>
      <p>Data
integration
query &amp; pool
K-nearest
neighbors
retrieval</p>
      <p>Fuzzy string
candidates
retrieval
kNN
fuzzy
•Loa d TSV files for query a nd pool formulae
•Retri eve formula symbols (identifiers, operators) from mathml-tags
('ci ','mi','co','mo'), together with formula LaTeX s tring
•Integrate all formulae with IDs a nd s ave dictionary to a Python Pi ckle file
•Encode formula La TeX s trings via TF-IDF and Doc2Vec
•Retri eve distances a nd k-nearest formula ca ndidates via kNN algorithm
•Ca l culate pairwisefuzzy string partial ratios (matching percentage)
•Ra nk all percentages for each formula to identify cl osest candidates
The properties a re retrieved from the ta sk da ta set tsv files. For the identifiers a nd
opera tors list, the symbols a re retrieved from the Ma thML string. For the query formulae,
the sea rch ta gs a re '&lt;mi&gt;' a nd '&lt;mo&gt;', a nd for the pool formula e, '&lt;ci&gt;' a nd '&lt;co&gt;' for
identifiers a nd opera tors respectively. The formula La TeX string is retrieved from the
'alttext' a ttribute of the '&lt;math&gt;' ta g. Fina lly, the formula dictiona ry is seria lized to a
pickle file. It is utilized in the following steps (formula encoding, kNN a nd fuzzy string
simila rity retrieva l).</p>
    </sec>
    <sec id="sec-12">
      <title>Formula LaTeX String Encoding via TF-IDF and Doc2Vec</title>
      <p>Ha ving retrieved the La TeX formula from the Ma thML string, it is encoded by jointly
feeding its identifier a nd opera tor tokens (utf -8) into the TfidfVectorizer from the
Python pa cka ge Scikit-lea rn [24] a nd the Doc2Vec encoder from Gensim [25]. For the
TfidfVectorizer, a n ngra m ra nge of (1,1) is used. The Doc2Vec distributed ba g of words
(PV-DBOW) model is tra ined for 10 itera tions.
4.8</p>
    </sec>
    <sec id="sec-13">
      <title>Formula Pool Retrieval via K-Nearest-Neighbors</title>
      <p>The two different formula encodings vector spa ces a re subsequently fed into a Nea
restNeighbors a lgorithm from Scikit-lea rn. In Table 3, some illustra tive exa mples of the
top 3 results a re displa yed. In a ll ca ses, the retrieved formula e a re structura lly simila r,
sometimes equiva lent, sometimes even “visua lly” identica l. Ha ving genera ted the
formula encodings, the kNN method is very fa st compa red to cla ssica l text ma tching. The
vector computa tions ca n be ca rried out fa ster tha n text processing.
Apart from the NearestNeighbors prediction using TF-IDF and Doc2Vec encoded
LaTeX formula strings, we also tested a fuzzy string matching to retrieve similar formulae.
For each ‘manually’ selected query formula, we calculated the fuzzy partial ratio
similarity with all pool formulae and ranked them with descending overlap. The top 10 of
the candidates were then submitted. Compared to the kNN approach, the fuzzy string
search has the advantage of not requiring an encoding index. Thus new formula
instances can easily be added without requiring to retrain the vector encodings of the
whole corpus.
5</p>
      <sec id="sec-13-1">
        <title>Classification of Question and Answer Types</title>
        <p>To assess the relative relevance of the specific question, answer, and formula types, we
carried out a human multi-label classification for each set respectively. Our approach
was inductive, meaning that we did not specify the cla sses upfront but observed them
examining the questions, answers, and formulae as they occurred.
5.1</p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>Example Questions and Answers</title>
      <p>To illustrate our classification operation mode, we will first give some examples.
In question A.1, the user asks to find the value of a parameter contained within a
function, given an interval constraint. We classified this question with the label “calculate /
compute / find value”. Our manually selected answer25 for A.1 was labeled “numeric
value / fraction”, and “inequality”.</p>
      <p>In question A.50, the user asks whether a series containing a fraction of powers and a
trigonometric function converges or diverges. We classified this question with the
labels “power / exponential / logarithmic”, “trigonometry”, and “sequence / summation”.
Our manually selected formula for B.50,   ≤ − 
{  ( ′ ) + (1 +
3
 } { ′ { }}, was labeled “inequality” and “powers / exponentials / logarithms”.
25
https://math.stackexchange.com/questions/3062860/finding-value-of-c-such-that-the-rangeof-the-rational-function-fx-frac/3063081#3063081
 ) 
5.2</p>
    </sec>
    <sec id="sec-15">
      <title>Question Types</title>
      <p>Label</p>
      <p>Questions
We labeled the question types as shown in Table 4.
Complex
numbers
Parameter
Probability
The occurrence sta tistics of the individua l question types is shown in Fig. 5. Appa
rently, the ma jor pa rt of the questions involved “sets” of numbers. This is pa rtly ca used
by the set symbols for na tura l numbers ℕ or ra tiona l numbers ℚ a ppea ring frequently
in definitions tha t a re included in the question. The second-highest ra nked la bel is
“function”. This is not surprising considering tha t functions a re a hea vily used n otion
or concept in ma thematics. To obta in this la bel, it wa s sufficient tha t a function
identifier a ppea rs in the question. The third highest ra nked la bel is “solve equa tions – a
lgebra ic or differentia l”. In ma ny ca ses, provided enough informa tion, the qu estion ca n be
a nswered by using a computer a lgebra system (QAS) connected to the question a
nswering engine.
Cla ssifying the question subject cla sses, we see tha t a lmost a ll questions a re pure ma
thema tics, except A 33 is from the math-stackexchange-category physics. Employing
subject cla ss cla ssifica tions ca n help to redirect questions a nd reducing the a nswer
spa ce. Open-doma in QA systems ca n then be modula rized into distinct closed domain
pa rts tha t ha ndle different QA types differently. For exa mple, a geometry question such
a s “Wha t is the surfa ce a rea of a sphere?” ca n be pa rsed a nd a nswered differently than
a n a lgebra ic question such a s “How to solve 
+ 1 = 2?”. While the former could be
pa ssed to a da ta base conta ining properties of geometric objects, the la tter could be
pa ssed to a computer a lgebra system. On the other ha nd, physic s questions often rely
hea vily on the sema ntics of identifier na mes. As a n exa mple, the question “Wha t is the
rela tionship between ma ss a nd energy?” should yield formula e such a s 

=
½   2. Without ha ving a nnotated identifier na mes conta ined within the
formu=   2 or
la e, the question ca nnot be a nswered.
5.4</p>
    </sec>
    <sec id="sec-16">
      <title>Answer Types</title>
      <p>We la beled our ma nually retrieved a nswer types a s shown in Table 5.</p>
      <p>Label
Value / fraction
Probability
Binomial
Pow / exp / log
Interval
Set
Inequality
Differential
Integral
Trigonometry
Function
Algebraic
transformation
Vector / matrix
Logic
Modulus
Limes
Deriv
Cases
Complex numbers 24, 27, 29</p>
      <p>A 19, 21
A 29, 75, 95
A 33, 86
A 46
A 10, 13, 15, 18, 20, 22, 24, 26, 30, 41, 45, 49, 50, 51, 59, 69, 76, 94
The occurrence sta tistics of the individua l a nswer types is shown in Fig. 6. As for the
question types, “set” is still the most frequent la bel. However, “function” is here only
ra nked fourth. The la bel “a lgorithmic tra nsforma tion” is ra nked second. Some of the
tra nsforma tions ca n be done using computer a lgebra systems. Appa rently, the a nswer
a nd question ca tegories differ. This mea ns, for exa mple, tha t given a short question, the
potentia lly longer a nswer (proof or other) ca n involve more ca tegories.</p>
      <sec id="sec-16-1">
        <title>We la beled the formula types a s shown in Table 6.</title>
        <p>The occurrence sta tistics of the individua l formula types is shown in Fig. 7. Algebra ic
tra nsforma tions a nd functions a re still ra nked high. All in a ll, the most frequent
question, a nswer, a nd formula types involve sets, sequences, sums, powers, exponentia ls,
loga rithms, trigonometry functions, inequa lities, a nd a lgebra ic tra nsformations, or
equa tion solving. In the future, one could explore whether the question cla ssifica tion
la bel is enha ncing a nswer retrieva l.</p>
        <sec id="sec-16-1-1">
          <title>Discussion of Challenges</title>
          <p>Table 7 shows the results of our submission in the ARQMa th la b. For Ta sk 1, the
reported nDCG' score for our ma nua l run is outsta ndingly low. Hence, we tried to
investiga te the rea sons for this low score. We identified one critica l issue in our m a nual run.
We ha ve linked the posts from the ARQMa th da ta set wit h the rea l posts in MSE, which
ma kes it ea sier to cra wl for releva nt a nswers ma nua lly. However, this a pproa ch leads
to the problem tha t some of our reported a nswers do not exist in the ARQMa th da ta set.
Nonetheless, the nDCG' removes non-judged documents prior to eva lua tion. Hence,a
rela tively high number of a nswers tha t do not exist in the da ta set should not ha rm our
score dra ma tica lly. We ca n report a n nDCG' score of 0.504 for our submitted run. This
is significa ntly higher tha n the reported score by the ARQMa th result pa per [29]. We
ca lcula ted the nDCG’ score a s formula ted in [30] a nd [31]</p>
          <p>DCG′p
nDCG′p =</p>
          <p>IDCG′p
,
where</p>
          <p>p
DCG′p = ∑
 =1
|RELp|
IDCG′p =
∑
2reli − 1
log2( + 1)
2reli − 1
log2( + 1)
 =1
a nd reli is the given releva nce score for the  -th element, a nd RELp is the list of relevant
documents ordered by their releva nce up to position  . In other words, the nDCG′p score
is the DCG′p score divided by the DCG′p score for the idea l order of releva nt hits. The
nDCG′p is ca lcula ted for every query in the test set. The overa ll sco re is therefore ca
lcula ted a s the mea n va lue of nDCG′p over a ll queries.</p>
          <p>We identified two possible issues tha t could expla in the misma tch between our ca
lcula ted score a nd the reported one. The nDCG′p score is ca lcula ted for a fixed number 
of retrieved top hits. If  is la rger tha n the number of retrieved documents, it would
reduce the score. We a ssume tha t most contesta nts reported a list of rele va nt hits for
ea ch query. Since we performed a ma nual run, we only reported the a ctua l a nswer. This
mea ns, for our reported a nswers it only ma kes sense to set  = 1.</p>
          <p>Moreover, we did not report va lid a nswers for some queries (in ca se the a nswer ID did
not exist in the da ta set, we ha d no va lid a nswer in tota l for tha t pa rticula r query). If
these queries were considered when ca lcula ting the mea n nDCG′p over a ll queries, it
would a lso expla in a significa ntly lower score. The nDCG′p is designed to not ta king
unjudged documents into a ccount. Simila rly, it ma kes sense to ignore queries with no
returned a nswers when ca lcula ting the overa ll nDCG′p over a ll queries. Following these
rules, we ca lcula ted a n nDCG′p of 0.504 for our ma nual run. Table 10 in the Appendix
shows the results for our DCG′1 und IDCG′1 scores for a ll queries of Ta sk 1, for which
we retrieved a nswers in our ma nua l run a nd were ra nked by the ARQMa th reviewers.
The fina l a vera ge score for nDCG′1 is 0.504.</p>
          <p>
            In a ddition to the problema tic score ca lcula tion, we found incomprehensible releva nc e
scores on multiple occa sions. A possible rea son for this is the subjectiveness of
releva nce. While we found the reported a nswers highly releva nt, the a nnota tors provided a
releva nce score of 0. Table 8 summa rizes the identified problema tic a nnotations. In
five out of nine of these ca ses, our reported a nswers were ma rked a s correct by the
questioner a t MSE (la st column in Table 8) but a nnota ted a s non-releva nt by the
ARQMa th a nnotator. This seems to indica te tha t the releva nce scores for ARQMa th
ta sks 1 a nd 2 a re very subjective, even though the reported Ka ppa coefficient for
intera nnota tor a greement wa s rea sona bly high with a round 0.34.
2146297
311354
893752
0
0
0
Yes
Yes
No
In the process of ma nual a nnota tion a nd a nswer retrieva l, we noticed severa l cha llenges
for IR systems. First, the question a nd a nswer fea tures a re obviously very
heterogeneous da ta types (text a nd formula e). It rema ins to be explored how to combine both in a
suita ble wa y. Recent studies [32] investiga ted the impa ct of different encoding
combina tions on the cla ssifica tion a ccura cy a nd cluster purity on the NTCIR-11/12 a rXiv
da ta set [33]. They ca lled out for a “formula encoding cha llenge” to exploit the formula
informa tion for ma chine lea rning ta sks. A successful encoding should, e.g., improve
the text cla ssifica tion a ccura cy. The a im is motiva ted by the observa tion tha t there is
little correla tion between text a nd formula simila rity, a t lea st using the cosine mea sure
on tf-idf a nd doc2vec encodings. We need to somehow connect text a nd ma th, such that
there is a synergy between their sema ntics. In the ca se of the ma thematical question
a nswering ta sk, this could be a chieved by tra nsforming the ma thematical formu la
elements to textua l entities. Consider for exa mple the ARQ Ta sk question A.29. The
question a sks for a recipe to divide complex numbers by infinity (title: “Dividing Complex
Numbers by Infinity”). For this question, we ma nua lly retrieved the formula  +∞ = 0
from the a nswer tha t wa s selected by the questioner on MSE. One wa y to connect the
question to possible a nswer formula e would be to a nnotate both textual elements. Table
9 shows how linking to items of the sema ntic knowledge-ba se Wikida ta 8 [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ], [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ] can
provide a connection via the joint QIDs Q1226939, Q11567, a nd Q205. A joint sema
ntic vector representa tion of both the title text a nd the formula could then be a conca
tena tion of the Wikida ta item embeddings, a s proposed in [34].
          </p>
          <p>Question text annotation
“Dividing”: “division” (Q1226939)
“Adding”: “addition” (Q32043)
N/A
N/A
N/A
“Infinity”: “infinity” (Q205)</p>
          <p>
            Formula answer annotation
 + : “division”(Q1226939)
∞
 +  : “addition” (Q32043)
 +  : “complex number” (Q11567)
 : “real number”(Q12916)
 : “complex number” (Q9165172)
∞: “infinity” (Q205)
This exa mple illustra tes how linking Formula Concepts [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ], [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] ca n be very
beneficia l for ma thematica l question a nswering (on MSE, a rXiv, Wikipedia , etc.). However,
this requires the sema ntic a nnotation of textua l a nd formula elements, which ca n be
done, e.g., using the “AnnoMa thTeX”26 system [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ] hosted by Wikimedia . In the
future, we should be a ble to a utoma tica lly link text a nd formula entities to Wikida ta items
a nd Wikipedia a rticles. It rema ins a cha llenging problem for ma thematical formula
entity linking to exha ustively a nd una mbiguously identify the importa nt semantic pa rts of
a formula . In the future, a nnotation guidelines should be developed to ta ckle this
problem.
For Ta sk 2, we used the MOI sea rch engine to retrieve releva nt ma thematica l
expressions from the da ta set. Since the MOI engine does not ha ndle entire ma thematica l
expressions by itself but disa ssemble formula e into their subexpressions, the concept of
linking retrieved MOIs ba ck to a formula ID wa s cha llenging. Furthermore, the a
pproa ch we used to ca lcula te the form ula ID of a n MOI ha s some dra wba cks. First, the
MOI engine retrieves releva nt documents from ela sticsea rch with a textua l sea rch
query. In the second step, the MOIs a re scored ba sed on the retrieved documents. Thus,
the retrieved MOIs (a nd the corresponding formula IDs) a re a s good a s the retrieved
documents in the first ta sk. When the retrieved documents a re not releva nt, none of the
retrieved MOIs ca n be releva nt. Hence, the sea rch results a re quite sensitive to the
settings tha t were used to retrieve releva nt documents. Nonetheless, the a pproa ch
performed rea sona bly well compa red to the results of other competitors with a n nDCG′p
score of 0.374.
7
          </p>
        </sec>
        <sec id="sec-16-1-2">
          <title>Outlook and Future Work</title>
          <p>
            We a re excited to employ our a pproa ches a nd the a pproa ches of other ta sk pa rticipants
to retrieve releva nt formula e on zbMATH da ta sets. However, a s discussed before, we
a re uncerta in if the computed performance numbers a re a suita ble indica t or to predict
the usefulness of the a pproa ches to zbMATH users. We will, therefore, consider
suggesting a ma thematical litera ture retrieva l ta sk in the future. However, a s a prerequisite ,
we see the need to resea rch ma th specific deterministic eva lua tion m etrics tha t eliminate
ta sk-specific huma n a nnotators in the loop. In contra st, we believe tha t obj ective
verifia ble or a lmost prova ble sema ntic enha ncement techniques ca n significa ntly benefit
from a huma n review. While releva nt (to a n informa tion need) is not yet a well-esta
blished term a mong working ma thematicia ns, definitions, equiva lences, exa mples,
substitutions, theorems a nd proves a re well esta blished. While forma l ma thematics is not
(yet) a ble to a utomatica lly ma p mathematical na med entities to forma l concepts,
working ma thematicia ns a re genera lly a ble to crea te such a ma pping with a very high
interreviewer a greement. Therefore, we a im to explore how employing our “AnnoMa
thTeX” formula a nnota tion recommender system [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ] on MSE questions a nd a nswers
ca n promote a nswer retrieva l.
26 annomathtex.wmflabs.org
To summarize the marginal results from our contribution, the kNN method can be
employed as a fast search engine, provided formulae are indexed as vector encodings. The
fuzzy string search is slower but has the advantage that no index is needed. As for MOI,
the retrieved results are less strictly tied to existing expressions since it considers all
subexpressions in an entire dataset. This helps to extract meaningful expressions rather
than exact matches.
8
          </p>
        </sec>
        <sec id="sec-16-1-3">
          <title>Acknowledgments</title>
          <p>This work was supported by the German Research Foundation (DFG grant GI -1259-1).
Joint Conference on Digital Libraries, Fort Worth Texas USA, May 2018, pp.
233–242, doi: 10.1145/3197026.3197058.
[23] Q. Le and T. Mikolov, “Distributed Representations of Sentences and
Documents,” Proceedings of the ICML Conference 2014, p. 9.
[24] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” MACHINE</p>
          <p>LEARNING IN PYTHON, p. 6.
[25] R. Řehůřek and P. Sojka, Software Framework for Topic Modelling with Large</p>
          <p>Corpora. University of Malta, 2010.
[26] A. Greiner-Petter et al., “Discovering Mathematical Objects of Interest—A Study
of Mathematical Notations,” in Proceedings of The Web Conference 2020, Taipei
Taiwan, Apr. 2020, pp. 1445–1456, doi: 10.1145/3366423.3380218.
[27] M. Schubotz and G. Wicke, “Mathoid: Robust, Scalable, Fast and Accessible Math
Rendering for Wikipedia,” in Intelligent Computer Mathematics - International
Conference, CICM 2014, Coimbra, Portugal, July 7-11, 2014. Proceedings, 2014,
vol. 8543, pp. 224–235, doi: 10/ggv8pz.
[28] S. Robertson and H. Zaragoza, “The Probabilistic Relevance Framework: BM25
and Beyond,” Found. Trends Inf. Retr., vol. 3, no. 4, pp. 333–389, Apr. 2009, doi:
10.1561/1500000019.
[29] R. Zanibbi, D. W. Oard, A. Agarwal, and B. Mansouri, “Overview of ARQMath
2020: CLEF Lab on Answer Retrieval for Questions on Math,” p. 25.
[30] K. Järvelin and J. Kekäläinen, “Cumulated gain-based evaluation of IR
techniques,” ACM Trans. Inf. Syst., vol. 20, no. 4, pp. 422–446, Oct. 2002, doi:
10.1145/582415.582418.
[31] C. Burges et al., “Learning to rank using gradient descent,” in Proceedings of the
22nd international conference on Machine learning, Bonn, Germany, Aug. 2005,
pp. 89–96, doi: 10.1145/1102351.1102363.
[32] P. Scharpf, M. Schubotz, A. Youssef, F. Hamborg, N. Meuschke, and B. Gipp,
“Classification and Clustering of arXiv Documents, Sections, and Abstracts,
Comparing Encodings of Natural and Mathematical Language,” Proceedings of the
JCDL Conference 2020, May 2020, doi: 10.1145/3383583.3398529.
[33] R. Zanibbi, A. Aizawa, and M. Kohlhase, “NTCIR-12 MathIR Task Overview,”
Proceedings of the 12th NTCIR Conference on Evaluation of Information Access
Technologies 2016, p. 10.
[34] A. Lerer et al., “PyTorch-BigGraph: A Large-scale Graph Embedding System,”
Proceedings of the MLSys Conference 2019, Apr. 2019, Accessed: Jul. 16, 2020.
[Online]. Available: http://arxiv.org/abs/1903.12287.</p>
        </sec>
        <sec id="sec-16-1-4">
          <title>Appendix</title>
          <p>retrieved answers in our manual run and were ranked by the ARQMath reviewers. The final
average nDCG1′ score is 0.504. The metrics rel_1 and REL_1 refer to the formulae in Section 6
Relevance</p>
          <p>in   
Best Relevance 
′ 
on page 19.</p>
          <p>Topic ID
A.12
A.13
A.14
A.16
A.17
A.19
A.20
A.21
A.30
A.35
A.37
A.41
A.42
A.45
A.47
A.50
A.52
A.54
A.56
A.59
A.60
A.62
A.63
A.67
A.68
A.69
A.74
A.75
A.85
A.93</p>
          <p>Post ID
44410
1115317
2248783
408304
5322
1348396
23977
65456
2721623
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
3
3
3
3
3
3
0.43
0.43</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Müller</surname>
          </string-name>
          and
          <string-name>
            <given-names>O.</given-names>
            <surname>Teschke</surname>
          </string-name>
          , “
          <article-title>Full text formula search in zbMATH,”</article-title>
          <source>Eur. Math. Soc. Newsl</source>
          , vol.
          <volume>102</volume>
          , p.
          <fpage>51</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mansouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Oard</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Zanibbi</surname>
          </string-name>
          , “Finding Old Answers to New Math Questions: The
          <source>ARQMath Lab at CLEF</source>
          <year>2020</year>
          ,” in
          <source>Advances in Information Retrieval</source>
          , vol.
          <volume>12036</volume>
          ,
          <string-name>
            <surname>J. M. Jose</surname>
            , E. Yilmaz,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Magalhães</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Castells</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          <string-name>
            <surname>Silva</surname>
            , and
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Martins</surname>
          </string-name>
          , Eds. Cham: Springer International Publishin g,
          <year>2020</year>
          , pp.
          <fpage>564</fpage>
          -
          <lpage>571</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mansouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zanibbi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Oard</surname>
          </string-name>
          , “
          <source>Characterizing Searches for Mathematical Concepts</source>
          ,
          <source>” in 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL)</source>
          , Champaign, IL, USA, Jun.
          <year>2019</year>
          , pp.
          <fpage>57</fpage>
          -
          <lpage>66</lpage>
          , doi: 10.1109/JCDL.
          <year>2019</year>
          .
          <volume>00019</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Karbasian</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Johri</surname>
          </string-name>
          , “
          <article-title>Insights for Curriculum Development: Identifying Emerging Data Science Topics through Analysis of Q&amp;A Communities,”</article-title>
          <source>in Proceedings of the 51st ACM Technical Symposium on Computer Science Education, Portland OR USA</source>
          ,
          <year>Feb</year>
          .
          <year>2020</year>
          , pp.
          <fpage>192</fpage>
          -
          <lpage>198</lpage>
          , doi: 10.1145/3328778.3366817.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N. W.</given-names>
            <surname>Smith</surname>
          </string-name>
          , “
          <article-title>A Question-Answering System for Elementary Mathematics</article-title>
          ,” Apr.
          <year>1974</year>
          , Accessed: Jun.
          <volume>22</volume>
          ,
          <year>2020</year>
          . [Online]. Available: https://eric.ed.gov/?id=
          <fpage>ED093703</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T. T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Hui</surname>
          </string-name>
          , “
          <article-title>A math-aware search engine for math question answering system</article-title>
          ,”
          <source>in Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12</source>
          ,
          <string-name>
            <surname>Maui</surname>
          </string-name>
          , Hawaii, USA,
          <year>2012</year>
          , p.
          <volume>724</volume>
          ,
          <issue>doi</issue>
          : 10.1145/2396761.2396854.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          , “
          <article-title>A Survey of Question Answering for Math and</article-title>
          Science Problem,” Computing Research Repository (CoRR),
          <source>May</source>
          <year>2017</year>
          , Accessed: Jun.
          <volume>08</volume>
          ,
          <year>2020</year>
          . [Online]. Available: http://arxiv.org/abs/1705.04530.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. A. S.</given-names>
            <surname>Gunawan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. R.</given-names>
            <surname>Mulyono</surname>
          </string-name>
          , and W. Budiharto, “
          <article-title>Indonesian Question Answering System for Solving Arithmetic Word Problems on Intelligent Humanoid Robot,” Procedia Computer Science</article-title>
          , vol.
          <volume>135</volume>
          , pp.
          <fpage>719</fpage>
          -
          <lpage>726</lpage>
          ,
          <year>2018</year>
          , doi: 10.1016/j.procs.
          <year>2018</year>
          .
          <volume>08</volume>
          .213.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schubotz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Scharpf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dudhat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Nagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hamborg</surname>
          </string-name>
          , and B. Gip p, “Introducing MathQA --
          <source>A Math-Aware Question Answering System,” Information Discovery and Delivery</source>
          , vol.
          <volume>46</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>214</fpage>
          -
          <lpage>224</lpage>
          , Nov.
          <year>2018</year>
          , doi: 10.1108/IDD-06
          <string-name>
            <surname>-</surname>
          </string-name>
          2018-0022.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hopkins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Le</given-names>
            <surname>Bras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Petrescu-Prahova</surname>
          </string-name>
          , G. Stanovsky,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hajishirzi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Koncel-Kedziorski</surname>
          </string-name>
          , “
          <fpage>SemEval</fpage>
          -2019
          <source>Task 10: Math Question Answering,” in Proceedings of the 13th International Workshop on Semantic Evaluation</source>
          , Minneapolis, Minnesota, USA,
          <year>2019</year>
          , pp.
          <fpage>893</fpage>
          -
          <lpage>899</lpage>
          , doi: 10.18653/v1/
          <fpage>S19</fpage>
          -2153.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Pineau</surname>
          </string-name>
          , “Math-Aware Search Engines: Physics Applications and Overview,” Computing Research Repository (CoRR),
          <source>Sep</source>
          .
          <year>2016</year>
          , Accessed: Jun.
          <volume>21</volume>
          ,
          <year>2020</year>
          . [Online]. Available: http://arxiv.org/abs/1609.03457.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Abdi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Idris</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          , “
          <article-title>QAPD: an ontology-based question answering system in the physics domain,” Soft Comput</article-title>
          , vol.
          <volume>22</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>213</fpage>
          -
          <lpage>230</lpage>
          , Jan.
          <year>2018</year>
          , doi: 10.1007/s00500-016-2328-2.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Suzuki</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Fujii</surname>
          </string-name>
          , “
          <source>Mathematical Document Categorization with Structure of Mathematical Expressions,” in 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL)</source>
          , Toronto, ON, Canada, Jun.
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          , doi: 10.1109/JCDL.
          <year>2017</year>
          .
          <volume>7991566</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schubotz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Scharpf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Teschke</surname>
          </string-name>
          , A. K. uhnemund, C. Breitinger, and
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          , “
          <source>AutoMSC: Automatic Assignment of Mathematics Subject Classification Labels,” Proceedings of the CICM Conference</source>
          <year>2020</year>
          , May
          <year>2020</year>
          , Accessed: Jun.
          <volume>21</volume>
          ,
          <year>2020</year>
          . [Online]. Available: http://arxiv.org/abs/
          <year>2005</year>
          .12099.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ko</surname>
          </string-name>
          , “
          <article-title>Mathematical Formula Search using Natural Language Queries,” AECE</article-title>
          , vol.
          <volume>14</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>99</fpage>
          -
          <lpage>104</lpage>
          ,
          <year>2014</year>
          , doi: 10.4316/AECE.
          <year>2014</year>
          .
          <volume>04015</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dmello</surname>
          </string-name>
          , “
          <article-title>Representing Mathematical Concepts Associated With Formulas Using Math Entity Cards,” Rochester Institute of Technology (RIT) Scholar Works</article-title>
          , p.
          <fpage>167</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P.</given-names>
            <surname>Scharpf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schubotz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Cohl</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          , “
          <article-title>Towards Formula Concept Discovery and Recognition,”</article-title>
          <source>Proceedings of the 4th BIRNDL Workshop at the 42nd ACM SIGIR Conference</source>
          <year>2019</year>
          , p.
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Dumitru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ginev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kohlhase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Merticariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mirea</surname>
          </string-name>
          , and T. Wiesing, “
          <article-title>System Description: KAT an Annotation Tool for STEM Documents,”</article-title>
          <source>Proceedings of the CICM Conference</source>
          <year>2016</year>
          , p.
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>P.</given-names>
            <surname>Scharpf</surname>
          </string-name>
          , I. Mackerracher,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schubotz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Beel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Breitinger</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          , “
          <article-title>AnnoMathTeX - a formula identifier annotation recommender system for STEM documents</article-title>
          ,”
          <source>in Proceedings of the 13th ACM Conference on Recommender Systems</source>
          , Copenhagen Denmark, Sep.
          <year>2019</year>
          , pp.
          <fpage>532</fpage>
          -
          <lpage>533</lpage>
          , doi: 10.1145/3298689.3347042.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>P.</given-names>
            <surname>Scharpf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schubotz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          , “
          <article-title>Representing Mathematical Formulae in Content MathML using Wikidata,”</article-title>
          <source>Proceedings of the 3th BIRNDL Workshop at the 41st ACM SIGIR Conference</source>
          <year>2018</year>
          , p.
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schubotz</surname>
          </string-name>
          , “
          <article-title>Generating OpenMath Content Dictionaries from Wikidata,”</article-title>
          <source>Proceedings of the CICM Conference</source>
          <year>2018</year>
          , p.
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schubotz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Greiner-Petter</surname>
          </string-name>
          , P. Scha rpf, N. Meuschke,
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Cohl</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          , “
          <article-title>Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context,”</article-title>
          <source>in Proceedings of the 18th ACM/IEEE on</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>