-

CLEF

ARQMath Lab: An Incubator for Semantic Formula Search in zbMATH Open?

Philipp Scha rpf

Moritz Schubotz

0 2

André Greiner-Petter

andre.greiner-petter@zbmath.org 2

Ma lte Ostendorff

Ola f Teschke

a nd Bela Gipp

2 0 FIZ Karlsruhe , Karlsruhe , Germany 1 University of Konstanz , Konstanz , Germany 2 University of Wuppertal , Wuppertal , Germany

2020

22 22 25

The zbMATH database contains more than 4 million bibliographic entries. We aim to provide easy access to these entries. Therefore, we maintain different index structures, including a formula index. To optimize the findability of the entries in our database, we continuously investigate new approaches to satisfy the information needs of our users. We believe that the findings from the ARQMath evaluation will generate new insights into which index structures are most suitable to satisfy mathematical information needs. Search engines, recommender systems, plagiarism checking software, and many other added-value services acting on databases such as the arXiv and zbMATH need to combine natural and formula language. One initial approach to address this challenge is to enrich the mostly unstructured document data via Entity Linking. The ARQMath Task at CLEF 2020 aims to tackle the problem of linking newly posted questions from Math Stack Exchange (MSE) to existing ones that were already answered by the community. To deeply understand MSE information needs, answer-, and formula types, we performed manual runs for tasks 1 and 2. Furthermore, we explored several formula retrieval methods: For task 2, such as fuzzy string search, k-nearest neighbors, and our recently introduced approach to retrieve Mathematical Objects of Interest (MOI) with textual search queries. The task results show that neither our automated methods nor our manual runs archived good scores in the competition. However, the perceived quality of the hits returned by the MOI search particularly motivates us to conduct further research about MOI.

Information Retrieval Mathematical Information Retrieval Question Answering Semantic Search Machine Learning Mathematical Objects of Interest ARQMath Lab

In 2013 the first prototype of formula -sea rch in zbMATH wa s a nnounced [ 1 ], which beca me a n integra l pa rt of the zbMATH interfa ce by now. At the beginning of 2021, Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution zbMATH will tra nsform its business model from a subscription-ba sed service to a publicly funded open service. In this context, we eva lua te novel a pproa ches to include ma thematical formula e a s first-cla ss citizens in our ma thematica l informa tion retrieval infra structure. Despite the sta nda rd sea rch tha t ta rgets a bstract, review, a nd publica tion meta -da ta, zbMATH a lso tra ces incoming links from the Question Answering pla tform Ma thOverflow a nd provides ba cklinks from scientific a rticles to Ma thOverflow links, mentioning the publica tion [ 1 ]. We hypothesize tha t federa ting inform a tion from zbMATH a nd Ma thOverflow will enha nce the zbMATH sea rch experience significa ntly. The ARQMa th La b a t CLEF 2020 a ims to ta ckle the problem of linking newly posted questions from Ma th Sta ck Excha nge to existing ones tha t we re a lrea dy a nswered by the community [ 2 ]. Using question postings from a test collection (extra cted by the ARQMa th orga nizers from a n MSE Internet Archive Sna pshot1 until 2018) as queries, the goa l is to retrieve releva nt a nswer posts, conta ining both text a nd a t least one formula . The test collection crea ted for the ta sk is intended to be used by resea rchers a s a benchma rk for ma thematica l retrieva l ta sk s tha t involve both na tura l a nd mathema tica l la ngua ge. The ARQMa th La b consists o f two sepa ra te subta sks. Ta sk 1 – Answer poses the cha llenge to retrieve releva nt community a nswer post given a question from Ma th Sta ck Excha nge (MSE). Ta sk 2 – Formula s poses the cha llenge to retrieve releva nt formula s from question a nd a nswer posts. Specifica lly, the a im of Ta sk 1 is to be a ble to find old a nswers to new questions to speed up the community a nswer process. The a im of Ta sk 2 is to find a ra nked list of releva nt formula e in old questions and a nswers to ma tch to a query formula from the new question. This ta sk design seems to a good fit for our resea rch interest, since the informa tion needs a re rela ted. Moreover, Ma thOverflow a nd ma th.stackexchange use the sa me da ta -format, which ena bles us to reuse softwa re developed during this competition a nd to tra nsform it into production softwa re la ter on. On the other ha nd, the ma thematical level of questions on Ma th Stack Excha nge is less sophistica ted a nd thus not a ll releva nt ra nkings might be suita ble for our use-ca se. 1.1

ARQMath Lab

The ARQMa th la b wa s motiva ted by the fa ct tha t Ma nsouri et a l. discovered “tha t 20% of the ma thema tica l queries in genera l-purpose sea rch engines were expressed a s wellformed questions” [ 2 ], [ 3 ]. Furthermore, with the increa sing public interest in Community Question Answering sites such a s MSE2 a nd Ma thOverflow3, it will be beneficial to develop computa tional methods to support human a nswerers. Pa rticula rly, the “timeto-a nswer” should be shortened by linking to rela ted a nswers a lrea dy provided on the pla tform, which ca n potentia lly lea d to the a nswer more quickly. This will be of grea t help since most of the time the question is urgent a nd rela ted – sometimes even directly exa ct – existing a nswers a re a va ila ble. However, the ta sk is cha llenging beca use both 1 https://archive.org/download/stackexchange 2 https://math.stackexchange.com 3 https://mathoverflow.net questions and answers can be a combination of natural and mathematical language, involving words and formulae. ARQMath lab at CLEF 2020 will be the first in a threeyear sequence through which the organizers “aim to push the state of the art in evaluation design for math-aware IR” [ 2 ]. The task starts with the domain of mathematics involving formula language. The goal is to later extend the task to other domains (e.g., chemistry or biology), which employ other types of special notation. 1.2

Math Stack Exchange

Stack Exchange is an online platform with a host of Q&A forums [ 4 ]. The Stack Exchange network consists of 177 Q&A communities including Stack Overflow, which claims to be “the largest, most trusted online community for developers to learn and share their knowledge”2. The different topic sites include Q&A on com puter issues, math, physics, photography, etc. Users can rank questions and answers by voting them up or down according to their quality assessment. Stack Exchange provides its content publicly available in XML format under the Creative Commons license [ 4 ]. The Math Stack Exchange collection for the ARQ lab tasks comprises Q&A postings extracted from data dumps from the Internet Archive4. Currently, over 1 million questions are included [ 2 ]. 2 2.1

Related Work Mathematical Question Answering

Already in 1974, Smith [ 5 ] describes a project investigating the understanding of natural language by computers. He develops a theoretical model of natural language processing (NLP) and algorithmically implements his theory. Specifically, he chooses the domain of elementary mathematics to construct a Q&A system for unrestrict ed natural language input. However, for some time later, there was little interest and progress in the field of mathematical question answering. In 2012, Nguyen et al. [ 6 ] present a mathaware search engine for a math question answering system . Their system handles both textual keywords as well as mathematical expressions. The math feature extraction is designed to encode the semantics of math expressions via a Finite State Machine model. They tested their approach against three classical information retrieval strategies on math documents crawled from Math Overflow, claiming to outperform them by more than 9%. In 2017, Bhattacharya et al. [ 7 ] publish a survey of question answering for math and science problems. They explore the current achievements towards the goal of making computers smart enough to pass math and science tests. They conclude claiming that “the smartest AI could not pass high school”. In 2018, Gunawan et al. [ 8 ] present an Indonesian question answering system for solving arithmetic word problems using pattern matching. Their approach is integrated into a physical humanoid robot. For auditive communication with the robot, the user’s Indonesian question must be

4 https://archive.org

translated into English text. They employ NLP usin g the NLTK toolkit5, specifically co-referencing, question parsing, and preprocessing. They conclude claiming that the Q&A system achieves an accuracy between 80% and 100%. However, they state that the response time is rather slow with average about more than one minute. Also in 2018, Schubotz et a l. [ 9 ] present MathQA6, an open-source math-aware question a nswering system based on Ask Platypus7. The system returns as a single mathematical formula for a natural language question in English or Hindi. The formulae are fetched from the open knowledge-base Wikidata 8. With numeric values for constants loaded from Wikidata, the user can do computations using the retrieved formula. It is claimed that the system outperforms a popular computational mathematical k nowledge-engine by 13%. In 2019, Hopkins et al. [ 10 ] report on the SemEva l 2019 task on math question answering. The derived a question set from Math SAT practice exams, including 2778 training questions and 1082 test questions. According to their study, the top system correctly answered 45% of the test questions, with a random guessing baseline at 17%. Beyond the domain of math Q&A, Pineau [ 11 ] and Abdi et al. [ 12 ] present first approaches to answer questions on physics. 2.2

Mathematical Document Subject Class Classification

For open-domain question redirection, it is beneficial to classify a given mathematical question by its domain, e.g. geometry, calculus, set theory, physics, etc. There have been several approaches to perform categorization or subject class classification for mathematical documents. In 2017, Suzuki and Fujii [ 13 ] test classification methods on collections built from MathOverflow9 and the arXiv10 paper preprint repository. The user tags include both keywords for math concepts and categories form the Mathematical Subject Classification (MSC) 201011 top and second-level subjects. In 2020, Scharpf et al. [ 9 ] investigate how combining encodings of natural and mathematical language affect the classification and clustering of documents with ma thematical content. They employ sets of documents, sections, and abstracts from the arXiv 10, labeled by their subject class (mathematics, computer science, physics, etc.) to compare different encodings of text and formulae and evaluate the performance and runtimes of selected classification and clustering algorithms. Also in 2020, Schubotz et al. [ 14 ] explore whether it is feasible to automatically assign a coarse-grained prima ry classification using the MSC scheme using multi-class classification algorithms. They claim to achieve a precision of 81% for the autom atic article classification. We conclude that for math Q&A systems, the classification needs to be performed at the sentence level. If MSE questions contain several sentences, the problem could potentially also be framed as an abstract classification problem. 5 https://www.nltk.org 6 http://mathqa.wmflabs.org 7 https://askplatyp.us 8 https://www.wikidata.org 9 https://mathoverflow.net 10 https://arxiv.org 11 http://msc2010.org 2.3

Connecting Natural and Mathematical Language

For ma thematica l question a nswering, ma thematical informa tion needs to be connected to na tura l la ngua ge queries. Ya ng & Ko [ 15 ] present a sea rch engine for formula e in Ma thML12 using a pla in word query. Ma nsouri et a l. [ 3 ] investiga te how queries for ma thematical concepts a re performed in sea rch engines. They conclude “tha t math sea rch sessions a re typica lly longer a nd less successful tha n genera l sea rch sessions”. For non-ma thematical queries, sea rch engines like Google13 or DuckDuckGo14 a lready provide entity ca rds with a short encyclopedic description of the sea rched concept [ 16 ]. For ma thematical concepts, however, there is a n urgent need to connect a na tura l la ngua ge query to a formula representing the keyword. Dmello [ 16 ] proposes integra ting entity ca rds into the ma th-a wa re sea rch interfa ce Ma thSeer15. Scha rpf et a l. [ 17 ] propose a Formula Concept Retrieva l cha llenge for Formula Concept Discovery (FCD) a nd Formula Concept Recognition (FCR) ta sks. They present first ma chine lea rning ba sed a pproa ches for retrieving formula concep ts from the NTCIR 11/12 a rXiv da ta set16. 2.4

Semantic Annotations

To connect ma thematica l formula e a nd symbols to na tura l la ngua ge keywords, sema ntic a nnota tions a re a n effective mea ns. So fa r there a re only a few a nnota tion systems a va ila ble for ma thematical documents. Dumitru et a l. [ 18 ] present a browser-ba sed a nnota tion tool (“KAT system”) for linguistic/sema ntic a nnotations in structured (XHTML5) documents. Scha rpf et a l. [ 19 ] present “AnnoMa thTeX”, a recommender system for formula a nd identifier a nnota tion of Wikipedia a rticles using Wikidata 17 QID item ta gs. The a nnotations ca n be integra ted into the Ma thML ma rkup using Ma thML Wikida ta Content Dictiona ries18 [ 20 ], [ 21 ], [ 22 ]. 3

Summary of Our Approach

We ta ckle the ARQMa th la b ta sks (Ta sk 1 – a nswer retrieva l, Ta sk 2 – formula retrieva l) using ma nua l run selection benchma rking. Therefore, we crea te, p opula te, and employ a Wiki19 with pa ges for norma l (Ta sk 1) a nd formula (Ta sk 2) topics. The main objective of our experiments wa s to explore methods to ena ble a utomatic a nswer a ssignment recommendations to question postings on Ma thematics Sta ck Excha nge (MSE). We tested the following a pproa ches or methods: 1) ma nual run a nnotation using 12 https://www.w3.org/TR/MathML3 13 https://www.google.com 14 https://duckduckgo.com 15 https://www.cs.rit.edu/~dprl/mathseer 16 http://ntcir-math.nii.ac.jp 17 https://www.wikidata.org 18 https://www.openmath.org 19 https://arq20.formulasearchengine.com Google a nd MSE sea rch, 2) formula TF-IDF or Doc2vec20 encodings [23] using the Python libra ries Scikit-lea rn21 [24] a nd Gensim22 [25], 3) fuzzy string compa rison or ma tching using ra pidfuzz23, 4) k-nea rest neighbors a lgorithm, a nd 5) discovering of Ma thematica l Objects of Interest (MOI) with textua l sea rch queries [26]. As result, we obta ined a releva nt MSE a nswer(s) ID for ea ch query in the sa mple of Ta sk 1, a nd a ra nked list of most releva nt formula e for ea ch query in the sa mple of Task 2 (if a va ila ble). Fina lly, we a na lyzed our results using a ma nua l consistency a nd quality check. 4

Workflow of Our Approach

The workflow of our a pproa ch is illustra ted in Fig. 1. It ca n be logica lly divided into three sta ges: 1) the crea tion of a Wiki with pa ges for norma l a nd formula topics, 2) methods to ta ckle Ta sk 1, a nd 3) methods to ta ckle Ta sk 2.

Wiki Task 1 Task 2 •Retri eval of URLs using Google a nd MSE search •Crea ti on of Wi ki at a rq20.formulasearchengine.com •Crea ti on of Wi ki pages for normal a nd formula topics •Ins ert links to math.stackexchange.com/questions/xxx on Wikipedia page •Ma nual run selection of the most s uitable answer •Ins ert links to https://math.stackexchange.com/a/xxx a s “relevant a nswers” property on Wikidata i tem for normal topics •Ma nual run selection of the most s uitable formula(e) •La TeX s tring as “defining formula” property a s subproperty of “relevant a ns wers” on Wikidata item for formula topics The initia l prepa ra tion step for our a pproa ch to ta ckle Ta sk 1 a nd 2 wa s to crea te, popula te, a nd employ a Media Wiki environment connected to a m a thoid [27] rendering 20 Also known as “Paragraph Vectors”, as introduced in [23]. 21 https://scikit-learn.org 22 https://radimrehurek.com/gensim 23 https://github.com/maxbachmann/rapidfuzz service with pages for normal and formula topics. For each query, there is a Wikibase item with the following properties: ‘math-stackexchange-category’ (P10), ‘topic-id’ (P12), ‘post-type’ (P9), ‘math stackexcange post id’ (P5), and ‘relevant answers’ (P14). Having set up the Wiki, we manually retrieved the question URLs using Google and MSE search and inserted them as values for the ‘math stackexchange post id’ on the respective question pages. Unfortunately by doing so some post 2019 new post-ids were entered because we did not check the date carefully enough. The ‘math-stackexchangecategory’ values were automatically retrieved from the question tags. The ‘topic-id’ (e.g., A.50) was transferred from the task dataset, the ‘post-type’ set to “Question”. Unfortunately, as we discovered later, the use of Google and MSE search led to results outside the task dataset. This means that the answer that was accepted as the best answer by the questioner was often not included in the task da taset. However, our aim was to establish the “correct” answer as semantic reference in our MediaWiki. 4.2

Populate Topic Answers (Task 1)

The first part in our experimental pipeline was a manual run selection of the most suitable answer from the MSE question posting page (preferably the one selected by the questioner, if available). Subsequently, we inserted links to the answers, i.e., math.stackexchange.com/a/xxx to the ‘relevant answers’ property of the query item normal topics page. 4.3

Populate Formula Answers (Task 2)

The second part in our experimental pipeline was a manual run selection of the most suitable formula per question or answer. The chosen formula was considered to answer the given question as concise as possible. Thus, we did interpret Task 2 as having to find formula answers to the question and only not similar formulae. We inserted the extracted LaTeX string to the ‘defining formula ’ property, as a subproperty of ‘relevant answers’ on the Wikidata item for formula topics. 4.4

Preparing Data for Experiments and Submission

After having populated our Wiki database, we used a SPARQL query (Fig. 2) to have an overview of its content. The query fetches all Wikidata question items, displaying their ‘topic-id’ (e.g. A.1 or B.1), ‘post-id’ (e.g., 3063081), and the formula LaTeX string. With the list of normal and formula topic insertions, we performed a quality check, correcting wrong or missing values. The previously developed MOI sea rch engine [26] a llows us to sea rch mea ningful ma thematical expressions by a given textua l sea rch query. This workflow ca n be used to solve Ta sk 2, but it requires some substa ntia l upda tes. Essentia lly, Ta sk 2 requests releva nt formula IDs for a given input formula ID. Ea ch formula ID is ma pped to the corresponding post ID. Hence, we ca n ta ke the entire post of a formula ID a s the input for our MOI sea rch engine. However, there a re two ma in problems with the existing a pproa ch: (i) the MOI sea rch engine wa s developed a nd tested only to sea rch for k eywords, thus, entering entire posts a t once ma y ha rm the a ccura cy, a nd (ii) every retrieved MOI is by design a subexpression a nd, thus, ha s proba bly no designa ted formula ID. To overcome these issues, we need to understa nd the current system. The MOI sea rch system retrieves MOIs in two steps. The first step retrieves releva nt documents from a n ela sticsea rch24 insta nce for the input query. Hence, we first indexed a ll ARQMa th posts in ela sticsea rch. To index the content of ea ch post a ppropria tely, we set up the sta nda rd English stemmer, stopword filtering, HTML strippin g (filters out HTML ta gs but preserves the content of ea ch ta g), a nd ena ble ASCII folding (converts a lpha betic, numeric, a nd symbolic cha ra cters to their ASCII equiva lence, e.g., ‘á ’ is repla ced by ‘a ’). For the sea rch query, we used the sta nda rd ma tch que ry system but boosted every ma thema tica l expression in the input. This tells ela sticsea rch to focus more on the ma th expressions in a sea rch query , ra ther tha n the a ctual text. With this setup, we overcome the mentioned issue (i) a nd ca n sea rch for releva n t posts by entering a n entire content of a post. In the second step of the MOI sea rch engine, the engine 24 https://www.elastic.co s( , ) ≔ , disa ssembles a ll formula e in the retrieved documents a nd ca lcula tes the mBM25 score [26] for ea ch of these subexpressions (MOI) ( + 1)IDF( )ITF( , )TF( , )

AV GDL )

TF( ′, ) + (1 − + | |AVGc max ′∈ | ( ) mBM25( , ) ≔ max s( , ), ∈ where mBM25( , ) is a modified version of the BM25 releva nce score [28] with as the entire ARQMa th corpus, IDF( ) is the inverse document frequency of the term , TF( , ) the term frequency of the term in the document ∈ , ITF( , ) the inverse term frequency (ca lcula ted the sa me wa y a s IDF( ) but on the document level for the document ), AVGDL the a vera ge document length of a nd AVGC the a vera ge complexity of (see [26] for a more deta iled description). The top-scored expressions will be returned. The mBM25 score requires the globa l term a nd document frequencies of every subexpression. Hence, we first ca lcula ted these globa l va lues for every subexpression of every formula in the ARQMa th da ta set. Table 1 shows the sta tistics of this MOI da ta base in compa rison to the previously genera ted da ta bases for a rXiv and zbMATH. A document in ARQMa th is a post from MSE. The da ta set only includes Ma thML representa tions. The complexity of a formula is the ma ximum depth of the Presenta tion Ma thML representa tion of the formula . As Table 1 shows, the ARQMa th da ta base ca n be interpreted a s a hybrid between the full resea rch pa pers in a rXiv and rela tively short review discussions in zbMATH (ma inly conta in ing reviews of ma thema tica l a rticles). Machine Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz - 4 Cores / 8 Threads RAM 32GB 2133 MHz Disk 1TB SSD Required Diskspace 7.8 GB (Posts) + 3 GB (MOIs) = 10.8 GB

Runtime 6.0 s / query (average over all queries) Considering tha t every formula in the ARQMa th da ta set ha s its own ID a nd the system needs to preserve the ID during computa tion, we need to a tta ch the ID to every genera ted MOI. However, this would result in a ma ssive overloa d. For exa m ple, the single identifier a ppea rs 7.6 million times in ARQMa th a nd thus will ha ve millions of different formula IDs. The entire ARQMa th da ta set ha s 16.8 million unique MOIs. Ha ndle this number of different IDs is impra ctica l. Hence, we choose a different a pproach to get the formula IDs for every MOI. Since the sea rch engine retrieves the releva nt documents first, we only need to consider formula IDs tha t exist in these retrieved documents. To a chieve this, we a tta ched the formula IDs to every post in the ela sticsea rch da ta base ra ther tha n to the MOIs itself. A single document in ela sticsea rch now contains the post ID, the textua l content, a nd a list of MOIs with loca l term frequencies (how often the MOI a ppea rs in the corresponding post) a nd formula IDs. Note tha t most MOI still ha s multiple formula IDs, since a subexpression ma y a ppea r multiple times in a single post, but the number of different IDs reduced dra stica lly. Since the IDs a re now a tta ched to ea ch post but a re not used in the sea rch query, the performa nce of retrieving releva nt documents from ela sticsea rch sta ys the sa me. With this a pproa ch, we ma y ca lcula te multiple but different mBM25 scores for a single formula ID, since a single unique formula ID ca n be a tta ched to multiple MOIs. To ca lcula te the fina l score fora formula ID, we ca lcula ted the a vera ge of a ll mBM25 scores for a formula ID. For exa mple, consider we would retrieve the document with the ID 2759760. This post conta ins the formula ID 25466124 6 , which would be disa ssembled into its subexpressions , 6, a nd . Hence, we would ca lcula te three mBM25 scores for / 6. The a vera ge of these scores would be the score for the formula ID.

We used this upda ted MOI sea rch engine to retrieve results for Ta sk 2. Note t ha t the a pproa ch might be a bit unorthodox, since the MOI sea rch engine ta kes the entire post of the given formula ID ra ther tha n the formula ID a lone. We interpreted Ta sk 2 to retrieve a nswer formula e for a given question formula , ra ther tha n retrieving v isua lly or sema ntica lly simila r formula e. Ba sed on this interpreta tion, it ma kes sense to use the entire post of a formula ID to sea rch for releva nt answers. In other words, we interpreted Ta sk 2 a s a n extension a nd ma th specific version of Ta sk 1. In summa ry, the key steps of the MOI sea rch engine to solve Ta sk 2 were the following: 1. 2. 3. 4.

Ta ke the entire post of the given formula ID.

Sea rch for posts simila r to the retrieved post in step 1.

Extra ct a ll MOI from a ll retrieved posts in step 2.

Ca lcula te mBM25 scores for a ll MOIs of step 3.

Group the MOIs by their a ssocia ted formula IDs (every formula ID ha s now multiple mBM25 scores).

Avera ge the mBM25 scores for ea ch formula ID.

For Ta sk 2, we retrieved 107,476 MOIs. We used the provided a nnota tion da taset to eva lua te the retrieved results. For a better compa rison, we ca lcula ted the nDCG′p (nDCG-prime) score, a s the ta sk orga nizers did [29]. Note the nDCG′p removes unjudged documents before ca lcula ting the score. Since these were post-experiment ca lcula tions, there is not much correla tion between the retrieved MOI documents a nd the judged formula IDs. We found 179 formula IDs tha t were retrieved by our MOI engine a nd conta ined a judgment by the a nnotators of the ARQMa th ta sk. Ba sed on these 179 judges, we retrieved a n nDCG′p va lue of 0.374, which is in the midra nge compa red to the other competitors. 4.6

Data Integration of Query and Pool Formulae

We tested two other a pproa ches for Ta sk 2: Formula pool retrieva l via k -nea rest neighbors a nd fuzzy string ma tching. For both methods, we first needed to integra te the pool of formula e (the ta sk da ta set) with our query set, consistin g of the formula e, which we ‘ma nua lly’ chose from the ca ndida te a nswers to be a formula a nswer to the question a sked.

Data integration query & pool K-nearest neighbors retrieval

Fuzzy string candidates retrieval kNN fuzzy •Loa d TSV files for query a nd pool formulae •Retri eve formula symbols (identifiers, operators) from mathml-tags ('ci ','mi','co','mo'), together with formula LaTeX s tring •Integrate all formulae with IDs a nd s ave dictionary to a Python Pi ckle file •Encode formula La TeX s trings via TF-IDF and Doc2Vec •Retri eve distances a nd k-nearest formula ca ndidates via kNN algorithm •Ca l culate pairwisefuzzy string partial ratios (matching percentage) •Ra nk all percentages for each formula to identify cl osest candidates The properties a re retrieved from the ta sk da ta set tsv files. For the identifiers a nd opera tors list, the symbols a re retrieved from the Ma thML string. For the query formulae, the sea rch ta gs a re '<mi>' a nd '<mo>', a nd for the pool formula e, '<ci>' a nd '<co>' for identifiers a nd opera tors respectively. The formula La TeX string is retrieved from the 'alttext' a ttribute of the '<math>' ta g. Fina lly, the formula dictiona ry is seria lized to a pickle file. It is utilized in the following steps (formula encoding, kNN a nd fuzzy string simila rity retrieva l).

Formula LaTeX String Encoding via TF-IDF and Doc2Vec

Ha ving retrieved the La TeX formula from the Ma thML string, it is encoded by jointly feeding its identifier a nd opera tor tokens (utf -8) into the TfidfVectorizer from the Python pa cka ge Scikit-lea rn [24] a nd the Doc2Vec encoder from Gensim [25]. For the TfidfVectorizer, a n ngra m ra nge of (1,1) is used. The Doc2Vec distributed ba g of words (PV-DBOW) model is tra ined for 10 itera tions. 4.8

Formula Pool Retrieval via K-Nearest-Neighbors

The two different formula encodings vector spa ces a re subsequently fed into a Nea restNeighbors a lgorithm from Scikit-lea rn. In Table 3, some illustra tive exa mples of the top 3 results a re displa yed. In a ll ca ses, the retrieved formula e a re structura lly simila r, sometimes equiva lent, sometimes even “visua lly” identica l. Ha ving genera ted the formula encodings, the kNN method is very fa st compa red to cla ssica l text ma tching. The vector computa tions ca n be ca rried out fa ster tha n text processing. Apart from the NearestNeighbors prediction using TF-IDF and Doc2Vec encoded LaTeX formula strings, we also tested a fuzzy string matching to retrieve similar formulae. For each ‘manually’ selected query formula, we calculated the fuzzy partial ratio similarity with all pool formulae and ranked them with descending overlap. The top 10 of the candidates were then submitted. Compared to the kNN approach, the fuzzy string search has the advantage of not requiring an encoding index. Thus new formula instances can easily be added without requiring to retrain the vector encodings of the whole corpus. 5

Classification of Question and Answer Types

To assess the relative relevance of the specific question, answer, and formula types, we carried out a human multi-label classification for each set respectively. Our approach was inductive, meaning that we did not specify the cla sses upfront but observed them examining the questions, answers, and formulae as they occurred. 5.1

Example Questions and Answers

To illustrate our classification operation mode, we will first give some examples. In question A.1, the user asks to find the value of a parameter contained within a function, given an interval constraint. We classified this question with the label “calculate / compute / find value”. Our manually selected answer25 for A.1 was labeled “numeric value / fraction”, and “inequality”.

In question A.50, the user asks whether a series containing a fraction of powers and a trigonometric function converges or diverges. We classified this question with the labels “power / exponential / logarithmic”, “trigonometry”, and “sequence / summation”. Our manually selected formula for B.50, ≤ − { ( ′ ) + (1 + 3 } { ′ { }}, was labeled “inequality” and “powers / exponentials / logarithms”. 25 https://math.stackexchange.com/questions/3062860/finding-value-of-c-such-that-the-rangeof-the-rational-function-fx-frac/3063081#3063081 ) 5.2

Question Types

Label

Questions We labeled the question types as shown in Table 4. Complex numbers Parameter Probability The occurrence sta tistics of the individua l question types is shown in Fig. 5. Appa rently, the ma jor pa rt of the questions involved “sets” of numbers. This is pa rtly ca used by the set symbols for na tura l numbers ℕ or ra tiona l numbers ℚ a ppea ring frequently in definitions tha t a re included in the question. The second-highest ra nked la bel is “function”. This is not surprising considering tha t functions a re a hea vily used n otion or concept in ma thematics. To obta in this la bel, it wa s sufficient tha t a function identifier a ppea rs in the question. The third highest ra nked la bel is “solve equa tions – a lgebra ic or differentia l”. In ma ny ca ses, provided enough informa tion, the qu estion ca n be a nswered by using a computer a lgebra system (QAS) connected to the question a nswering engine. Cla ssifying the question subject cla sses, we see tha t a lmost a ll questions a re pure ma thema tics, except A 33 is from the math-stackexchange-category physics. Employing subject cla ss cla ssifica tions ca n help to redirect questions a nd reducing the a nswer spa ce. Open-doma in QA systems ca n then be modula rized into distinct closed domain pa rts tha t ha ndle different QA types differently. For exa mple, a geometry question such a s “Wha t is the surfa ce a rea of a sphere?” ca n be pa rsed a nd a nswered differently than a n a lgebra ic question such a s “How to solve + 1 = 2?”. While the former could be pa ssed to a da ta base conta ining properties of geometric objects, the la tter could be pa ssed to a computer a lgebra system. On the other ha nd, physic s questions often rely hea vily on the sema ntics of identifier na mes. As a n exa mple, the question “Wha t is the rela tionship between ma ss a nd energy?” should yield formula e such a s = ½ 2. Without ha ving a nnotated identifier na mes conta ined within the formu= 2 or la e, the question ca nnot be a nswered. 5.4

Answer Types

We la beled our ma nually retrieved a nswer types a s shown in Table 5.

Label Value / fraction Probability Binomial Pow / exp / log Interval Set Inequality Differential Integral Trigonometry Function Algebraic transformation Vector / matrix Logic Modulus Limes Deriv Cases Complex numbers 24, 27, 29

A 19, 21 A 29, 75, 95 A 33, 86 A 46 A 10, 13, 15, 18, 20, 22, 24, 26, 30, 41, 45, 49, 50, 51, 59, 69, 76, 94 The occurrence sta tistics of the individua l a nswer types is shown in Fig. 6. As for the question types, “set” is still the most frequent la bel. However, “function” is here only ra nked fourth. The la bel “a lgorithmic tra nsforma tion” is ra nked second. Some of the tra nsforma tions ca n be done using computer a lgebra systems. Appa rently, the a nswer a nd question ca tegories differ. This mea ns, for exa mple, tha t given a short question, the potentia lly longer a nswer (proof or other) ca n involve more ca tegories.

We la beled the formula types a s shown in Table 6.

The occurrence sta tistics of the individua l formula types is shown in Fig. 7. Algebra ic tra nsforma tions a nd functions a re still ra nked high. All in a ll, the most frequent question, a nswer, a nd formula types involve sets, sequences, sums, powers, exponentia ls, loga rithms, trigonometry functions, inequa lities, a nd a lgebra ic tra nsformations, or equa tion solving. In the future, one could explore whether the question cla ssifica tion la bel is enha ncing a nswer retrieva l.

Discussion of Challenges

Table 7 shows the results of our submission in the ARQMa th la b. For Ta sk 1, the reported nDCG' score for our ma nua l run is outsta ndingly low. Hence, we tried to investiga te the rea sons for this low score. We identified one critica l issue in our m a nual run. We ha ve linked the posts from the ARQMa th da ta set wit h the rea l posts in MSE, which ma kes it ea sier to cra wl for releva nt a nswers ma nua lly. However, this a pproa ch leads to the problem tha t some of our reported a nswers do not exist in the ARQMa th da ta set. Nonetheless, the nDCG' removes non-judged documents prior to eva lua tion. Hence,a rela tively high number of a nswers tha t do not exist in the da ta set should not ha rm our score dra ma tica lly. We ca n report a n nDCG' score of 0.504 for our submitted run. This is significa ntly higher tha n the reported score by the ARQMa th result pa per [29]. We ca lcula ted the nDCG’ score a s formula ted in [30] a nd [31]

DCG′p nDCG′p =

IDCG′p , where

p DCG′p = ∑ =1 |RELp| IDCG′p = ∑ 2reli − 1 log2( + 1) 2reli − 1 log2( + 1) =1 a nd reli is the given releva nce score for the -th element, a nd RELp is the list of relevant documents ordered by their releva nce up to position . In other words, the nDCG′p score is the DCG′p score divided by the DCG′p score for the idea l order of releva nt hits. The nDCG′p is ca lcula ted for every query in the test set. The overa ll sco re is therefore ca lcula ted a s the mea n va lue of nDCG′p over a ll queries.

We identified two possible issues tha t could expla in the misma tch between our ca lcula ted score a nd the reported one. The nDCG′p score is ca lcula ted for a fixed number of retrieved top hits. If is la rger tha n the number of retrieved documents, it would reduce the score. We a ssume tha t most contesta nts reported a list of rele va nt hits for ea ch query. Since we performed a ma nual run, we only reported the a ctua l a nswer. This mea ns, for our reported a nswers it only ma kes sense to set = 1.

Moreover, we did not report va lid a nswers for some queries (in ca se the a nswer ID did not exist in the da ta set, we ha d no va lid a nswer in tota l for tha t pa rticula r query). If these queries were considered when ca lcula ting the mea n nDCG′p over a ll queries, it would a lso expla in a significa ntly lower score. The nDCG′p is designed to not ta king unjudged documents into a ccount. Simila rly, it ma kes sense to ignore queries with no returned a nswers when ca lcula ting the overa ll nDCG′p over a ll queries. Following these rules, we ca lcula ted a n nDCG′p of 0.504 for our ma nual run. Table 10 in the Appendix shows the results for our DCG′1 und IDCG′1 scores for a ll queries of Ta sk 1, for which we retrieved a nswers in our ma nua l run a nd were ra nked by the ARQMa th reviewers. The fina l a vera ge score for nDCG′1 is 0.504.

In a ddition to the problema tic score ca lcula tion, we found incomprehensible releva nc e scores on multiple occa sions. A possible rea son for this is the subjectiveness of releva nce. While we found the reported a nswers highly releva nt, the a nnota tors provided a releva nce score of 0. Table 8 summa rizes the identified problema tic a nnotations. In five out of nine of these ca ses, our reported a nswers were ma rked a s correct by the questioner a t MSE (la st column in Table 8) but a nnota ted a s non-releva nt by the ARQMa th a nnotator. This seems to indica te tha t the releva nce scores for ARQMa th ta sks 1 a nd 2 a re very subjective, even though the reported Ka ppa coefficient for intera nnota tor a greement wa s rea sona bly high with a round 0.34. 2146297 311354 893752 0 0 0 Yes Yes No In the process of ma nual a nnota tion a nd a nswer retrieva l, we noticed severa l cha llenges for IR systems. First, the question a nd a nswer fea tures a re obviously very heterogeneous da ta types (text a nd formula e). It rema ins to be explored how to combine both in a suita ble wa y. Recent studies [32] investiga ted the impa ct of different encoding combina tions on the cla ssifica tion a ccura cy a nd cluster purity on the NTCIR-11/12 a rXiv da ta set [33]. They ca lled out for a “formula encoding cha llenge” to exploit the formula informa tion for ma chine lea rning ta sks. A successful encoding should, e.g., improve the text cla ssifica tion a ccura cy. The a im is motiva ted by the observa tion tha t there is little correla tion between text a nd formula simila rity, a t lea st using the cosine mea sure on tf-idf a nd doc2vec encodings. We need to somehow connect text a nd ma th, such that there is a synergy between their sema ntics. In the ca se of the ma thematical question a nswering ta sk, this could be a chieved by tra nsforming the ma thematical formu la elements to textua l entities. Consider for exa mple the ARQ Ta sk question A.29. The question a sks for a recipe to divide complex numbers by infinity (title: “Dividing Complex Numbers by Infinity”). For this question, we ma nua lly retrieved the formula +∞ = 0 from the a nswer tha t wa s selected by the questioner on MSE. One wa y to connect the question to possible a nswer formula e would be to a nnotate both textual elements. Table 9 shows how linking to items of the sema ntic knowledge-ba se Wikida ta 8 [ 20 ], [ 21 ] can provide a connection via the joint QIDs Q1226939, Q11567, a nd Q205. A joint sema ntic vector representa tion of both the title text a nd the formula could then be a conca tena tion of the Wikida ta item embeddings, a s proposed in [34].

Question text annotation “Dividing”: “division” (Q1226939) “Adding”: “addition” (Q32043) N/A N/A N/A “Infinity”: “infinity” (Q205)

Formula answer annotation + : “division”(Q1226939) ∞ + : “addition” (Q32043) + : “complex number” (Q11567) : “real number”(Q12916) : “complex number” (Q9165172) ∞: “infinity” (Q205) This exa mple illustra tes how linking Formula Concepts [ 16 ], [ 17 ] ca n be very beneficia l for ma thematica l question a nswering (on MSE, a rXiv, Wikipedia , etc.). However, this requires the sema ntic a nnotation of textua l a nd formula elements, which ca n be done, e.g., using the “AnnoMa thTeX”26 system [ 19 ] hosted by Wikimedia . In the future, we should be a ble to a utoma tica lly link text a nd formula entities to Wikida ta items a nd Wikipedia a rticles. It rema ins a cha llenging problem for ma thematical formula entity linking to exha ustively a nd una mbiguously identify the importa nt semantic pa rts of a formula . In the future, a nnotation guidelines should be developed to ta ckle this problem. For Ta sk 2, we used the MOI sea rch engine to retrieve releva nt ma thematica l expressions from the da ta set. Since the MOI engine does not ha ndle entire ma thematica l expressions by itself but disa ssemble formula e into their subexpressions, the concept of linking retrieved MOIs ba ck to a formula ID wa s cha llenging. Furthermore, the a pproa ch we used to ca lcula te the form ula ID of a n MOI ha s some dra wba cks. First, the MOI engine retrieves releva nt documents from ela sticsea rch with a textua l sea rch query. In the second step, the MOIs a re scored ba sed on the retrieved documents. Thus, the retrieved MOIs (a nd the corresponding formula IDs) a re a s good a s the retrieved documents in the first ta sk. When the retrieved documents a re not releva nt, none of the retrieved MOIs ca n be releva nt. Hence, the sea rch results a re quite sensitive to the settings tha t were used to retrieve releva nt documents. Nonetheless, the a pproa ch performed rea sona bly well compa red to the results of other competitors with a n nDCG′p score of 0.374. 7

Outlook and Future Work

We a re excited to employ our a pproa ches a nd the a pproa ches of other ta sk pa rticipants to retrieve releva nt formula e on zbMATH da ta sets. However, a s discussed before, we a re uncerta in if the computed performance numbers a re a suita ble indica t or to predict the usefulness of the a pproa ches to zbMATH users. We will, therefore, consider suggesting a ma thematical litera ture retrieva l ta sk in the future. However, a s a prerequisite , we see the need to resea rch ma th specific deterministic eva lua tion m etrics tha t eliminate ta sk-specific huma n a nnotators in the loop. In contra st, we believe tha t obj ective verifia ble or a lmost prova ble sema ntic enha ncement techniques ca n significa ntly benefit from a huma n review. While releva nt (to a n informa tion need) is not yet a well-esta blished term a mong working ma thematicia ns, definitions, equiva lences, exa mples, substitutions, theorems a nd proves a re well esta blished. While forma l ma thematics is not (yet) a ble to a utomatica lly ma p mathematical na med entities to forma l concepts, working ma thematicia ns a re genera lly a ble to crea te such a ma pping with a very high interreviewer a greement. Therefore, we a im to explore how employing our “AnnoMa thTeX” formula a nnota tion recommender system [ 19 ] on MSE questions a nd a nswers ca n promote a nswer retrieva l. 26 annomathtex.wmflabs.org To summarize the marginal results from our contribution, the kNN method can be employed as a fast search engine, provided formulae are indexed as vector encodings. The fuzzy string search is slower but has the advantage that no index is needed. As for MOI, the retrieved results are less strictly tied to existing expressions since it considers all subexpressions in an entire dataset. This helps to extract meaningful expressions rather than exact matches. 8

Acknowledgments

This work was supported by the German Research Foundation (DFG grant GI -1259-1). Joint Conference on Digital Libraries, Fort Worth Texas USA, May 2018, pp. 233–242, doi: 10.1145/3197026.3197058. [23] Q. Le and T. Mikolov, “Distributed Representations of Sentences and Documents,” Proceedings of the ICML Conference 2014, p. 9. [24] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” MACHINE

LEARNING IN PYTHON, p. 6. [25] R. Řehůřek and P. Sojka, Software Framework for Topic Modelling with Large

Corpora. University of Malta, 2010. [26] A. Greiner-Petter et al., “Discovering Mathematical Objects of Interest—A Study of Mathematical Notations,” in Proceedings of The Web Conference 2020, Taipei Taiwan, Apr. 2020, pp. 1445–1456, doi: 10.1145/3366423.3380218. [27] M. Schubotz and G. Wicke, “Mathoid: Robust, Scalable, Fast and Accessible Math Rendering for Wikipedia,” in Intelligent Computer Mathematics - International Conference, CICM 2014, Coimbra, Portugal, July 7-11, 2014. Proceedings, 2014, vol. 8543, pp. 224–235, doi: 10/ggv8pz. [28] S. Robertson and H. Zaragoza, “The Probabilistic Relevance Framework: BM25 and Beyond,” Found. Trends Inf. Retr., vol. 3, no. 4, pp. 333–389, Apr. 2009, doi: 10.1561/1500000019. [29] R. Zanibbi, D. W. Oard, A. Agarwal, and B. Mansouri, “Overview of ARQMath 2020: CLEF Lab on Answer Retrieval for Questions on Math,” p. 25. [30] K. Järvelin and J. Kekäläinen, “Cumulated gain-based evaluation of IR techniques,” ACM Trans. Inf. Syst., vol. 20, no. 4, pp. 422–446, Oct. 2002, doi: 10.1145/582415.582418. [31] C. Burges et al., “Learning to rank using gradient descent,” in Proceedings of the 22nd international conference on Machine learning, Bonn, Germany, Aug. 2005, pp. 89–96, doi: 10.1145/1102351.1102363. [32] P. Scharpf, M. Schubotz, A. Youssef, F. Hamborg, N. Meuschke, and B. Gipp, “Classification and Clustering of arXiv Documents, Sections, and Abstracts, Comparing Encodings of Natural and Mathematical Language,” Proceedings of the JCDL Conference 2020, May 2020, doi: 10.1145/3383583.3398529. [33] R. Zanibbi, A. Aizawa, and M. Kohlhase, “NTCIR-12 MathIR Task Overview,” Proceedings of the 12th NTCIR Conference on Evaluation of Information Access Technologies 2016, p. 10. [34] A. Lerer et al., “PyTorch-BigGraph: A Large-scale Graph Embedding System,” Proceedings of the MLSys Conference 2019, Apr. 2019, Accessed: Jul. 16, 2020. [Online]. Available: http://arxiv.org/abs/1903.12287.

Appendix

retrieved answers in our manual run and were ranked by the ARQMath reviewers. The final average nDCG1′ score is 0.504. The metrics rel_1 and REL_1 refer to the formulae in Section 6 Relevance

in Best Relevance ′ on page 19.

Topic ID A.12 A.13 A.14 A.16 A.17 A.19 A.20 A.21 A.30 A.35 A.37 A.41 A.42 A.45 A.47 A.50 A.52 A.54 A.56 A.59 A.60 A.62 A.63 A.67 A.68 A.69 A.74 A.75 A.85 A.93

Post ID 44410 1115317 2248783 408304 5322 1348396 23977 65456 2721623 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 0.43 0.43

[1]

Müller and

Teschke , “ Full text formula search in zbMATH,” Eur. Math. Soc. Newsl , vol. 102 , p. 51 , 2016 .

[2]

Mansouri ,

Agarwal ,

Oard , and

Zanibbi , “Finding Old Answers to New Math Questions: The ARQMath Lab at CLEF 2020 ,” in Advances in Information Retrieval , vol. 12036 , J. M. Jose , E. Yilmaz, J.

Magalhães , P.

Castells , N.

Ferro , M. J.

Silva , and F.

Martins , Eds. Cham: Springer International Publishin g, 2020 , pp. 564 - 571 .

[3]

Mansouri ,

Zanibbi , and

D. W.

Oard , “ Characterizing Searches for Mathematical Concepts , ” in 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL) , Champaign, IL, USA, Jun. 2019 , pp. 57 - 66 , doi: 10.1109/JCDL. 2019 . 00019 .

[4]

Karbasian and

Johri , “ Insights for Curriculum Development: Identifying Emerging Data Science Topics through Analysis of Q&A Communities,” in Proceedings of the 51st ACM Technical Symposium on Computer Science Education, Portland OR USA , Feb . 2020 , pp. 192 - 198 , doi: 10.1145/3328778.3366817.

[5]

N. W.

Smith , “ A Question-Answering System for Elementary Mathematics ,” Apr. 1974 , Accessed: Jun. 22 , 2020 . [Online]. Available: https://eric.ed.gov/?id= ED093703 .

[6]

T. T.

Nguyen ,

Chang , and

S. C.

Hui , “ A math-aware search engine for math question answering system ,” in Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12 , Maui , Hawaii, USA, 2012 , p. 724 , doi : 10.1145/2396761.2396854.

[7]

Bhattacharya , “ A Survey of Question Answering for Math and Science Problem,” Computing Research Repository (CoRR), May 2017 , Accessed: Jun. 08 , 2020 . [Online]. Available: http://arxiv.org/abs/1705.04530.

[8]

A. A. S.

Gunawan ,

P. R.

Mulyono , and W. Budiharto, “ Indonesian Question Answering System for Solving Arithmetic Word Problems on Intelligent Humanoid Robot,” Procedia Computer Science , vol. 135 , pp. 719 - 726 , 2018 , doi: 10.1016/j.procs. 2018 . 08 .213.

[9]

Schubotz ,

Scharpf ,

Dudhat ,

Nagar ,

Hamborg , and B. Gip p, “Introducing MathQA -- A Math-Aware Question Answering System,” Information Discovery and Delivery , vol. 46 , no. 4 , pp. 214 - 224 , Nov. 2018 , doi: 10.1108/IDD-06 - 2018-0022.

[10]

Hopkins ,

R. Le

Bras ,

Petrescu-Prahova , G. Stanovsky,

Hajishirzi , and

Koncel-Kedziorski , “ SemEval -2019 Task 10: Math Question Answering,” in Proceedings of the 13th International Workshop on Semantic Evaluation , Minneapolis, Minnesota, USA, 2019 , pp. 893 - 899 , doi: 10.18653/v1/ S19 -2153.

[11]

D. C.

Pineau , “Math-Aware Search Engines: Physics Applications and Overview,” Computing Research Repository (CoRR), Sep . 2016 , Accessed: Jun. 21 , 2020 . [Online]. Available: http://arxiv.org/abs/1609.03457.

[12]

Abdi ,

Idris , and

Ahmad , “ QAPD: an ontology-based question answering system in the physics domain,” Soft Comput , vol. 22 , no. 1 , pp. 213 - 230 , Jan. 2018 , doi: 10.1007/s00500-016-2328-2.

[13]

Suzuki and

Fujii , “ Mathematical Document Categorization with Structure of Mathematical Expressions,” in 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL) , Toronto, ON, Canada, Jun. 2017 , pp. 1 - 10 , doi: 10.1109/JCDL. 2017 . 7991566 .

[14]

Schubotz ,

Scharpf ,

Teschke , A. K. uhnemund, C. Breitinger, and

Gipp , “ AutoMSC: Automatic Assignment of Mathematics Subject Classification Labels,” Proceedings of the CICM Conference 2020 , May 2020 , Accessed: Jun. 21 , 2020 . [Online]. Available: http://arxiv.org/abs/ 2005 .12099.

[15]

Yang and

Ko , “ Mathematical Formula Search using Natural Language Queries,” AECE , vol. 14 , no. 4 , pp. 99 - 104 , 2014 , doi: 10.4316/AECE. 2014 . 04015 .

[16]

Dmello , “ Representing Mathematical Concepts Associated With Formulas Using Math Entity Cards,” Rochester Institute of Technology (RIT) Scholar Works , p. 167 .

[17]

Scharpf ,

Schubotz ,

H. S.

Cohl , and

Gipp , “ Towards Formula Concept Discovery and Recognition,” Proceedings of the 4th BIRNDL Workshop at the 42nd ACM SIGIR Conference 2019 , p. 8 .

[18]

M. A.

Dumitru ,

Ginev ,

Kohlhase ,

Merticariu ,

Mirea , and T. Wiesing, “ System Description: KAT an Annotation Tool for STEM Documents,” Proceedings of the CICM Conference 2016 , p. 4 .

[19]

Scharpf , I. Mackerracher,

Schubotz ,

Beel ,

Breitinger , and

Gipp , “ AnnoMathTeX - a formula identifier annotation recommender system for STEM documents ,” in Proceedings of the 13th ACM Conference on Recommender Systems , Copenhagen Denmark, Sep. 2019 , pp. 532 - 533 , doi: 10.1145/3298689.3347042.

[20]

Scharpf ,

Schubotz , and

Gipp , “ Representing Mathematical Formulae in Content MathML using Wikidata,” Proceedings of the 3th BIRNDL Workshop at the 41st ACM SIGIR Conference 2018 , p. 14 .

[21]

Schubotz , “ Generating OpenMath Content Dictionaries from Wikidata,” Proceedings of the CICM Conference 2018 , p. 8 .

[22]

Schubotz ,

Greiner-Petter , P. Scha rpf, N. Meuschke,

H. S.

Cohl , and

Gipp , “ Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context,” in Proceedings of the 18th ACM/IEEE on