<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Disambiguation of Mathematical Terms based on Semantic Representations⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shufan Jiang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mary Ann Tan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harald Sack</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FIZ Karlsruhe - Leibniz Institute for Information Infrastructure</institution>
          ,
          <addr-line>Eggenstein-Leopoldshafen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Karlsruhe Institute of Technology</institution>
          ,
          <addr-line>Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>84</fpage>
      <lpage>91</lpage>
      <abstract>
        <p>In mathematical literature, terms can have multiple meanings based on context. Manual disambiguation across scholarly articles demands massive eforts from mathematicians. This paper addresses the challenge of automatically determining whether two definitions of a mathematical term are semantically diferent. Specifically, the dificulties and how contextualized textual representation can help resolve the problem, are investigated. A new dataset MathD2 for mathematical term disambiguation is constructed with Proof Wiki's disambiguation pages. Then two approaches based on the contextualized textual representation are studied: (1) supervised classification based on the embedding of concatenated definition and title and (2) zero-shot prediction based on semantic textual similarity(STS) between definition and title. Both approaches achieve accuracy and macro F1 scores greater than 0.9 on the ground truth dataset, demonstrating the efectiveness of our methods for the automatic disambiguation of mathematical definitions. Our dataset, code, and experimental results are available here: https://github.com/sufianj/MathTermDisambiguation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Entity Linking</kwd>
        <kwd>Text Similarity</kwd>
        <kwd>Transformers</kwd>
        <kwd>Mathematical Definition</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Mathematical scholarly articles contain highly structured statements, such as axioms, theorems, and
proofs, which are not easily navigable or explorable through traditional keyword searches. Several
initiatives have emerged to enhance the discovery of mathematical definitions. Argot [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]1 is a collection
of term-definition pairs automatically extracted from mathematical papers, allowing users to retrieve all
definitions of a given term. MathMex [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] 2 is a recent search engine for mathematical definitions based
on the semantic similarity between a user’s query and the definition. Both projects show promising
usage of diferent word embeddings. However, Argot cannot disambiguate polysemous terms, while
MathMex cannot guarantee that the retrieved definitions accurately define the queried term. Both
highlight the need for an automatically constructed knowledge base of mathematical definitions from
scholarly articles. Such a knowledge base would enable researchers to eficiently look up terms and
index relevant mathematical statements and articles.
      </p>
      <p>
        Existing research in this area focuses on extracting mathematical definitions [
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6">3, 4, 5, 6</xref>
        ] and identifying
the terms defined therein, known as definienda (singular: definiendum ) [
        <xref ref-type="bibr" rid="ref1 ref7">1, 7</xref>
        ]. These tasks are extended
by disambiguating or linking newly extracted definition-term pairs to existing concepts in a reference
glossary, or otherwise expanding the glossary. Current work about math term disambiguation shows
promising applications of natural language processing for resolving token-level ambiguity in equations,
such as the ambiguity of “prime” (′) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and for linking formulae and identifiers(formula variable
without fixed value) in STEM papers to Wikidata [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>Disambiguating definienda is particularly challenging when identical terms for the same concept are
defined in various ways (e.g., “path”) or when polysemous terms (e.g., “block”) refer to distinct concepts</p>
      <p>
        Definition and Source Article
If the vertices 0, 1, . . . ,  of a walk  are distinct then  is called a Path. A path with
 vertices will be denoted by .  has length  − 1. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
Let  = (, ) be a graph. A path in a graph is a sequence of vertices such that from
each of its vertices there is an edge to the next vertex in the sequence. This is denoted by
 = ( = 0, 1 . . . ,  = ), where (, +1) ∈  for 0 ≤  ≤  − 1. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
A block in  is a maximal set of tightly-connected hyperedges. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
A block of indices is a set of numbers  where every term ,() depends on the same
value via division, for all  ∈ . [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
(see Table 1). A possible heuristic is that if the definienda of two definitions are linked to diferent
concepts in a reference knowledge base, then these two definitions are distinct. This scenario assumes
that each definition corresponds to one definiendum.
      </p>
      <p>For this study, Proof Wiki3 serves as the reference list. It is a crowd-sourced online collection of
mathematical proofs, including 500 disambiguation pages. Similar to Wikipedia, these disambiguation
pages list identical terms, each linking to its corresponding definition page. Each definition page includes
a unique page title, the definition, and a topic or category where the term can be found (e.g., algebra
or geometry). Specifically, the page title contains the definiendum along with its category and serves
as the identifier of the definition page within Proof Wiki (e.g. “Definition:Bilinear Form (Polynomial
Theory)” in table 2).</p>
      <p>This work addresses the following research questions: RQ1: How well can contextualized word
embeddings help the disambiguation of mathematical terms? RQ2: Which pretraining strategies and
downstream tasks best suit this task? The main contributions of this work are:
• MathD2 - a new dataset for Mathematical Definiendum Disambiguation.
• Exploration of two diferent approaches demonstrating how the disambiguation task can
benefit from contextualized semantic representations.
• Experiment-supported evidence highlighting the eficiency of sentence embeddings for the
addressed disambiguation task.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The challenges posed by this task are (a) the lack of labeled datasets for equivalent mathematical
definitions, (b) the limited number of disambiguation pages, and (c) the unstructured nature of
definitions that combine mathematical notations, formulas, and general discourse [
        <xref ref-type="bibr" rid="ref6 ref7">7, 6</xref>
        ]. To address (a),
entity linking and sentence similarity approaches for mathematical terms are reviewed. To tackle (b)
and (c), transformer models [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] are employed for their capabilities to produce rich, contextualized
representations.
      </p>
      <p>
        Contextualized representations produced by BERT (Bidirectional Encoder Representations from
Transformers) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] encode the meaning of a word according to its context. This means that polysemous
words have several, more accurate representations depending on the sentence where they appear.
BERT is pretrained on two key tasks: Masked Language Modeling (MLM), where random tokens in
a sentence are masked and predicted based on context, and Next Sentence Prediction (NSP), which
trains BERT to determine whether a sentence logically follows another. Pretraining with MLM is widely
applied for domain adaptation , especially when there is a dearth of data for finetuneing [
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ]. In
addition, finetuning BERT for specific downstream tasks and domains is straightforward. For instance,
by combining BERT’s output with a classification layer, it has been adapted for mathematical notation
prediction [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], definiendum extraction [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and mathematical statement extraction [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The Natural
Language Inferernce (NLI) datasets [
        <xref ref-type="bibr" rid="ref20 ref21">20, 21</xref>
        ] used by BERT’s NSP pretraining are related to the task at
hand. A piece of supporting evidence is AcroBERT [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], an entity linker that reuses BERT for NSP’s
pretrained weights and is finetuned to link acronyms to their long forms. AcroBERT outperforms BERT
and other domain-adapted BERT-based models.
      </p>
      <p>
        However, the nature of the BERT’s pretraining tasks makes it unsuitable for measuring semantic
similarity. Sentence BERT (SBERT) 4 [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] modifies BERT’s architecture to produce semantically
meaningful sentence embeddings that can be compared using cosine-similarity. Out-of-the-box SBERT
achieves superior performance across varied classification tasks involving mathematical texts [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. In
one such task, the proponents measure the similarity of SBERT embeddings between an input text and
the combination of titles and abstracts of mathematical publications in arXiv 5 and zbMATH 6 to predict
the classification code of the respective repositories. In the same vein, this study aims to evaluate the
efectiveness of semantic textual similarity in linking definitions to titles. Since BERT for NSP and
SBERT require diferent domain adaptation strategies [
        <xref ref-type="bibr" rid="ref23 ref24">23, 24</xref>
        ], this work first identifies the architecture
that performs better for the task.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>Term disambiguation is formalized as an entity linking task, where the entities refer to the definition
page titles in Proof Wiki. That is, given (1) a definition and an ambiguous definiendum and (2) a
dictionary that maps the ambiguous definiendum to entities, the goal is to find the title that best
matches the definition. The proposed method is described in two steps. First, the ground truth dataset
is constructed. Second, two applicable approaches are considered.</p>
      <sec id="sec-3-1">
        <title>3.1. Construction of the MathD2 Dataset</title>
        <p>
          A dump of the whole Proof Wiki was extracted on the 9th July, 2024 using WikiTeam [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. This dump is
then parsed to get all the definition statements and titles from all definition disambiguation pages. The
extracted definitions is converted to plain text. By mapping ambiguous terms and the corresponding
definition titles is finally constructed. Some definitions might contain other definitions(e.g., the definition
of “Loop” 7), which also happens to definitions in scholarly papers. If both definition titles are mapped
to a common ambiguous term, only the nested definition and its title are kept, because otherwise the
outer definition should be mapped to two titles: its title and the one of the nested definition. Finally,
terms mapped to less than two titles are removed. Table 2 shows (definition, title) pairs extracted from
the disambiguation page of “Bilinear Form’ 8. For the finetuning in Section 3.2, the dataset is split based
on the 343 ambiguous terms at the ratio of 8:2, making a training dataset of 275 ambiguous terms with
1436 (definition, title) pairs and a test dataset of 68 ambiguous terms with 433 (definition, title) pairs. All
(definition, title) pairs from one disambiguation page are kept together in either the training or test sets,
        </p>
        <sec id="sec-3-1-1">
          <title>4https://huggingface.co/sentence-transformers/all-mpnet-base-v2</title>
          <p>5https://arxiv.org/
6https://zbmath.org/
7https://proofwiki.org/wiki/Definition:Loop_(Topology)
8https://proofwiki.org/wiki/Definition:Bilinear_Form
Positive samples:
[CLS]Definition:Polyhedron/Vertex[SEP]The vertices of a polyhedron are the vertices of the polygons which constitute its faces.[SEP]
[CLS]Definition:Angle/Vertex[SEP]The point at which the arms of an angle meet is known as the vertex of that angle.[SEP]
Negative samples:
[CLS]Definition:Polyhedron/Vertex[SEP]The point at which the arms of an angle meet is known as the vertex of that angle.[SEP
[CLS]Definition:Angle/Vertex[SEP]The vertices of a polyhedron are the vertices of the polygons which constitute its faces.[SEP]
so that the generalizability of the finetuned model on unseen terms can be evaluated. In the finetuning
of Section 3.2, for each ambiguous term, two definitions and their titles are randomly selected to make
positive pairs, and the titles of two other definitions to make negative pairs (see example in Figure 1).
Both approaches are evaluated on the training and test datasets, except for the finetuned model.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Classification Based on One Concatenated Embedding</title>
        <p>
          Following the finetuning setup of AcroBERT [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], BERT for NSP is adapted to build a supervised
sentence pair classifier to link definitions to their page titles in Proof Wiki. Every pair of (definition,
candidate title with the matching ambiguous term in Proof Wiki) is concatenated as an input sequence.
The sequence begins with a [CLS] token, followed by a candidate title, a [SEP] token, and then the
definition, ending with [SEP]. The input sequence passes through BERT’s transformer layers. These
layers produce contextual embedding for each token in the sequence. Then, the embedding of [CLS] is
fed into a softmax classification layer, which outputs a score to judge how coherent the concatenated
sequence is. The pair with the highest score is selected as the final predicted output. First the out-of-box
BERT for NSP serves as the baseline to see how well the pre-retained natural language inference model
can describe the entailment between the titles and definitions. Then the pretrained BERT for NSP is
ifnetuned with the training set using a triplet loss function:
ℒ = max {︀ 0,  − neg + pos}︀
(1)
that aims to assign higher scores to the correct titles that match the input definition while reducing the
scores of irrelevant candidates, where  = 0.2 is the margin value, and pos and neg are the distances
for positive and negative pairs, respectively. This approach is implemented with PyTorch [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] and
transfomers [27]. A batch size of 16 and Adam optimizer with learning rate 1e-5 are used. The learning
rate is exponentially decayed at a rate of 0.95 every 1000 steps. The model is trained with the training
dataset for 100 epochs. After each epoch, a checkpoint (copy of the current model weights) is saved.
Each checkpoint is then evaluated with the test dataset so that test data do not impact the model
weights.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Zero-shot Prediction Based on Semantic Textual Similarity between Two</title>
      </sec>
      <sec id="sec-3-4">
        <title>Embeddings</title>
        <p>
          A shortcoming of the previous solution is that the NSP inference has to be run for every (definition, title)
pair mapped to an ambiguous term. Motivated to make a computationally more eficient solution, the
sentence embeddings of the definitions and titles are explored. In this setup, the sentence embedding
of the titles and the definitions only need to be calculated once. For the definition and each candidate
title with the matching ambiguous term, the title with the highest cosine similarity to the embedding
of the definition is selected as the final predicted output. To explore the potential benefits of diferent
pretraining corpus and related tasks, the following models are studied:
• The best-performing sentence transformers for Semantic Textual Similarity(STS) tasks for short
mathematical text as reported in [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], including out-of-box SBERT [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ],
math-similarity/BertMLM_arXiv-MP-class_arXiv [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] (noted as Adapted SBERT in Table 3), and mini SBERT models
SBERT/all-MiniLM-L6-v2 [28], and SBERT/all-MiniLM-L12-v2 [28].
• Mean pooled out-of-box BERT, to compare with the pretraining of SBERT.
• Mean pooled out-of-box CC-BERT [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], a from-scratch model pretrained with MLM on
mathematical papers. This experiment studies the impact of domain-specific MLM pretraining and
domain-specific tokenization, comparing to mean pooled out-of-box BERT
Following SBERT’s default setting [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], the mean pooling strategy is used to calculate the sentence
embeddings with out-of-box BERT and CC-BERT.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>
        Accuracy and the average of the 1 score for each ambiguous term (macro 1 score) are used to measure
how well both approaches can link a definition to the correct title. Table 3 shows the experimental results
of both methods. Overall, our finetuned NSP model performs best, validating AcroBERT’s set-up and
the helpfulness of BERT for NSP’s pretrained weights. Notably, the out-of-the-box SBERT demonstrated
excellent performance with much less inference time. The performance of prediction based on STS
with sentence transformers is aligned with the results of [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Given that both BERT for
NSP and SBERT are pretrained on NLI tasks [
        <xref ref-type="bibr" rid="ref15 ref23">15, 23</xref>
        ], it may be deduced that i) compared to using the
[CLS] representation of concatenated sequence, using separated sentence embeddings captures more
information for our task, and/or ii) SBERT’s pretraining on (title, abstract) pairs from S2ORC dataset [29]
helps to better understand the entailment between titles and body texts. However, the domain-adapted
SBERT model 9 that the authors of [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] finetuned with multiple tasks using titles and abstracts of
mathematical papers does not yield better results than general SBERT models. This might be due to the
model being solely trained on titles and abstracts, diminishing the model’s representational capacity for
both formulas and general text. The experiments with the mean pooled out-of-box BERT and CC-BERT
show that MLM domain-adaptation over mathematical papers slightly improves this task but is far less
eficient than adapted SBERT, which has been pretrained with fewer data but on a better task.
      </p>
      <p>Limitations: An interesting finding is that SBERT for STS and the finetuned BERT for NSP make
some common mistakes, indicating the limits of using only semantic representations. The most
common error is when the definition statement includes nested definitions. Another typical error is that
the predicted result is in the correct category but not the definiendum, mainly when the definition
contains morphemes in the predicted title or when the definition does not contain some morphemes
in the expected title. For example, the definition of “Consequence Function” starts with “Let G be
a game...” 10, and the predicted title is “Definition:Consequence(Game Theory)’ 11. Thus, enhancing
sentence embedding’s comprehension of semantic and syntactic knowledge of mathematical definitions
is still worth investigating. Other common mistakes reveal the noises in the dataset due to automatic
scrapping and LATEXconversion of irregular Proof Wiki pages.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Future Works</title>
      <p>This work introduces a new dataset for mathematical term disambiguation with Proof Wiki. Two
entity linking approaches have been implemented and shown to yield advantages in the usage of
contextualized embeddings to diferentiate mathematical definitions. The experimental results proved
the eficiency and efectiveness of using out-of-the-box SBERT. Further work is planned on applying
the proposed approaches on scholarly papers. In addition, the current approach is to be extended to
include document-level representation and citation information to diferentiate definitions in scholarly
papers. This work also indicates the need for further study on building sentence transformers that
benefit from domain-specific MLM and task-related pretraining.</p>
      <sec id="sec-5-1">
        <title>9https://huggingface.co/math-similarity/Bert-MLM_arXiv-MP-class_arXiv</title>
        <p>10https://proofwiki.org/wiki/Definition:Consequence_Function
11https://proofwiki.org/wiki/Definition:Consequence_(Game_Theory)</p>
        <p>Acc.
84.8
93.8
35.3
37.9
92.4
93.3
93.5
52.2</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly in order to: grammar and spelling
check, paraphrase and reword. After using this tool/service, the authors reviewed and edited the content
as needed and take full responsibility for the publication’s content.
[27] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M.
Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger,
M. Drame, Q. Lhoest, A. M. Rush, Transformers: State-of-the-art natural language processing,
in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing:
System Demonstrations, Association for Computational Linguistics, Online, 2020, pp. 38–45. URL:
https://www.aclweb.org/anthology/2020.emnlp-demos.6.
[28] W. Wang, F. Wei, L. Dong, H. Bao, N. Yang, M. Zhou, Minilm: Deep self-attention distillation
for task-agnostic compression of pre-trained transformers, Advances in Neural Information
Processing Systems 33 (2020) 5776–5788.
[29] K. Lo, L. L. Wang, M. Neumann, R. Kinney, D. Weld, S2ORC: The semantic scholar open research
corpus, in: Proceedings of the 58th Annual Meeting of the Association for Computational
Linguistics, Association for Computational Linguistics, Online, 2020, pp. 4969–4983. URL: https:
//www.aclweb.org/anthology/2020.acl-main.447. doi:10.18653/v1/2020.acl-main.447.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Berlioz</surname>
          </string-name>
          ,
          <article-title>ArGoT: A Glossary of Terms extracted from the arXiv</article-title>
          ,
          <source>Electronic Proceedings in Theoretical Computer Science</source>
          <volume>342</volume>
          (
          <year>2021</year>
          )
          <fpage>14</fpage>
          -
          <lpage>21</lpage>
          . URL: http://arxiv.org/abs/2109.02801v1. doi:
          <volume>10</volume>
          . 4204/EPTCS.342.2.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Durgin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mansouri</surname>
          </string-name>
          ,
          <article-title>Mathmex: Search engine for math definitions</article-title>
          ,
          <source>in: European Conference on Information Retrieval</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>194</fpage>
          -
          <lpage>199</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Berlioz</surname>
          </string-name>
          ,
          <source>Hierarchical Representations from Large Mathematical Corpora, Ph.D. thesis</source>
          , University of Pittsburgh,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Nakagawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nomura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Suzuki</surname>
          </string-name>
          ,
          <article-title>Extraction of Logical Structure from Articles in Mathematics</article-title>
          , in: A.
          <string-name>
            <surname>Asperti</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Bancerek</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Trybulec (Eds.),
          <source>Mathematical Knowledge Management</source>
          , volume
          <volume>3119</volume>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2004</year>
          , pp.
          <fpage>276</fpage>
          -
          <lpage>289</lpage>
          . URL: http: //link.springer.com/10.1007/978-3-
          <fpage>540</fpage>
          -27818-4_
          <fpage>20</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>540</fpage>
          -27818-4_20, series Title: Lecture Notes in Computer Science.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhuge</surname>
          </string-name>
          ,
          <article-title>Discovering patterns of definitions and methods from scientific documents</article-title>
          ,
          <source>arXiv preprint arXiv:2307.01216</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Vanetik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Litvak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shevchuk</surname>
          </string-name>
          , L. Reznik,
          <source>Automated discovery of mathematical definitions in text, in: Proceedings of the Twelfth Language Resources and Evaluation Conference</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>2086</fpage>
          -
          <lpage>2094</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Senellart</surname>
          </string-name>
          ,
          <article-title>Extracting definienda in mathematical scholarly articles with transformers</article-title>
          ,
          <source>in: Proceedings of the Second Workshop on Information Extraction from Scientific Publications</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Shan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Youssef</surname>
          </string-name>
          ,
          <source>Towards Math Terms Disambiguation Using Machine Learning</source>
          , in: F. Kamareddine, C. Sacerdoti Coen (Eds.),
          <source>Intelligent Computer Mathematics</source>
          , Springer International Publishing, Cham,
          <year>2021</year>
          , pp.
          <fpage>90</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Scharpf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schubotz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <article-title>Towards explaining stem document classification using mathematical entity linking</article-title>
          ,
          <source>arXiv preprint arXiv:2109.00954</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Kalayathankal</surname>
          </string-name>
          , et al.,
          <article-title>Operations on covering numbers of certain graph classes</article-title>
          ,
          <source>arXiv preprint arXiv:1506.03251</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Perera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mizoguchi</surname>
          </string-name>
          ,
          <article-title>Bipartition of graphs based on the normalized cut and spectral methods</article-title>
          ,
          <source>arXiv preprint arXiv:1210.7253</source>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ergemlidze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Győri</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Methuku, 3
          <article-title>-uniform hypergraphs without a cycle of length five</article-title>
          , arXiv preprint arXiv:
          <year>1902</year>
          .
          <volume>06257</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kupin</surname>
          </string-name>
          , Subtraction division games,
          <source>arXiv preprint arXiv:1201.0171</source>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          , Attention is all you need,
          <year>2023</year>
          . URL: https://arxiv.org/abs/1706.03762. arXiv:
          <volume>1706</volume>
          .
          <fpage>03762</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pluvinage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Senellart</surname>
          </string-name>
          ,
          <article-title>Towards extraction of theorems and proofs in scholarly articles</article-title>
          , in: P. Healy,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bilauca</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Bonnici (Eds.),
          <source>DocEng '21: ACM Symposium on Document Engineering</source>
          <year>2021</year>
          , Limerick, Ireland,
          <source>August 24-27</source>
          ,
          <year>2021</year>
          , ACM,
          <year>2021</year>
          , pp.
          <volume>25</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>25</lpage>
          :
          <fpage>4</fpage>
          . URL: https: //doi.org/10.1145/3469096.3475059. doi:
          <volume>10</volume>
          .1145/3469096.3475059.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Angarita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cormier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Orensanz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rousseaux</surname>
          </string-name>
          , Choubert:
          <article-title>Pre-training french language model for crowdsensing with tweets in phytosanitary context</article-title>
          ,
          <source>in: International Conference on Research Challenges in Information Science</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>653</fpage>
          -
          <lpage>661</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Head</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Hearst</surname>
          </string-name>
          ,
          <article-title>Modeling mathematical notation semantics in academic papers</article-title>
          ,
          <source>in: Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2021</year>
          ,
          <year>2021</year>
          , pp.
          <fpage>3102</fpage>
          -
          <lpage>3115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Brihmouche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Delemazure</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gauquier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Senellart</surname>
          </string-name>
          ,
          <article-title>First steps in building a knowledge base of mathematical results</article-title>
          ,
          <source>in: Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP</source>
          <year>2024</year>
          ),
          <year>2024</year>
          , pp.
          <fpage>165</fpage>
          -
          <lpage>174</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Bowman</surname>
          </string-name>
          , G. Angeli,
          <string-name>
            <given-names>C.</given-names>
            <surname>Potts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>A large annotated corpus for learning natural language inference</article-title>
          , in: L.
          <string-name>
            <surname>Màrquez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Callison-Burch</surname>
          </string-name>
          , J. Su (Eds.),
          <source>Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Lisbon, Portugal,
          <year>2015</year>
          , pp.
          <fpage>632</fpage>
          -
          <lpage>642</lpage>
          . URL: https://aclanthology.org/D15-1075. doi:
          <volume>10</volume>
          . 18653/v1/
          <fpage>D15</fpage>
          -1075.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nangia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bowman</surname>
          </string-name>
          ,
          <article-title>A broad-coverage challenge corpus for sentence understanding through inference</article-title>
          , in: M.
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Stent (Eds.),
          <source>Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , New Orleans, Louisiana,
          <year>2018</year>
          , pp.
          <fpage>1112</fpage>
          -
          <lpage>1122</lpage>
          . URL: https://aclanthology.org/N18-1101. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N18</fpage>
          -1101.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          ,
          <article-title>GLADIS: A general and large acronym disambiguation benchmark</article-title>
          ,
          <source>in: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics</source>
          , Dubrovnik, Croatia,
          <year>2023</year>
          , pp.
          <fpage>2073</fpage>
          -
          <lpage>2088</lpage>
          . URL: https://aclanthology. org/
          <year>2023</year>
          .eacl-main.
          <volume>152</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .eacl-main.
          <volume>152</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          , Sentence-BERT:
          <article-title>Sentence embeddings using Siamese BERT-networks</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>3982</fpage>
          -
          <lpage>3992</lpage>
          . URL: https: //aclanthology.org/D19-1410. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -1410.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>C.</given-names>
            <surname>Steinfeldt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mihaljević</surname>
          </string-name>
          ,
          <article-title>Evaluation and domain adaptation of similarity models for short mathematical texts</article-title>
          , in: International Conference on Intelligent
          <source>Computer Mathematics</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>241</fpage>
          -
          <lpage>260</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>WikiTeam</surname>
          </string-name>
          , Wikiteam,
          <year>2024</year>
          . URL: https://github.com/WikiTeam/wikiteam, original-date:
          <fpage>2014</fpage>
          -
          <lpage>06</lpage>
          - 25T10:
          <fpage>18</fpage>
          :
          <fpage>03Z</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paszke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          , G. Chanan,
          <string-name>
            <given-names>T.</given-names>
            <surname>Killeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gimelshein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antiga</surname>
          </string-name>
          , et al.,
          <article-title>Pytorch: An imperative style, high-performance deep learning library</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>