<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Mathematical Formula Representation via Tree Embeddings</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Zichao</forename><surname>Wang</surname></persName>
							<email>jzwang@rice.edu</email>
							<affiliation key="aff0">
								<orgName type="institution">Rice University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Andrew</forename><surname>Lan</surname></persName>
							<email>andrewlan@cs.umass.edu</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Massachusetts Amherst</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Richard</forename><surname>Baraniuk</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Rice University</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Mathematical Formula Representation via Tree Embeddings</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">200E38048304853E4C0D6C89C76B79CF</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T00:17+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We propose a new framework for learning formula representations using tree embeddings to facilitate search and similar content retrieval in textbooks containing mathematical (and possibly other types of) formula. By representing each symbolic formula (such as math equation) as an operator tree, we can explicitly capture its inherent structural and semantic properties. Our framework consists of a tree encoder that encodes the formula's operator tree into a vector and a tree decoder that generates a formula from a vector in operator tree format. To improve the quality of formula tree generation, we develop a novel tree beam search algorithm that is of independent scientific interest. We validate our framework on a formula reconstruction task and a similar formula retrieval task on a new real-world dataset of over 770k formulae collected online. Our experimental results show that our framework significantly outperforms various baselines.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Recent years have seen increasing proliferation of mathematical language such as equations and formulae. Table <ref type="table" target="#tab_0">1</ref> shows a few examples of formulae not only in mathematics but also in other scientific subjects that often appear in science, technology, engineering, and math (STEM) textbooks.</p><p>A number of practical tasks have recently gained traction because of the ubiquitous presence of mathematical language. For example, one common task is similar formula retrieval, i.e., finding relevant formulae similar to a query formula (e.g., <ref type="bibr" target="#b4">[5]</ref>). This task arises in a wide range of scenarios, such as when students look for relevant math content in a textbook when doing algebra homework. Another common task is automatic formula generation, which arises in scenarios such as formula auto-completion and math summary and headline generation <ref type="bibr" target="#b23">[24]</ref>. Both these tasks are potentially labor-intensive and time-consuming; an automatic method that tackles these tasks would be of great benefit. In this work, we focus on mathematical language processing (MLP), which involves the formula representation problem, i.e., processing a formula into an appropriate format for downstream tasks such as similar formula retrieval and formula generation.   Existing research mostly focuses on either the retrieval or the generation task but rarely both. In terms of formula retrieval, an emerging line of research explores the idea of symbolic tree representation. Indeed, mathematical formulae are inherently hierarchical and tree structures are appropriate for organizing the math symbols in a formula. Compared to representing a formula simply as a sequence of math symbols, the symbolic tree representation has the advantage to encode both the semantics and the inherent hierarchical structure of a formula. Similar to works in natural language processing (NLP) that leverage inherent language structure (i.e., <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b15">16,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b17">18]</ref>), a number of recent works in formula retrieval exploit formula's unique structural properties, leading to improved results <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b27">28,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b26">27]</ref> compared to other formula representations, e.g., <ref type="bibr" target="#b5">[6]</ref>. However, none of the aforementioned works is capable of generating a formula.</p><p>In terms of formula generation, existing research usually combines processing mathematical and natural language. For example, <ref type="bibr" target="#b22">[23]</ref> trains a topic model on scientific documents and learns the keywords (topics) of a formula. <ref type="bibr" target="#b23">[24]</ref> generates a headline from mathematical questions. However, these works treat formulae as sequences of math symbols and thus neither leverage formula's inherent tree structure. Some other works focus on solving math problems, i.e., generating a solution for an input formula <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b14">15]</ref>. However, these works are fully supervised, i.e., they rely on large, labeled datasets that are difficult to collect. To overcome the data issue, these works design methods to artificially generate formulae. However, such synthetic data is simple and does not cover many complicated formulae in practice (e.g., matrices), limiting the generalization capability of models trained on such data.</p><p>Contributions. We propose FORTE, a novel unsupervised framework for Mathematical FOrmula Representation learning via Tree Embeddings. Our framework fully exploits the tree structure of math formulae to enable both effective formula encoding and formula generation. FORTE consists of 2 key components.  The "E" nodes represents the special "end" node attached as the last child to every node. (2b) Illustration of FORTE's decoding process at a particular time step. First, the position of the next node to be generated is computed (dark blue). Next, the next node (light blue) is generated by the decoder using already generated nodes and positions and the newly computed position. Finally, the partial tree and the stack are updated.</p><p>First, a tree encoder encodes a formula tree into an embedding that can benefit various downstream tasks, i.e., formula retrieval. Second, a tree decoder generates a formula tree from an embedding. We also propose a novel tree beam search algorithm that extends the beam search in sequence-to-sequence models (e.g., <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b0">1]</ref>) to improve the generation quality for tree-structured data. To evaluate our framework, we have collected a dataset of over 770k formulae, the largest to date to our knowledge, from professional, real-world sources such as Wikipedia and arXiv articles. On a formula autoencoding task and a formula retrieval task, we show that our framework (sometimes significantly) outperforms existing methods.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">The FORTE Framework</head><p>We now present our FORTE framework. We first describe the tree representation of a formula (operator tree), which forms the foundation of our framework. We then set up the MLP problem and introduce the various FORTE components, including the tree encoder, tree decoder, and tree beam search. Figures <ref type="figure" target="#fig_1">1-?</ref>? together provide a high-level overview of our framework.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Formulae As Operator Trees</head><p>Every formula X is inherently tree structured <ref type="bibr" target="#b25">[26,</ref><ref type="bibr" target="#b4">5]</ref> and can be represented as a symbolic operator tree (OT):</p><formula xml:id="formula_0">X = (U, &lt;), u ∈ U, U ⊆ V (1)</formula><p>where u is a math symbol (a node in OT), <ref type="foot" target="#foot_0">1</ref> U is the set of math symbols in the operator tree X, and V is the "vocabulary", i.e., all unique math symbols in the data set. &lt; represents partial binary parent-child relation ∀u ∈ U <ref type="bibr" target="#b7">[8]</ref>. Procedure (a) in Fig. <ref type="figure" target="#fig_1">1</ref> illustrates the conversion from a formula to its OT. Intuitively, the OT organizes the math symbols in a formula, such as operators, variables, and numerical values, as nodes in an explicit, hierarchical tree structure. We choose OT because of its intuitive interpretation and rich semantics. We emphasize that our FORTE framework is agnostic to the underlying tree representation; other tree representations such as symbol layout tree <ref type="bibr" target="#b4">[5]</ref> can also be used.</p><p>Math Symbol Vocabulary. The size of the math symbol vocabulary V may be unbounded (e.g., every element in the real number set R, which is uncountably infinite, could be an element in V ); however, most symbols rarely appear. We thus propose the following truncation method in order to work with a finite vocabulary in practice. First, we partition the vocabulary V into five disjoint sub-vocabulary according to symbol types, including numeric V num (numbers, decimals), functional V fun (multiplication, subtraction etc.), variable V var , textual V txt and others V o . We do so because different types of math symbols carry different semantic meanings. Then, we retain only the most frequent K symbols in each sub-vocabulary and convert others to an "unknown" symbol specific to each type. This setup guarantees that the semantics of symbols that do not occur frequently are preserved.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">The MLP Problem Formulation</head><p>We set up the MLP problem as an unsupervised "autoencoding" task (see e.g., Ch.14 in <ref type="bibr" target="#b6">[7]</ref>), motivated by the downstream tasks that we envision our framework will perform. Specifically, our framework aims to reconstruct the input formula in its OT representation through an encoder-decoder bottleneck model design. This problem setup allows us to use the latent embedding from the encoder output for many downstream tasks, i.e., formula retrieval, and the generated formula from the decoder output for generation-related tasks.</p><p>Concretely, the training objective of our framework is</p><formula xml:id="formula_1">L(θ, φ) = − 1 N N i=1 P x (X (i) out ) log f d (f e (X (i) in ; θ); φ) ,<label>(2)</label></formula><p>where N is the number of formulae, f e is the encoder function with parameter θ, f d is the decoder function with parameter φ and P x is the empirical data distribution.</p><formula xml:id="formula_2">X (i) in and X (i)</formula><p>out are the input and output representations of the i-th formula tree in our new dataset, which we will introduce in Sec. 3. We will drop the data point index i in the remainder of the paper for simplicity of exposition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Formula Tree Encoder</head><p>Our tree encoder takes a formula tree as input and outputs an embedding of this formula. The key idea is to properly encode all information underlying the formula tree. To this end, we use two methods including tree traversal, which extracts content (node) information, and tree positional encoding, which extracts structural (relative positions of nodes) information. Figure <ref type="figure" target="#fig_1">1</ref> provides an overview of our formula tree encoder.</p><p>Formula Tree Traversal and Node Embedding. To obtain the content in a formula tree, we employ tree traversal, which visits each node in the tree in a particular order and extracts its content. Process (b) in Fig. <ref type="figure" target="#fig_1">1</ref> illustrates this process. In this work, we consider traversal using depth-first search (DFS), although other traversal orders can also be used. This step returns a DFS-ordered list of nodes {u t } T t=1 where t is the position of node u in the DFS order and T is the number of nodes in the formula tree. Each node is then represented as a trainable embedding x t ∈ R M with dimension M .</p><p>Tree Positional Embedding. To extract the structure of a formula tree, we propose a two-step method that first retains and then embeds the relative positions of nodes in the tree. First, we recursively encode the position t of node u as q t ∈ R t where t = 0, . . . , T is the depth of node u in the tree. q t is composed of the node's parent's position appended with its relative positions to its siblings. q t = [0] for the root node. The formula tree in Fig. <ref type="figure" target="#fig_1">1</ref> illustrates this step. For example, the position [0, 1, 1] of the numeric node "4" is composed of [0, 1] which is its parent's position and <ref type="bibr" target="#b0">[1]</ref> because it is the second child of its parent. Second, we propose a binary tree positional embedding to convert the encoded position q t of each node to a fixed-dimensional vector p t ∈ R D where</p><formula xml:id="formula_3">p t log 2 (C) j: log 2 (C) j+ log 2 (q t [j]) =bin(q t [j]) (3) ∀j = 0, . . . , t .</formula><p>C is the maximum degree (i.e., number of children) of all trees in the dataset,</p><p>• is the ceiling function, bin(•) is the binarization operator (e.g., bin(5) = 101) and p t [j] selects the j-th index of the vector p t . The resulting dimension of the tree positional embedding p t is D = L log 2 (C) where L is the maximum depth of all trees in the dataset.</p><p>Formula Tree Embedding. To transform the formula tree into its embedding, we utilize an embedding function f e : R (M +D)×T → R K where K is the dimension of the formula tree embedding and T is the total number of nodes in the formula tree. Because M is not necessarily the same as D, we concatenate the node and tree positional embeddings. Concretely, the formula tree embedding is computed as</p><formula xml:id="formula_4">h = f e ({x t } T t=1 ; θ), x t = x t , p t .<label>(4)</label></formula><p>There are many options for instantiating the embedding function f e because, thanks to our node and tree positional embedding methods, the tree content and structure are fully preserved in the encoded input sequence. In this work, we use the gated recurrent unit network (GRU) <ref type="bibr" target="#b3">[4]</ref> for f e , but one can freely choose other appropriate models.</p><p>Relation to Prior Work. Our tree encoder design differs from existing approaches in 2 regards. First, compared to <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b2">3]</ref>, which perform tree traversal during training and thus only allow a single data point per iteration, our encoder performs traversal before training, which enables mini-batch processing during training. As a result, our approach removes this computationally expensive traversal step from the training process and significantly speeds up training. Second, compared to <ref type="bibr" target="#b16">[17]</ref>, which uses a onehot-style tree positional embedding, our encoder employs a different binary tree positional embedding which reduces the space complexity from O(LC) to O(L log 2 (C)). This reduction is especially significant for trees with a large degree.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Formula Tree Decoder</head><p>The decoder takes a formula embedding vector, i.e., the output from our tree encoder, as input and generates a formula tree as output. We face two main challenges when we design the decoder. First, the terminating condition of tree generation is unclear because the formula tree can have multiple leaf nodes, each of which terminates the generation in a single branch but not necessarily the entire tree. Second, the order of generation, i.e., which node to generate next, is unclear because there are at least two directions at each node: its siblings (horizontal) and its children (vertical). We now present our decoder, which tackles these two challenges by modifying the decoder target from the encoder input (recall that in a usual autoencoding task, the input and target are exactly the same) and generating by traversing the tree. We also present our novel tree beam search generation algorithm, which improves formula tree generation quality during test time.</p><p>Modified Decoder Target Tree. To address the challenge of unclear generation termination condition, we propose to slightly modify the decoder target from the input formula tree by attaching an additional special "end" node to each node as its last child. Doing so informs the decoder when a node has no more  children to generate and provides a clear generation termination signal for each node. Figure <ref type="figure" target="#fig_3">2a</ref> compares the encoder's input and decoder's target of the same formula tree.</p><p>Generation Via Tree Traversal. To address the challenge of the unclear generation order, we propose to traverse the tree during generation in the same order as the encoder. A data structure is required to track the traversal process. Therefore, for an encoder using DFS traversal, We propose to use a stack which maintains nodes whose generation is unfinished in the DFS order. In addition to the node itself, the stack also maintains the number of children and the position of each node in the stack.</p><p>During generation, we use the current node's embedding xt together with the next node's position p t+1 to generate the next node. We do so because a node can have multiple children; therefore, we use the next node's position to inform the model which node to generate next. This is a major difference from typical sequence generation models, i.e., Transformers, in which the input position is the input token's position itself. Concretely, the next node is generated as</p><formula xml:id="formula_5">u t+1 = argmax u∈V softmax(f d ({x s } t s=1 ; φ)) ,<label>(5)</label></formula><p>s = 1, . . . , t and x s = [ x s ; p s+1 ; h] ,</p><p>where we also concatenate the formula tree embedding h to the decoder's input at each generation step. The next node's position p s+1 is computed using the number of children and the position of the current input node; see Section 2.3 for details. When a node's generation finishes, as signaled by the generation of an "end" node, this node is removed from the stack. The "end" node itself is never added to the stack. The entire tree generation is finished when the stack is empty, i.e., no more nodes to expand further. We use a special "start" token to mark the beginning of the generation process. Figure <ref type="figure" target="#fig_3">2b</ref> illustrates the generation process.</p><p>Tree Beam Search for Tree Generation. The above generation process is a greedy algorithm which may be sub-optimal. To improve tree generation quality, we propose the tree beam search (TBS). The idea is to maintain a stack for each beam, which records the node generation order. During the generation process, the decoder generates the top B most probable next nodes from the current generated tree of each beam, resulting in a total of B 2 candidate trees to expand. We then select B most probable candidate trees to expand further. This process continues until B trees finish generation or until a preset maximum number of steps have been reached. TBS extends beam search in NLP (e.g., in <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b0">1]</ref>) which only concerns sequential data and is incapable of generating tree-structured data. Similar to beam search in NLP, by expands the search space of candidate trees by a factor of B, TBS enables more flexible and higher quality generation during the decoding process compared to greedy generation.</p><p>Relation to Prior Work. To our knowledge, this is the first beam search algorithm for generating tree structured data. Our work differs from <ref type="bibr" target="#b28">[29]</ref> which proposed a tree beam search algorithm for recommender systems that treats the entire dataset as a tree, where each node is a data point (user, item); in our work, we treat each data point (formula) as a tree. Our work also differs from <ref type="bibr" target="#b16">[17]</ref> which specifies the number of children each node must have in the tree, significantly constrains the generation process, unnecessarily increases the node vocabulary and leads to worse generation quality; in our work, there are no constraints on the number of children that each node must have, resulting in flexible and varied generated formula tree.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Experiments</head><p>We conduct two experiments to validate FORTE. In the first experiment, we demonstrate the advantage of FORTE compared to other tree and sequence generation methods for formulae in the formula reconstruction task. In the second experiment, we demonstrate an application of FORTE in the context of the formula retrieval task and show the advantage of FORTE over existing formula retrieval systems. For our framework in all experiments, we use a 2-layer bidirectional GRU for the encoder and a 2-layer unidirectional GRU for the decoder.</p><p>Dataset. We collected a large real-world dataset of more than 770k formulae from a subset of articles on Wikipedia and arXiv. We extracted formulae from these articles and processed them into OT representations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Formula Reconstruction</head><p>In this experiment, we test FORTE's ability to reconstruct a formula. Because some baselines only works on binary trees <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b16">17]</ref>, we select a subset of 170k formulae whose operator trees are binary.</p><p>Baselines. We consider the following baselines: seq2seqRNN which implements the same encoder and decoder as our framework but processes formulae as sequences of math symbols; tree2treeRNN <ref type="bibr" target="#b2">[3]</ref> which is an RNN-based method capable of encoding and decoding only binary trees; treeTransformer <ref type="bibr" target="#b16">[17]</ref> which is a Transformer-based method that shows success only on binary trees. The latter two baselines were originally developed and evaluated on a very different task (program translation) than MLP. We also include four variants of our framework to evaluate the utility of (1) binary against onehot tree positional embedding and (2) TBS against greedy search for tree generation. We construct training, validation, and test sets by splitting the 170k dataset 80%-10%-10%. We train each model 5 times for 50 epochs, record the model with the best performance on the validation set. We then perform formula reconstruction on the test set using beam size B = 10 for applicable methods and report the average values of the following metrics on the test set.</p><p>Evaluation Metrics. We use two groups of metrics. The first group of metrics measures the reconstruction accuracy, i.e., the percentage of the generated formulae that are exactly the same as the ground-truth. We compute both ACC top-1, using only the most probable generated formulae, and ACC top-5, using the five most probable generated formulae. The second group of metrics measures how much the generated formula tree differs from the ground truth formula tree. We use tree edit distance (TED) which measures the distance of two trees by computing the minimum number of operations needed, including changing nodes and node connections, to convert one tree to the other. See <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b12">13]</ref> for an overview of the TED algorithm. We compute both TED-overall which considers both node and connection editing and TED-structural which only considers connection editing.</p><p>Results. Table <ref type="table" target="#tab_2">2</ref> presents the formula reconstruction results comparing our framework with baselines. Comparing to the two tree2tree baselines that struggle at this task, the encoder and decoder designs in FORTE enable near-perfect formulae reconstruction. Comparing to the seq2seqRNN baseline, FORTE shows improvement when processing formulae as OT against as sequences. Moreover, the results from the 4 FORTE variants clearly demonstrates the benefits of binary tree positional embedding and TBS, leading to improvements in all 4 metrics compared to onehot tree positional embedding and greedy search, respectively. We repeat this experiment on the full dataset comparing only seq2seqRNN and FORTE since the tree-based baselines cannot process non-binary trees. FORTE achieves 85.87% compared to seq2seqRNN's 84.30% on the TOP-1 ACC and 90.30% compared to seq2seqRNN's 88.52% on TOP-5 ACC, respectively. These results further show the benefits of representing formulae as OT against as sequences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Formula Retrieval</head><p>In this qualitative experiment, we evaluate FORTE's capabilities in a formula retrieval application. Given an input formula (query), a retrieval method aims to return the top relevant formulae (retrievals) from a collection of formulae. We use the entire 770k formula dataset to train our framework and then use the trained encoder to obtain an embedding for each formula. For each query, we compute the cosine similarity between its embedding and the embedding of each formula in the dataset. Finally, we choose the formulae with the highest similarity scores as the retrievals. We use the queries from the NTCIR-12 formula retrieval task <ref type="bibr" target="#b24">[25]</ref>. Table <ref type="table" target="#tab_0">1</ref> shows a few examples of the queries.</p><p>Model and Baselines. We consider three state-of-the-art baselines designed specifically for the formula retrieval task including Tangent-CFT <ref type="bibr" target="#b10">[11]</ref>, which is one of the few data-driven formula retrieval systems to date, and Tangent-S <ref type="bibr" target="#b4">[5]</ref> and Approach0 <ref type="bibr" target="#b27">[28]</ref>, both of which are based on symbolic sub-tree matching and are data independent. We train Tangent-CFT on the same dataset as FORTE.</p><p>Evaluation Metrics. Because it is difficult to algorithmically judge the relevance of a retrieval to a query, we perform a human evaluation for this task as follows. First, for each method and each query, we choose the top 25 retrieved formulae and mix them into a single pool of retrievals. Second, for each query, three human evaluators independently provide a ternary rating for each retrieval in the pool, i.e., whether the retrieval is relevant, half-relevant, or irrelevant to the query. The above evaluation procedure is consistent with <ref type="bibr" target="#b24">[25]</ref>, including the number of evaluators involved. We then use the mean average precision (MAP) <ref type="bibr" target="#b20">[21]</ref> and bpref <ref type="bibr" target="#b1">[2]</ref> as the evaluation metrics. Compared to other retrieval evaluation metrics, Both MAP and bpref are easy to interpret and appropriate for evaluating multiple queries and for comparing multiple retrieval systems.</p><p>Results. Table <ref type="table" target="#tab_3">3</ref> presents the quantitative evaluation results, averaged over the three evaluators' scores. Both metrics indicate that our framework achieves superior performance than the other data-driven baseline, TangentCFT, and one of the data-independent baselines, Tangent-S. Our method falls slightly behind Approach0. This may be caused by the fact that Approach0 uses a more explicit symbolic matching using directly subtrees whereas we rely on the more abstract vector representations of formulae which may lose information during similarity computation. This result is consistent with existing literature <ref type="bibr" target="#b10">[11]</ref>. We then propose a new model, FORTE-App that combines the strengths of FORTE and Approach0. The last row of Table <ref type="table" target="#tab_3">3</ref> shows that the combined method achieves new state-of-the-art performance on the formula retrieval experiment. In particular, the bpref score implies that, on average, in FORTE's retrieved formulae, relevant formulae (as judged by evaluators) rank higher than irrelevant ones more often than in those retrieved by baselines.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions and Future Work</head><p>In this work, we propose FORTE, a novel, unsupervised mathematical language processing framework by leveraging tree embeddings. By encoding formulae as operator trees, we can explicitly capture the inherent structure and semantics of a formula. We propose an encoder and a decoder capable of embedding and generating formula trees, respectively, and a novel tree beam search algorithm to improve generation quality at test time. We evaluate our framework on formula reconstruction and demonstrate our framework's superior performance in both experiments compared to baselines. There are many avenues of future work. One direction is to combine our framework's dedicated capability to encode and generate formulae with state-of-the-art NLP methods to enable cross-modality applications that involve both mathematical and natural language. For example, our framework can serve as a drop-in replacement for the formulae processing part in a number of existing works to potentially improve performance, i.e., in <ref type="bibr" target="#b22">[23]</ref> for joint text and math retrieval, in <ref type="bibr" target="#b23">[24]</ref> for math headline generation, in <ref type="bibr" target="#b9">[10]</ref> for grading students' math homework solutions, and in <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b8">9]</ref> for neural math reasoning. We also look forward to conducting a thorough human evaluation for relevant formula retrieval.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>N</head><label></label><figDesc>= 0.5 − log 2 Frequency of this item Frequency of most common item (physics) 238 92 U + 64 28 Ni → 302 120 Ubn * → fission only (chemistry) ax 2 + bx + c = 0 (algebra)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 1 :</head><label>1</label><figDesc>Fig. 1: Illustration of FORTE's encoding process of a formula.</figDesc><graphic coords="2,134.77,209.26,345.84,59.17" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>(a) Input and output formula tree. (b) The decoding process.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 2 :</head><label>2</label><figDesc>Fig. 2: (2a) Illustration of FORTE's input and output operator tree of the same formula.The "E" nodes represents the special "end" node attached as the last child to every node. (2b) Illustration of FORTE's decoding process at a particular time step. First, the position of the next node to be generated is computed (dark blue). Next, the next node (light blue) is generated by the decoder using already generated nodes and positions and the newly computed position. Finally, the partial tree and the stack are updated.</figDesc><graphic coords="3,151.62,222.09,315.19,82.54" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>A few examples of mathematical language in our context.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 :</head><label>2</label><figDesc>Formula reconstruction results. FORTE outperforms all other methods.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3 :</head><label>3</label><figDesc>Formula retrieval results.</figDesc><table><row><cell>Methods</cell><cell>Metrics</cell></row><row><cell></cell><cell>map bpref</cell></row><row><cell>Approach0</cell><cell>0.486 0.507</cell></row><row><cell>Tangent-S</cell><cell>0.461 0.472</cell></row><row><cell cols="2">TangentCFT 0.462 0.464</cell></row><row><cell>FORTE</cell><cell>0.475 0.485</cell></row><row><cell cols="2">FORTE-App 0.509 0.513</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">We will refer to "math symbol" and "node" interchangeably depending on context.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Bahdanau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<idno>arXiv e-prints</idno>
		<title level="m">Neural Machine Translation by Jointly Learning to Align and Translate</title>
				<imprint>
			<date type="published" when="2014-09">Sep 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Retrieval evaluation with incomplete information</title>
		<author>
			<persName><forename type="first">C</forename><surname>Buckley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Voorhees</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Prof. Intl. ACM SIGIR Conf. Res. Develop. Info. Retrieval. p</title>
				<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="25" to="32" />
		</imprint>
	</monogr>
	<note>SIGIR &apos;04</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Tree-to-tree neural networks for program translation</title>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Song</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Intl. Conf. Neural Info. Process. Syst. p</title>
				<meeting>Intl. Conf. Neural Info. ess. Syst. p</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="2552" to="2562" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Learning phrase representations using RNN encoder-decoder for statistical machine translation</title>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Van Merriënboer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gulcehre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bahdanau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bougares</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schwenk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Conf. Empirical Methods Natural Lang. Process</title>
				<meeting>Conf. Empirical Methods Natural Lang. ess</meeting>
		<imprint>
			<date type="published" when="2014-10">Oct 2014</date>
			<biblScope unit="page" from="1724" to="1734" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Layout and semantics: Combining representations for mathematical formula search</title>
		<author>
			<persName><forename type="first">K</forename><surname>Davila</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zanibbi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Prof. Intl. ACM SIGIR Conf. Res. Develop. Info. Retrieval</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1165" to="1168" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Preliminary Exploration of Formula Embedding for Mathematical Information Retrieval: can mathematical formulae be embedded like a natural language?</title>
		<author>
			<persName><forename type="first">L</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Tang</surname></persName>
		</author>
		<idno>arXiv e-prints</idno>
		<imprint>
			<date type="published" when="2017-07">Jul 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Deep Learning</title>
		<author>
			<persName><forename type="first">I</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Courville</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016</date>
			<publisher>MIT Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Set theory -an introduction to independence proofs</title>
		<author>
			<persName><forename type="first">K</forename><surname>Kunen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Studies in logic and the foundations of mathematics</title>
				<meeting><address><addrLine>North-Holland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1983">1983</date>
			<biblScope unit="volume">102</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Deep learning for symbolic mathematics</title>
		<author>
			<persName><forename type="first">G</forename><surname>Lample</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Charton</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=S1eZYeHFDS" />
	</analytic>
	<monogr>
		<title level="m">Proc. Intl. Conf. Learn. Representations</title>
				<meeting>Intl. Conf. Learn. Representations</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Mathematical language processing: Automatic grading and feedback for open response mathematical questions</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Lan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Vats</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">E</forename><surname>Waters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">G</forename><surname>Baraniuk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. ACM Conf. Learn. @ Scale. p</title>
				<meeting>ACM Conf. Learn. @ Scale. p</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="167" to="176" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Tangentcft: An embedding model for mathematical formulas</title>
		<author>
			<persName><forename type="first">B</forename><surname>Mansouri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rohatgi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">W</forename><surname>Oard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">L</forename><surname>Giles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zanibbi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Intl. ACM SIGIR Conf. Res. Develop. Info. Retrieval. p</title>
				<meeting>Intl. ACM SIGIR Conf. Res. Develop. Info. Retrieval. p</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="11" to="18" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Tree-structured attention with hierarchical accumulation</title>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">P</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Joty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hoi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=HJxK5pEYvr" />
	</analytic>
	<monogr>
		<title level="m">Proc. Intl. Conf. Learn. Representations</title>
				<meeting>Intl. Conf. Learn. Representations</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Efficient computation of the tree edit distance</title>
		<author>
			<persName><forename type="first">M</forename><surname>Pawlik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Augsten</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Trans. Database Syst</title>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2015-03">Mar 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Tree edit distance: Robust and memory-efficient</title>
		<author>
			<persName><forename type="first">M</forename><surname>Pawlik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Augsten</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Info. Syst</title>
		<imprint>
			<biblScope unit="volume">56</biblScope>
			<biblScope unit="page" from="157" to="173" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Analysing mathematical reasoning abilities of neural models</title>
		<author>
			<persName><forename type="first">D</forename><surname>Saxton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grefenstette</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kohli</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=H1gR5iR5FX" />
	</analytic>
	<monogr>
		<title level="m">Proc. Intl. Conf. Learn. Representations</title>
				<meeting>Intl. Conf. Learn. Representations</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Ordered neurons: Integrating tree structures into recurrent neural networks</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sordoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Courville</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=B1l6qiR5F7" />
	</analytic>
	<monogr>
		<title level="m">Proc. Intl. Conf. Learn. Representations</title>
				<meeting>Intl. Conf. Learn. Representations</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Novel positional encodings to enable tree-based transformers</title>
		<author>
			<persName><forename type="first">V</forename><surname>Shiv</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Quirk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Intl. Conf. Neural Info. Process. Syst</title>
				<meeting>Intl. Conf. Neural Info. ess. Syst</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="12081" to="12091" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Transformer based multi-grained attention network for aspect-based sentiment analysis</title>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="211152" to="211163" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Sequence to sequence learning with neural networks</title>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Vinyals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Intl. Conf. Neural Info. Process. Syst. p</title>
				<meeting>Intl. Conf. Neural Info. ess. Syst. p</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="3104" to="3112" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">S</forename><surname>Tai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2015-02">Feb 2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv e-prints</note>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Voorhees</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">K</forename><surname>Harman</surname></persName>
		</author>
		<title level="m">TREC: Experiment and evaluation in information retrieval</title>
				<imprint>
			<publisher>MIT press Cambridge</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="volume">63</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Tree transformer: Integrating tree structures into self-attention</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">Y</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">N</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Conf. Empirical Methods Natural Lang. Process. and Intl. Joint Conf. Natural Lang. Process</title>
				<meeting>Conf. Empirical Methods Natural Lang. ess. and Intl. Joint Conf. Natural Lang. ess</meeting>
		<imprint>
			<date type="published" when="2019-11">Nov 2019</date>
			<biblScope unit="page" from="1061" to="1070" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">TopicEq: A Joint Topic and Mathematical Equation Model for Scientific Texts</title>
		<author>
			<persName><forename type="first">M</forename><surname>Yasunaga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lafferty</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. AAAI conf. Artificial Intell</title>
				<meeting>AAAI conf. Artificial Intell</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Automatic generation of headlines for online math questions</title>
		<author>
			<persName><forename type="first">K</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">L</forename><surname>Giles</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. AAAI conf. Artificial Intell</title>
				<meeting>AAAI conf. Artificial Intell</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="9490" to="9497" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Ntcir-12 mathir task overview</title>
		<author>
			<persName><forename type="first">R</forename><surname>Zanibbi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Aizawa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kohlhase</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Ounis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Topic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Davila</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. NTCIR Conf. Eval. Info. Access</title>
				<meeting>NTCIR Conf. Eval. Info. Access</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Recognition and retrieval of mathematical expressions</title>
		<author>
			<persName><forename type="first">R</forename><surname>Zanibbi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Blostein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Intl. J. Document Anal. Recognit</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="331" to="357" />
			<date type="published" when="2012-12">Dec 2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Accelerating substructure similarity search for formula retrieval</title>
		<author>
			<persName><forename type="first">W</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rohatgi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Giles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zanibbi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. European Conf. Info. Retrieval</title>
				<meeting>European Conf. Info. Retrieval</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="714" to="727" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Structural similarity search for formulas using leaf-root paths in operator subtrees</title>
		<author>
			<persName><forename type="first">W</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zanibbi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Intl. Conf. Neural Info. Process. Syst</title>
				<editor>
			<persName><forename type="first">L</forename><surname>Azzopardi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Fuhr</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Mayr</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Hauff</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Hiemstra</surname></persName>
		</editor>
		<meeting>Intl. Conf. Neural Info. ess. Syst</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="116" to="129" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Learning optimal tree models under beam search</title>
		<author>
			<persName><forename type="first">J</forename><surname>Zhuo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Gai</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Intl. Conf. Mach. Learn</title>
				<meeting>Intl. Conf. Mach. Learn</meeting>
		<imprint>
			<date type="published" when="2020-07">Jul 2020</date>
			<biblScope unit="volume">119</biblScope>
			<biblScope unit="page" from="13" to="18" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
