Gröbner basis computation via learning
                         Hiroshi Kera1,∗ , Yuki Ishihara2 , Tristan Vaccon3 and Kazuhiro Yokoyama4
                         1
                           Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba-shi, Chiba, 2638522, Japan
                         2
                           Nihon University, 1-8-14 Kanda Surugadai, Chiyoda-ku, Tokyo, 1018308, Japan
                         3
                           Université de Limoges; CNRS, XLIM, UMR 7252, Limoges, France
                         4
                           Rikkyo University, 3-34-1, Nishi-Ikebukuro, Toshima-ku, Tokyo, 1718501, Japan


                                     Abstract
                                     Solving a polynomial system, or computing an associated Gröbner basis, has been a fundamental task in compu-
                                     tational algebra. However, it is also known for its notorious doubly exponential time complexity in the number
                                     of variables in the worst case. This paper is the first to address the learning of Gröbner basis computation with
                                     Transformers. The training requires many pairs of a polynomial system and the associated Gröbner basis, raising
                                     two novel algebraic problems: random generation of Gröbner bases and transforming them into non-Gröbner
                                     ones, termed as backward Gröbner problem. We resolve these problems with 0-dimensional radical ideals, the
                                     ideals appearing in various applications. The experiments show that our dataset generation method is at least
                                     three orders of magnitude faster than a naive approach, overcoming a crucial challenge in learning to compute
                                     Gröbner bases, and Gröbner computation is learnable in a particular class.

                                     Keywords
                                     Gröbner Bases, Machine Learning, Transformer


                         1. Introduction
                         Understanding the properties of polynomial systems and solving them have been a fundamental problem
                         in computational algebra and algebraic geometry with vast applications in cryptography [1, 2], control
                         theory [3], statistics [4, 5], computer vision [6], systems biology [7], and so forth. Special sets of
                         polynomials called Gröbner bases [8] play a key role to this end. In linear algebra, the Gaussian
                         elimination simplifies or solves a system of linear equations by transforming its coefficient matrix
                         into the reduced row echelon form. Similarly, a Gröbner basis can be regarded as a reduced form
                         of a given polynomial system, and its computation is a generalization of the Gaussian elimination
                         to general polynomial systems. However, computing a Gröbner basis is known for its notoriously
                         bad computational cost in theory and practice. It is an NP-hard problem with the doubly exponential
                         worst-case time complexity in the number of variables [9, 10]. Nevertheless, because of its importance,
                         various algorithms have been proposed in computational algebra to obtain Gröbner bases in better
                         runtime. Examples include Faugère’s F4/F5 algorithms [11, 12] and M4GB [13].
                            In this study, we investigate Gröbner basis computation from a learning perspective, envisioning
                         it as a practical compromise to address large-scale polynomial system solving and understanding,
                         where mathematical algorithms are computationally intractable. The learning approach does not
                         require explicit design of computational procedures, and we only need to train a model using a large
                         amount of (non-Gröbner set, Gröbner basis) pairs. Further, if we restrict ourselves to a particular
                         class of Gröbner bases (or associated ideals), the model may internally find some patterns useful for
                         prediction. The success of learning indicates the existence of such patterns, which encourages the
                         improvement of mathematical algorithms and heuristics. Several recent studies have already addressed

                         SCSS 2024: 10th International Symposium on Symbolic Computation in Software Science, August 28–30, 2024, Tokyo, Japan
                         ∗
                             Corresponding author.
                         Envelope-Open kera@chiba-u.jp (H. Kera); ishihara.yuki@nihon-u.ac.jp (Y. Ishihara); tristan.vaccon@unilim.fr (T. Vaccon);
                         kazuhiro@rikkyo.ac.jp (K. Yokoyama)
                         GLOBE https://hkera.wordpress.com (H. Kera); https://researchmap.jp/yishihara (Y. Ishihara);
                         https://www.unilim.fr/pages_perso/tristan.vaccon/ (T. Vaccon)
                         Orcid 0000-0002-9830-0436 (H. Kera); 0000-0003-4057-3703 (Y. Ishihara); 0000-0003-4208-8349 (T. Vaccon); 0000-0002-5072-7799
                         (K. Yokoyama)
                                     © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings


                                                                                                            51
mathematical tasks via learning, particularly using Transformers [14, 15, 16]. For example, [14] showed
that Transformers can learn symbolic integration simply by observing many ( d𝑓/d𝑥 , 𝑓 ) pairs in training.
The training samples are generated by first randomly generating 𝑓 and computing its derivative d𝑓/d𝑥
and/or by the reverse process.
   However, a crucial challenge in the learning of Gröbner basis computation is that it is mathematically
unknown how to efficiently generate many (non-Gröbner set, Gröbner basis) pairs. We need an efficient
backward approach (i.e., solution-to-problem computation) because, as discussed above, the forward
approach (i.e., problem-to-solution computation) is prohibitively expensive. To this end, we frame two
problems: (i) a random generation of Gröbner bases and (ii) a backward transformation from a Gröbner
basis to an associated non-Gröbner set. To our knowledge, neither of them has been addressed in the
study of Gröbner bases because of the lack of motivations; all the efforts have been dedicated to the
forward computation from a non-Gröbner set to Gröbner basis.
   Tackling aforementioned two unexplored algebraic problems, we investigates the first learning
approach to the Gröbner computation using Transformers and experimentally show its learnability
uncovered two unexplored algebraic problems in the 0-dimensional case. Our experiments show that
the proposed dataset generation is highly efficient and faster than a baseline method by three or four
orders of magnitude. Further, we observe a learnability gap between polynomials on finite fields and
infinite fields while predicting polynomial supports are more tractable. Full version of this paper can be
found in [17].


2. New algebraic problems for dataset generation
Our notations and definitions follow [18] except that we call power products of indeterminate terms
instead of monomials. By Gröbner basis computation, we mean computation of reduced Gröbner bases.
Our goal is to realize Gröbner basis computation through learning. To this end, we need a large training
set {(𝐹𝑖 , 𝐺𝑖 )}𝑚
                𝑖=1 with finite polynomial set 𝐹𝑖 ⊂ 𝑘[𝑥1 , … , 𝑥𝑛 ] and Gröbner basis 𝐺𝑖 of ideal ⟨𝐹𝑖 ⟩. As the
computation from 𝐹𝑖 to 𝐺𝑖 is computationally expensive in general, we instead resort to backward
generation (i.e., solution-to-problem process); that is, we generate a Gröbner basis 𝐺𝑖 randomly and
transform it to non-Gröbner set 𝐹𝑖 .

Problem 2.1 (Random generation of Gröbner bases). Find a collection 𝒢 = {𝐺𝑖 }𝑚            𝑖=1 with the reduced
Gröbner basis 𝐺𝑖 ⊂ 𝑘[𝑥1 , … , 𝑥𝑛 ] of ⟨𝐺𝑖 ⟩, 𝑖 = 1, … , 𝑚. The collection should contain diverse bases, and we
need an efficient algorithm for constructing them.

Problem 2.2 (Backward Gröbner problem). Given a Gröbner basis 𝐺 ⊂ 𝑘[𝑥1 , … , 𝑥𝑛 ], find a collection
         𝜇
ℱ = {𝐹𝑖 }𝑖=1 of polynomial sets that are not Gröbner bases but ⟨𝐹𝑖 ⟩ = ⟨𝐺⟩ for 𝑖 = 1, … , 𝜇. The collection
should contain diverse sets, and we need an efficient algorithm for constructing them.

   Problems. 2.1 and 2.2 require the collections 𝒢 , ℱ to contain diverse polynomial sets. Thus, the algo-
rithms for these problems should not be deterministic but should have some controllable randomness.
   What makes the learning of Gröbner basis computation hard is that, to our knowledge, neither (i) a
random generation of Gröbner basis nor (ii) the backward transform from Gröbner basis to non-Gröbner
set has been considered in computational algebra. Its primary interest has been instead posed on
Gröbner basis computation (i.e., forward generation), and nothing motivates the random generation of
Gröbner basis nor the backward transform. Interestingly, machine learning now sheds light on them.
Formally, we address the following problems for dataset generation.
   In this paper, we tackle these problems in the case of radical 0-dimensional ideals. We first address
Prob. 2.1 using the fact that 0-dimensional radical ideals are generally in shape position.

Definition 2.3 (Shape position). Ideal 𝐼 ⊂ 𝑘[𝑥1 , … , 𝑥𝑛 ] is called in shape position if some univariate
polynomials ℎ, 𝑔1 , … , 𝑔𝑛−1 ∈ 𝑘[𝑥𝑛 ] form the reduced ≺lex -Gröbner basis of 𝐼 as follows.

                                      𝐺 = {ℎ, 𝑥1 − 𝑔1 , … , 𝑥𝑛−1 − 𝑔𝑛−1 }.                                (2.1)


                                                      52
  Particularly, 0-dimensional radical ideals are almost always in shape position if 𝑘 is an infinite field
or finite field with large field order [19, 20]. With this fact, an efficient sampling of Gröbner bases of
0-dimensional radical ideals can be realized by sampling 𝑛 polynomials in 𝑘[𝑥𝑛 ], i.e., ℎ, 𝑔1 , … , 𝑔𝑛−1 with
ℎ ≠ 0. We have to make sure that the degree of ℎ is always greater than that of 𝑔1 , … , 𝑔𝑛−1 , which is
necessary and sufficient for 𝐺 to be a reduced Gröbner basis. This approach involves efficiency and
randomness, and thus resolving Prob. 2.1. To address Prob. 2.2, we consider the following problem.
Problem 2.4. Let 𝐼 ⊂ 𝑘[𝑥1 , … , 𝑥𝑛 ] be a 0-dimensional ideal, and let 𝐺 = (𝑔1 , … , 𝑔𝑡 )⊤ ∈ 𝑘[𝑥1 , … , 𝑥𝑛 ]𝑡 be
its ≺-Gröbner basis with respect to term order ≺.1 Find a polynomial matrix 𝐴 ∈ 𝑘[𝑥1 , … , 𝑥𝑛 ]𝑠×𝑡 giving a
non-Gröbner set 𝐹 = (𝑓1 , … , 𝑓𝑠 )⊤ = 𝐴𝐺 such that ⟨𝐹 ⟩ = ⟨𝐺⟩.
                                                                                                           𝑡
   Namely, we generate a set of polynomials 𝐹 = (𝑓1 , … , 𝑓𝑠 )⊤ from 𝐺 = (𝑔1 , … , 𝑔𝑡 )⊤ by 𝑓𝑖 = ∑𝑗=1 𝑎𝑖𝑗 𝑔𝑗 for
𝑖 = 1, … , 𝑠, where 𝑎𝑖𝑗 ∈ 𝑘[𝑥1 , … , 𝑥𝑛 ] denotes the (𝑖, 𝑗)-th entry of 𝐴. Note that ⟨𝐹 ⟩ and ⟨𝐺⟩ are generally
not identical, and the design of 𝐴 such that ⟨𝐹 ⟩ = ⟨𝐺⟩ is of our question.
   A similar question was studied without the Gröbner condition in [21, 22]. They provided an algebraic
necessary and sufficient condition for the polynomial system of 𝐹 to have a solution outside the variety
defined by 𝐺. This condition is expressed explicitly by multivariate resultants. However, strong
additional assumptions are required: 𝐴, 𝐹 , 𝐺 are homogeneous, 𝐺 is a regular sequence, and in the end,
⟨𝐹⟩ = ⟨𝐺⟩ is only satisfied up to saturation. Thus, they are not compatible with our setting and method
for Prob. 2.1. Our analysis gives the following results for the design 𝐴 to achieve ⟨𝐹 ⟩ = ⟨𝐺⟩ for the
0-dimensional case.
Theorem 2.5. Let 𝐺 = (𝑔1 , … , 𝑔𝑡 )⊤ be a Gröbner basis of a 0-dimensional ideal in 𝑘[𝑥1 , … , 𝑥𝑛 ]. Let
𝐹 = (𝑓1 , … , 𝑓𝑠 )⊤ = 𝐴𝐺 with 𝐴 ∈ 𝑘[𝑥1 , … , 𝑥𝑛 ]𝑠×𝑡 .
    1. If ⟨𝐹 ⟩ = ⟨𝐺⟩, it implies 𝑠 ≥ 𝑛.
    2. If 𝐴 has a left-inverse in 𝑘[𝑥1 , … , 𝑥𝑛 ]𝑡×𝑠 , ⟨𝐹 ⟩ = ⟨𝐺⟩ holds.
    3. The equality ⟨𝐹 ⟩ = ⟨𝐺⟩ holds if and only if there exists a matrix 𝐵 ∈ 𝑘[𝑥1 , … , 𝑥𝑛 ]𝑡×𝑠 such that each
       row of 𝐵𝐴 − 𝐸𝑡 is a syzygy of 𝐺, where 𝐸𝑡 is the identity matrix of size 𝑡.
 We now assume ≺=≺lex and 0-dimensional ideals in shape position. Then, 𝐺 has exactly 𝑛 generators.
When 𝑠 = 𝑛, we have the following.
Proposition 2.6. For any 𝐴 ∈ 𝑘[𝑥1 , … , 𝑥𝑛 ]𝑛×𝑛 with det(𝐴) ∈ 𝑘 ∖ {0}, we have ⟨𝐹 ⟩ = ⟨𝐺⟩.
   As non-zero constant scaling does not change the ideal, we focus on 𝐴 with det(𝐴) = ±1 without
loss of generality. Such 𝐴 can be constructed using the Bruhat decomposition 𝐴 = 𝑈1 𝑃𝑈2 , where
𝑈1 , 𝑈2 ∈ ST(𝑛, 𝑘[𝑥1 , … , 𝑥𝑛 ]) are upper-triangular matrices with all-one diagonal entries (i.e., unimodular
upper-triangular matrices) and 𝑃 ∈ {0, 1}𝑛×𝑛 denotes a permutation matrix. Noting that 𝐴−1 satisfies
𝐴−1 𝐴 = 𝐸𝑛 , we have ⟨𝐴𝐺⟩ = ⟨𝐺⟩ from Thm. 2.5. Therefore, random sampling (𝑈1 , 𝑈2 , 𝑃) of unimodular
upper-triangular matrices 𝑈1 , 𝑈2 and a permutation matrix 𝑃 resolves the backward Gröbner problem for
𝑠 = 𝑛. We extend this idea to the case of 𝑠 > 𝑛 using a rectangular unimodular upper-triangular matrix
          𝑈′
𝑈2 = ( 2 ) ∈ 𝑘[𝑥1 , … , 𝑥𝑛 ]𝑠×𝑛 , where 𝑈2′ ∈ 𝑘[𝑥1 , … , 𝑥𝑛 ]𝑛×𝑛 is a unimodular upper-triangular matrix and
        𝑂𝑠−𝑛,𝑛
𝑂𝑠−𝑛,𝑛 ∈ 𝑘[𝑥1 , … , 𝑥𝑛 ](𝑠−𝑛)×𝑛 is the zero matrix. The permutation matrix is now 𝑃 ∈ {0, 1}𝑠×𝑠 . Our strategy
is to compute 𝐹 = 𝑈1 𝑃𝑈2 𝐺, which only requires a sampling of 𝒪(𝑠 2 ) polynomials in 𝑘[𝑥1 , … , 𝑥𝑛 ], and
𝒪(𝑛2 + 𝑠 2 )-times multiplications of polynomials.


3. Experiments
We present the efficiency of our dataset generation method and the learnability of Gröbner basis
computation. The experiments were conducted with 48-core CPUs, 768GB RAM, and NVIDIA RTX
A6000ada GPUs. Due to the space limitation, we cannot present full experimental setup. See the full
version in [17].
1
We surcharge notations to mean that the set {𝑔1 , … , 𝑔𝑡 } defined by the vector 𝐺 is a ≺-Gröbner basis.


                                                              53
Table 1
Runtime comparison (in seconds) of forward generation (F.) and backward generation (B.) of dataset 𝒟𝑛 (𝔽7 ) of
size 1,000. The forward generation used either of the three algorithms provided in SageMath with the libSingular
backend. We set a timeout limit to five seconds (added to the total runtime at every occurrence) for each Gröbner
basis computation. The numbers with † and ‡ include the timeout for more than 13 % and 24 % of the runs,
respectively.
 Method                  𝑛=2                     𝑛=3                     𝑛=4                     𝑛=5
 F. (std)                4.65                     129                    873†                    1354‡
 F. (slimgb)             4.67                     149                    712†                    1259‡
 F. (stdfglm)            5.78                    12.6                    44.2                      360
 B. (ours)               .003                    .005                    .009                     .014


Dataset generation. We constructed 12 datasets 𝒟𝑛 (𝑘) for 𝑛 ∈ {2, 3, 4, 5} and 𝑘 ∈ {𝔽7 , 𝔽31 , ℚ} and
measured the runtime of our backward generation and naive forward generation (i.e., Gröbner basis
computation). In the backward generation, we sampled Gröbner bases of ideals in shape position. In
this step, univariable polynomials were generically sampled in 𝑘[𝑥1 , … , 𝑥𝑛 ]≤5 . Next, Gröbner bases were
transformed to non-Gröbner sets based on Thm. 2.5. Random polynomials in Bruhat decomposition
(i.e., 𝑈1 and 𝑈2′ ) were sampled from 𝑘[𝑥1 , … , 𝑥𝑛 ]≤3 and restricted to monomials and binomials. For
ℚ, coefficients of all sampled polynomials were bounded as 𝑎/𝑏 with 𝑎, 𝑏 ∈ {−5, … , 5} and we only
accepted 𝐹 with coefficients such as 𝑎, 𝑏 ∈ {−100, … , 100}. This restriction is required from our machine
learning model and learning framework. For forward generation, we adopted three algorithms given
by SageMath [23] with the libSingular backend. For a fair comparison, forward generation computed
Gröbner bases of the non-Gröbner sets given by the backward generation, leading to the identical
dataset. As Tab. 1 shows, our backward generation is significant orders of magnitude faster than the
forward generation. A sharp runtime growth is observed in the forward generation as the number of
variables increases. Note that these numbers only show the runtime on 1,000 samples, while training
typically requires millions of samples. Therefore, the forward generation is almost infeasible, and the
proposed method resolves a bottleneck in the learning of Gröbner basis computation.

Learning results. We used a standard Transformer (e.g., 6 encoder/decoder layers and 8 attention
heads) and a standard training setup. The batch size was set to 16, and models were trained for 8 epochs.
Each polynomial set in the datasets is converted into a sequence using the prefix representation and
the separator tokens. To make the input sequence length manageable for vanilla Transformers, we
used simpler datasets 𝒟𝑛− (𝑘) using 𝑈1 , 𝑈2′ of a moderate density 𝜎 ∈ (0, 1]. This makes the maximum
sequence length less than 5,000. Specifically, we used 𝜎 = 1.0, 0.6, 0.3, 0.2 for 𝑛 = 2, 3, 4, 5, respectively.
The training set has one million samples, and the test set has one thousand samples. Table 2 shows
that trained Transformers successfully compute Gröbner bases with moderate/high accuracy. Not
shown here, but we found several examples in the datasets for which Transformer successfully compute
Gröbner bases significantly faster than math algorithms. The accuracy shows that the learning is
more successful on infinite field coefficients 𝑘 ∈ {ℚ, ℝ} than finite field ones 𝑘 = 𝔽𝑝 . This may be a
counter-intuitive observation because there are more possible coefficients in 𝐺 and 𝐹 for ℚ than 𝔽𝑝 .
Specifically, for 𝐺, the coefficient 𝑎/𝑏 ∈ ℚ is restricted to those with 𝑎, 𝑏 ∈ {−5, … , 5} (i.e., roughly 50
choices), and 𝑎, 𝑏 ∈ {−100, … , 100} (i.e., roughly 20,000 choices) for 𝐹. In contrast, there are only 𝑝
choices for 𝔽𝑝 . The performance even degrades for the larger order 𝑝 = 31. Interestingly, the support
accuracy shows that the terms forming the polynomial (i.e., the support of polynomial) are correctly
identified well. Thus, Transformers have difficulty determining the coefficients in finite fields. Several
studies have also reported that learning to solve a problem involving modular arithmetic may encounter
some difficulties [24, 25, 26].


                                                        54
    Table 2
    Accuracy [%] / support accuracy [%] of Gröbner basis computation by Transformer on 𝒟𝑛− (𝑘). In the
    support accuracy, two polynomials are considered identical if they consist of an identical set of terms
    (i.e., identical support), Note that the datasets for 𝑛 = 3, 4, 5 are here constructed using 𝑈1 , 𝑈2′ with
    density 𝜎 = 0.6, 0.3, 0.2, respectively.
      Ring            𝑛 = 2, 𝜎 = 1            𝑛 = 3, 𝜎 = 0.6          𝑛 = 4, 𝜎 = 0.3           𝑛 = 5, 𝜎 = 0.2
 ℚ[𝑥1 , … , 𝑥𝑛 ]       94.6 / 97.9              96.1 / 98.6             96.2 / 98.6              91.8 / 97.9
 𝔽7 [𝑥1 , … , 𝑥𝑛 ]     66.6 / 76.6              78.8 / 87.6             80.9 / 91.1              83.2 / 91.4
 𝔽31 [𝑥1 , … , 𝑥𝑛 ]    44.7 / 82.7              58.5 / 89.3             73.9 / 93.9              80.0 / 93.4


4. Conclusion
This study proposed the first learning approach to a fundamental algebraic task, the Gröbner basis
computation. While various recent studies have reported the learnability of mathematical problems by
Transformers, we addressed the first problem with nontriviality in the dataset generation. Ultimately,
the learning approach may be useful to address large-scale problems that cannot be approached by
Gröbner basis computation algorithms because of their computational complexity. Transformers can
output predictions in moderate runtime. The outputs may be incorrect, but there is a chance of obtaining
a hint of a solution, as shown in our experiments. We believe that our study reveals many interesting
open questions to achieve Gröbner basis computation learning.


Acknowledgments
This research was supported by JST ACT-X Grant Number JPMJAX23C8 and JSPS KAKENHI Grant
Number JP22K13901. Yuta Kambe (Mitsubishi Electric Information Technology R&D Center) is not
included in the authors due to a technical reason at submission.


References
 [1] G. V. Bard, Algorithms for Solving Polynomial Systems, Springer US, 2009.
 [2] T. Yasuda, X. Dahan, Y.-J. Huang, T. Takagi, K. Sakurai, MQ challenge: hardness evaluation of
     solving multivariate quadratic problems, Cryptology ePrint Archive (2015).
 [3] H. Park, G. Regensburger (Eds.), Gröbner Bases in Control Theory and Signal Processing, De
     Gruyter, 2007.
 [4] P. Diaconis, B. Sturmfels, Algebraic algorithms for sampling from conditional distributions, The
     Annals of Statistics 26 (1998) 363 – 397.
 [5] T. Hibi, Gröbner bases. Statistics and software systems., Springer Tokyo, 2014.
 [6] H. Stewenius, Gröbner Basis Methods for Minimal Problems in Computer Vision, Ph.D. thesis,
     Mathematics (Faculty of Engineering), 2005.
 [7] R. Laubenbacher, B. Sturmfels, Computer algebra in systems biology, American Mathematical
     Monthly 116 (2009) 882–891.
 [8] B. Buchberger, Ein Algorithmus zum Auffinden der Basiselemente des Restklassenringes nach
     einem nulldimensionalen Polynomideal (An Algorithm for Finding the Basis Elements in the
     Residue Class Ring Modulo a Zero Dimensional Polynomial Ideal), Ph.D. thesis, Mathematical
     Institute, University of Innsbruck, Austria, 1965. English translation in J. of Symbolic Computation,
     Special Issue on Logic, Mathematics, and Computer Science: Interactions. Vol. 41, Number 3-4,
     Pages 475–511, 2006.
 [9] E. W. Mayr, A. R. Meyer, The complexity of the word problems for commutative semigroups and
     polynomial ideals, Advances in Mathematics 46 (1982) 305–329.
[10] T. W. Dubé, The structure of polynomial ideals and Gröbner bases, SIAM Journal on Computing
     19 (1990) 750–773.


                                                       55
[11] J.-C. Faugère, A new efficient algorithm for computing Gröbner bases (F4), Journal of Pure and
     Applied Algebra 139 (1999) 61–88.
[12] J.-C. Faugère, A new efficient algorithm for computing Gröbner bases without reduction to zero
     (F5), in: Proceedings of the 2002 International Symposium on Symbolic and Algebraic Computation,
     ISSAC ’02, Association for Computing Machinery, New York, NY, USA, 2002, p. 75–83.
[13] R. H. Makarim, M. Stevens, M4GB: An efficient Gröbner-basis algorithm, in: Proceedings of
     the 2017 ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC’17,
     Association for Computing Machinery, New York, NY, USA, 2017, pp. 293–300.
[14] G. Lample, F. Charton, Deep learning for symbolic mathematics, in: International Conference on
     Learning Representations, 2020. URL: https://openreview.net/forum?id=S1eZYeHFDS.
[15] L. Biggio, T. Bendinelli, A. Neitz, A. Lucchi, G. Parascandolo, Neural symbolic regression that
     scales, in: Proceedings of the 38th International Conference on Machine Learning, volume 139,
     2021, pp. 936–945.
[16] F. Charton, Linear algebra with transformers, Transactions on Machine Learning Research (2022).
[17] H. Kera, Y. Ishihara, Y. Kambe, T. Vaccon, K. Yokoyama, Learning to compute gröbner bases, 2024.
     arXiv:2311.12904 .
[18] D. A. Cox, J. Little, D. O’Shea, Ideals, Varieties, and Algorithms: An Introduction to Computational
     Algebraic Geometry and Commutative Algebra, Undergraduate Texts in Mathematics, Springer
     International Publishing, 2015.
[19] P. Gianni, T. Mora, Algebraic solution of systems of polynomial equations using Groebner bases, in:
     Applied Algebra, Algebraic Algorithms and Error-Correcting Codes, Springer Berlin Heidelberg,
     Berlin, Heidelberg, 1989, pp. 247–257.
[20] M. Noro, K. Yokoyama, A modular method to compute the rational univariate representation of
     zero-dimensional ideals, Journal of Symbolic Computation 28 (1999) 243–263.
[21] L. Busé, M. Elkadi, B. Mourrain, Resultant over the residual of a complete intersection, Journal of
     Pure and Applied Algebra 164 (2001) 35–57.
[22] L. Busé, Étude du résultant sur une variété algébrique, Theses, Université Nice Sophia Antipolis,
     2001.
[23] The Sage Developers, SageMath, the Sage Mathematics Software System (Version 10.0), 2023.
     https://www.sagemath.org .
[24] A. Power, Y. Burda, H. Edwards, I. Babuschkin, V. Misra, Grokking: Generalization beyond
     overfitting on small algorithmic datasets, arXiv abs/2201.02177 (2022).
[25] F. Charton, Can transformers learn the greatest common divisor?, arXiv abs/2308.15594 (2023).
[26] A. Gromov, Grokking modular arithmetic, arXiv abs/2301.02679 (2023).


                                                   56