Gröbner basis computation via learning

Gröbner basis computation via learning HiroshiKera kera@chiba-u.jp Chiba University

1-33 Yayoi-cho, Inage-ku, Chiba-shi 2638522 Chiba Japan

YukiIshihara ishihara.yuki@nihon-u.ac.jp Nihon University

1-8-14 Kanda Surugadai, Chiyoda-ku 1018308 Tokyo Japan

TristanVaccon tristan.vaccon@unilim.fr UMR 7252 Université de Limoges CNRS XLIM

Limoges France

KazuhiroYokoyama kazuhiro@rikkyo.ac.jp Rikkyo University

3-34-1, Nishi-Ikebukuro, Toshima-ku 1718501 Tokyo Japan

Gröbner basis computation via learning 1613-0073 3946AB559C3AFBC219193420359E4DB6 GROBID - A machine learning software for extracting information from scholarly documents Gröbner Bases, Machine Learning, Transformer Orcid 0000-0002-9830-0436 (H. Kera) 0000-0003-4057-3703 (Y. Ishihara) 0000-0003-4208-8349 (T. Vaccon) 0000-0002-5072-7799 (K. Yokoyama)

Solving a polynomial system, or computing an associated Gröbner basis, has been a fundamental task in computational algebra. However, it is also known for its notorious doubly exponential time complexity in the number of variables in the worst case. This paper is the first to address the learning of Gröbner basis computation with Transformers. The training requires many pairs of a polynomial system and the associated Gröbner basis, raising two novel algebraic problems: random generation of Gröbner bases and transforming them into non-Gröbner ones, termed as backward Gröbner problem. We resolve these problems with 0-dimensional radical ideals, the ideals appearing in various applications. The experiments show that our dataset generation method is at least three orders of magnitude faster than a naive approach, overcoming a crucial challenge in learning to compute Gröbner bases, and Gröbner computation is learnable in a particular class.

Introduction

Understanding the properties of polynomial systems and solving them have been a fundamental problem in computational algebra and algebraic geometry with vast applications in cryptography [1,2], control theory [3], statistics [4,5], computer vision [6], systems biology [7], and so forth. Special sets of polynomials called Gröbner bases [8] play a key role to this end. In linear algebra, the Gaussian elimination simplifies or solves a system of linear equations by transforming its coefficient matrix into the reduced row echelon form. Similarly, a Gröbner basis can be regarded as a reduced form of a given polynomial system, and its computation is a generalization of the Gaussian elimination to general polynomial systems. However, computing a Gröbner basis is known for its notoriously bad computational cost in theory and practice. It is an NP-hard problem with the doubly exponential worst-case time complexity in the number of variables [9,10]. Nevertheless, because of its importance, various algorithms have been proposed in computational algebra to obtain Gröbner bases in better runtime. Examples include Faugère's F4/F5 algorithms [11,12] and M4GB [13].

In this study, we investigate Gröbner basis computation from a learning perspective, envisioning it as a practical compromise to address large-scale polynomial system solving and understanding, where mathematical algorithms are computationally intractable. The learning approach does not require explicit design of computational procedures, and we only need to train a model using a large amount of (non-Gröbner set, Gröbner basis) pairs. Further, if we restrict ourselves to a particular class of Gröbner bases (or associated ideals), the model may internally find some patterns useful for prediction. The success of learning indicates the existence of such patterns, which encourages the improvement of mathematical algorithms and heuristics. Several recent studies have already addressed mathematical tasks via learning, particularly using Transformers [14,15,16]. For example, [14] showed that Transformers can learn symbolic integration simply by observing many ( d𝑓/d𝑥 , 𝑓 ) pairs in training.

The training samples are generated by first randomly generating 𝑓 and computing its derivative d𝑓/d𝑥 and/or by the reverse process.

However, a crucial challenge in the learning of Gröbner basis computation is that it is mathematically unknown how to efficiently generate many (non-Gröbner set, Gröbner basis) pairs. We need an efficient backward approach (i.e., solution-to-problem computation) because, as discussed above, the forward approach (i.e., problem-to-solution computation) is prohibitively expensive. To this end, we frame two problems: (i) a random generation of Gröbner bases and (ii) a backward transformation from a Gröbner basis to an associated non-Gröbner set. To our knowledge, neither of them has been addressed in the study of Gröbner bases because of the lack of motivations; all the efforts have been dedicated to the forward computation from a non-Gröbner set to Gröbner basis.

Tackling aforementioned two unexplored algebraic problems, we investigates the first learning approach to the Gröbner computation using Transformers and experimentally show its learnability uncovered two unexplored algebraic problems in the 0-dimensional case. Our experiments show that the proposed dataset generation is highly efficient and faster than a baseline method by three or four orders of magnitude. Further, we observe a learnability gap between polynomials on finite fields and infinite fields while predicting polynomial supports are more tractable. Full version of this paper can be found in [17].

New algebraic problems for dataset generation

Our notations and definitions follow [18] except that we call power products of indeterminate terms instead of monomials. By Gröbner basis computation, we mean computation of reduced Gröbner bases. Our goal is to realize Gröbner basis computation through learning. To this end, we need a large training set {(𝐹 𝑖 , 𝐺 𝑖 )} 𝑚 𝑖=1 with finite polynomial set 𝐹 𝑖 ⊂ 𝑘[𝑥 1 , … , 𝑥 𝑛 ] and Gröbner basis 𝐺 𝑖 of ideal ⟨𝐹 𝑖 ⟩. As the computation from 𝐹 𝑖 to 𝐺 𝑖 is computationally expensive in general, we instead resort to backward generation (i.e., solution-to-problem process); that is, we generate a Gröbner basis 𝐺 𝑖 randomly and transform it to non-Gröbner set 𝐹 𝑖 . Problems. 2.1 and 2.2 require the collections 𝒢 , ℱ to contain diverse polynomial sets. Thus, the algorithms for these problems should not be deterministic but should have some controllable randomness.

What makes the learning of Gröbner basis computation hard is that, to our knowledge, neither (i) a random generation of Gröbner basis nor (ii) the backward transform from Gröbner basis to non-Gröbner set has been considered in computational algebra. Its primary interest has been instead posed on Gröbner basis computation (i.e., forward generation), and nothing motivates the random generation of Gröbner basis nor the backward transform. Interestingly, machine learning now sheds light on them. Formally, we address the following problems for dataset generation.

In this paper, we tackle these problems in the case of radical 0-dimensional ideals. We first address Prob. 2.1 using the fact that 0-dimensional radical ideals are generally in shape position.

Definition 2.3 (Shape position). Ideal𝐼 ⊂ 𝑘[𝑥 1 , … , 𝑥 𝑛 ] is called in shape position if some univariate polynomials ℎ, 𝑔 1 , … , 𝑔 𝑛−1 ∈ 𝑘[𝑥 𝑛 ] form the reduced ≺ lex -Gröbner basis of 𝐼 as follows. 𝐺 = {ℎ, 𝑥 1 − 𝑔 1 , … , 𝑥 𝑛−1 − 𝑔 𝑛−1 }. (2.1)

Particularly, 0-dimensional radical ideals are almost always in shape position if 𝑘 is an infinite field or finite field with large field order [19,20]. With this fact, an efficient sampling of Gröbner bases of 0-dimensional radical ideals can be realized by sampling 𝑛 polynomials in 𝑘[𝑥 𝑛 ], i.e., ℎ, 𝑔 1 , … , 𝑔 𝑛−1 with ℎ ≠ 0. We have to make sure that the degree of ℎ is always greater than that of 𝑔 1 , … , 𝑔 𝑛−1 , which is necessary and sufficient for 𝐺 to be a reduced Gröbner basis. This approach involves efficiency and randomness, and thus resolving Prob. 2.1. To address Prob. 2.2, we consider the following problem. A similar question was studied without the Gröbner condition in [21,22]. They provided an algebraic necessary and sufficient condition for the polynomial system of 𝐹 to have a solution outside the variety defined by 𝐺. This condition is expressed explicitly by multivariate resultants. However, strong additional assumptions are required: 𝐴, 𝐹 , 𝐺 are homogeneous, 𝐺 is a regular sequence, and in the end, ⟨𝐹⟩ = ⟨𝐺⟩ is only satisfied up to saturation. Thus, they are not compatible with our setting and method for Prob. 2.1. Our analysis gives the following results for the design 𝐴 to achieve ⟨𝐹 ⟩ = ⟨𝐺⟩ for the 0-dimensional case.

Theorem 2.5. Let 𝐺 = (𝑔 1 , … , 𝑔 𝑡 ) ⊤ be a Gröbner basis of a 0-dimensional ideal in 𝑘[𝑥 1 , … , 𝑥 𝑛 ]. Let 𝐹 = (𝑓 1 , … , 𝑓 𝑠 ) ⊤ = 𝐴𝐺 with 𝐴 ∈ 𝑘[𝑥 1 , … , 𝑥 𝑛 ] 𝑠×𝑡 . 1. If ⟨𝐹 ⟩ = ⟨𝐺⟩, it implies 𝑠 ≥ 𝑛. 2. If 𝐴 has a left-inverse in 𝑘[𝑥 1 , … , 𝑥 𝑛 ] 𝑡×𝑠 , ⟨𝐹 ⟩ = ⟨𝐺⟩ holds.

The equality ⟨𝐹 ⟩ = ⟨𝐺⟩ holds if and only if there exists a matrix 𝐵 ∈ 𝑘[𝑥 1 , … , 𝑥 𝑛 ]

𝑡×𝑠 such that each row of 𝐵𝐴 − 𝐸 𝑡 is a syzygy of 𝐺, where 𝐸 𝑡 is the identity matrix of size 𝑡.

We now assume ≺=≺ lex and 0-dimensional ideals in shape position. Then, 𝐺 has exactly 𝑛 generators. When 𝑠 = 𝑛, we have the following. Proposition 2.6. For any 𝐴 ∈ 𝑘[𝑥 1 , … , 𝑥 𝑛 ] 𝑛×𝑛 with det(𝐴) ∈ 𝑘 ∖ {0}, we have ⟨𝐹 ⟩ = ⟨𝐺⟩.

As non-zero constant scaling does not change the ideal, we focus on 𝐴 with det(𝐴) = ±1 without loss of generality. Such 𝐴 can be constructed using the Bruhat decomposition 𝐴 = 𝑈 1 𝑃𝑈 2 , where 𝑈 1 , 𝑈 2 ∈ ST(𝑛, 𝑘[𝑥 1 , … , 𝑥 𝑛 ]) are upper-triangular matrices with all-one diagonal entries (i.e., unimodular upper-triangular matrices) and 𝑃 ∈ {0, 1} 𝑛×𝑛 denotes a permutation matrix. Noting that 𝐴 −1 satisfies 𝐴 −1 𝐴 = 𝐸 𝑛 , we have ⟨𝐴𝐺⟩ = ⟨𝐺⟩ from Thm. 2.5. Therefore, random sampling (𝑈 1 , 𝑈 2 , 𝑃) of unimodular upper-triangular matrices 𝑈 1 , 𝑈 2 and a permutation matrix 𝑃 resolves the backward Gröbner problem for 𝑠 = 𝑛. We extend this idea to the case of 𝑠 > 𝑛 using a rectangular unimodular upper-triangular matrix

𝑈 2 = ( 𝑈 ′ 2 𝑂 𝑠−𝑛,𝑛 ) ∈ 𝑘[𝑥 1 , … , 𝑥 𝑛 ] 𝑠×𝑛 ,

Experiments

We present the efficiency of our dataset generation method and the learnability of Gröbner basis computation. The experiments were conducted with 48-core CPUs, 768GB RAM, and NVIDIA RTX A6000ada GPUs. Due to the space limitation, we cannot present full experimental setup. See the full version in [17].

We surcharge notations to mean that the set {𝑔 1 , … , 𝑔 𝑡 } defined by the vector 𝐺 is a ≺-Gröbner basis.

Table 1

Runtime comparison (in seconds) of forward generation (F.) and backward generation (B.) of dataset 𝒟 𝑛 (𝔽 7 ) of size 1,000. The forward generation used either of the three algorithms provided in SageMath with the libSingular backend. We set a timeout limit to five seconds (added to the total runtime at every occurrence) for each Gröbner basis computation. The numbers with † and ‡ include the timeout for more than 13 % and 24 % of the runs, respectively. ≤3 and restricted to monomials and binomials. For ℚ, coefficients of all sampled polynomials were bounded as 𝑎/𝑏 with 𝑎, 𝑏 ∈ {−5, … , 5} and we only accepted 𝐹 with coefficients such as 𝑎, 𝑏 ∈ {−100, … , 100}. This restriction is required from our machine learning model and learning framework. For forward generation, we adopted three algorithms given by SageMath [23] with the libSingular backend. For a fair comparison, forward generation computed Gröbner bases of the non-Gröbner sets given by the backward generation, leading to the identical dataset. As Tab. 1 shows, our backward generation is significant orders of magnitude faster than the forward generation. A sharp runtime growth is observed in the forward generation as the number of variables increases. Note that these numbers only show the runtime on 1,000 samples, while training typically requires millions of samples. Therefore, the forward generation is almost infeasible, and the proposed method resolves a bottleneck in the learning of Gröbner basis computation.

Method

Learning results. We used a standard Transformer (e.g., 6 encoder/decoder layers and 8 attention heads) and a standard training setup. The batch size was set to 16, and models were trained for 8 epochs. Each polynomial set in the datasets is converted into a sequence using the prefix representation and the separator tokens. To make the input sequence length manageable for vanilla Transformers, we used simpler datasets 𝒟 − 𝑛 (𝑘) using 𝑈 1 , 𝑈 ′ 2 of a moderate density 𝜎 ∈ (0, 1]. This makes the maximum sequence length less than 5,000. Specifically, we used 𝜎 = 1.0, 0.6, 0.3, 0.2 for 𝑛 = 2, 3, 4, 5, respectively.

The training set has one million samples, and the test set has one thousand samples. Table 2 shows that trained Transformers successfully compute Gröbner bases with moderate/high accuracy. Not shown here, but we found several examples in the datasets for which Transformer successfully compute Gröbner bases significantly faster than math algorithms. The accuracy shows that the learning is more successful on infinite field coefficients 𝑘 ∈ {ℚ, ℝ} than finite field ones 𝑘 = 𝔽 𝑝 . This may be a counter-intuitive observation because there are more possible coefficients in 𝐺 and 𝐹 for ℚ than 𝔽 𝑝 . Specifically, for 𝐺, the coefficient 𝑎/𝑏 ∈ ℚ is restricted to those with 𝑎, 𝑏 ∈ {−5, … , 5} (i.e., roughly 50 choices), and 𝑎, 𝑏 ∈ {−100, … , 100} (i.e., roughly 20,000 choices) for 𝐹. In contrast, there are only 𝑝 choices for 𝔽 𝑝 . The performance even degrades for the larger order 𝑝 = 31. Interestingly, the support accuracy shows that the terms forming the polynomial (i.e., the support of polynomial) are correctly identified well. Thus, Transformers have difficulty determining the coefficients in finite fields. Several studies have also reported that learning to solve a problem involving modular arithmetic may encounter some difficulties [24,25,26].

Conclusion

This study proposed the first learning approach to a fundamental algebraic task, the Gröbner basis computation. While various recent studies have reported the learnability of mathematical problems by Transformers, we addressed the first problem with nontriviality in the dataset generation. Ultimately, the learning approach may be useful to address large-scale problems that cannot be approached by Gröbner basis computation algorithms because of their computational complexity. Transformers can output predictions in moderate runtime. The outputs may be incorrect, but there is a chance of obtaining a hint of a solution, as shown in our experiments. We believe that our study reveals many interesting open questions to achieve Gröbner basis computation learning.

Problem 2 . 1 (21Random generation of Gröbner bases). Find a collection 𝒢 = {𝐺 𝑖 } 𝑚 𝑖=1 with the reduced Gröbner basis 𝐺 𝑖 ⊂ 𝑘[𝑥 1 , … , 𝑥 𝑛 ] of ⟨𝐺 𝑖 ⟩, 𝑖 = 1, … , 𝑚. The collection should contain diverse bases, and we need an efficient algorithm for constructing them. Problem 2.2 (Backward Gröbner problem). Given a Gröbner basis 𝐺 ⊂ 𝑘[𝑥 1 , … , 𝑥 𝑛 ], find a collection ℱ = {𝐹 𝑖 } 𝜇 𝑖=1 of polynomial sets that are not Gröbner bases but ⟨𝐹 𝑖 ⟩ = ⟨𝐺⟩ for 𝑖 = 1, … , 𝜇. The collection should contain diverse sets, and we need an efficient algorithm for constructing them.

Problem 2 . 4 .24Let 𝐼 ⊂ 𝑘[𝑥 1 , … , 𝑥 𝑛 ] be a 0-dimensional ideal, and let 𝐺 = (𝑔 1 , … , 𝑔 𝑡 ) ⊤ ∈ 𝑘[𝑥 1 , … , 𝑥 𝑛 ] 𝑡 be its ≺-Gröbner basis with respect to term order ≺.1 Find a polynomial matrix 𝐴 ∈ 𝑘[𝑥 1 , … , 𝑥 𝑛 ] 𝑠×𝑡 giving a non-Gröbner set 𝐹 = (𝑓 1 , … , 𝑓 𝑠 ) ⊤ = 𝐴𝐺 such that ⟨𝐹 ⟩ = ⟨𝐺⟩.Namely, we generate a set of polynomials 𝐹 = (𝑓 1 , … , 𝑓 𝑠 ) ⊤ from 𝐺 = (𝑔 1 , … , 𝑔 𝑡 ) ⊤ by 𝑓 𝑖 = ∑ 𝑡 𝑗=1 𝑎 𝑖𝑗 𝑔 𝑗 for 𝑖 = 1, … , 𝑠, where 𝑎 𝑖𝑗 ∈ 𝑘[𝑥 1 , … , 𝑥 𝑛 ] denotes the (𝑖, 𝑗)-th entry of 𝐴. Note that ⟨𝐹 ⟩ and ⟨𝐺⟩ are generally not identical, and the design of 𝐴 such that ⟨𝐹 ⟩ = ⟨𝐺⟩ is of our question.

where 𝑈 ′ 2 ∈ 𝑘[𝑥 1 , … , 𝑥 𝑛 ] 𝑛×𝑛 is a unimodular upper-triangular matrix and𝑂 𝑠−𝑛,𝑛 ∈ 𝑘[𝑥 1 , … , 𝑥 𝑛 ] (𝑠−𝑛)×𝑛 is the zero matrix. The permutation matrix is now 𝑃 ∈ {0, 1} 𝑠×𝑠 . Our strategy is to compute 𝐹 = 𝑈 1 𝑃𝑈 2 𝐺, which only requires a sampling of 𝒪(𝑠 2 ) polynomials in 𝑘[𝑥 1 , … , 𝑥 𝑛 ], and 𝒪(𝑛 2 + 𝑠 2 )-times multiplications of polynomials.

We constructed 12 datasets 𝒟 𝑛 (𝑘) for 𝑛 ∈ {2, 3, 4, 5} and 𝑘 ∈ {𝔽 7 , 𝔽 31 , ℚ} and measured the runtime of our backward generation and naive forward generation (i.e., Gröbner basis computation). In the backward generation, we sampled Gröbner bases of ideals in shape position. In this step, univariable polynomials were generically sampled in 𝑘[𝑥 1 , … , 𝑥 𝑛 ] ≤5 . Next, Gröbner bases were transformed to non-Gröbner sets based on Thm. 2.5. Random polynomials in Bruhat decomposition (i.e., 𝑈 1 and 𝑈 ′ 2 ) were sampled from 𝑘[𝑥 1 , … , 𝑥 𝑛 ]

𝑛 = 2𝑛 = 3𝑛 = 4𝑛 = 5F. (std)4.65129873 †1354 ‡F. (slimgb)4.67149712 †1259 ‡F. (stdfglm)5.7812.644.2360B. (ours).003.005.009.014Dataset generation.

Table 22Accuracy [%] / support accuracy [%] of Gröbner basis computation by Transformer on 𝒟 − 𝑛 (𝑘). In the support accuracy, two polynomials are considered identical if they consist of an identical set of terms (i.e., identical support), Note that the datasets for 𝑛 = 3, 4, 5 are here constructed using 𝑈 1 , 𝑈 ′ 2 with density 𝜎 = 0.6, 0.3, 0.2, respectively.Ring𝑛 = 2, 𝜎 = 1𝑛 = 3, 𝜎 = 0.6𝑛 = 4, 𝜎 = 0.3𝑛 = 5, 𝜎 = 0.2ℚ[𝑥 1 , … , 𝑥 𝑛 ]94.6 / 97.996.1 / 98.696.2 / 98.691.8 / 97.9𝔽 7 [𝑥 1 , … , 𝑥 𝑛 ]66.6 / 76.678.8 / 87.680.9 / 91.183.2 / 91.4𝔽 31 [𝑥 1 , … , 𝑥 𝑛 ]44.7 / 82.758.5 / 89.373.9 / 93.980.0 / 93.4

Acknowledgments

This research was supported by JST ACT-X Grant Number JPMJAX23C8 and JSPS KAKENHI Grant Number JP22K13901. Yuta Kambe (Mitsubishi Electric Information Technology R&D Center) is not included in the authors due to a technical reason at submission.

K. Yokoyama) GLOBE https://hkera.wordpress.com (H. Kera); https://researchmap.jp/yishihara (Y. Ishihara); https://www.unilim.fr/pages_perso/tristan.vaccon/ (T. Vaccon

Algorithms for Solving Polynomial Systems GVBard 2009 Springer US MQ challenge: hardness evaluation of solving multivariate quadratic problems TYasuda XDahan Y.-JHuang TTakagi KSakurai Cryptology ePrint Archive 2015 Gröbner Bases in Control Theory and Signal Processing HPark GRegensburger De Gruyter 2007 Algebraic algorithms for sampling from conditional distributions PDiaconis BSturmfels The Annals of Statistics 26 1998 THibi Gröbner bases. Statistics and software systems

Tokyo

Springer 2014 Gröbner Basis Methods for Minimal Problems in Computer Vision HStewenius 2005 Mathematics (Faculty of Engineering) Ph.D. thesis Computer algebra in systems biology RLaubenbacher BSturmfels American Mathematical Monthly 116 2009 Ein Algorithmus zum Auffinden der Basiselemente des Restklassenringes nach einem nulldimensionalen Polynomideal (An Algorithm for Finding the Basis Elements in the Residue Class Ring Modulo a Zero Dimensional Polynomial Ideal BBuchberger of Symbolic Computation, Special Issue on Logic, Mathematics, and Computer Science: Interactions 41 3-4 1965. 2006 Mathematical Institute, University of Innsbruck Ph.D. thesis English translation in J The complexity of the word problems for commutative semigroups and polynomial ideals EWMayr ARMeyer Advances in Mathematics 46 1982 The structure of polynomial ideals and Gröbner bases TWDubé SIAM Journal on Computing 19 1990 A new efficient algorithm for computing Gröbner bases (F4) J.-CFaugère Journal of Pure and Applied Algebra 139 1999 A new efficient algorithm for computing Gröbner bases without reduction to zero (F5) J.-CFaugère Proceedings of the 2002 International Symposium on Symbolic and Algebraic Computation, ISSAC '02 the 2002 International Symposium on Symbolic and Algebraic Computation, ISSAC '02

New York, NY, USA

Association for Computing Machinery 2002 M4GB: An efficient Gröbner-basis algorithm RHMakarim MStevens Proceedings of the 2017 ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC'17 the 2017 ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC'17

New York, NY, USA

Association for Computing Machinery 2017 Deep learning for symbolic mathematics GLample FCharton International Conference on Learning Representations 2020 Neural symbolic regression that scales LBiggio TBendinelli ANeitz ALucchi GParascandolo Proceedings of the 38th International Conference on Machine Learning the 38th International Conference on Machine Learning 2021 139 Linear algebra with transformers FCharton Transactions on Machine Learning Research 2022 HKera YIshihara YKambe TVaccon KYokoyama arXiv:2311.12904 Learning to compute gröbner bases 2024 Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra DACox JLittle DO'shea Undergraduate Texts in Mathematics 2015 Springer International Publishing Algebraic solution of systems of polynomial equations using Groebner bases PGianni TMora Applied Algebra, Algebraic Algorithms and Error-Correcting Codes

Berlin Heidelberg; Berlin, Heidelberg

Springer 1989 A modular method to compute the rational univariate representation of zero-dimensional ideals MNoro KYokoyama Journal of Symbolic Computation 28 1999 Resultant over the residual of a complete intersection LBusé MElkadi BMourrain Journal of Pure and Applied Algebra 164 2001 LBusé Étude du résultant sur une variété algébrique 2001 Université Nice Sophia Antipolis Theses The Sage Developers, SageMath, the Sage Mathematics Software System (Version 10 2023 APower YBurda HEdwards IBabuschkin VMisra arXiv abs/2201.02177 Grokking: Generalization beyond overfitting on small algorithmic datasets 2022 Can transformers learn the greatest common divisor? FCharton arXiv abs/2308.15594 2023 AGromov arXiv abs/2301.02679 Grokking modular arithmetic 2023