1. Introduction

Rules in High-Dimensional Small Tabular Data

Erkan Karabulut

e.karabulut@uva.nl 1

Daniel Daza

d.f.dazacruz@amsterdamumc.nl 0

Paul Groth

p.t.groth@uva.nl 1

Victoria Degeler

v.o.degeler@uva.nl 1 0 Amsterdam UMC location Vrije Universiteit Amsterdam, Department of Laboratory Medicine , De Boelelaan 1117, Amsterdam , the Netherlands 1 University of Amsterdam , Science Park 900, 1098 XH Amsterdam , The Netherlands

2025

Association Rule Mining (ARM) aims to discover patterns between features in datasets in the form of propositional rules, supporting both knowledge discovery and interpretable machine learning in high-stakes decision-making. However, in high-dimensional settings, rule explosion and computational overhead render popular algorithmic approaches impractical without efective search space reduction-challenges that propagate to downstream tasks. Neurosymbolic methods, such as Aerial+, have recently been proposed to address the rule explosion in ARM. While they tackle the high-dimensionality of the data, they also inherit limitations of neural networks, particularly reduced performance in low-data regimes.

1. Introduction

Association Rule Mining (ARM) is the task of discovering patterns among the features of a dataset in the form of logical implications [ 1 ], also known as if-then rules. ARM has been applied in a myriad of domains for knowledge discovery [ 2 ] as well as for high-stakes decisionmaking as part of interpretable machine learning models [ 3, 4 ]. High-dimensional datasets, e.g., with thousands of columns, often lead to rule explosion and prolonged execution times [ 5 ]. Common solutions to rule explosion in ARM include constraining data features (i.e., ARM with item constraints [ 6, 7, 8 ]), mining top-k high-quality rules [ 9, 10 ], and closed itemset mining [ 11 ]. However, these methods mainly focus on reducing the search space for knowledge discovery, rather than directly addressing the computational burden of rule mining.

Neurosymbolic methods for ARM, such as Aerial+ [ 12 ], have been recently proposed to address the rule explosion problem on tabular data. Despite its efectiveness in addressing the rule explosion problem in generic tabular data, Aerial+ has not yet been evaluated on high d-dimensional datasets for scalability. Moreover, neurosymbolic methods for ARM also inherit the limitations of neural networks, such as reduced performance in low-data (n) regimes [ 13 ].

As is the case for several data-driven methods, Aerial+ relies on statistical patterns present in the dataset. In small datasets, such patterns may be hard to extract, which in turn may lead to reduced predictive performance, and in the case of Aerial+, to rules that do not accurately capture the true underlying patterns. Recent works on models for tabular data have addressed this issue by introducing foundation models [ 16, 17, 18, 19, 20 ], which are pre-trained on large Sample ≫

dataset and association rules. Gene expression datasets in tabular form often consist of 10K+ columns and a limited number of rows. This is a sample gene expression level data from [ 14 ], partially pre-processed by [ 15 ] and put in discrete form after applying z-score binning. Listed association rules are learned using Aerial+ [ 12 ] with item constraints on low and high gene expression levels.

Sample / Rule Gene_1 Gene_2 Gene_3 ⋯

Gene_18107

Sample_1 Sample_2 Sample_3 Rule_1 Rule_2

normal normal normal ⋯ normal normal high ⋯ normal normal normal ⋯ ⋯ normal high low

Gene2 (high) ∧ Gene29 (high) → Gene14 (low) Gene3 (high) ∧ Gene45 (high) → Gene84 (high)

datasets and transferred to small datasets without additional training, thereby providing strong inductive biases and generalizable representations that compensate for the limited data instances.

In this paper, we make three key contributions to ARM research on categorical tabular data. First, we evaluate the scalability of both commonly used algorithmic ARM approaches as well as the recent Neurosymbolic methods on high ddimensional datasets. Second, to the best of our knowledge, we introduce the problem of ARM on ≫ datasets for the first time, which are common in the biomedicine domain, such as gene expression datasets [ 14 ] (see Table 1), and evaluate the recent neurosymbolic methods on such datasets in terms of statistical rule quality and execution time. Third, we propose two fine-tuning methods for neurosymbolic ARM methods that rely on tabular foundation models for addressing the low-data regime.

Our empirical results show that: i) Aerial+ scales one to two orders of magnitude faster on high-dimensional datasets compared to state-of-the-art ARM methods (Section 3), ii) neurosymbolic methods need longer training to find highquality association rules on ≫ datasets (Section 4), iii) CEUR Workshop

ISSN1613-0073 our two proposed fine-tuning methods allow Aerial+ to learn significantly higher quality rules in small datasets (Section 4). The results indicate that neurosymbolic methods, especially when supported with tabular foundation models, can enable scalable and high-quality knowledge discovery in high-dimensional tabular data with few instances (Section 5).

2. Related Work

This section presents a formal definition of ARM on categorical tabular data, the problem of high-dimensional data with few instances, neurosymbolic ARM methods, and tabular foundation models.

Association rule mining. Following the original definition of ARM in [ 1 ], let = { 1, 2, ..., } be a set of items, and let = { 1, 2, ..., } be a set of transactions where ∀ ∈ , ⊆ meaning each transaction consists of a set of items in . An association rule is of the form → , where , ⊆ , is a first-order Horn clause with at most one positive literal, | | = 1 and | | ≥ 1 , in its Conjunctive Normal Form (CNF) (¬ ∨ ), and ∩ = ∅ . Note that → ∧ can be rewritten as → and → , (i.e., , , ∈ ). is often referred to as the antecedent while is the consequent side of the association rule. Example association rules are given in Table 1. A rule → is said to have support percentage if s% of ∈ contain ∪ , while the confidence of a rule is defined as (→( )) . ARM has initially been defined as the problem of finding rules that have higher minimum support and confidence values than a given userdefined threshold. The state-of-the-art in ARM literature has a plethora of sub-problems and solutions which can be found in [ 2, 21 ]. Categorical tabular data is often converted to a set of transactions via one-hot encoding, where each encoded value represents the presence (1) or absence (0) of a column-value pair, corresponding to items in , and each row corresponds to a transaction in .

ARM for high-dimensional small data. Having highdimensional data with a limited number of samples is common in domains such as biomedicine, as in gene expression datasets [ 14 ] where there are 10K+ columns (diferent genes) and less than 100 rows (samples, e.g., patients). Highdimensionality of data has many solutions in the ARM literature, as it leads to rule explosion and, therefore, prolonged execution times. Existing methods include: i) mining rules for items of interest rather than all items, known as ARM with item constraints [ 6, 7, 8 ], ii) mining top-k high-quality rules based on a given rule quality criteria [ 9, 10 ] and, iii) reducing rule redundancy by identifying only frequent itemsets without frequent supersets of equal support, known as closed itemset mining [ 11 ]. Aerial+ [ 12 ] (and the earlier version Aerial [ 22 ]), is a neurosymbolic method that is orthogonal to many of the existing solutions and leverages neural networks to learn a concise set of high-quality rules with full data coverage. Despite showing promising results on generic tabular datasets, it has not yet been evaluated on high-dimensional data. Furthermore, we argue that using neural networks for ARM inherits neural network–specific issues, most notably reduced performance in low-data regimes [ 13 ]. To our knowledge, low-data scenarios in ARM have not yet been addressed, as employing neural networks for ARM represents a new paradigm shift.

Neurosymbolic methods for ARM. Neural networks have been used to mine association rules directly from tabular data in the past few years. Patel and Yadav [ 24 ] proposed the first approach that identifies frequent itemsets before constructing rules, but the work lacks an explicit algorithm or source code. Berteloot et al. [ 25 ] introduced ARM-AE, an autoencoder-based [ 26 ] method to mine association rules directly. Aerial+ [ 12 ] tackles the rule explosion problem in ARM by using an under-complete denoising autoencoder [ 27 ] to learn a compact data representation, and by introducing a more scalable extraction method than ARM-AE (Figure 1). This results in a smaller set of highquality rules with full coverage over the data. Both Aerial+ models with symbolic rule extraction (Figure 2). However, Aerial+ has not yet been evaluated on high-dimensional datasets, which we address in this work.

Tabular foundation models are large neural networks pre-trained on vast collections of tabular data to capture table semantics and support diverse downstream tasks [ 28 ]. Among them, Tabular Prior-data Fitted Network (TabPFN) [ 16 ] is trained on millions of synthetic tables generated via structural causal models [ 29 ], and supports classification and regression. Other recent tabular foundation models include CARTE [ 30 ], which leverages graph-based representations trained on real-world knowledge graphs [ 31 ]; TabICL [ 17 ], which frames tabular learning as in-context learning; Tabbie [ 18 ], which uses masked token modeling for pretraining; and TableGPT [ 20 ], which adopts large language models for table understanding. Crucially, TabPFN is the only continuously maintained model that explicitly exposes an interface to extract table embeddings, which we utilized to develop fine-tuning strategies for Aerial+’s autoencoder architecture to learn higher-quality association rules in tables with a low number of rows 1.

3. ARM on high-dimensional tabular datasets

Given the focus on low-dimensional datasets in prior work on ARM, we begin with an empirical evaluation of the scalability of the state-of-the-art algorithmic and neurosymbolic ARM methods on high-dimensional categorical tabular datasets with few instances. Specifically, we aim to answer: how does the runtime cost of current ARM methods scale in the case of high-dimensional datasets?

Open-source.

All the source code and datasets used in all the experiments can be found in https://github.com/DiTEC-project/rule_learning_high_ dimensional_small_tabular_data.

Hardware. All experiments are run on a 12th Gen Intel® Core™ i5-1240P × 16 CPU, with 16 GiB memory, and 512 GB disk space. No GPUs were used, and no parallel execution was conducted.

Datasets. We use 5 ≫ gene expression datasets from [ 32, 33, 34, 14 ] (listed in Table 2), which are pre-processed according to the procedure described in [ 35 ] by [ 15 ]. The pre-processing consists of the trimmed mean of m-values normalization, log transformation (i.e., ( + 1) ), and the expression values were made to have zero mean and unit standard deviation. Furthermore, to enable ARM on gene expression datasets, we applied z-score binning with one standard deviation as the cutof to discretize values into high, low, and medium levels, as exemplified in Table 1.

Algorithms. We run the state-of-the-art neurosymbolic ARM method Aerial+ [ 12 ], commonly used algorithmic methods, ECLAT [ 36 ] and FP-Growth [ 37 ], as well as ARMAE [ 25 ] on all the datasets given in Table 2. FP-Growth remains one of the most widely used ARM algorithms due to its eficiency and adaptability. Numerous variations to FP-Growth have been proposed to mitigate rule explosion and improve scalability, including Guided FP-Growth [ 38 ] for item-constrained mining, parallel FP-Growth [ 39 ], and GPU-accelerated versions [ 40 ] for faster execution. Note 1We rely on the implementation available at https://github.com/ PriorLabs/TabPFN.

High-dimensional tabular gene expression datasets with few instances, used in all experiments [32, 33, 34, 14]. Dataset Chondrosarcoma SmallCellLungCarcinoma NonSmallCellLungCarcinoma BreastCarcinoma Melanoma

# Columns

# Rows 18006 18237 18108 18061 17902 6 60 86 51 55 GPU executions. However, we only compare the basic version of each algorithm.

Experimental setup and hyperparameters. To ensure a fair comparison, we set the hyperparameters of each method (shown in Table 3) as follows: i) number of antecedents is set to 2 for all methods, ii) Aerial+’s antecedent similarity threshold ( ) and ARM-AE’s likeness ( ) are set to 0.5, iii) Aerial+’s consequent similarity threshold ( ) and minimum confidence of the algorithmic methods are set to 0.8, iv) minimum support threshold of the algorithmic methods are set to half the average support of the rules learned by Aerial+, to ensure comparable average support values, v) ARM-AE’s number of rules per consequent ( ) is set to Aerial+’s rule count divided by the number of columns to ensure comparable rule counts, vi) and both Aerial+ and ARM-AE were trained for 10 epochs with a batch size of 2. Aerial+ is implemented using the pyAerial 2 [ 41 ] library, FPGrowth is implemented using MLxtend [ 42 ], ECLAT is implemented using pyECLAT 3, and ARM-AE is implemented using its original repository 4. The goal of this experimental setup is to test the scalability of the algorithms, and not to perform a rule quality comparison, which has already been done in earlier work [ 12 ].

Results. Figure 3 shows the execution time of each method, in seconds on a logarithmic scale, on 5 datasets as the number of columns increases. Execution times include both training and the rule extraction times for the neurosymbolic methods. The results show that Aerial+ has one to two orders of magnitude faster execution times than the other methods. The gap in execution time increases as the number of columns increases. We also see that the algorithmic method FP-Growth runs faster when the number of columns is smaller than 30. This shows that Aerial+’s training is only compensated if the tables have more than 30 columns. Note that Aerial+ has linear time complexity during training and polynomial time over the number of columns (after one-hot encoding) during the extraction. 2https://github.com/DiTEC-project/pyaerial 3https://github.com/jefrichardchemistry/pyECLAT 4https://github.com/TheophileBERTELOOT/ARM-AE/tree/master

4. Neurosymbolic ARM in low-data regime

Experiments in Section 3 showed that the fastest algorithmic solution, FP-Growth, takes ~103 seconds on tables with only 150 columns and 2 antecedents, while a neurosymbolic method, Aerial+, runs one to two orders of magnitude faster. This empirically validates the scalability of neurosymbolic approaches to ARM. However, we argue that Aerial+ also inherits the known issues in neural networks, particularly the decline in performance in a low-data regime [ 13 ]. Concretely, Aerial+ relies on training a deep autoencoder on the tabular data with a reconstruction objective. Following results from statistical learning theory [ 43 ] and empirical observations in neural networks [ 44 ], this implies that Aerial+’s performance is bounded by the number of training samples, and with small data it may yield rules that do not accurately capture ground-truth associations.

An efective approach for addressing data scarcity is transfer learning [ 45 ], which requires training a neural network, or vector representations (i.e. embeddings) on a large dataset, that then can be transferred to a downstream task on a small dataset. This provides a starting point that can improve performance in comparison to learning from scratch on a small dataset.

In this work, we propose two fine-tuning strategies to Aerial+ using TabPFN [ 16 ], a foundation model for tabular data that has been pre-trained over millions of tables, which we use to generate embeddings for the small datasets in our experiments. 4.1. Fine Tuning with Pre-trained Weight

Initialization Figure 4 illustrates the fine-tuning strategy introduced in this section (Aerial+WI). On a high level, table embeddings from a tabular foundation model are utilized to initialize the weights of Aerial+’s under-complete denoising autoencoder, providing a semantically meaningful starting point for learning compact data representations.

Let ∈ ℝ × denote the tabular dataset and ∈ ℝ the corresponding labels. We first compute fixed-length embeddings for each row in using a pretrained TabPFNClassifier. These embeddings, denoted as ∈ ℝ × , where is the embedding dimension, are generated via a 10-fold TabPFNbased meta-learning scheme:

= TabPFNClassifier ( , )

We then one-hot encode into ∈̂ ℝ × ′ following the original Aerial+ pipeline, where ′ is the total number of binary features after encoding categorical attributes. A twolayer projection encoder ∶ ℝ ′ → ℝ is trained to map ̂ to the TabPFN embedding space. The encoder architecture is as follows: ( ) ̂ =

2 ⋅ Dropout( ( LayerNorm( 1 +̂ 1))) + 2 1 and bias 1 from the first layer of are used to initialize the corresponding parameters in the first layer of Aerial+’s encoder:

Aerial+(e1n)c ← ( 1, 1)

This initialization provides a strong inductive prior for Aerial+, guiding its encoder to start from a semantically meaningful representation space derived from TabPFN’s meta-learned embeddings.

Note that the gene expression datasets contain no predeifned class labels. Therefore, a random column is selected as the target variable to enable TabPFN embedding generation. 4.2. Projection-Guided Fine Tuning via

Double Loss semantic alignment of the autoencoder reconstruction process to the table embeddings. ( ̂′) ≈

1 =1 table embeddings from a tabular foundation model, jointly optimizing reconstruction and alignment losses for semantic consistency.

Building on the projection encoder described in Section 4.1, this second fine-tuning strategy aligns the Aerial+’s autoencoder reconstructions with TabPFN embeddings using a double loss function.

Unlike the first strategy, where was trained directly on raw one-hot inputs, here we first pass a corrupted version of the one-hot input ̂ through Aerial+’s initial autoencoder and train on its outputs. Specifically, we generate noisy inputs (following the same strategy as Aerial+): =̃ clip( +̂ ), ∼ (0, 2 ) where = 0.5 and values are clipped to [ 0, 1 ]. We then compute reconstructions ̂ ′ = ( ) ̃ . The projection encoder is trained to map these reconstructions to their corresponding TabPFN embeddings ∈ ℝ × : by minimizing the cosine distance loss: ℒproj( ) = 1 −

∑ cos( ( ̂′), ) autoencoder is fine-tuned using a double loss objective: After this pretraining phase, is frozen, and Aerial+’s ℒ ( ) = ℒ recon( (),̃ )̂ + ℒ proj( ( ())̃, ) where ℒrecon is a binary cross-entropy loss applied to per one-hot encoded column value as in Aerial+, generating probability distributions per column. The double loss strategy encourages Aerial+’s autoencoder to not only reconstruct the original data, but also to produce representations that are semantically consistent with TabPFN’s metalearned embedding space. 4.3. Experimental Results Setup and hyperparameters. We run Aerial+ and the two fine-tuned versions, with pre-trained weight initialization (Aerial+WI) and double loss (Aerial+DL), on 5 ≫ datasets with 100 columns and compare their rule quality. The default Aerial+ uses Xavier [ 47 ] weight initialization as in the original work. All the approaches are run with 2 antecedents, for 25 epochs with a batch size of 2. Aerial+’s autoencoder for both the default and the fine-tuned versions consists of 2 layers per encoder and decoder, with the dimensions →̂ 50 → 10 , and the mirrored version for the decoder. We run each method 50 times and present the average rule quality results for robustness.

Evaluation criteria. The standard rule quality metrics each approach where ∀ ∈ , = ( → ): from the ARM literature are used as the evaluation criteria [ 2, 21 ]. Let be a set of transactions as introduced in Section 2, and = { 1, 2, ..., } be the rule set learned by • Number of rules. Total number of rules learned:|| • Average rule coverage.

Average number of transactions where the rule antecedent appears: AvgCov = || 1

|| ∑ =1 |{ ∈ ∣ • Execution time. Sum of model training time, finetuning (when applicable), and rule extraction time in seconds.

Results. Table 4 shows the rule quality evaluation results of Aerial+ and the two fined-tuned versions Aerial+WI and Aerial+DL on 5 datasets. The results show that Aerial+WI outperforms Aerial+ in terms of rule confidence and association strength (Zhang’s metric) on all 5 datasets. Aerial+DL’s confidence and association strength also exceed Aerial+’s on 4 out of 5 datasets, except the NonSmallCellLungCarcinoma dataset. Both fine-tuning methods resulted in a smaller number of rules on all datasets and with a smaller data coverage on 3 out of the 5. This is expected as the fine-tuned versions capture rules with higher association strength on average, meaning the less obvious rules are eliminated during the rule extraction process, and therefore, the final data coverage was lower. The fine-tuned methods have higher support values on 4 out of 5 datasets. However, we do not take the high support values as a positive sign, as it depends on the application. For instance, high support rules are good at explaining trends in the data, while low support rules can be better at explaining anomalies. Lastly, fine-tuning resulted in only a few seconds of increment in the execution time, which is negligible in the low-data regime. Note that the costliest operation in Aerial+ is the rule extraction process and not the training (or pre-training), which is not significantly afected by the fine-tuning methods.

5. Discussion

The section discusses the experimental results, the role of neurosymbolic methods, and tabular foundation models in ARM.

Neurosymbolic methods scale better on highdimensional data. Experiments in Section 3 show that Aerial+, a neurosymbolic method to ARM, has execution speed of one to two orders of magnitude faster than the algorithmic ARM approaches. We argue this is because Aerial+ leverages neural networks’ ability to handle highdimensional data, it has linear complexity over the number of rows in training, and polynomial time complexity over the number (one-hot encoded) columns during the rule extraction stage. Algorithmic methods, on the other hand, rely on counting the co-occurrences of itemsets in the data, which is a costlier operation.

Aerial+ inherits neural networks-specific issues into ARM. The scalability of Aerial+ on high-dimensional data comes at a cost, most notably the reduced performance in the low-data regime for ARM. The original paper of Aerial+ trains only for 2 epochs on generic tabular datasets and was able to obtain high-quality rules. In the low-data regime, however, we were able to get high-quality rules consistently in each execution only after training for 25 epochs. This shows that while the neurosymbolic methods can help in scalability, they also introduce a new research problem into the ARM literature, namely, rule mining in the low-data regime.

Fine-tuning Aerial+ for better knowledge discovery. Experiments in Section 4 showed that our two proposed ifne-tuning methods using the tabular foundation model TabPFN resulted in significantly higher-quality rules in comparison to the default version of Aerial+ on 5 real-world high-dimensional tabular datasets with few instances. Many of the other tabular foundation models that we investigated, including Tabbie, CARTE, TableGPT, and TabICL, do not provide an interface to obtain table embeddings. Therefore, we were not able to use them in our experiments. Since TabPFN is trained to perform classification and regression tasks over tabular data, we expect that models explicitly trained to learn column embeddings and associations could potentially result in better rule quality.

Neurosymbolic methods start a paradigm shift in ARM. We show that the Neurosymbolic ARM methods can be supported by prior-data fitted networks, as in TabPFN, to learn higher-quality rules. This raises the research question of what other types of prior data or background knowledge can be utilized as part of ARM? We invite researchers to further investigate neurosymbolic methods for ARM, as the neurosymbolic integration brings an immense potential for both knowledge discovery and fully interpretable inference across a plethora of domains.

Further validation of our approach and limitations. The algorithmic methods strictly depend on the distribution of data when mining rules in terms of execution time, as denser datasets where many frequent itemsets of high support are present will eventually prolong the execution time. Aerial+, however, applies the exact same polynomialtime rule extraction process regardless of the density of the data, and therefore depends less on the dataset attributes. However, we will still test our fine-tuning approaches on more datasets from diverse domains to further validate our approach in future work. Furthermore, we will evaluate our approach on generic tabular data with higher numbers of instances, i.e., ≫ , to see whether it leads to early convergence or higher quality rules. Our proposed fine-tuning strategies are currently limited to the only available tabular foundation model with an explicit table embedding interface, TabPFN. Since TabPFN is specifically trained for classification and regression, this limitation may restrict performance improvements, and a future foundation model trained to capture column associations explicitly could significantly improve rule discovery.

6. Conclusions

This paper highlights the potential of neurosymbolic methods in the domain of association rule mining (ARM), especially under high-dimensional and low-sample ( ≫ ) settings common in domains such as biomedicine. We have empirically shown that Aerial+, a neurosymbolic approach, ofers substantial scalability improvements compared to the state-of-the-art neurosymbolic and algorithmic ARM techniques, scaling one to two orders of magnitude faster. However, neurosymbolic ARM also inherits the known issues of neural networks into ARM literature, specifically the reduced performance in low-data regimes, which we addressed through two targeted fine-tuning strategies.

Our fine-tuning methods use table embeddings from TabPFN, a tabular foundation model, to i) initialize the weights of Aerial+ (Aerial+WI), ii) and to better semantically align Aerial+ autoencoder training with a given tabular data (Aerial+DL). The results show that both Aerial+WI and Aerial+DL methods significantly improved rule quality in low-data settings. This demonstrates the promising role of pretrained tabular models in enhancing knowledge discovery over tabular datasets besides classification and regression tasks that are commonly tackled in the tabular data domain.

Looking forward, we see this as the beginning of a broader paradigm shift in ARM, where background knowledge and pretrained models can be explicitly leveraged to guide rule extraction. We invite the community to explore what other forms of prior knowledge, architectures, or foundation models can be integrated into neurosymbolic ARM. Future work will also validate our methods across a wider range of datasets and evaluate their efectiveness in high-instance scenarios ( ≫ ), with the aim of achieving both scalability and high interpretability in real-world data mining applications.

Acknowledgements

This work has received support from the Dutch Research Council (NWO), in the scope of the Digital Twin for Evolutionary Changes in water networks (DiTEC) project, file number 19454.

Declaration on Generative AI

During the preparation of this work, the authors used ChatGPT (GPT-4.1) for paraphrasing and rewording. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content.

[1]

Agrawal ,

Srikant , et al., Fast algorithms for mining association rules , in: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB , volume 1215 , 1994 , pp. 487 - 499 .

[2]

J. M.

Luna ,

Fournier-Viger ,

Ventura , Frequent itemset mining: A 25 years review , WIREs Data Mining and Knowledge Discovery 9 ( 2019 ) e1329 . doi: 10 .1002/widm.1329.

[3]

Rudin , Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , Nature machine intelligence 1 ( 2019 ) 206 - 215 .

[4]

Angelino ,

Larus-Stone ,

Alabi ,

Seltzer ,

Rudin , Learning certifiably optimal rule lists for categorical data , Journal of Machine Learning Research 18 ( 2018 ) 1 - 78 .

[5]

Moens ,

Aksehirli ,

Goethals , Frequent itemset mining for big data , in: 2013 IEEE international conference on big data, IEEE , 2013 , pp. 111 - 118 .

[6]

Srikant ,

Vu ,

Agrawal , Mining association rules with item constraints ., in: Kdd , volume 97 , 1997 , pp. 67 - 73 .

[7]

Baralis ,

Cagliero ,

Cerquitelli ,

Garza , Generalized association rule mining with constraints , Information Sciences 194 ( 2012 ) 68 - 84 .

[8]

Yin , W. Gan, G. Huang,

Wu ,

Fournier-Viger , Constraint-based sequential rule mining , in: 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA) , IEEE, 2022 , pp. 1 - 10 .

[9]

Fournier-Viger ,

C.-W.

Wu ,

V. S.

Tseng , Mining top-k association rules , in: Advances in Artificial Intelligence: 25th Canadian Conference on Artificial Intelligence , Canadian AI 2012 , Toronto, ON, Canada, May 28 -30, 2012 . Proceedings 25, Springer, 2012 , pp. 61 - 73 .

[10]

L. T.

Nguyen ,

Vo ,

L. T.

Nguyen ,

Fournier-Viger ,

Selamat , Etarm: an eficient top-k association rule mining algorithm , Applied Intelligence 48 ( 2018 ) 1148 - 1160 .

[11] M. J. Zaki , C.-J. Hsiao , Charm: An eficient algorithm for closed itemset mining , in: Proceedings of the 2002 SIAM international conference on data mining, SIAM , 2002 , pp. 457 - 473 .

[12]

Karabulut ,

Groth ,

Degeler , Neurosymbolic association rule mining from tabular data , in: Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning , volume 284 of Proceedings of Machine Learning Research, PMLR , 2025 , pp. 565 - 588 . URL: https://proceedings.mlr.press/v284/ karabulut25a.html.

[13]

Liu ,

Wei ,

Zhang ,

Yang , Deep neural networks for high dimension, low sample size data ., in: IJCAI , volume 2017 , 2017 , pp. 2287 - 2293 .

[14]

Gao ,

J. M.

Korn ,

Ferretti ,

J. E.

Monahan ,

Wang ,

Singh ,

Zhang , C. Schnell,

Yang ,

Zhang , et al., High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response , Nature medicine 21 ( 2015 ) 1318 - 1325 .

[15]

Ruiz ,

Ren ,

Huang ,

Leskovec , High dimensional, tabular deep learning with an auxiliary knowledge graph , Advances in Neural Information Processing Systems 36 ( 2023 ) 26348 - 26371 .

[16]

Hollmann ,

Müller ,

Purucker ,

Krishnakumar ,

Körfer ,

S. B.

Hoo ,

R. T.

Schirrmeister ,

Hutter , Accurate predictions on small data with a tabular foundation model , Nature 637 ( 2025 ) 319 - 326 .

[17]

Qu , D. HolzmÃžller, G. Varoquaux,

M. L.

Morvan , Tabicl: A tabular foundation model for in-context learning on large data , arXiv preprint arXiv:2502.05564 ( 2025 ).

[18]

Iida ,

Thai ,

Manjunatha ,

Iyyer , Tabbie: Pretrained representations of tabular data , arXiv preprint arXiv:2105.02584 ( 2021 ).

[19]

Yin , G. Neubig, W.-t. Yih,

Riedel , Tabert: Pretraining for joint understanding of textual and tabular data , arXiv preprint arXiv: 2005 . 08314 ( 2020 ).

[20]

Su ,

Wang ,

Ye ,

Zhou , G. Zhang, G. Chen, G. Zhu,

Wang ,

Xu ,

Chen , et al., Tablegpt2: A large multimodal model with tabular data integration , arXiv preprint arXiv:2411 . 02059 ( 2024 ).

[21]

Kaushik ,

Sharma ,

I. Fister

Jr ,

Draheim , Numerical association rule mining: a systematic literature review , arXiv preprint arXiv:2307.00662 ( 2023 ).

[22]

Karabulut ,

Groth ,

Degeler , Learning semantic association rules from internet of things data , Neurosymbolic Artificial Intelligence 1 ( 2025 ). doi: 10 . 1177/29498732251377518.

[23] M. van Bekkum , M. de Boer , F. van Harmelen , A. Meyer-Vitali , A. t. Teije, Modular design patterns for hybrid learning and reasoning systems: a taxonomy, patterns and use cases, Applied Intelligence 51 ( 2021 ) 6528 - 6546 .

[24]

H. K.

Patel ,

Yadav , An innovative approach for association rule mining in grocery dataset based on non-negative matrix factorization and autoencoder , Journal of Algebraic Statistics 13 ( 2022 ) 2898 - 2905 .

[25]

Berteloot ,

Khoury ,

Durand , Association rules mining with auto-encoders , in: International Conference on Intelligent Data Engineering and Automated Learning , Springer, 2024 , pp. 51 - 62 .

[26]

Bank ,

Koenigstein ,

Giryes , Autoencoders,

Machine learning for data science handbook: data mining and knowledge discovery handbook (

2023 ) 353 - 374 .

[27]

Vincent ,

Larochelle ,

Bengio , P. -A. Manzagol, Extracting and composing robust features with denoising autoencoders , in: Proceedings of the 25th international conference on Machine learning , 2008 , pp. 1096 - 1103 .

[28]

Ruan ,

Lan , J. Ma,

Dong ,

He ,

Feng , Language modeling on tabular data: A survey of foundations, techniques and evolution , arXiv preprint arXiv: 2408 .10548 ( 2024 ).

[29]

Pearl , Causality, Cambridge university press, 2009 .

[30] M. J. Kim , L. Grinsztajn , G. Varoquaux, Carte: pretraining and transfer for tabular learning , arXiv preprint arXiv:2402.16785 ( 2024 ).

[31]

Hogan , E. Blomqvist,

Cochez , C. d'Amato,

G. D.

Melo ,

Gutierrez ,

Kirrane ,

J. E. L.

Gayo ,

Navigli ,

Neumaier , et al., Knowledge

graphs

, ACM Computing Surveys (Csur) 54 ( 2021 ) 1 - 37 .

[32]

Yang ,

Soares ,

Greninger ,

E. J.

Edelman ,

Lightfoot ,

Forbes ,

Bindal ,

Beare ,

J. A.

Smith ,

I. R.

Thompson , et al., Genomics of drug sensitivity in cancer (gdsc): a resource for therapeutic biomarker discovery in cancer cells , Nucleic acids research 41 ( 2012 ) D955 - D961 .

[33]

Iorio ,

T. A.

Knijnenburg ,

D. J.

Vis ,

G. R.

Bignell ,

M. P.

Menden ,

Schubert ,

Aben ,

Gonçalves ,

Barthorpe ,

Lightfoot , et al., A landscape of pharmacogenomic interactions in cancer , Cell 166 ( 2016 ) 740 - 754 .

[34] M. J. Garnett , E. J.

Edelman , S. J.

Heidorn , C. D.

Greenman , A.

Dastur , K. W.

Lau , P.

Greninger , I. R.

Thompson , X.

Luo , J.

Soares , et al., Systematic identification of genomic markers of drug sensitivity in cancer cells , Nature 483 ( 2012 ) 570 - 575 .

[35]

S. M.

Mourragui ,

Loog ,

D. J.

Vis ,

Moore ,

A. G.

Manjon , M. A. van de Wiel,

M. J.

Reinders ,

L. F.

Wessels , Predicting patient response with models trained on cell lines and patient-derived xenografts by nonlinear transfer learning , Proceedings of the National Academy of Sciences 118 ( 2021 ) e2106682118 .

[36] M. J. Zaki , S.

Parthasarathy , M.

Ogihara , W.

Li , et al., New algorithms for fast discovery of association rules ., in: KDD , volume 97 , 1997 , pp. 283 - 286 .

[37]

Han , J . Pei,

Yin , Mining frequent patterns without candidate generation , ACM sigmod record 29 ( 2000 ) 1 - 12 .

[38]

Shabtay ,

Fournier-Viger ,

Yaari , I. Dattner , A guided fp-growth algorithm for mining multitudetargeted item-sets and class association rules in imbalanced data , Information Sciences 553 ( 2021 ) 353 - 375 .

[39]

Li ,

Wang ,

Zhang ,

E. Y.

Chang , Pfp: parallel fp-growth for query recommendation , in: Proceedings of the 2008 ACM conference on Recommender systems , 2008 , pp. 107 - 114 .

[40]

Jiang ,

Meng , A parallel fp-growth algorithm based on gpu , in: 2017 IEEE 14th International Conference on e-Business Engineering (ICEBE), IEEE, 2017 , pp. 97 - 102 .

[41]

Karabulut ,

Groth ,

Degeler , Pyaerial: Scalable association rule mining from tabular data , SoftwareX 31 ( 2025 ) 102341 . doi: 10 .1016/j.softx. 2025 . 102341 .

[42]

Raschka , Mlxtend: Providing machine learning and data science utilities and extensions to python's scientific computing stack , The Journal of Open Source Software 3 ( 2018 ). doi: 10 .21105/joss.00638.

[43]

Vapnik , Statistical learning theory , Wiley, 1998 .

[44]

Zhang , S. Bengio,

Hardt ,

Recht ,

Vinyals , Understanding deep learning (still) requires rethinking generalization , Commun. ACM 64 ( 2021 ) 107 - 115 . doi: 10 .1145/3446776.

[45]

K. R.

Weiss ,

T. M.

Khoshgoftaar ,

Wang , A survey of transfer learning , J. Big Data 3 ( 2016 ) 9 . doi: 10 .1186/ S40537- 016- 0043- 6.

[46]

D. P.

Kingma ,

Ba , Adam: A method for stochastic optimization , arXiv preprint arXiv:1412.6980 ( 2014 ).

[47]

Glorot ,

Bengio , Understanding the dificulty of training deep feedforward neural networks , in: Proceedings of the thirteenth international conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings , 2010 , pp. 249 - 256 .

[48]

Yan ,

Zhang , S. Zhang, Confidence metrics for association rule mining , Applied Artificial Intelligence 23 ( 2009 ) 713 - 737 .