<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Rules in High-Dimensional Small Tabular Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Erkan Karabulut</string-name>
          <email>e.karabulut@uva.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Daza</string-name>
          <email>d.f.dazacruz@amsterdamumc.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Groth</string-name>
          <email>p.t.groth@uva.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Victoria Degeler</string-name>
          <email>v.o.degeler@uva.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Amsterdam UMC location Vrije Universiteit Amsterdam, Department of Laboratory Medicine</institution>
          ,
          <addr-line>De Boelelaan 1117, Amsterdam</addr-line>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Amsterdam</institution>
          ,
          <addr-line>Science Park 900, 1098 XH Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Association Rule Mining (ARM) aims to discover patterns between features in datasets in the form of propositional rules, supporting both knowledge discovery and interpretable machine learning in high-stakes decision-making. However, in high-dimensional settings, rule explosion and computational overhead render popular algorithmic approaches impractical without efective search space reduction-challenges that propagate to downstream tasks. Neurosymbolic methods, such as Aerial+, have recently been proposed to address the rule explosion in ARM. While they tackle the high-dimensionality of the data, they also inherit limitations of neural networks, particularly reduced performance in low-data regimes.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Association Rule Mining (ARM) is the task of
discovering patterns among the features of a dataset in the form
of logical implications [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], also known as if-then rules.
ARM has been applied in a myriad of domains for
knowledge discovery [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] as well as for high-stakes
decisionmaking as part of interpretable machine learning
models [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. High-dimensional datasets, e.g., with thousands
of columns, often lead to rule explosion and prolonged
execution times [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Common solutions to rule explosion in
ARM include constraining data features (i.e., ARM with item
constraints [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ]), mining top-k high-quality rules [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ],
and closed itemset mining [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. However, these methods
mainly focus on reducing the search space for knowledge
discovery, rather than directly addressing the computational
burden of rule mining.
      </p>
      <p>
        Neurosymbolic methods for ARM, such as Aerial+ [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ],
have been recently proposed to address the rule explosion
problem on tabular data. Despite its efectiveness in
addressing the rule explosion problem in generic tabular data,
Aerial+ has not yet been evaluated on high d-dimensional
datasets for scalability. Moreover, neurosymbolic methods
for ARM also inherit the limitations of neural networks,
such as reduced performance in low-data (n) regimes [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        As is the case for several data-driven methods, Aerial+
relies on statistical patterns present in the dataset. In small
datasets, such patterns may be hard to extract, which in turn
may lead to reduced predictive performance, and in the case
of Aerial+, to rules that do not accurately capture the true
underlying patterns. Recent works on models for tabular
data have addressed this issue by introducing foundation
models [
        <xref ref-type="bibr" rid="ref16 ref17 ref18 ref19 ref20">16, 17, 18, 19, 20</xref>
        ], which are pre-trained on large
Sample  ≫
      </p>
      <p>
        dataset and association rules. Gene expression
datasets in tabular form often consist of 10K+ columns and a
limited number of rows. This is a sample gene expression level
data from [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], partially pre-processed by [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and put in discrete
form after applying z-score binning. Listed association rules are
learned using Aerial+ [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] with item constraints on low and high
gene expression levels.
      </p>
      <sec id="sec-1-1">
        <title>Sample / Rule Gene_1 Gene_2 Gene_3 ⋯</title>
        <p>Gene_18107</p>
        <sec id="sec-1-1-1">
          <title>Sample_1</title>
        </sec>
        <sec id="sec-1-1-2">
          <title>Sample_2</title>
        </sec>
        <sec id="sec-1-1-3">
          <title>Sample_3</title>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>Rule_1 Rule_2</title>
        <p>normal normal normal ⋯
normal
normal
high
⋯
normal normal normal ⋯
⋯
normal
high
low</p>
        <sec id="sec-1-2-1">
          <title>Gene2 (high) ∧ Gene29 (high) → Gene14 (low)</title>
        </sec>
        <sec id="sec-1-2-2">
          <title>Gene3 (high) ∧ Gene45 (high) → Gene84 (high)</title>
          <p>datasets and transferred to small datasets without additional
training, thereby providing strong inductive biases and
generalizable representations that compensate for the limited
data instances.</p>
          <p>
            In this paper, we make three key contributions to ARM
research on categorical tabular data. First, we evaluate the
scalability of both commonly used algorithmic ARM approaches
as well as the recent Neurosymbolic methods on high
ddimensional datasets. Second, to the best of our knowledge,
we introduce the problem of ARM on  ≫ 
datasets for
the first time, which are common in the biomedicine
domain, such as gene expression datasets [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ] (see Table 1),
and evaluate the recent neurosymbolic methods on such
datasets in terms of statistical rule quality and execution
time. Third, we propose two fine-tuning methods for
neurosymbolic ARM methods that rely on tabular foundation
models for addressing the low-data regime.
          </p>
          <p>Our empirical results show that: i) Aerial+ scales one to
two orders of magnitude faster on high-dimensional datasets
compared to state-of-the-art ARM methods (Section 3), ii)
neurosymbolic methods need longer training to find
highquality association rules on  ≫  datasets (Section 4), iii)
CEUR
Workshop</p>
          <p>ISSN1613-0073
our two proposed fine-tuning methods allow Aerial+ to
learn significantly higher quality rules in small datasets
(Section 4). The results indicate that neurosymbolic
methods, especially when supported with tabular foundation
models, can enable scalable and high-quality knowledge
discovery in high-dimensional tabular data with few instances
(Section 5).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>This section presents a formal definition of ARM on
categorical tabular data, the problem of high-dimensional data with
few instances, neurosymbolic ARM methods, and tabular
foundation models.</p>
      <p>
        Association rule mining. Following the original
definition of ARM in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], let  = { 1,  2, ...,   } be a set of  items,
and let  = { 1,  2, ...,   } be a set of  transactions where
∀ ∈ ,  ⊆  meaning each transaction  consists of a set of
items in  . An association rule is of the form  →  , where
 ,  ⊆  , is a first-order Horn clause with at most one
positive literal, | | = 1 and | | ≥ 1 , in its Conjunctive Normal
Form (CNF) (¬ ∨  ), and  ∩  = ∅ . Note that  →  ∧ 
can be rewritten as  →  and  →  , (i.e., , ,  ∈  ).  is
often referred to as the antecedent while  is the consequent
side of the association rule. Example association rules are
given in Table 1. A rule  →  is said to have support
percentage  if s% of  ∈  contain  ∪  , while the confidence
of a rule is defined as (→( )) . ARM has initially been
defined as the problem of finding rules that have higher
minimum support and confidence values than a given
userdefined threshold. The state-of-the-art in ARM literature
has a plethora of sub-problems and solutions which can be
found in [
        <xref ref-type="bibr" rid="ref2 ref21">2, 21</xref>
        ]. Categorical tabular data is often converted
to a set of transactions via one-hot encoding, where each
encoded value represents the presence (1) or absence (0) of
a column-value pair, corresponding to items in  , and each
row corresponds to a transaction in  .
      </p>
      <p>
        ARM for high-dimensional small data. Having
highdimensional data with a limited number of samples is
common in domains such as biomedicine, as in gene
expression datasets [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] where there are 10K+ columns (diferent
genes) and less than 100 rows (samples, e.g., patients).
Highdimensionality of data has many solutions in the ARM
literature, as it leads to rule explosion and, therefore, prolonged
execution times. Existing methods include: i) mining rules
for items of interest rather than all items, known as ARM
with item constraints [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ], ii) mining top-k high-quality
rules based on a given rule quality criteria [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ] and, iii)
reducing rule redundancy by identifying only frequent
itemsets without frequent supersets of equal support, known
as closed itemset mining [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Aerial+ [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] (and the
earlier version Aerial [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]), is a neurosymbolic method that
is orthogonal to many of the existing solutions and
leverages neural networks to learn a concise set of high-quality
rules with full data coverage. Despite showing promising
results on generic tabular datasets, it has not yet been
evaluated on high-dimensional data. Furthermore, we argue
that using neural networks for ARM inherits neural
network–specific issues, most notably reduced performance
in low-data regimes [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. To our knowledge, low-data
scenarios in ARM have not yet been addressed, as employing
neural networks for ARM represents a new paradigm shift.
      </p>
      <p>
        Neurosymbolic methods for ARM. Neural networks
have been used to mine association rules directly from
tabular data in the past few years. Patel and Yadav [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]
proposed the first approach that identifies frequent itemsets
before constructing rules, but the work lacks an explicit
algorithm or source code. Berteloot et al. [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] introduced
ARM-AE, an autoencoder-based [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] method to mine
association rules directly. Aerial+ [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] tackles the rule explosion
problem in ARM by using an under-complete denoising
autoencoder [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] to learn a compact data representation,
and by introducing a more scalable extraction method than
ARM-AE (Figure 1). This results in a smaller set of
highquality rules with full coverage over the data. Both Aerial+
models with symbolic rule extraction (Figure 2). However,
Aerial+ has not yet been evaluated on high-dimensional
datasets, which we address in this work.
      </p>
      <p>
        Tabular foundation models are large neural
networks pre-trained on vast collections of tabular data to
capture table semantics and support diverse downstream
tasks [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. Among them, Tabular Prior-data Fitted
Network (TabPFN) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] is trained on millions of synthetic
tables generated via structural causal models [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], and
supports classification and regression. Other recent tabular
foundation models include CARTE [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], which leverages
graph-based representations trained on real-world
knowledge graphs [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]; TabICL [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], which frames tabular
learning as in-context learning; Tabbie [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], which uses masked
token modeling for pretraining; and TableGPT [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], which
adopts large language models for table understanding.
Crucially, TabPFN is the only continuously maintained model
that explicitly exposes an interface to extract table
embeddings, which we utilized to develop fine-tuning strategies for
Aerial+’s autoencoder architecture to learn higher-quality
association rules in tables with a low number of rows 1.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. ARM on high-dimensional tabular datasets</title>
      <p>Given the focus on low-dimensional datasets in prior work
on ARM, we begin with an empirical evaluation of the
scalability of the state-of-the-art algorithmic and
neurosymbolic ARM methods on high-dimensional categorical tabular
datasets with few instances. Specifically, we aim to answer:
how does the runtime cost of current ARM methods scale in
the case of high-dimensional datasets?</p>
      <p>Open-source.</p>
      <p>All the source code and datasets
used
in
all
the
experiments
can
be
found
in
https://github.com/DiTEC-project/rule_learning_high_
dimensional_small_tabular_data.</p>
      <p>Hardware. All experiments are run on a 12th Gen Intel®
Core™ i5-1240P × 16 CPU, with 16 GiB memory, and 512 GB
disk space. No GPUs were used, and no parallel execution
was conducted.</p>
      <p>
        Datasets. We use 5  ≫  gene expression datasets from
[
        <xref ref-type="bibr" rid="ref14 ref32 ref33 ref34">32, 33, 34, 14</xref>
        ] (listed in Table 2), which are pre-processed
according to the procedure described in [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] by [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The
pre-processing consists of the trimmed mean of m-values
normalization, log transformation (i.e., ( + 1)
), and the
expression values were made to have zero mean and unit
standard deviation. Furthermore, to enable ARM on gene
expression datasets, we applied z-score binning with one
standard deviation as the cutof to discretize values into
high, low, and medium levels, as exemplified in Table 1.
      </p>
      <p>
        Algorithms. We run the state-of-the-art neurosymbolic
ARM method Aerial+ [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], commonly used algorithmic
methods, ECLAT [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] and FP-Growth [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ], as well as
ARMAE [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] on all the datasets given in Table 2. FP-Growth
remains one of the most widely used ARM algorithms due
to its eficiency and adaptability. Numerous variations to
FP-Growth have been proposed to mitigate rule explosion
and improve scalability, including Guided FP-Growth [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ]
for item-constrained mining, parallel FP-Growth [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ], and
GPU-accelerated versions [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ] for faster execution. Note
1We rely on the implementation available at https://github.com/
PriorLabs/TabPFN.
      </p>
      <sec id="sec-3-1">
        <title>High-dimensional tabular gene expression datasets with few instances, used in all experiments [32, 33, 34, 14].</title>
        <sec id="sec-3-1-1">
          <title>Dataset</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Chondrosarcoma</title>
      </sec>
      <sec id="sec-3-3">
        <title>SmallCellLungCarcinoma</title>
      </sec>
      <sec id="sec-3-4">
        <title>NonSmallCellLungCarcinoma</title>
      </sec>
      <sec id="sec-3-5">
        <title>BreastCarcinoma Melanoma</title>
        <p># Columns</p>
        <p># Rows
18006
18237
18108
18061
17902
6
60
86
51
55
GPU executions. However, we only compare the basic
version of each algorithm.</p>
        <p>
          Experimental setup and hyperparameters. To
ensure a fair comparison, we set the hyperparameters of each
method (shown in Table 3) as follows: i) number of
antecedents is set to 2 for all methods, ii) Aerial+’s antecedent
similarity threshold ( ) and ARM-AE’s likeness ( ) are set

to 0.5, iii) Aerial+’s consequent similarity threshold (  ) and
minimum confidence of the algorithmic methods are set to
0.8, iv) minimum support threshold of the algorithmic
methods are set to half the average support of the rules learned
by Aerial+, to ensure comparable average support values,
v) ARM-AE’s number of rules per consequent ( ) is set to
Aerial+’s rule count divided by the number of columns to
ensure comparable rule counts, vi) and both Aerial+ and
ARM-AE were trained for 10 epochs with a batch size of 2.
Aerial+ is implemented using the pyAerial 2 [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ] library,
FPGrowth is implemented using MLxtend [
          <xref ref-type="bibr" rid="ref42">42</xref>
          ], ECLAT is
implemented using pyECLAT 3, and ARM-AE is implemented
using its original repository 4. The goal of this experimental
setup is to test the scalability of the algorithms, and not to
perform a rule quality comparison, which has already been
done in earlier work [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>Results. Figure 3 shows the execution time of each
method, in seconds on a logarithmic scale, on 5 datasets
as the number of columns increases. Execution times
include both training and the rule extraction times for the
neurosymbolic methods. The results show that Aerial+ has
one to two orders of magnitude faster execution times than
the other methods. The gap in execution time increases as
the number of columns increases. We also see that the
algorithmic method FP-Growth runs faster when the number
of columns is smaller than 30. This shows that Aerial+’s
training is only compensated if the tables have more than
30 columns. Note that Aerial+ has linear time complexity
during training and polynomial time over the number of
columns (after one-hot encoding) during the extraction.
2https://github.com/DiTEC-project/pyaerial
3https://github.com/jefrichardchemistry/pyECLAT
4https://github.com/TheophileBERTELOOT/ARM-AE/tree/master</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Neurosymbolic ARM in low-data regime</title>
      <p>
        Experiments in Section 3 showed that the fastest
algorithmic solution, FP-Growth, takes ~103 seconds on tables with
only 150 columns and 2 antecedents, while a neurosymbolic
method, Aerial+, runs one to two orders of magnitude faster.
This empirically validates the scalability of neurosymbolic
approaches to ARM. However, we argue that Aerial+ also
inherits the known issues in neural networks, particularly
the decline in performance in a low-data regime [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
Concretely, Aerial+ relies on training a deep autoencoder on
the tabular data with a reconstruction objective.
Following results from statistical learning theory [
        <xref ref-type="bibr" rid="ref43">43</xref>
        ] and
empirical observations in neural networks [
        <xref ref-type="bibr" rid="ref44">44</xref>
        ], this implies that
Aerial+’s performance is bounded by the number of training
samples, and with small data it may yield rules that do not
accurately capture ground-truth associations.
      </p>
      <p>
        An efective approach for addressing data scarcity is
transfer learning [
        <xref ref-type="bibr" rid="ref45">45</xref>
        ], which requires training a neural network,
or vector representations (i.e. embeddings) on a large dataset,
that then can be transferred to a downstream task on a small
dataset. This provides a starting point that can improve
performance in comparison to learning from scratch on a small
dataset.
      </p>
      <p>
        In this work, we propose two fine-tuning strategies to
Aerial+ using TabPFN [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], a foundation model for tabular
data that has been pre-trained over millions of tables, which
we use to generate embeddings for the small datasets in our
experiments.
4.1. Fine Tuning with Pre-trained Weight
      </p>
      <p>Initialization
Figure 4 illustrates the fine-tuning strategy introduced in
this section (Aerial+WI). On a high level, table embeddings
from a tabular foundation model are utilized to initialize
the weights of Aerial+’s under-complete denoising
autoencoder, providing a semantically meaningful starting point
for learning compact data representations.</p>
      <p>Let  ∈ ℝ × denote the tabular dataset and  ∈ ℝ  the
corresponding labels. We first compute fixed-length
embeddings for each row in  using a pretrained TabPFNClassifier.
These embeddings, denoted as  ∈ ℝ ×  , where   is the
embedding dimension, are generated via a 10-fold
TabPFNbased meta-learning scheme:</p>
      <p>=  TabPFNClassifier ( ,  )</p>
      <p>We then one-hot encode  into  ∈̂ ℝ × ′ following the
original Aerial+ pipeline, where  ′ is the total number of
binary features after encoding categorical attributes. A
twolayer projection encoder   ∶ ℝ ′ → ℝ  is trained to map  ̂
to the TabPFN embedding space. The encoder architecture
is as follows:
  ( ) ̂ =</p>
      <p>2 ⋅ Dropout( ( LayerNorm( 1 +̂  1))) +  2
 1 and bias  1 from the first layer of   are used to initialize
the corresponding parameters in the first layer of Aerial+’s
encoder:</p>
      <p>Aerial+(e1n)c ← ( 1,  1)</p>
      <p>This initialization provides a strong inductive prior for
Aerial+, guiding its encoder to start from a semantically
meaningful representation space derived from TabPFN’s
meta-learned embeddings.</p>
      <p>Note that the gene expression datasets contain no
predeifned class labels. Therefore, a random column is selected as
the target variable to enable TabPFN embedding generation.
4.2. Projection-Guided Fine Tuning via</p>
      <p>Double Loss
semantic alignment of the autoencoder reconstruction process
to the table embeddings.
  ( ̂′) ≈</p>
      <p>1
 =1
table embeddings from a tabular foundation model, jointly
optimizing reconstruction and alignment losses for semantic
consistency.</p>
      <p>Building on the projection encoder   described in Section
4.1, this second fine-tuning strategy aligns the Aerial+’s
autoencoder reconstructions with TabPFN embeddings using
a double loss function.</p>
      <p>
        Unlike the first strategy, where   was trained directly on
raw one-hot inputs, here we first pass a corrupted version of
the one-hot input  ̂ through Aerial+’s initial autoencoder  
and train   on its outputs. Specifically, we generate noisy
inputs (following the same strategy as Aerial+):
 =̃
clip( +̂ ),  ∼  (0, 
2
)
where  = 0.5 and values are clipped to [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]. We then
compute reconstructions  ̂ ′ =   ( ) ̃ . The projection encoder is
trained to map these reconstructions to their corresponding
TabPFN embeddings  ∈ ℝ ×  :
by minimizing the cosine distance loss:
ℒproj( ) = 1 −
      </p>
      <p>
        ∑ cos(  ( ̂′),   )
autoencoder is fine-tuned using a
double loss objective:
After this pretraining phase,   is frozen, and Aerial+’s
ℒ ( ) = ℒ recon(  (),̃ )̂ + ℒ
proj(  (  ())̃, )
where ℒrecon is a binary cross-entropy loss applied to per
one-hot encoded column value as in Aerial+, generating
probability distributions per column. The double loss
strategy encourages Aerial+’s autoencoder to not only
reconstruct the original data, but also to produce
representations that are semantically consistent with TabPFN’s
metalearned embedding space.
4.3. Experimental Results
Setup and hyperparameters. We run Aerial+ and the
two fine-tuned versions, with pre-trained weight
initialization (Aerial+WI) and double loss (Aerial+DL), on 5  ≫ 
datasets with 100 columns and compare their rule quality.
The default Aerial+ uses Xavier [
        <xref ref-type="bibr" rid="ref47">47</xref>
        ] weight initialization
as in the original work. All the approaches are run with 2
antecedents, for 25 epochs with a batch size of 2. Aerial+’s
autoencoder for both the default and the fine-tuned versions
consists of 2 layers per encoder and decoder, with the
dimensions  →̂ 50 → 10 , and the mirrored version for the
decoder. We run each method 50 times and present the
average rule quality results for robustness.
      </p>
      <p>
        Evaluation criteria. The standard rule quality metrics
each approach where ∀  ∈ ,   = (  →   ):
from the ARM literature are used as the evaluation
criteria [
        <xref ref-type="bibr" rid="ref2 ref21">2, 21</xref>
        ]. Let  be a set of transactions as introduced in
Section 2, and  = { 1,  2, ...,   } be the rule set learned by
• Number of rules. Total number of rules learned:||
• Average rule coverage.
      </p>
      <p>Average number of
transactions where the rule antecedent appears:
AvgCov = ||
1</p>
      <p>||
∑
=1 |{ ∈  ∣ 
• Execution time. Sum of model training time,
finetuning (when applicable), and rule extraction time
in seconds.</p>
      <p>Results. Table 4 shows the rule quality evaluation results
of Aerial+ and the two fined-tuned versions Aerial+WI and
Aerial+DL on 5 datasets. The results show that Aerial+WI
outperforms Aerial+ in terms of rule confidence and
association strength (Zhang’s metric) on all 5 datasets. Aerial+DL’s
confidence and association strength also exceed Aerial+’s on
4 out of 5 datasets, except the NonSmallCellLungCarcinoma
dataset. Both fine-tuning methods resulted in a smaller
number of rules on all datasets and with a smaller data
coverage on 3 out of the 5. This is expected as the fine-tuned
versions capture rules with higher association strength on
average, meaning the less obvious rules are eliminated
during the rule extraction process, and therefore, the final data
coverage was lower. The fine-tuned methods have higher
support values on 4 out of 5 datasets. However, we do not
take the high support values as a positive sign, as it depends
on the application. For instance, high support rules are good
at explaining trends in the data, while low support rules
can be better at explaining anomalies. Lastly, fine-tuning
resulted in only a few seconds of increment in the
execution time, which is negligible in the low-data regime. Note
that the costliest operation in Aerial+ is the rule extraction
process and not the training (or pre-training), which is not
significantly afected by the fine-tuning methods.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>The section discusses the experimental results, the role of
neurosymbolic methods, and tabular foundation models in
ARM.</p>
      <p>Neurosymbolic methods scale better on
highdimensional data. Experiments in Section 3 show that
Aerial+, a neurosymbolic method to ARM, has execution
speed of one to two orders of magnitude faster than the
algorithmic ARM approaches. We argue this is because
Aerial+ leverages neural networks’ ability to handle
highdimensional data, it has linear complexity over the number
of rows in training, and polynomial time complexity over
the number (one-hot encoded) columns during the rule
extraction stage. Algorithmic methods, on the other hand,
rely on counting the co-occurrences of itemsets in the data,
which is a costlier operation.</p>
      <p>Aerial+ inherits neural networks-specific issues into
ARM. The scalability of Aerial+ on high-dimensional data
comes at a cost, most notably the reduced performance in
the low-data regime for ARM. The original paper of Aerial+
trains only for 2 epochs on generic tabular datasets and was
able to obtain high-quality rules. In the low-data regime,
however, we were able to get high-quality rules consistently
in each execution only after training for 25 epochs. This
shows that while the neurosymbolic methods can help in
scalability, they also introduce a new research problem into
the ARM literature, namely, rule mining in the low-data
regime.</p>
      <p>Fine-tuning Aerial+ for better knowledge discovery.
Experiments in Section 4 showed that our two proposed
ifne-tuning methods using the tabular foundation model
TabPFN resulted in significantly higher-quality rules in
comparison to the default version of Aerial+ on 5 real-world
high-dimensional tabular datasets with few instances. Many
of the other tabular foundation models that we investigated,
including Tabbie, CARTE, TableGPT, and TabICL, do not
provide an interface to obtain table embeddings. Therefore,
we were not able to use them in our experiments. Since
TabPFN is trained to perform classification and regression
tasks over tabular data, we expect that models explicitly
trained to learn column embeddings and associations could
potentially result in better rule quality.</p>
      <p>Neurosymbolic methods start a paradigm shift in
ARM. We show that the Neurosymbolic ARM methods can
be supported by prior-data fitted networks, as in TabPFN,
to learn higher-quality rules. This raises the research
question of what other types of prior data or background
knowledge can be utilized as part of ARM? We invite
researchers to further investigate neurosymbolic methods
for ARM, as the neurosymbolic integration brings an
immense potential for both knowledge discovery and fully
interpretable inference across a plethora of domains.</p>
      <p>Further validation of our approach and limitations.
The algorithmic methods strictly depend on the
distribution of data when mining rules in terms of execution time,
as denser datasets where many frequent itemsets of high
support are present will eventually prolong the execution
time. Aerial+, however, applies the exact same
polynomialtime rule extraction process regardless of the density of the
data, and therefore depends less on the dataset attributes.
However, we will still test our fine-tuning approaches on
more datasets from diverse domains to further validate our
approach in future work. Furthermore, we will evaluate our
approach on generic tabular data with higher numbers of
instances, i.e.,  ≫  , to see whether it leads to early
convergence or higher quality rules. Our proposed fine-tuning
strategies are currently limited to the only available tabular
foundation model with an explicit table embedding interface,
TabPFN. Since TabPFN is specifically trained for
classification and regression, this limitation may restrict performance
improvements, and a future foundation model trained to
capture column associations explicitly could significantly
improve rule discovery.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>This paper highlights the potential of neurosymbolic
methods in the domain of association rule mining (ARM),
especially under high-dimensional and low-sample ( ≫  )
settings common in domains such as biomedicine. We have
empirically shown that Aerial+, a neurosymbolic approach,
ofers substantial scalability improvements compared to
the state-of-the-art neurosymbolic and algorithmic ARM
techniques, scaling one to two orders of magnitude faster.
However, neurosymbolic ARM also inherits the known
issues of neural networks into ARM literature, specifically
the reduced performance in low-data regimes, which we
addressed through two targeted fine-tuning strategies.</p>
      <p>Our fine-tuning methods use table embeddings from
TabPFN, a tabular foundation model, to i) initialize the
weights of Aerial+ (Aerial+WI), ii) and to better
semantically align Aerial+ autoencoder training with a given
tabular data (Aerial+DL). The results show that both Aerial+WI
and Aerial+DL methods significantly improved rule
quality in low-data settings. This demonstrates the promising
role of pretrained tabular models in enhancing knowledge
discovery over tabular datasets besides classification and
regression tasks that are commonly tackled in the tabular
data domain.</p>
      <p>Looking forward, we see this as the beginning of a broader
paradigm shift in ARM, where background knowledge and
pretrained models can be explicitly leveraged to guide rule
extraction. We invite the community to explore what other
forms of prior knowledge, architectures, or foundation
models can be integrated into neurosymbolic ARM. Future work
will also validate our methods across a wider range of
datasets and evaluate their efectiveness in high-instance
scenarios ( ≫  ), with the aim of achieving both
scalability and high interpretability in real-world data mining
applications.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work has received support from the Dutch Research
Council (NWO), in the scope of the Digital Twin for
Evolutionary Changes in water networks (DiTEC) project, file
number 19454.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used
ChatGPT (GPT-4.1) for paraphrasing and rewording. After using
this tool, the authors reviewed and edited the content as
needed and take full responsibility for the publication’s
content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Srikant</surname>
          </string-name>
          , et al.,
          <article-title>Fast algorithms for mining association rules</article-title>
          ,
          <source>in: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB</source>
          , volume
          <volume>1215</volume>
          ,
          <year>1994</year>
          , pp.
          <fpage>487</fpage>
          -
          <lpage>499</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Luna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fournier-Viger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ventura</surname>
          </string-name>
          ,
          <article-title>Frequent itemset mining: A 25 years review</article-title>
          ,
          <source>WIREs Data Mining and Knowledge Discovery</source>
          <volume>9</volume>
          (
          <year>2019</year>
          )
          <article-title>e1329</article-title>
          . doi:
          <volume>10</volume>
          .1002/widm.1329.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rudin</surname>
          </string-name>
          ,
          <article-title>Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead</article-title>
          ,
          <source>Nature machine intelligence</source>
          <volume>1</volume>
          (
          <year>2019</year>
          )
          <fpage>206</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Angelino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Larus-Stone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Alabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Seltzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rudin</surname>
          </string-name>
          ,
          <article-title>Learning certifiably optimal rule lists for categorical data</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>18</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>78</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Moens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Aksehirli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Goethals</surname>
          </string-name>
          ,
          <article-title>Frequent itemset mining for big data</article-title>
          ,
          <source>in: 2013 IEEE international conference on big data, IEEE</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Srikant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Vu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <article-title>Mining association rules with item constraints</article-title>
          .,
          <source>in: Kdd</source>
          , volume
          <volume>97</volume>
          ,
          <year>1997</year>
          , pp.
          <fpage>67</fpage>
          -
          <lpage>73</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Baralis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cagliero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Cerquitelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Garza</surname>
          </string-name>
          ,
          <article-title>Generalized association rule mining with constraints</article-title>
          ,
          <source>Information Sciences 194</source>
          (
          <year>2012</year>
          )
          <fpage>68</fpage>
          -
          <lpage>84</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yin</surname>
          </string-name>
          , W. Gan, G. Huang,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fournier-Viger</surname>
          </string-name>
          ,
          <article-title>Constraint-based sequential rule mining</article-title>
          ,
          <source>in: 2022 IEEE 9th International Conference on Data Science and Advanced Analytics (DSAA)</source>
          , IEEE,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Fournier-Viger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-W.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Tseng</surname>
          </string-name>
          ,
          <article-title>Mining top-k association rules</article-title>
          ,
          <source>in: Advances in Artificial Intelligence: 25th Canadian Conference on Artificial Intelligence</source>
          ,
          <source>Canadian AI</source>
          <year>2012</year>
          , Toronto, ON, Canada, May
          <volume>28</volume>
          -30,
          <year>2012</year>
          . Proceedings 25, Springer,
          <year>2012</year>
          , pp.
          <fpage>61</fpage>
          -
          <lpage>73</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L. T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fournier-Viger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Selamat</surname>
          </string-name>
          ,
          <article-title>Etarm: an eficient top-k association rule mining algorithm</article-title>
          ,
          <source>Applied Intelligence</source>
          <volume>48</volume>
          (
          <year>2018</year>
          )
          <fpage>1148</fpage>
          -
          <lpage>1160</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>M. J. Zaki</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-J. Hsiao</surname>
            ,
            <given-names>Charm:</given-names>
          </string-name>
          <article-title>An eficient algorithm for closed itemset mining</article-title>
          ,
          <source>in: Proceedings of the 2002 SIAM international conference on data mining, SIAM</source>
          ,
          <year>2002</year>
          , pp.
          <fpage>457</fpage>
          -
          <lpage>473</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Karabulut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Groth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Degeler</surname>
          </string-name>
          ,
          <article-title>Neurosymbolic association rule mining from tabular data</article-title>
          ,
          <source>in: Proceedings of The 19th International Conference on Neurosymbolic Learning and Reasoning</source>
          , volume
          <volume>284</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2025</year>
          , pp.
          <fpage>565</fpage>
          -
          <lpage>588</lpage>
          . URL: https://proceedings.mlr.press/v284/ karabulut25a.html.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Deep neural networks for high dimension, low sample size data</article-title>
          .,
          <source>in: IJCAI</source>
          , volume
          <year>2017</year>
          ,
          <year>2017</year>
          , pp.
          <fpage>2287</fpage>
          -
          <lpage>2293</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Korn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferretti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Monahan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , C. Schnell,
          <string-name>
            <given-names>G.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.,
          <article-title>High-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response</article-title>
          ,
          <source>Nature medicine 21</source>
          (
          <year>2015</year>
          )
          <fpage>1318</fpage>
          -
          <lpage>1325</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <article-title>High dimensional, tabular deep learning with an auxiliary knowledge graph</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>36</volume>
          (
          <year>2023</year>
          )
          <fpage>26348</fpage>
          -
          <lpage>26371</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hollmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Purucker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krishnakumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Körfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Hoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. T.</given-names>
            <surname>Schirrmeister</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <article-title>Accurate predictions on small data with a tabular foundation model</article-title>
          ,
          <source>Nature</source>
          <volume>637</volume>
          (
          <year>2025</year>
          )
          <fpage>319</fpage>
          -
          <lpage>326</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Qu</surname>
          </string-name>
          , D. HolzmÃžller, G. Varoquaux,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Morvan</surname>
          </string-name>
          ,
          <article-title>Tabicl: A tabular foundation model for in-context learning on large data</article-title>
          ,
          <source>arXiv preprint arXiv:2502.05564</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>H.</given-names>
            <surname>Iida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Thai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Manjunatha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Iyyer</surname>
          </string-name>
          , Tabbie:
          <article-title>Pretrained representations of tabular data</article-title>
          ,
          <source>arXiv preprint arXiv:2105.02584</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>P.</given-names>
            <surname>Yin</surname>
          </string-name>
          , G. Neubig, W.-t. Yih,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <article-title>Tabert: Pretraining for joint understanding of textual and tabular data</article-title>
          , arXiv preprint arXiv:
          <year>2005</year>
          .
          <volume>08314</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , G. Zhang, G. Chen, G. Zhu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          , et al.,
          <article-title>Tablegpt2: A large multimodal model with tabular data integration</article-title>
          ,
          <source>arXiv preprint arXiv:2411</source>
          .
          <year>02059</year>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaushik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. Fister</given-names>
            <surname>Jr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Draheim</surname>
          </string-name>
          ,
          <article-title>Numerical association rule mining: a systematic literature review</article-title>
          ,
          <source>arXiv preprint arXiv:2307.00662</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>E.</given-names>
            <surname>Karabulut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Groth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Degeler</surname>
          </string-name>
          ,
          <article-title>Learning semantic association rules from internet of things data</article-title>
          ,
          <source>Neurosymbolic Artificial Intelligence</source>
          <volume>1</volume>
          (
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          . 1177/29498732251377518.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>M. van Bekkum</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. de Boer</surname>
          </string-name>
          , F. van
          <string-name>
            <surname>Harmelen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Meyer-Vitali</surname>
          </string-name>
          , A. t. Teije,
          <article-title>Modular design patterns for hybrid learning and reasoning systems: a taxonomy, patterns</article-title>
          and use cases,
          <source>Applied Intelligence</source>
          <volume>51</volume>
          (
          <year>2021</year>
          )
          <fpage>6528</fpage>
          -
          <lpage>6546</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>H. K.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yadav</surname>
          </string-name>
          ,
          <article-title>An innovative approach for association rule mining in grocery dataset based on non-negative matrix factorization and autoencoder</article-title>
          ,
          <source>Journal of Algebraic Statistics</source>
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <fpage>2898</fpage>
          -
          <lpage>2905</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>T.</given-names>
            <surname>Berteloot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Khoury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Durand</surname>
          </string-name>
          ,
          <article-title>Association rules mining with auto-encoders</article-title>
          ,
          <source>in: International Conference on Intelligent Data Engineering and Automated Learning</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>51</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bank</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Koenigstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Giryes</surname>
          </string-name>
          ,
          <string-name>
            <surname>Autoencoders,</surname>
          </string-name>
          <article-title>Machine learning for data science handbook: data mining and knowledge discovery handbook (</article-title>
          <year>2023</year>
          )
          <fpage>353</fpage>
          -
          <lpage>374</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>P.</given-names>
            <surname>Vincent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Larochelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          , P.
          <article-title>-A. Manzagol, Extracting and composing robust features with denoising autoencoders</article-title>
          ,
          <source>in: Proceedings of the 25th international conference on Machine learning</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>1096</fpage>
          -
          <lpage>1103</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ruan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lan</surname>
          </string-name>
          , J. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <article-title>Language modeling on tabular data: A survey of foundations, techniques and evolution</article-title>
          , arXiv preprint arXiv:
          <volume>2408</volume>
          .10548 (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pearl</surname>
          </string-name>
          , Causality, Cambridge university press,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>M. J. Kim</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Grinsztajn</surname>
          </string-name>
          , G. Varoquaux,
          <article-title>Carte: pretraining and transfer for tabular learning</article-title>
          ,
          <source>arXiv preprint arXiv:2402.16785</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , E. Blomqvist,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , C. d'Amato,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Melo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kirrane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E. L.</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Neumaier</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Knowledge</surname>
            <given-names>graphs</given-names>
          </string-name>
          ,
          <source>ACM Computing Surveys (Csur) 54</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>37</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>W.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Soares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Greninger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Edelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lightfoot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Forbes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bindal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Beare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. R.</given-names>
            <surname>Thompson</surname>
          </string-name>
          , et al.,
          <article-title>Genomics of drug sensitivity in cancer (gdsc): a resource for therapeutic biomarker discovery in cancer cells</article-title>
          ,
          <source>Nucleic acids research</source>
          <volume>41</volume>
          (
          <year>2012</year>
          )
          <fpage>D955</fpage>
          -
          <lpage>D961</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>F.</given-names>
            <surname>Iorio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Knijnenburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Vis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. R.</given-names>
            <surname>Bignell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Menden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schubert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Aben</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gonçalves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Barthorpe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lightfoot</surname>
          </string-name>
          , et al.,
          <article-title>A landscape of pharmacogenomic interactions in cancer</article-title>
          ,
          <source>Cell</source>
          <volume>166</volume>
          (
          <year>2016</year>
          )
          <fpage>740</fpage>
          -
          <lpage>754</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>M. J. Garnett</surname>
            ,
            <given-names>E. J.</given-names>
          </string-name>
          <string-name>
            <surname>Edelman</surname>
            ,
            <given-names>S. J.</given-names>
          </string-name>
          <string-name>
            <surname>Heidorn</surname>
            ,
            <given-names>C. D.</given-names>
          </string-name>
          <string-name>
            <surname>Greenman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Dastur</surname>
            ,
            <given-names>K. W.</given-names>
          </string-name>
          <string-name>
            <surname>Lau</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Greninger</surname>
            ,
            <given-names>I. R.</given-names>
          </string-name>
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Soares</surname>
          </string-name>
          , et al.,
          <article-title>Systematic identification of genomic markers of drug sensitivity in cancer cells</article-title>
          ,
          <source>Nature</source>
          <volume>483</volume>
          (
          <year>2012</year>
          )
          <fpage>570</fpage>
          -
          <lpage>575</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Mourragui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Loog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Vis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Manjon</surname>
          </string-name>
          , M. A. van de Wiel,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Reinders</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. F.</given-names>
            <surname>Wessels</surname>
          </string-name>
          ,
          <article-title>Predicting patient response with models trained on cell lines and patient-derived xenografts by nonlinear transfer learning</article-title>
          ,
          <source>Proceedings of the National Academy of Sciences</source>
          <volume>118</volume>
          (
          <year>2021</year>
          )
          <article-title>e2106682118</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>M. J. Zaki</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Parthasarathy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ogihara</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          , et al.,
          <article-title>New algorithms for fast discovery of association rules</article-title>
          .,
          <source>in: KDD</source>
          , volume
          <volume>97</volume>
          ,
          <year>1997</year>
          , pp.
          <fpage>283</fpage>
          -
          <lpage>286</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>J.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . Pei,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <article-title>Mining frequent patterns without candidate generation</article-title>
          ,
          <source>ACM sigmod record 29</source>
          (
          <year>2000</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>L.</given-names>
            <surname>Shabtay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fournier-Viger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yaari</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Dattner</surname>
          </string-name>
          ,
          <article-title>A guided fp-growth algorithm for mining multitudetargeted item-sets and class association rules in imbalanced data</article-title>
          ,
          <source>Information Sciences 553</source>
          (
          <year>2021</year>
          )
          <fpage>353</fpage>
          -
          <lpage>375</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Y.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>Pfp: parallel fp-growth for query recommendation</article-title>
          ,
          <source>in: Proceedings of the 2008 ACM conference on Recommender systems</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>107</fpage>
          -
          <lpage>114</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>A parallel fp-growth algorithm based on gpu</article-title>
          , in: 2017 IEEE 14th International Conference on e-Business
          <string-name>
            <surname>Engineering</surname>
          </string-name>
          (ICEBE), IEEE,
          <year>2017</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>102</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>E.</given-names>
            <surname>Karabulut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Groth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Degeler</surname>
          </string-name>
          , Pyaerial:
          <article-title>Scalable association rule mining from tabular data</article-title>
          ,
          <source>SoftwareX</source>
          <volume>31</volume>
          (
          <year>2025</year>
          )
          <article-title>102341</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.softx.
          <year>2025</year>
          .
          <volume>102341</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>S.</given-names>
            <surname>Raschka</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mlxtend:</surname>
          </string-name>
          <article-title>Providing machine learning and data science utilities and extensions to python's scientific computing stack</article-title>
          ,
          <source>The Journal of Open Source Software</source>
          <volume>3</volume>
          (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .21105/joss.00638.
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>V.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          ,
          <article-title>Statistical learning theory</article-title>
          , Wiley,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Bengio,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Recht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <article-title>Understanding deep learning (still) requires rethinking generalization</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>64</volume>
          (
          <year>2021</year>
          )
          <fpage>107</fpage>
          -
          <lpage>115</lpage>
          . doi:
          <volume>10</volume>
          .1145/3446776.
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Khoshgoftaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A survey of transfer learning</article-title>
          ,
          <source>J. Big Data</source>
          <volume>3</volume>
          (
          <year>2016</year>
          )
          <article-title>9</article-title>
          . doi:
          <volume>10</volume>
          .1186/ S40537- 016- 0043- 6.
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A method for stochastic optimization</article-title>
          ,
          <source>arXiv preprint arXiv:1412.6980</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>X.</given-names>
            <surname>Glorot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Understanding the dificulty of training deep feedforward neural networks</article-title>
          ,
          <source>in: Proceedings of the thirteenth international conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>249</fpage>
          -
          <lpage>256</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>X.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Zhang,
          <article-title>Confidence metrics for association rule mining</article-title>
          ,
          <source>Applied Artificial Intelligence</source>
          <volume>23</volume>
          (
          <year>2009</year>
          )
          <fpage>713</fpage>
          -
          <lpage>737</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>