1. Introduction

A Domain-Specific Language for NeSy Focussing on Symbolic Knowledge Injection

Mattia Matteini

mattia.matteini@unibo.it 0

Giovanni Ciatto

giovanni.ciatto@unibo.it 0

Andrea Omicini

andrea.omicini@unibo.it 0 0 Matteo Magnini

2025

In neuro-symbolic AI (NeSy), integrating symbolic languages - typically subsets of first-order logic ( FOL) -, with neural networks (NNs) serves goals like enhancing symbolic processing, extending reasoning with pattern recognition, and guiding neural learning with symbolic knowledge-a.k.a. symbolic knowledge injection (SKI). Despite its utility, FOL's expressiveness poses challenges to SKI algorithms, and its general-purpose nature complicates use for non-experts. We propose SKI-lang, a domain-specific language for SKI that balances practicality, clear semantics, and expressiveness-tractability trade-ofs. SKI-lang simplifies symbolic specification, serves as a unified interface for diverse SKI approaches, and allows for automating benchmarks from NeSy literature. We discuss the design choices behind SKI-lang and its implementation, and demonstrate its efectiveness and versatility through a few case studies.

eol>symbolic knowledge injection SKI-lang NeSy language Python

1. Introduction

In the context of neuro-symbolic AI (NeSy) [ 1 ], many approaches have been proposed to integrate (some sort) for symbolic language – most commonly, a subset of first-order logic (FOL) – with neural networks (NNs) [ 2 ], to pursue disparate goals, including, but not limited to: (i) speeding up symbolic processing via neural computation, (ii) extending symbolic reasoning with pattern-recognition capabilities, or (iii) controlling the learning process of NNs with symbolic knowledge.

The latter goal in particular is also known into the literature as symbolic knowledge injection (SKI) [ 3 ]. SKI has been addressed by several works, proposing as many algorithms, each one focussing on a diferent subset of FOL, ranging from propositional logic to Datalog [ 4 ] and beyond—up to the full power of FOL itself.

However, the very choice of the FOL syntax (and its subsets) as the target symbolic language has never been questioned, despite posing several challenges to SKI algorithms, because of its expressiveness. To complicate the matter, we observe that writing a symbolic specification to be injected via general-purpose and expressive languages like FOL, Prolog [ 5 ], or Datalog, may often be cumbersome for non-experts in symbolic reasoning. In fact, a modelling efort is required to translate the domain knowledge into the target symbolic language, and we argue that such modelling efort could be reduced by using a domain-specific language (DSL).

Accordingly, in this paper, we propose SKI-lang, a domainspecific language for SKI, which aims at being practical for non-experts in symbolic reasoning, while still retaining a clear semantics and a good expressiveness–tractability tradeof [ 6 ]. In particular, we discuss the engineering choices behind the design of SKI-lang, and we show how it can act as a common interface for diferent benchmarks from the NeSy literature. Most notably, the main goal of this paper is to motivate the need for an ad-hoc language for SKI – complementary to FOL – and to propose one particular syntactical reification for such a language, tailored on the current state of practice in machine learning (ML) and NeSy. The implementation is preliminary, and we do not claim completeness or generality. Instead, the paper reports about our proof-of-concept implementation of SKI-lang, and provides a roadmap for future work.

2. Background

NeSy has emerged as a significant area of research within artificial intelligence ( AI), aiming to integrate symbolic reasoning with NNs learning to leverage the strengths of both symbolic and connectionist approaches. Typically, symbolic – most commonly, logic – languages are integrated to enhance NNs by enabling structured reasoning, interpretability, and explicit knowledge representation.

Information representation within these systems often combines localist (symbol-based) and distributed (subsymbolic/neural-based) approaches [ 7 ], providing flexible and efective knowledge representation and data processing capabilities. There, training processes integrate neuralbased inductive learning from data with symbolic reasoning approaches, allowing systems to learn efectively even from smaller datasets due to explicit symbolic knowledge representation.

From symbolic AI, NeSy approaches may inherit various reasoning capabilities, such as: deductive, inductive, abductive, common-sense, and combinatorial reasoning [ 1 ]. Yet, decision-making processes in these systems combine intuitive, heuristic neural processing with deliberate symbolic reasoning, closely mimicking human cognitive patterns.

Lastly, logic utilised in NeSy ranges from propositional to higher-order logic, ofering extensive capabilities for knowledge representation and reasoning by embedding logical frameworks directly within neural architectures, thus enabling complex logical inferences.

2.1. Symbolic knowledge injection (SKI)

SKI refers to the injection of symbolic knowledge – expressed in formal logic – into sub-symbolic predictors like NNs. As defined by [ 2 ], SKI is “any algorithmic procedure affecting how sub-symbolic predictors draw their inferences in such a way that predictions are either computed as a function of, or made consistent with, some given symbolic knowledge”.

SKI aims to improve model interpretability, robustness, and controllability by incorporating structured, humanintelligible prior knowledge into the learning process. Three major SKI strategies are recognised in the literature: structuring, guided learning, and embedding [ 2 ]. Structuring: the architecture of the predictor is built or modified to relfect the symbolic knowledge structure, e.g., via encoding rules directly as modules in a NN. Guided learning (a.k.a. Constraining): symbolic constraints are added to the loss function, guiding learning via soft penalties or hard constraints. Embedding: symbolic knowledge is converted into continuous representations that are fed into the predictor as part of the input.

SKI methods vary widely with respect to the kind of logic language they support. These range from propositional logic, to FOL (used in logic tensor networks (LTN) [ 8 ] and logic neural networks (LNN) [ 9 ]), to probabilistic or Horn logics (used in DeepProbLog [ 10 ], NTP [ 11 ]), to Datalog (as in Scallop [ 12 ]).

Related works. Neural theorem proving (NTP) [ 11 ] implements a diferentiable version of backward chaining using Horn logic. Variables are grounded via soft unification in embedding space, blending guided learning with embedding strategies. DeepProbLog [ 10 ] extends probabilistic Prolog with neural predicates and performs SKI via structuring. The logic program controls the inference pipeline, and variables are grounded via probabilistic backward chaining over dataset constants. Neuro-symbolic forward chaining (NSFR) [ 13 ] performs forward-chaining symbolic reasoning over probabilistic ground atoms. It structures symbolic inference within neural computation, and performs grounding by enumerating atoms and chaining over them. LTN [ 8 ] uses fuzzy FOL and injects rules via diferentiable real-valued semantics that act as soft constraints in the loss function of NNs which are structured to reflect the symbolic knowledge. Therefore, this method combines the ‘constraining’ and ‘structuring’ strategies. Knowledge-enhanced neural networks (KENN) [ 14 ] works in propositional logic and injects knowledge through a knowledge enhancement layer added to a neural classifier. Rules are translated into diferentiable adjustments applied post-hoc to network outputs, realising guided learning. LNN [ 9 ] encodes weighted FOL directly in network structure. Symbolic formulas are embedded into the network via confidence scores and diferentiable logic gates. This combines structuring with constrained learning. Hierarchical rule induction (HRI) [ 15 ] learns logic programs from data using meta-rules, combining inductive learning with diferentiable logic. Grounding is achieved via substitution over datasets and similarity in neural embedding space. Knowledge Injection via Lambda Layer (KILL) [ 16 ] regularises NN training with symbolic knowledge in stratified Datalog with negation. Knowledge Injection via Network Structuring (KINS) [ 17 ] injects logic formulas, expressed in stratified Datalog with negation, into NN by structuring additional layers that mimic the symbolic knowledge. Scallop [ 12 ], supports Datalog-style programs over tensors, ofering diferentiable symbolic reasoning. Grounding is done via batched comprehension over input data, and symbolic rules are enforced structurally. DeepLogic [ 18 ] uses structured neural logic operators over tree-based FOL expressions. It applies structuring and guided learning to learn logical forms jointly with perception.

2.2. About SKI Languages

In the realm of NeSy, the interplay between symbolic logic languages and NN architectures introduces crucial considerations around expressivity, computational tractability, and usability—particularly within ML workflows.

This discussion aims to reflect on these issues, emphasising the challenges posed by diferent logic languages to SKI, the various solutions adopted by existing methods, and ultimately addressing the question of how these complexities relate practically to the everyday tasks of data scientists and ML practitioners.

Expressivity vs. Tractability: A Spectrum. Symbolic

languages vary significantly in expressivity and computational complexity, fundamentally influencing their usability and suitability in ML contexts.

At the lower end of the expressivity spectrum lies propositional logic, which provides simplicity and computational tractability. It allows for straightforward symbolic-to-subsymbolic translation and integration into ML workflows, as seen in early and simpler approaches. However, propositional logic is limited to representing flat, atomic statements about domain entities and lacks the capability to generalise across instances using variables or quantifiers, limiting its practical utility in more complex ML applications.

Conversely, FOL stands at the upper end of the expressivity spectrum, empowered by variables, quantifiers, unification, and logical inference mechanisms such as resolution. This language allows for compact, intentional, and relational representations of knowledge, which are powerful within symbolic reasoning frameworks. Nonetheless, when integrating FOL into neural models – where datasets form strict subsets of the Herbrand universe – complexities emerge due to grounding (instantiation of variables), interpretation, and computational overhead.

In particular, FOL’s expressive nature demands significant computational resources and sophisticated tricks to efectively use it for SKI. Common methods include: soft grounding, as utilised in diferentiable neural logic frameworks (e.g., LTN [ 8 ]); probabilistic backward chaining, exemplified by systems like DeepProbLog [ 10 ]; and embedding-based grounding, such as the soft unification in NTP [ 11 ]. These methods essentially mitigate the complexity by translating symbolic structures into computationally manageable, diferentiable forms.

Intermediate languages such as Horn clauses and Datalog – including their stratified and negation-free variants –, strike a middle ground [ 19 ]. They simplify symbolic formulations through syntactical constraints, enhancing computational tractability. However, they still present challenges, particularly when recursive definitions are involved, as recursion may lead to non-terminating grounding processes— unless managed through smart techniques like: batched grounding, as in Scallop’s (cf. [ 12 ]) implementation for Datalog programs; or forward chaining with probabilistic atoms, such as in NSFR [ 13 ]. These “tricks” prevent infinite computations by limiting recursive expansions or bounding inference procedures through practical heuristics. Practical Usability. From a purely symbolic perspective, these technical challenges ofer deep theoretical interest, particularly around the foundational study of neurosymbolic integration. Nevertheless, the critical question remains: are such complexities genuinely necessary for practical ML tasks?

Practically speaking, data scientists typically resort to SKI when raw data alone is insuficient for training robust ML models. This insuficiency arises due to data scarcity, uneven data distribution, or dataset bias—scenarios common in real-world applications such as healthcare diagnostics, fairness-aware AI, or structured data interpretation.

In these cases, symbolic knowledge becomes a complementary resource, enhancing model performance through structured constraints and prior domain knowledge. Specifically, symbols in these contexts are not referencing abstract entities but rather direct representations of data instances, their features, and relational knowledge explicitly linked to dataset columns and rows.

Along this line, SKI practitioners may just need a language that allows them to express symbolic constraints or relations over the domain of the dataset at hand, simplifying the expression of declarative statements which involve the dataset’s instances (and their components), and features— rather than arbitrary Herbrand terms.

Moreover, as the (i) specification of the knowledge to be injected and (ii) the hyperparameters of the learning and (iii) injections processes are deeply intertwined, with a lot of back-and-forth between the two, it is crucial to provide SKI practitioners with a unified language to express both symbolic knowledge and SKI/ML workflows. This would represent a significant advantage in terms of usability and experimental setup time.

To address these concerns, in the next sections, we present SKI-lang, and its design rationale, and we attempt to demonstrate its efectiveness and usability in ordinary ML tasks where SKI may apply.

2.3. Running Examples and Benchmarks

In the remainder of this paper, we rely on running examples taken from three distinct SKI benchmarks, tailored onto as many application domains, namely: handwritten digit recognition, fairness-aware income prediction, and SKI-enhanced Poker hand classification. Each benchmark demonstrates a diferent way to exploit SKI and lets us showcase some feature of SKI-lang in real-world scenarios.

The benchmarks difer in data format, learning objective, and symbolic knowledge to be injected. We present them here as they will be referenced multiple times in the following section.

Sum of MNIST Digits. This benchmark is based on the well-known MNIST dataset1 of handwritten digits. The dataset consists of 70,000 gray-scale 28 × 28 pixels images labelled with one of 10 digit classes. The goal is to train a neural classifier that predicts the digit class of an image, but with an additional symbolic constraint: when images are grouped in pairs, the sum of the true classes of each pair must equal the sum of their predicted classes.

The symbolic constraint thus involves a global consistency condition across two independent input instances. This setting highlights a key foundational challenge for SKI, namely how to enforce relational constraints across multiple inputs, especially when symbolic information is not local to a single instance.

Fair Income Prediction. Based on the Adult (a.k.a., Census Income) dataset2, this benchmark addresses the problem of learning a binary income predictor (above or below $50k) from demographic and employment data. The dataset contains 48,842 tabular records with 14 features including age, education, occupation, and race. The learning task is binary classification over the income field.

The symbolic knowledge to be injected encodes a fairness constraint: the predicted income should be independent of the sensitive attribute race. This is commonly formalised as a statistical parity requirement (a.k.a., demographic parity, cf., [ 20 ]). Briefly, statistical parity is a fairness metric that measures the diference between the probabilities that individuals from privileged and unprivileged groups receive a favorable outcome. The more the value is close to zero, the more the predictor is considered fair.

From a foundational SKI perspective, this example highlights the challenge of injecting distributional constraints— not over individual predictions but over group-level statistics. The benchmark is also interesting because it combines numerical and categorical features, making it a test case for symbolic reasoning over mixed-type structured data. SKI-enhanced Poker-Hand Classification. This benchmark involves training a sub-symbolic classifier over a dataset of poker hands3. Each data instance encodes 5 playing cards through 10 attributes (5 suits and 5 ranks), and is assigned one of 10 class labels corresponding to the type of hand (e.g., pair, flush, full house, etc.). The dataset is highly imbalanced because some poker hands are much rarer than others.

The available symbolic knowledge consists of a rich set of crisp logic rules that fully characterise each class. This makes the benchmark suitable for stress-testing SKI under conditions where symbolic information is both precise and essential, due to the extreme class imbalance and low data coverage. From a foundational standpoint, this example poses challenges in terms of combining rule-based logic (e.g., multiple conditions with dependencies and precedence) with neural learning, and allows research on prioritised rule injection and expressiveness–tractability trade-ofs.

3. A Practical Language for SKI

Here we introduce SKI-lang, a DSL for NeSy that is specifically designed to make SKI practical for data scientists. SKI-lang is a declarative language that allows users to express symbolic knowledge in a way that is both intuitive and concise, tailoring that knowledge to the data-related task at hand – for which a dataset is supposed to be available –, and training ML predictors accordingly—in such a way that, at the end of training, they comply with the aforementioned symbolic knowledge expressed in SKI-lang. 2‘Adult’ Dataset on UCI: https://doi.org/10.24432/C5XW20 3‘Poker Hand’ Dataset on UCI: https://doi.org/10.24432/C5KW38 Accordingly, in this section, we first discuss the abstract design criteria that guided our design of SKI-lang, and then we provide a brief overview of its concrete syntax and the intended semantics.

3.1. Design Criteria

SKI-lang is designed to serve the purposes of a data scientist working on some supervised ML task of interest, for which a dataset is available via a clear schema, as well as some background knowledge that may be worth keeping into account when training ML predictors for the task.

For the sake of simplicity – yet without loss of generality –, we describe the dataset as a table-like structure, where each row represents an instance of the domain at hand, and each column represents a feature of the dataset. We assume that features come with mnemonic names, whereas instances are represented by their row number. Finally, we assume that one feature is marked as the target feature w.r.t. the supervised ML task at hand.

Under these assumptions, the core goals of SKI-lang are to provide the data scientist with (G1) a convenient, concise, and expressive syntax for expressing their background knowledge; as well as (G2) a declarative syntax for specifying the ML workflow – including the dataset schema, the predictors to be trained, and their hyperparameters –, in such a way that SKI-lang is the only entry point for any SKI-enhanced ML pipeline.

To address these goals, we design SKI-lang to satisfy the following requirements, enumerated by Ri. The discussion is deliberately abstract: please refer to section 3.2 for concrete syntactical examples.

Expressing the knowledge (G1). Firstly, and most importantly, (R1) SKI-lang should allow expressing knowledge about the dataset in symbolic form. More specifically, it should be possible to express declarative statements involving: (i) references to one or more instances from the dataset, (ii) references to one or more features from the dataset, (iii) references to one or more features of the same instance (iv) arbitrary constants; (v) named logic predicate definitions over the items above; (vi) any algebraic or logical combination of the items above. Such statements constitute the symbolic knowledge specification to be injected.

Furthermore, to simplify the specification of common logic statements, (R2) SKI-lang should support the import and usage of built-in functions and predicates, aimed at keeping the symbolic specification concise and declarative.

Finally, (R3) the language should be agnostic w.r.t. the particular sort of SKI approach and algorithm being used. In other words, SKI-lang should be able to express symbolic knowledge in a way that is independent to how it will be injected. In practice, this means that SKI-lang should allow for (i) structuring the architecture of the predictor out of the symbolic knowledge, (ii) constraining the loss function of the predictor with symbolic knowledge, (iii) embedding the symbolic knowledge into vectors that are fed into the predictor, or (iv) any combination of the above; while (v) requiring minimal or no changes to the specification.

Declaring the workflow (G2). To account for the declaration of end-to-end SKI workflows where all relevant aspects of the process are specified in a single place, SKIlang should also support (R4) the declaration or import of the ML predictor(s) subject to SKI, and of the (R5) the dataset and data-schema to be used for training and testing the predictors. Similarly, it should support (R6) the selection of the SKI algorithm to be used, and (R7) the customisation of any aspect related to the SKI-aware ML pipeline.

More precisely, requirement R4 prescribes that the predictor undergoing SKI – be it a predictor to be loaded from a file, or a new one to be trained from scratch –, should be declared in SKI-lang. Declarations should specify any modelling aspect, there including: the predictor family of choice (e.g.,NNs, decision trees, etc.), its actual hyperparameters (e.g., number of layers, the number of neurons per layer, the activation functions, etc.), and its mapping to symbolic predicates. The latter, in particular, aims at declaring the interface (name + arity) the ML predictor is ofering to the symbolic realm. This would allow SKI-lang users to reference the predictor in the symbolic knowledge specification, as if it were a logic symbol4.

Similarly, requirement R5 prescribes that the dataset(s) being used for training and testing the predictors – as well as the schema of the data therein contained –, should be declared in SKI-lang too. This includes the dataset’s name, the names of the features, the classification of features as target or non-target, and the data type of each feature—aside from the actual URLs or paths to the dataset(s) files.

Finally, requirement R6 prescribes that SKI-lang should let the user select the SKI algorithm to be used for injecting the symbolic knowledge into the ML predictors. This implies that a few more facilities should be available to SKIlang users, namely: (i) some syntactical construct to select the SKI algorithm to adopt, and (ii) multiple, ad-hoc parsers for adapting SKI-lang’s syntax as many SKI algorithms. Requirement R7 complements such customisability by allowing the user to customize details such as: the fuzzification and grounding strategies, the learning rate, the number of epochs, the random seeds, etc.—possibly including safe defaults.

3.2. Syntax By Examples

SKI-lang adopts a YAML-like syntax as its foundational design choice. YAML5 is a popular and intuitive configuration language, widely adopted in the data science and software engineering communities for its clean readability and shallow nesting. Its ability to support explicit sectioning, hierarchical definitions, as well as anchors and references makes it ideal for the kind of structured yet flexible specification required in neuro-symbolic workflows. Moreover, the existence of robust and mature parsing libraries across several programming languages ensures seamless integration of SKI-lang into modern ML programming frameworks.

Each SKI-lang script is a YAML file composed of five primary sections: data, optimization, learnables, knowledge, and constraints. The data section declares the dataset(s) to be used and defines their schema, thereby addressing requirement R5. The optimization section specifies all tunable hyperparameters and SKI-related conifguration options, addressing requirements R6 and R7. The learnables section declares the structure, I/O types, and hyperparameters of the learnable sub-symbolic predictors to be trained, covering requirements R4 and R3 (as far as 4For instance, a binary classifier may be mapped onto a logic unary predicate, where the predicate’s name is the name of the predicted class. 5cf. https://yaml.org Listing 1: SKI-lang example: Sum of MNIST Digits benchmark structuring is concerned). The constraints section encodes declarative statements that represent symbolic knowledge to be injected as constraints—hence addressing requirements R1, R2, and R3 (as far as constraining is concerned). Finally, the knowledge section allows users to declare auxiliary symbolic definitions, reusable logic predicates, and domain knowledge to be referenced in the aforementioned sections—thus supporting requirements R1, R2, and R3 (as far as embedding is concerned).

Below, we explain the intended purpose of each section, as well as the key features of SKI-lang’s syntax, via a few incremental examples tailored on the benchmarks from section 2.3.

3.2.1. MNIST Example

Here we present a minimal example of SKI-lang applied to the ‘sum of MNIST digits’ benchmark, where a single rule involving pairs of MNIST digits is injected via constraining. Refer to listing 1 for the features described here. Pythonic formulas. Symbolic formulas appearing in the constraints, knowledge, and learnables sections are expressed using a compact, Python-like syntax. These formulas consist of algebraic and logical expressions over variables and constants, with symbols either implicitly or explicitly declared in the data section to ensure consistency with the dataset schema. In this way, SKI-lang allows logic constraints to refer directly to instance-level values, features, or model predictions, using intuitive dot-notation and functional application.

For example, to express the constraint that, for any pair of MNIST digits x1 and x2, the sum of the predicted classes must equal the sum of the groundtruth classes, one can write a formula as simple as: digit(x1)+digit(x2) == x1.value+x2.value Unpacking the minimal example from listing 1, we can observe several key elements of SKI-lang in action, and understand how the formula above is interpreted.

The constraints section can be filled with a list of formulas, each one expressing a symbolic constraint to be injected. Each constraint is expressed in a natural and concise manner, with a Pythonic syntax which is familiar to most data scientists. Constraints should be prefixed by a keyword specifying their applicability scope (e.g. always) over the declared instance variables (e.g. x1 and x2), which are introduced in the data section as independent draws from the dataset(s) therein declared. In the particular case of listing 1, the always keyword indicates that the constraint should be re-evaluated for every pair of instances x1 and x2 drawn from the dataset.

The rest of the data section declares the structure of the dataset at hand—i.e., what features and targets it contains, and of what types. In the MNIST case, each instance includes an image feature (represented as a 28×28 tensor, i.e., a grayscale picture depicting a handwritten digit), and a target feature value ranging from 0 to 9 (representing the digit class). Hence, expressions like x1.value and x2.value are evaluated as the ground-truth class labels of x1 and x2. Learnables as the link between realms. The learnables section hosts named declarations for trainable ML predictors, and their hyperparameters, possibly expressed in terms of the dataset schema. The MNIST example from listing 1 introduces a model named digit, i.e.: a neural classifier aimed to predict the class of a digit, given its image. The declaration includes the model’s name, along with an architectural specification (e.g., two layers with ReLU activations and a softmax output layer). Notice that the input layer is not explicitly declared, as it is automatically inferred from the dataset schema, while the output layer must be explicitly declared with an activation function which is adequate for the task at hand (here: softmax for multi-class classification).

Importantly, in SKI-lang, once a predictor is named (here: digit), it becomes callable as a logic function in symbolic expressions. Thus, digit(x) represents the predicted class of instance x1, and the whole constraint can be interpreted as a symbolic equality between predicted and ground-truth sums.

Speaking of learnables, it is worth focussing on the inputs (resp. outputs) subsection: this is where the model’s input and output types are declared, hence allowing for computing the shapes of its input and output layers. Such declarations may reference the dataset’s attribute names, as defined in the data section, using the dataset’s name as a global variable, and feature names as attributes (e.g., MNIST.features). If transformations are needed – such as one-hot encoding (OHE) for categorical features – these should be declared here too, as they must be kept into account when shaping the model’s structure, yet they should be transparent to the symbolic knowledge specification— meaning that symbolic formulas would keep referring to the original features instead of the encoded ones. Multiple instances at a time. Most notably, SKI-lang assumes that the specific variable names being used to refer to instances of a dataset are declared in the data section too, explicitly, under the instances sub-sub-section (cf. x1 and x2 in listing 1). This is not just a readability feature, but rather a crucial design choice that allows SKI-lang to declare when a SKI process is considering to multiple instances at a time.

In fact, there could be scenarios where the symbolic knowledge to be injected involves multiple instances at once. The simplest example is the ‘sum of MNIST digits’, where the formula to inject considers two digits at a time, and it subtends a universal quantification over all pairs of instances. Generalizing on this point, SKI-lang allows for the declaration of multiple instance variables, hence allowing for the injection of rules that involve (at maximum) all of them at once.

Declaring the multiple instances explicitly, in turns, enables SKI-lang parsers to configure data-loaders and batching strategies accordingly, ensuring that the target amount of instances are loaded altogether during training, injection, and inference.

3.2.2. Census Income Example

Here we present a minimal example of SKI-lang applied to the ‘fair income prediction’ benchmark, where a columnwise fairness constraint is injected via constraining into an ordinary binary classifier trained via supervised learning. We focus only on new syntactical aspects which are not already covered by the MNIST example. Refer to listing 2 for the features described here.

Column-wise expressions. Let us consider the case where fairness is computed by means of the statistical parity criterion, which requires that the predicted income is independent of some sensitive attribute (say, race). To compute statistical parity, one needs to compare the distribution of the predicted income across diferent values of the race column—a dataset-wise operation, be it the training- or testset, or just a batch.

In SKI-lang, column-wise expressions are supported by the <dataset>.<column> syntax, where <dataset> is the name of the dataset declared in the data section, and <column> is the name of some column as declared in the same section. Expressions of this form are evaluated as column tensors, allowing the application of tensor operations across the entire column. Hence, assuming that a built-in function SP is available to compute statistical parity among two column-tensors, the fairness constraint can be expressed as in listing 2. Another hidden assumption in there is that applying the learnable function over50k to a multidimensional tensor containing the training instances input features (e.g., adult.features) would yield a column-tensor containing the predicted income for each instance.

In this example, there are then two constraints being declared for injection: an ordinary supervision constraint (i.e., the predicted income should match the expected one) to be computed instance-wise, and a fairness constraint (i.e., the statistical parity between the predicted income across races should be below threshold) to be computed columnwise—hence, globally, i.e., once per dataset.

Optimization parameters. A common trick to implement column-wise constraints during gradient-descentbased training processes is to compute those constraints over the entire batch—leading to wider batches to be preferable. To account for this and other similar cases, SKI-lang supports the specification of custom optimization parameters in the outer optimization section of the YAML configuration. In listing 2, for instance, the batch_size parameter is set to 256.

In general, other optimization-related parameters must be specified here, such as: (i) the number of training epochs, (ii) the learning rate, (iii) the random seed, (iv) the optimizer and (v) the injection algorithm to be used, etc.

Built-in functions. To simplify the specification of common symbolic constraints, SKI-lang supports referencing built-in functions. This is the case, for instance, of the SP function in listing 2, which computes the statistical parity between two column tensors. More generally, these are ordinary Python functions involving tensors as arguments, and returning tensors as results. These can be provided as built-in symbols upon calling the SKI-lang parser, and allow for a more concise specification of common symbolic constraints.

Despite their simplicity, the possibility to plug additional functions to simplify the expression of symbolic logic is a key engineering feature of our approach. This is where SKIlang becomes the sub-stratum upon which SKI algorithms engineers can build re-usable functions for expressing symbolic knowledge to be reused.

3.2.3. Poker Hand Example

Here we present a minimal example of SKI-lang applied to the ‘poker hand classification’ benchmark, where the learning process may greatly benefit from symbolic knowledge, which is in turn quite complex to express. We focus only on new syntactical aspects which are not already covered by previous examples. Refer to listing 3 for the features described here.

Test-set separation. SKI-lang naturally supports the separation of training- and test-sets, by allowing the user to declare them in separate subsections of the data section. As exemplified in listing 3, the train (resp. test) subsection declares the training-set (resp., test-set), and they both allow for the indication of a file – possibly remote, possibly to be unpacked from an archive – and the data format (e.g., CSV) of the file contents. They also allow for selecting diferent samples from the same dataset file, indicating the split percentage.

Handy features from YAML. Being YAML-based, SKIlang naturally supports the use of6 anchors (&name), aliases (*name), and merge keys (<<: *name) to avoid code duplication and promote reusability. In this way, repetitive information can be declared once and reused across the script. Choosing meaningful anchor names may help in retaining the declarativeness of the code.

In listing 3, for instance, this feature is used to avoid repeating details between the training- and test-set declarations, as well as to shorten the dataset features’ declaration considerably (extensive lists of values for categorical/ordinal features must be written only once).

Background Knowledge. The knowledge section al

lows for the declaration of reusable logic predicates, which can be referenced in the constraints and learnable sections, in order to keep the constraining or structuring specifications concise and declarative, and to avoid code duplication.

As exemplified in listing 3, the knowledge section declares a set of logic predicates, indexed by their name (to avoid name clashes). Each predicate comes with a list of formal argument names (args), – which can be considered either as logic variables or as references to unknown tensors, depending on the mindset – and a clause, which is a Pythonic formula that defines when the predicate holds as a function of its arguments. Technically speaking, the body of the clause is a Python expression which should re6cf. https://archive.ph/PobLI turn a boolean value – when interpreted as a logic formula – or a scalar tensor in the range [ 0, 1 ]—when interpreted numerically.

Handy features from Python. Being Pythonic, SKI-lang

also supports the use of comprehensions to enumerate over multiple items at once. This is particularly useful to make complex rules more concise and declarative.

Consider for instance the case of the TwoPairs rule in listing 3. This rule states how to compute whether a numeric tensor representing a poker hand – namely, a vector of the form [suit1, rank1, . . . , suit5, rank5] – is a two-pairs hand. Computationally, the rule considers only the rank-related features of the input tensor (i.e., [rank1, . . . , rank5]), and all possible 2-sized combinations of them (via an ad-hoc built-in function combs); counting how many combinations are composed by equals ranks. If the count is greater than 2, then the hand is classified as a twopairs hand. Thanks to generator comprehensions, the rule specification is concise and readable, and it matches the Python implementation directly.

Weighted & Guarded Constraints. Finally, SKI-lang allows for the specification of weighted constraints, possibly marked by a guard condition which describes when the constraint should be enforced. These features are particularly useful when rules in the constraints section are not mutually exclusive, like in the case of the Poker hand classification benchmark. For instance, in that benchmark, the TwoPairs rule is not mutually exclusive with the Pair one: when the first is satisfied, the second is certainly satisfied too. When this is the case, SKI-lang allows for (cf. listing 3) the specification of (i) a guard – prefixed by the if keyword – which describes when (i.e. for which instances in the dataset) the rule prefixed by then should be enforced, and, optionally, (ii) a weight value, which is a scalar value in the range R≥0 defining the relative importance of the constraint w.r.t. to other constraints in the same section.

Of course, both guards and constraints can refer to predicates defined in the knowledge section, and can be combined with other constraints via logical operators. Furthermore, despite the particular interpretation of weights is up to the SKI algorithm being used, but they are guaranteed to be normalized w.r.t. the total sum of all weights, and they are commonly implemented as multiplicative factors when constraints are turned into penalties in loss functions.

4. Implementation Status and Roadmap

Technologically speaking, SKI-lang is a working prototype, implemented in Python 3.10, and built on top of well-known ML libraries such as PyTorch [ 21 ], SciKit-Learn [ 22 ], and Pandas [ 23 ]. The source code is available on Anonymous4Science7, for public inspectability and reproducibility.

At the current stage of development, the implementation acts as a parser for SKI-lang scripts, whose content is then exploited to automate: (i) the loading of the training and test datasets as Pandas data frames, (ii) their preprocessing (e.g., normalization, encoding, etc.), via SciKit-Learn’s 7cf. https://anonymous.4open.science/r/skilang-68AC application programming interfaces (APIs), (iii) the instantiation of PyTorch modules to represent the learnable predictors, (iv) the instantiation of PyTorch data-loaders to load the datasets in batches, (v) the configuration of PyTorch optimizers to train the aforementioned predictors, (vi) the fuzzification of symbolic knowledge into the predictors’ loss functions via PyTorch’s API, and, finally, (vii) the training of the predictors, again via PyTorch.

It is worth mentioning that the current implementation assumes that training is performed via stochastic gradient descent (SGD) optimizers, in turns relaying on batches of instances being loaded from the training set. The batching here is particularly important because logic constraints are evaluated over batches, rather than over the entire training set, which is a common practice in SKI literature making the “batch size” a crucial hyperparameter to be tuned. Limitations and Future Interventions. While the current architecture is stable, the implementation is still a work in progress, and some features are still under development, while others are already ready for use. In particular, requirements from R1 to R7 are already supported, despite with some minor limitations. Details about the current limitations and our plans to address them are following.

While R1 is fully satisfied at the syntactical level, meaning that all sorts of expressions prescribed by the requirement can be expressed in SKI-lang, the implementation currently lacks support for injecting expressions involving two or more instances at once. In fact, expressions of such sorts would require custom data-loaders to be implemented, sampling the Cartesian power of training sets—and a general (supporting instances at once, with parametric ) solution is still under development.

Requirement R3 is partially satisfied, as the current implementation only supports the injection of symbolic knowledge as constraints, while the structuring of predictors from symbolic expressions is ignored by the parser. Again, filling this gap is a work in progress, requiring ad-hoc converters from Pythonic formulae to neural structures to be implemented – similarly, to what happens in [ 17 ] – on top of PyTorch’s API, with minimal or no changes to the SKIlang syntax.

Finally, despite allowing for the customisation of learning parameters such as the learning rate, the number of epochs, etc., requirements R6 and R7 are still mostly unsupported, as the current implementation relies on a single SKI structuring algorithm, and a single fuzzification strategy. In fact, despite the framework is designed to let developers plug in new SKI algorithms and fuzzification strategies – by providing abstract APIs that implementers can extend and override – the implementation currently only ships no alternative algorithms or strategies. This is a deliberate implementation choice: we wanted to stabilise the syntax and the software architecture first, while leaving the door open for future contributions – from either the community or ourselves – to implement additional algorithms and strategies. The rationale here is straightforward: widening the coverage of SKI-lang’s supported algorithms and strategies takes time, efort, and care, hence this goal should be pursued incrementally.

Demonstrative Experiments. To demonstrate the functionality of SKI-lang in its current implementation, we report experiments on the Fair Income Prediction benchmark, as described in Sections 2.3 and 3.2.2. An overview of the results is shown in Section 4. As the reader can notice, SKIlang is efective in injecting symbolic knowledge into the predictors, leading to significant improvements in the fairness of the predictions at the expense of a slight decrease in predictive performance (both accuracy and F1-score). The phenomenon is expected, and it is known in the literature as the accuracy–fairness trade-of.

5. Conclusions

In this paper, we introduced SKI-lang, a DSL designed to make SKI practical and accessible for data scientists. We discussed its design rationale, provided concrete syntax examples, and demonstrated its applicability to well-known neuro-symbolic benchmarks. Our preliminary implementation shows that SKI-lang can efectively streamline SKI workflows and facilitate the integration of symbolic knowledge into ML pipelines.

Future work will focus on extending algorithm support, improving customisability, and further evaluating SKI-lang across diverse application domains.

Acknowledgments

This work was partially supported by PNRR – M4C2 – Investment 1.3, Partenariato Esteso PE00000013 – “FAIR—Future Artificial Intelligence Research” – Spoke 8 “Pervasive AI”.

Declaration on Generative AI

During the preparation of this work, the authors used ChatGPT and GitHub Copilot for the sake of grammar and spelling check. After using these tools/services, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content.

[1]

B. P.

Bhuyan ,

Ramdane-Cherif ,

Tomar ,

T. P.

Singh , Neuro-symbolic artificial intelligence: a survey , Neural Computing and Applications 36 ( 2024 ) 12809 - 12844 . doi: 10 .1007/s00521-024-09960-z.

[2]

Ciatto ,

Sabbatini ,

Agiollo ,

Magnini ,

Omicini , Symbolic knowledge extraction and injection with sub-symbolic predictors: A systematic literature review , ACM Computing Surveys 56 ( 2024 ) 161 : 1 - 161 : 35 . doi: 10 .1145/3645103.

[3]

Agiollo ,

Rafanelli ,

Magnini ,

Ciatto ,

Omicini , Symbolic knowledge injection meets intelligent agents: Qos metrics and experiments , Autonomous Agents and Multi-Agent Systems 37 ( 2023 ). doi:10.1007/s10458-023-09609-6.

[4]

Ajtai ,

Gurevich , Datalog vs first-order logic , Journal of Computer and System Sciences 49 ( 1994 ) 562 - 588 . doi: 10 .1016/s0022- 0000 ( 05 ) 80071 - 6 .

[5]

Körner , M. Leuschel, e. a. Barbosa, Fifty years of prolog and beyond , Theory and Practice of Logic Programming 22 ( 2022 ) 776 - 858 . doi: 10 .1017/ s1471068422000102.

[6]

H. J.

Levesque ,

R. J.

Brachman , Expressiveness and tractability in knowledge representation and reasoning , Computational Intelligence 3 ( 1987 ) 78 - 93 . doi: 10 . 1111/j.1467- 8640 . 1987 .tb00176.x.

[7] T. van Gelder , Why Distributed Representation is Inherently Non-Symbolic , Springer Berlin Heidelberg, 1990 , pp. 58 - 66 . doi: 10 .1007/978-3- 642 -76070- 9 _ 6 .

[8]

Badreddine , A. S. d'Avila Garcez , L.

Serafini , M.

Spranger , Logic tensor networks , Artif. Intell . 303 ( 2022 ) 103649 . doi: 10 .1016/J.ARTINT. 2021 . 103649 .

[9]

Sen , B. W. S. R. de Carvalho,

Riegel ,

A. G.

Gray , Neuro-symbolic inductive logic programming with logical neural networks , in: Thirty-Sixth Conference on Artificial Intelligence, AAAI, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence , IAAI , The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI Virtual Event, February 22 - March 1 , AAAI Press, 2022 , pp. 8212 - 8219 . doi: 10 .1609/AAAI.V36I8.20795.

[10]

Manhaeve ,

Dumancic ,

Kimmig ,

Demeester ,

L. D.

Raedt , Neural probabilistic logic programming in deepproblog, Artif . Intell. 298 ( 2021 ) 103504 . doi: 10 . 1016/J.ARTINT. 2021 . 103504 .

[11]

Rocktäschel ,

Riedel , End-to-end diferentiable proving , in: I. Guyon, U. von Luxburg, S. Bengio,

H. M.

Wallach ,

Fergus ,

S. V. N.

Vishwanathan , R. Garnett (Eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4- 9 , 2017 , Long Beach, CA, USA, 2017 , pp. 3788 - 3800 . URL: https://proceedings.neurips.cc/paper/2017/hash/ b2ab001909a8a6f04b51920306046ce5-Abstract.html.

[12]

Li ,

Huang ,

Naik , Scallop: A language for neurosymbolic programming , Proc. ACM Program. Lang . 7 ( 2023 ) 1463 - 1487 . doi: 10 .1145/3591280.

[13]

Shindo ,

D. S.

Dhami ,

Kersting , Neuro-symbolic forward reasoning , CoRR abs/2110 .09383 ( 2021 ). arXiv: 2110 . 09383 .

[14]

Daniele ,

Serafini , Knowledge enhanced neural networks for relational domains , in: A. Dovier , A. Montanari , A . Orlandini (Eds.), AIxIA - Advances in Artificial Intelligence - XXIst International Conference of the Italian Association for Artificial Intelligence , AIxIA, Udine, Italy, November 28 - December 2, Proceedings , volume 13796 of Lecture Notes in Computer Science, Springer, 2022 , pp. 91 - 109 . doi: 10 . 1007/978-3- 031 -27181- 6 _ 7 .

[15]

Glanois ,

Jiang ,

Feng ,

Weng ,

Zimmer ,

Li ,

Liu ,

Hao , Neuro-symbolic hierarchical rule induction , in: K. Chaudhuri,

Jegelka ,

Song ,

Szepesvári , G. Niu, S. Sabato (Eds.), International Conference on Machine Learning , ICML 2022 , 17 - 23 July , Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, PMLR , 2022 , pp. 7583 - 7615 . URL: https://proceedings.mlr.press/v162/ glanois22a.html.

[16]

Magnini ,

Ciatto ,

Omicini , A view to a KILL: Knowledge injection via lambda layer , in: A. Ferrando , V. Mascardi (Eds.), WOA 2022 - 23rd Workshop “ From Objects to Agents”, volume 3261 of CEUR Workshop Proceedings , CEUR-WS.org, Genova, Italy, 2022 , pp. 61 - 76 . URL: http://ceur-ws. org/ Vol- 3261 /paper5.pdf.

[17]

Magnini ,

Ciatto ,

Omicini , KINS : Knowledge injection via network structuring , in: R. Calegari,

Ciatto , A . Omicini (Eds.), CILC 2022 - Italian Conference on Computational Logic , volume 3204 of CEUR Workshop Proceedings , CEUR-WS.org, Bologna, Italy, 2022 , pp. 254 - 267 . URL: http://ceur-ws. org/ Vol- 3204 / paper_25.pdf.

[18]

Duan ,

Wang ,

Zhao ,

Shen ,

Zhu , Deeplogic: Joint learning of neural perception and logical reasoning , IEEE Trans. Pattern Anal. Mach. Intell . 45 ( 2023 ) 4321 - 4334 . doi: 10 .1109/TPAMI. 2022 . 3191093 .

[19]

Magnini ,

Ciatto ,

Omicini , On the design of PSyKI: A platform for symbolic knowledge injection into sub-symbolic predictors , in: D. Calvaresi , A.

Najjar , M.

Winikof , K. Främling (Eds.), Explainable and Transparent AI and Multi-Agent Systems - 4th International Workshop, EXTRAAMAS 2022, Virtual Event , May 9- 10 , 2022 , Revised Selected Papers, volume 13283 of Lecture Notes in Computer Science, Springer, 2022 , pp. 90 - 108 . doi: 10 .1007/978-3- 031 -15565- 9 _ 6 .

[20]

Dwork ,

Hardt ,

Pitassi ,

Reingold ,

R. S.

Zemel , Fairness through awareness , in: S. Goldwasser (Ed.), Innovations in Theoretical Computer Science 2012 , Cambridge, MA, USA, January 8- 10 , 2012 , ACM, 2012 , pp. 214 - 226 . doi: 10 .1145/2090236.2090255.

[21]

Paszke ,

Gross ,

Massa , et al., Pytorch: An imperative style, high-performance deep learning library , in: H. M. Wallach , H.

Larochelle , A.

Beygelzimer , F.

d'Alché-

Buc , E. B.

Fox , R. Garnett (Eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 , NeurIPS 2019 , December 8- 14 , 2019 , Vancouver, BC, Canada, 2019 , pp. 8024 - 8035 . URL: https://proceedings.neurips.cc/paper/2019/hash/ bdbca288fee7f92f2bfa9f7012727740-Abstract.html.

[22]

Pedregosa ,

Varoquaux ,

Gramfort ,

Michel ,

Thirion ,

Grisel ,

Blondel ,

Prettenhofer ,

Weiss ,

Dubourg , J. VanderPlas ,

Passos ,

Cournapeau ,

Brucher ,

Perrot , E. Duchesnay, Scikitlearn: Machine learning in python , J. Mach. Learn. Res . 12 ( 2011 ) 2825 - 2830 . doi: 10 .5555/1953048. 2078195.

[23]

McKinney , Data structures for statistical computing in python , in: S. van der Walt, J. Millman (Eds.), Proceedings of the 9th Python in Science Conference 2010 (SciPy 2010 ), Austin, Texas, June 28 - July 3, 2010 , scipy.org, 2010 , pp. 56 - 61 . doi: 10 .25080/ MAJORA-92BF1922-00A.