<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Domain-Specific Language for NeSy Focussing on Symbolic Knowledge Injection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mattia Matteini</string-name>
          <email>mattia.matteini@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Ciatto</string-name>
          <email>giovanni.ciatto@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Omicini</string-name>
          <email>andrea.omicini@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Matteo Magnini</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>In neuro-symbolic AI (NeSy), integrating symbolic languages - typically subsets of first-order logic ( FOL) -, with neural networks (NNs) serves goals like enhancing symbolic processing, extending reasoning with pattern recognition, and guiding neural learning with symbolic knowledge-a.k.a. symbolic knowledge injection (SKI). Despite its utility, FOL's expressiveness poses challenges to SKI algorithms, and its general-purpose nature complicates use for non-experts. We propose SKI-lang, a domain-specific language for SKI that balances practicality, clear semantics, and expressiveness-tractability trade-ofs. SKI-lang simplifies symbolic specification, serves as a unified interface for diverse SKI approaches, and allows for automating benchmarks from NeSy literature. We discuss the design choices behind SKI-lang and its implementation, and demonstrate its efectiveness and versatility through a few case studies.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;symbolic knowledge injection</kwd>
        <kwd>SKI-lang</kwd>
        <kwd>NeSy</kwd>
        <kwd>language</kwd>
        <kwd>Python</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In the context of neuro-symbolic AI (NeSy) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], many
approaches have been proposed to integrate (some sort) for
symbolic language – most commonly, a subset of first-order
logic (FOL) – with neural networks (NNs) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], to pursue
disparate goals, including, but not limited to: (i) speeding up
symbolic processing via neural computation, (ii) extending
symbolic reasoning with pattern-recognition capabilities, or
(iii) controlling the learning process of NNs with symbolic
knowledge.
      </p>
      <p>
        The latter goal in particular is also known into the
literature as symbolic knowledge injection (SKI) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. SKI has been
addressed by several works, proposing as many algorithms,
each one focussing on a diferent subset of FOL, ranging
from propositional logic to Datalog [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and beyond—up to
the full power of FOL itself.
      </p>
      <p>
        However, the very choice of the FOL syntax (and its
subsets) as the target symbolic language has never been
questioned, despite posing several challenges to SKI algorithms,
because of its expressiveness. To complicate the matter,
we observe that writing a symbolic specification to be
injected via general-purpose and expressive languages like
FOL, Prolog [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], or Datalog, may often be cumbersome for
non-experts in symbolic reasoning. In fact, a modelling
efort is required to translate the domain knowledge into
the target symbolic language, and we argue that such
modelling efort could be reduced by using a domain-specific
language (DSL).
      </p>
      <p>
        Accordingly, in this paper, we propose SKI-lang, a
domainspecific language for SKI, which aims at being practical for
non-experts in symbolic reasoning, while still retaining a
clear semantics and a good expressiveness–tractability
tradeof [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In particular, we discuss the engineering choices
behind the design of SKI-lang, and we show how it can
act as a common interface for diferent benchmarks from
the NeSy literature. Most notably, the main goal of this
paper is to motivate the need for an ad-hoc language for
SKI – complementary to FOL – and to propose one
particular syntactical reification for such a language, tailored on
the current state of practice in machine learning (ML) and
NeSy. The implementation is preliminary, and we do not
claim completeness or generality. Instead, the paper reports
about our proof-of-concept implementation of SKI-lang, and
provides a roadmap for future work.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>NeSy has emerged as a significant area of research within
artificial intelligence ( AI), aiming to integrate symbolic
reasoning with NNs learning to leverage the strengths of both
symbolic and connectionist approaches. Typically,
symbolic – most commonly, logic – languages are integrated
to enhance NNs by enabling structured reasoning,
interpretability, and explicit knowledge representation.</p>
      <p>
        Information representation within these systems
often combines localist (symbol-based) and distributed
(subsymbolic/neural-based) approaches [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], providing flexible
and efective knowledge representation and data
processing capabilities. There, training processes integrate
neuralbased inductive learning from data with symbolic reasoning
approaches, allowing systems to learn efectively even from
smaller datasets due to explicit symbolic knowledge
representation.
      </p>
      <p>
        From symbolic AI, NeSy approaches may inherit various
reasoning capabilities, such as: deductive, inductive,
abductive, common-sense, and combinatorial reasoning [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Yet,
decision-making processes in these systems combine
intuitive, heuristic neural processing with deliberate symbolic
reasoning, closely mimicking human cognitive patterns.
      </p>
      <p>Lastly, logic utilised in NeSy ranges from propositional to
higher-order logic, ofering extensive capabilities for
knowledge representation and reasoning by embedding logical
frameworks directly within neural architectures, thus
enabling complex logical inferences.</p>
      <sec id="sec-2-1">
        <title>2.1. Symbolic knowledge injection (SKI)</title>
        <p>
          SKI refers to the injection of symbolic knowledge –
expressed in formal logic – into sub-symbolic predictors like
NNs. As defined by [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], SKI is “any algorithmic procedure
affecting how sub-symbolic predictors draw their inferences
in such a way that predictions are either computed as a
function of, or made consistent with, some given symbolic
knowledge”.
        </p>
        <p>
          SKI aims to improve model interpretability, robustness,
and controllability by incorporating structured,
humanintelligible prior knowledge into the learning process. Three
major SKI strategies are recognised in the literature:
structuring, guided learning, and embedding [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Structuring:
the architecture of the predictor is built or modified to
relfect the symbolic knowledge structure, e.g., via encoding
rules directly as modules in a NN. Guided learning (a.k.a.
Constraining): symbolic constraints are added to the loss
function, guiding learning via soft penalties or hard
constraints. Embedding: symbolic knowledge is converted into
continuous representations that are fed into the predictor
as part of the input.
        </p>
        <p>
          SKI methods vary widely with respect to the kind of logic
language they support. These range from propositional
logic, to FOL (used in logic tensor networks (LTN) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and
logic neural networks (LNN) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]), to probabilistic or Horn
logics (used in DeepProbLog [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], NTP [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]), to Datalog (as
in Scallop [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]).
        </p>
        <p>
          Related works. Neural theorem proving (NTP) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]
implements a diferentiable version of backward chaining
using Horn logic. Variables are grounded via soft unification
in embedding space, blending guided learning with
embedding strategies. DeepProbLog [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] extends probabilistic
Prolog with neural predicates and performs SKI via
structuring. The logic program controls the inference pipeline, and
variables are grounded via probabilistic backward chaining
over dataset constants. Neuro-symbolic forward chaining
(NSFR) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] performs forward-chaining symbolic reasoning
over probabilistic ground atoms. It structures symbolic
inference within neural computation, and performs grounding by
enumerating atoms and chaining over them. LTN [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] uses
fuzzy FOL and injects rules via diferentiable real-valued
semantics that act as soft constraints in the loss function of
NNs which are structured to reflect the symbolic knowledge.
Therefore, this method combines the ‘constraining’ and
‘structuring’ strategies. Knowledge-enhanced neural
networks (KENN) [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] works in propositional logic and injects
knowledge through a knowledge enhancement layer added
to a neural classifier. Rules are translated into diferentiable
adjustments applied post-hoc to network outputs, realising
guided learning. LNN [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] encodes weighted FOL directly in
network structure. Symbolic formulas are embedded into
the network via confidence scores and diferentiable logic
gates. This combines structuring with constrained learning.
Hierarchical rule induction (HRI) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] learns logic programs
from data using meta-rules, combining inductive learning
with diferentiable logic. Grounding is achieved via
substitution over datasets and similarity in neural embedding space.
Knowledge Injection via Lambda Layer (KILL) [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]
regularises NN training with symbolic knowledge in stratified
Datalog with negation. Knowledge Injection via Network
Structuring (KINS) [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] injects logic formulas, expressed
in stratified Datalog with negation, into NN by
structuring additional layers that mimic the symbolic knowledge.
Scallop [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], supports Datalog-style programs over tensors,
ofering diferentiable symbolic reasoning. Grounding is
done via batched comprehension over input data, and
symbolic rules are enforced structurally. DeepLogic [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] uses
structured neural logic operators over tree-based FOL
expressions. It applies structuring and guided learning to learn
logical forms jointly with perception.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. About SKI Languages</title>
        <p>In the realm of NeSy, the interplay between symbolic logic
languages and NN architectures introduces crucial
considerations around expressivity, computational tractability, and
usability—particularly within ML workflows.</p>
        <p>This discussion aims to reflect on these issues,
emphasising the challenges posed by diferent logic languages to SKI,
the various solutions adopted by existing methods, and
ultimately addressing the question of how these complexities
relate practically to the everyday tasks of data scientists and
ML practitioners.</p>
        <sec id="sec-2-2-1">
          <title>Expressivity vs. Tractability: A Spectrum. Symbolic</title>
          <p>languages vary significantly in expressivity and
computational complexity, fundamentally influencing their usability
and suitability in ML contexts.</p>
          <p>At the lower end of the expressivity spectrum lies
propositional logic, which provides simplicity and computational
tractability. It allows for straightforward
symbolic-to-subsymbolic translation and integration into ML workflows, as
seen in early and simpler approaches. However,
propositional logic is limited to representing flat, atomic statements
about domain entities and lacks the capability to generalise
across instances using variables or quantifiers, limiting its
practical utility in more complex ML applications.</p>
          <p>Conversely, FOL stands at the upper end of the
expressivity spectrum, empowered by variables, quantifiers,
unification, and logical inference mechanisms such as resolution.
This language allows for compact, intentional, and relational
representations of knowledge, which are powerful within
symbolic reasoning frameworks. Nonetheless, when
integrating FOL into neural models – where datasets form strict
subsets of the Herbrand universe – complexities emerge
due to grounding (instantiation of variables), interpretation,
and computational overhead.</p>
          <p>
            In particular, FOL’s expressive nature demands significant
computational resources and sophisticated tricks to
efectively use it for SKI. Common methods include: soft
grounding, as utilised in diferentiable neural logic frameworks
(e.g., LTN [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]); probabilistic backward chaining, exemplified
by systems like DeepProbLog [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]; and embedding-based
grounding, such as the soft unification in NTP [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]. These
methods essentially mitigate the complexity by
translating symbolic structures into computationally manageable,
diferentiable forms.
          </p>
          <p>
            Intermediate languages such as Horn clauses and
Datalog – including their stratified and negation-free variants –,
strike a middle ground [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ]. They simplify symbolic
formulations through syntactical constraints, enhancing
computational tractability. However, they still present challenges,
particularly when recursive definitions are involved, as
recursion may lead to non-terminating grounding processes—
unless managed through smart techniques like: batched
grounding, as in Scallop’s (cf. [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]) implementation for
Datalog programs; or forward chaining with probabilistic atoms,
such as in NSFR [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ]. These “tricks” prevent infinite
computations by limiting recursive expansions or bounding
inference procedures through practical heuristics.
Practical Usability. From a purely symbolic
perspective, these technical challenges ofer deep theoretical
interest, particularly around the foundational study of
neurosymbolic integration. Nevertheless, the critical question
remains: are such complexities genuinely necessary for
practical ML tasks?
          </p>
          <p>Practically speaking, data scientists typically resort to
SKI when raw data alone is insuficient for training robust
ML models. This insuficiency arises due to data scarcity,
uneven data distribution, or dataset bias—scenarios common
in real-world applications such as healthcare diagnostics,
fairness-aware AI, or structured data interpretation.</p>
          <p>In these cases, symbolic knowledge becomes a
complementary resource, enhancing model performance through
structured constraints and prior domain knowledge.
Specifically, symbols in these contexts are not referencing abstract
entities but rather direct representations of data instances,
their features, and relational knowledge explicitly linked to
dataset columns and rows.</p>
          <p>Along this line, SKI practitioners may just need a
language that allows them to express symbolic constraints or
relations over the domain of the dataset at hand, simplifying
the expression of declarative statements which involve the
dataset’s instances (and their components), and features—
rather than arbitrary Herbrand terms.</p>
          <p>Moreover, as the (i) specification of the knowledge to be
injected and (ii) the hyperparameters of the learning and
(iii) injections processes are deeply intertwined, with a lot
of back-and-forth between the two, it is crucial to provide
SKI practitioners with a unified language to express both
symbolic knowledge and SKI/ML workflows. This would
represent a significant advantage in terms of usability and
experimental setup time.</p>
          <p>To address these concerns, in the next sections, we
present SKI-lang, and its design rationale, and we attempt
to demonstrate its efectiveness and usability in ordinary
ML tasks where SKI may apply.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Running Examples and Benchmarks</title>
        <p>In the remainder of this paper, we rely on running examples
taken from three distinct SKI benchmarks, tailored onto as
many application domains, namely: handwritten digit
recognition, fairness-aware income prediction, and SKI-enhanced
Poker hand classification. Each benchmark demonstrates
a diferent way to exploit SKI and lets us showcase some
feature of SKI-lang in real-world scenarios.</p>
        <p>The benchmarks difer in data format, learning
objective, and symbolic knowledge to be injected. We present
them here as they will be referenced multiple times in the
following section.</p>
        <p>Sum of MNIST Digits. This benchmark is based on the
well-known MNIST dataset1 of handwritten digits. The
dataset consists of 70,000 gray-scale 28 × 28 pixels images
labelled with one of 10 digit classes. The goal is to train a
neural classifier that predicts the digit class of an image, but
with an additional symbolic constraint: when images are
grouped in pairs, the sum of the true classes of each pair
must equal the sum of their predicted classes.</p>
        <p>The symbolic constraint thus involves a global
consistency condition across two independent input instances.
This setting highlights a key foundational challenge for SKI,
namely how to enforce relational constraints across multiple
inputs, especially when symbolic information is not local to
a single instance.</p>
        <p>Fair Income Prediction. Based on the Adult (a.k.a.,
Census Income) dataset2, this benchmark addresses the problem
of learning a binary income predictor (above or below $50k)
from demographic and employment data. The dataset
contains 48,842 tabular records with 14 features including age,
education, occupation, and race. The learning task is
binary classification over the income field.</p>
        <p>
          The symbolic knowledge to be injected encodes a fairness
constraint: the predicted income should be independent of
the sensitive attribute race. This is commonly formalised
as a statistical parity requirement (a.k.a., demographic
parity, cf., [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]). Briefly, statistical parity is a fairness metric
that measures the diference between the probabilities that
individuals from privileged and unprivileged groups receive
a favorable outcome. The more the value is close to zero,
the more the predictor is considered fair.
        </p>
        <p>From a foundational SKI perspective, this example
highlights the challenge of injecting distributional constraints—
not over individual predictions but over group-level
statistics. The benchmark is also interesting because it combines
numerical and categorical features, making it a test case for
symbolic reasoning over mixed-type structured data.
SKI-enhanced Poker-Hand Classification. This
benchmark involves training a sub-symbolic classifier over a
dataset of poker hands3. Each data instance encodes 5
playing cards through 10 attributes (5 suits and 5 ranks),
and is assigned one of 10 class labels corresponding to the
type of hand (e.g., pair, flush, full house, etc.). The dataset
is highly imbalanced because some poker hands are much
rarer than others.</p>
        <p>The available symbolic knowledge consists of a rich set
of crisp logic rules that fully characterise each class. This
makes the benchmark suitable for stress-testing SKI under
conditions where symbolic information is both precise and
essential, due to the extreme class imbalance and low data
coverage. From a foundational standpoint, this example
poses challenges in terms of combining rule-based logic
(e.g., multiple conditions with dependencies and precedence)
with neural learning, and allows research on prioritised rule
injection and expressiveness–tractability trade-ofs.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. A Practical Language for SKI</title>
      <p>Here we introduce SKI-lang, a DSL for NeSy that is
specifically designed to make SKI practical for data scientists.
SKI-lang is a declarative language that allows users to
express symbolic knowledge in a way that is both intuitive and
concise, tailoring that knowledge to the data-related task
at hand – for which a dataset is supposed to be available –,
and training ML predictors accordingly—in such a way that,
at the end of training, they comply with the aforementioned
symbolic knowledge expressed in SKI-lang.
2‘Adult’ Dataset on UCI: https://doi.org/10.24432/C5XW20
3‘Poker Hand’ Dataset on UCI: https://doi.org/10.24432/C5KW38
Accordingly, in this section, we first discuss the abstract
design criteria that guided our design of SKI-lang, and then
we provide a brief overview of its concrete syntax and the
intended semantics.</p>
      <sec id="sec-3-1">
        <title>3.1. Design Criteria</title>
        <p>SKI-lang is designed to serve the purposes of a data
scientist working on some supervised ML task of interest, for
which a dataset is available via a clear schema, as well as
some background knowledge that may be worth keeping into
account when training ML predictors for the task.</p>
        <p>For the sake of simplicity – yet without loss of generality
–, we describe the dataset as a table-like structure, where
each row represents an instance of the domain at hand,
and each column represents a feature of the dataset. We
assume that features come with mnemonic names, whereas
instances are represented by their row number. Finally, we
assume that one feature is marked as the target feature w.r.t.
the supervised ML task at hand.</p>
        <p>Under these assumptions, the core goals of SKI-lang are
to provide the data scientist with (G1) a convenient,
concise, and expressive syntax for expressing their background
knowledge; as well as (G2) a declarative syntax for
specifying the ML workflow – including the dataset schema, the
predictors to be trained, and their hyperparameters –, in
such a way that SKI-lang is the only entry point for any
SKI-enhanced ML pipeline.</p>
        <p>To address these goals, we design SKI-lang to satisfy the
following requirements, enumerated by Ri. The discussion
is deliberately abstract: please refer to section 3.2 for
concrete syntactical examples.</p>
        <p>Expressing the knowledge (G1). Firstly, and most
importantly, (R1) SKI-lang should allow expressing
knowledge about the dataset in symbolic form. More specifically,
it should be possible to express declarative statements
involving: (i) references to one or more instances from the
dataset, (ii) references to one or more features from the
dataset, (iii) references to one or more features of the same
instance (iv) arbitrary constants; (v) named logic predicate
definitions over the items above; (vi) any algebraic or logical
combination of the items above. Such statements constitute
the symbolic knowledge specification to be injected.</p>
        <p>Furthermore, to simplify the specification of common
logic statements, (R2) SKI-lang should support the import
and usage of built-in functions and predicates, aimed at
keeping the symbolic specification concise and declarative.</p>
        <p>Finally, (R3) the language should be agnostic w.r.t. the
particular sort of SKI approach and algorithm being used.
In other words, SKI-lang should be able to express symbolic
knowledge in a way that is independent to how it will be
injected. In practice, this means that SKI-lang should allow
for (i) structuring the architecture of the predictor out of the
symbolic knowledge, (ii) constraining the loss function of
the predictor with symbolic knowledge, (iii) embedding the
symbolic knowledge into vectors that are fed into the
predictor, or (iv) any combination of the above; while (v) requiring
minimal or no changes to the specification.</p>
        <p>Declaring the workflow (G2). To account for the
declaration of end-to-end SKI workflows where all relevant
aspects of the process are specified in a single place,
SKIlang should also support (R4) the declaration or import
of the ML predictor(s) subject to SKI, and of the (R5) the
dataset and data-schema to be used for training and testing
the predictors. Similarly, it should support (R6) the selection
of the SKI algorithm to be used, and (R7) the customisation
of any aspect related to the SKI-aware ML pipeline.</p>
        <p>More precisely, requirement R4 prescribes that the
predictor undergoing SKI – be it a predictor to be loaded from
a file, or a new one to be trained from scratch –, should be
declared in SKI-lang. Declarations should specify any
modelling aspect, there including: the predictor family of choice
(e.g.,NNs, decision trees, etc.), its actual hyperparameters
(e.g., number of layers, the number of neurons per layer,
the activation functions, etc.), and its mapping to symbolic
predicates. The latter, in particular, aims at declaring the
interface (name + arity) the ML predictor is ofering to the
symbolic realm. This would allow SKI-lang users to
reference the predictor in the symbolic knowledge specification,
as if it were a logic symbol4.</p>
        <p>Similarly, requirement R5 prescribes that the dataset(s)
being used for training and testing the predictors – as well
as the schema of the data therein contained –, should be
declared in SKI-lang too. This includes the dataset’s name,
the names of the features, the classification of features as
target or non-target, and the data type of each feature—aside
from the actual URLs or paths to the dataset(s) files.</p>
        <p>Finally, requirement R6 prescribes that SKI-lang should
let the user select the SKI algorithm to be used for
injecting the symbolic knowledge into the ML predictors. This
implies that a few more facilities should be available to
SKIlang users, namely: (i) some syntactical construct to select
the SKI algorithm to adopt, and (ii) multiple, ad-hoc parsers
for adapting SKI-lang’s syntax as many SKI algorithms.
Requirement R7 complements such customisability by
allowing the user to customize details such as: the fuzzification
and grounding strategies, the learning rate, the number
of epochs, the random seeds, etc.—possibly including safe
defaults.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Syntax By Examples</title>
        <p>SKI-lang adopts a YAML-like syntax as its foundational
design choice. YAML5 is a popular and intuitive configuration
language, widely adopted in the data science and software
engineering communities for its clean readability and
shallow nesting. Its ability to support explicit sectioning,
hierarchical definitions, as well as anchors and references makes
it ideal for the kind of structured yet flexible specification
required in neuro-symbolic workflows. Moreover, the
existence of robust and mature parsing libraries across several
programming languages ensures seamless integration of
SKI-lang into modern ML programming frameworks.</p>
        <p>Each SKI-lang script is a YAML file composed of five
primary sections: data, optimization, learnables,
knowledge, and constraints. The data section declares
the dataset(s) to be used and defines their schema, thereby
addressing requirement R5. The optimization section
specifies all tunable hyperparameters and SKI-related
conifguration options, addressing requirements R6 and R7. The
learnables section declares the structure, I/O types, and
hyperparameters of the learnable sub-symbolic predictors
to be trained, covering requirements R4 and R3 (as far as
4For instance, a binary classifier may be mapped onto a logic unary
predicate, where the predicate’s name is the name of the predicted
class.
5cf. https://yaml.org
Listing 1: SKI-lang example: Sum of MNIST Digits
benchmark
structuring is concerned). The constraints section
encodes declarative statements that represent symbolic
knowledge to be injected as constraints—hence addressing
requirements R1, R2, and R3 (as far as constraining is concerned).
Finally, the knowledge section allows users to declare
auxiliary symbolic definitions, reusable logic predicates, and
domain knowledge to be referenced in the aforementioned
sections—thus supporting requirements R1, R2, and R3 (as
far as embedding is concerned).</p>
        <p>Below, we explain the intended purpose of each section,
as well as the key features of SKI-lang’s syntax, via a few
incremental examples tailored on the benchmarks from
section 2.3.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. MNIST Example</title>
          <p>Here we present a minimal example of SKI-lang applied to
the ‘sum of MNIST digits’ benchmark, where a single rule
involving pairs of MNIST digits is injected via constraining.
Refer to listing 1 for the features described here.
Pythonic formulas. Symbolic formulas appearing in the
constraints, knowledge, and learnables sections are
expressed using a compact, Python-like syntax. These
formulas consist of algebraic and logical expressions over
variables and constants, with symbols either implicitly or
explicitly declared in the data section to ensure consistency
with the dataset schema. In this way, SKI-lang allows logic
constraints to refer directly to instance-level values,
features, or model predictions, using intuitive dot-notation and
functional application.</p>
          <p>For example, to express the constraint that, for any
pair of MNIST digits x1 and x2, the sum of the
predicted classes must equal the sum of the
groundtruth classes, one can write a formula as simple as:
digit(x1)+digit(x2) == x1.value+x2.value
Unpacking the minimal example from listing 1, we can observe
several key elements of SKI-lang in action, and understand
how the formula above is interpreted.</p>
          <p>The constraints section can be filled with a list of
formulas, each one expressing a symbolic constraint to be
injected. Each constraint is expressed in a natural and
concise manner, with a Pythonic syntax which is familiar to
most data scientists. Constraints should be prefixed by a
keyword specifying their applicability scope (e.g. always)
over the declared instance variables (e.g. x1 and x2), which
are introduced in the data section as independent draws
from the dataset(s) therein declared. In the particular case of
listing 1, the always keyword indicates that the constraint
should be re-evaluated for every pair of instances x1 and
x2 drawn from the dataset.</p>
          <p>The rest of the data section declares the structure of the
dataset at hand—i.e., what features and targets it contains,
and of what types. In the MNIST case, each instance includes
an image feature (represented as a 28×28 tensor, i.e., a
grayscale picture depicting a handwritten digit), and a target
feature value ranging from 0 to 9 (representing the digit
class). Hence, expressions like x1.value and x2.value
are evaluated as the ground-truth class labels of x1 and x2.
Learnables as the link between realms. The
learnables section hosts named declarations for
trainable ML predictors, and their hyperparameters, possibly
expressed in terms of the dataset schema. The MNIST
example from listing 1 introduces a model named digit,
i.e.: a neural classifier aimed to predict the class of a digit,
given its image. The declaration includes the model’s
name, along with an architectural specification (e.g., two
layers with ReLU activations and a softmax output layer).
Notice that the input layer is not explicitly declared, as it is
automatically inferred from the dataset schema, while the
output layer must be explicitly declared with an activation
function which is adequate for the task at hand (here:
softmax for multi-class classification).</p>
          <p>Importantly, in SKI-lang, once a predictor is named (here:
digit), it becomes callable as a logic function in symbolic
expressions. Thus, digit(x) represents the predicted class
of instance x1, and the whole constraint can be interpreted
as a symbolic equality between predicted and ground-truth
sums.</p>
          <p>Speaking of learnables, it is worth focussing on the
inputs (resp. outputs) subsection: this is where the
model’s input and output types are declared, hence allowing
for computing the shapes of its input and output layers. Such
declarations may reference the dataset’s attribute names,
as defined in the data section, using the dataset’s name
as a global variable, and feature names as attributes (e.g.,
MNIST.features). If transformations are needed – such
as one-hot encoding (OHE) for categorical features – these
should be declared here too, as they must be kept into
account when shaping the model’s structure, yet they should
be transparent to the symbolic knowledge specification—
meaning that symbolic formulas would keep referring to
the original features instead of the encoded ones.
Multiple instances at a time. Most notably, SKI-lang
assumes that the specific variable names being used to refer
to instances of a dataset are declared in the data section
too, explicitly, under the instances sub-sub-section (cf. x1
and x2 in listing 1). This is not just a readability feature, but
rather a crucial design choice that allows SKI-lang to declare
when a SKI process is considering to multiple instances at a
time.</p>
          <p>In fact, there could be scenarios where the symbolic
knowledge to be injected involves multiple instances at
once. The simplest example is the ‘sum of MNIST digits’,
where the formula to inject considers two digits at a time,
and it subtends a universal quantification over all pairs of
instances. Generalizing on this point, SKI-lang allows for
the declaration of multiple instance variables, hence
allowing for the injection of rules that involve (at maximum) all
of them at once.</p>
          <p>Declaring the multiple instances explicitly, in turns,
enables SKI-lang parsers to configure data-loaders and
batching strategies accordingly, ensuring that the target amount
of instances are loaded altogether during training, injection,
and inference.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Census Income Example</title>
          <p>Here we present a minimal example of SKI-lang applied to
the ‘fair income prediction’ benchmark, where a
columnwise fairness constraint is injected via constraining into an
ordinary binary classifier trained via supervised learning.
We focus only on new syntactical aspects which are not
already covered by the MNIST example. Refer to listing 2
for the features described here.</p>
          <p>Column-wise expressions. Let us consider the case
where fairness is computed by means of the statistical parity
criterion, which requires that the predicted income is
independent of some sensitive attribute (say, race). To compute
statistical parity, one needs to compare the distribution of
the predicted income across diferent values of the race
column—a dataset-wise operation, be it the training- or
testset, or just a batch.</p>
          <p>In SKI-lang, column-wise expressions are supported by
the &lt;dataset&gt;.&lt;column&gt; syntax, where &lt;dataset&gt; is
the name of the dataset declared in the data section, and
&lt;column&gt; is the name of some column as declared in the
same section. Expressions of this form are evaluated as
column tensors, allowing the application of tensor
operations across the entire column. Hence, assuming that a
built-in function SP is available to compute statistical
parity among two column-tensors, the fairness constraint can
be expressed as in listing 2. Another hidden assumption
in there is that applying the learnable function over50k
to a multidimensional tensor containing the training
instances input features (e.g., adult.features) would yield
a column-tensor containing the predicted income for each
instance.</p>
          <p>In this example, there are then two constraints being
declared for injection: an ordinary supervision constraint
(i.e., the predicted income should match the expected one)
to be computed instance-wise, and a fairness constraint (i.e.,
the statistical parity between the predicted income across
races should be below threshold) to be computed
columnwise—hence, globally, i.e., once per dataset.</p>
          <p>Optimization parameters. A common trick to
implement column-wise constraints during
gradient-descentbased training processes is to compute those constraints
over the entire batch—leading to wider batches to be
preferable. To account for this and other similar cases, SKI-lang
supports the specification of custom optimization parameters
in the outer optimization section of the YAML
configuration. In listing 2, for instance, the batch_size parameter
is set to 256.</p>
          <p>In general, other optimization-related parameters must
be specified here, such as: (i) the number of training epochs,
(ii) the learning rate, (iii) the random seed, (iv) the optimizer
and (v) the injection algorithm to be used, etc.</p>
          <p>Built-in functions. To simplify the specification of
common symbolic constraints, SKI-lang supports referencing
built-in functions. This is the case, for instance, of the SP
function in listing 2, which computes the statistical parity
between two column tensors. More generally, these are
ordinary Python functions involving tensors as arguments,
and returning tensors as results. These can be provided
as built-in symbols upon calling the SKI-lang parser, and
allow for a more concise specification of common symbolic
constraints.</p>
          <p>Despite their simplicity, the possibility to plug additional
functions to simplify the expression of symbolic logic is a
key engineering feature of our approach. This is where
SKIlang becomes the sub-stratum upon which SKI algorithms
engineers can build re-usable functions for expressing
symbolic knowledge to be reused.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. Poker Hand Example</title>
          <p>Here we present a minimal example of SKI-lang applied to
the ‘poker hand classification’ benchmark, where the
learning process may greatly benefit from symbolic knowledge,
which is in turn quite complex to express. We focus only
on new syntactical aspects which are not already covered
by previous examples. Refer to listing 3 for the features
described here.</p>
          <p>Test-set separation. SKI-lang naturally supports the
separation of training- and test-sets, by allowing the user to
declare them in separate subsections of the data section.
As exemplified in listing 3, the train (resp. test)
subsection declares the training-set (resp., test-set), and they both
allow for the indication of a file – possibly remote,
possibly to be unpacked from an archive – and the data format
(e.g., CSV) of the file contents. They also allow for selecting
diferent samples from the same dataset file, indicating the
split percentage.</p>
          <p>Handy features from YAML. Being YAML-based,
SKIlang naturally supports the use of6 anchors (&amp;name), aliases
(*name), and merge keys (&lt;&lt;: *name) to avoid code
duplication and promote reusability. In this way, repetitive
information can be declared once and reused across the
script. Choosing meaningful anchor names may help in
retaining the declarativeness of the code.</p>
          <p>In listing 3, for instance, this feature is used to avoid
repeating details between the training- and test-set
declarations, as well as to shorten the dataset features’ declaration
considerably (extensive lists of values for categorical/ordinal
features must be written only once).</p>
        </sec>
        <sec id="sec-3-2-4">
          <title>Background Knowledge. The knowledge section al</title>
          <p>lows for the declaration of reusable logic predicates, which
can be referenced in the constraints and learnable
sections, in order to keep the constraining or structuring
specifications concise and declarative, and to avoid code
duplication.</p>
          <p>
            As exemplified in listing 3, the knowledge section
declares a set of logic predicates, indexed by their name (to
avoid name clashes). Each predicate comes with a list of
formal argument names (args), – which can be considered
either as logic variables or as references to unknown
tensors, depending on the mindset – and a clause, which is
a Pythonic formula that defines when the predicate holds
as a function of its arguments. Technically speaking, the
body of the clause is a Python expression which should
re6cf. https://archive.ph/PobLI
turn a boolean value – when interpreted as a logic formula
– or a scalar tensor in the range [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ]—when interpreted
numerically.
          </p>
        </sec>
        <sec id="sec-3-2-5">
          <title>Handy features from Python. Being Pythonic, SKI-lang</title>
          <p>also supports the use of comprehensions to enumerate over
multiple items at once. This is particularly useful to make
complex rules more concise and declarative.</p>
          <p>Consider for instance the case of the TwoPairs rule in
listing 3. This rule states how to compute whether a numeric
tensor representing a poker hand – namely, a vector of the
form [suit1, rank1, . . . , suit5, rank5] – is a
two-pairs hand. Computationally, the rule considers only
the rank-related features of the input tensor (i.e., [rank1,
. . . , rank5]), and all possible 2-sized combinations of
them (via an ad-hoc built-in function combs); counting how
many combinations are composed by equals ranks. If the
count is greater than 2, then the hand is classified as a
twopairs hand. Thanks to generator comprehensions, the rule
specification is concise and readable, and it matches the
Python implementation directly.</p>
          <p>Weighted &amp; Guarded Constraints. Finally, SKI-lang
allows for the specification of weighted constraints,
possibly marked by a guard condition which describes when the
constraint should be enforced. These features are
particularly useful when rules in the constraints section are
not mutually exclusive, like in the case of the Poker hand
classification benchmark. For instance, in that benchmark,
the TwoPairs rule is not mutually exclusive with the Pair
one: when the first is satisfied, the second is certainly
satisfied too. When this is the case, SKI-lang allows for (cf.
listing 3) the specification of (i) a guard – prefixed by the if
keyword – which describes when (i.e. for which instances
in the dataset) the rule prefixed by then should be enforced,
and, optionally, (ii) a weight value, which is a scalar value
in the range R≥0 defining the relative importance of the
constraint w.r.t. to other constraints in the same section.</p>
          <p>Of course, both guards and constraints can refer to
predicates defined in the knowledge section, and can be
combined with other constraints via logical operators.
Furthermore, despite the particular interpretation of weights is up
to the SKI algorithm being used, but they are guaranteed to
be normalized w.r.t. the total sum of all weights, and they
are commonly implemented as multiplicative factors when
constraints are turned into penalties in loss functions.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Implementation Status and</title>
    </sec>
    <sec id="sec-5">
      <title>Roadmap</title>
      <p>
        Technologically speaking, SKI-lang is a working prototype,
implemented in Python 3.10, and built on top of well-known
ML libraries such as PyTorch [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], SciKit-Learn [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ],
and Pandas [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. The source code is available on
Anonymous4Science7, for public inspectability and reproducibility.
      </p>
      <p>At the current stage of development, the implementation
acts as a parser for SKI-lang scripts, whose content is then
exploited to automate: (i) the loading of the training and
test datasets as Pandas data frames, (ii) their preprocessing
(e.g., normalization, encoding, etc.), via SciKit-Learn’s
7cf. https://anonymous.4open.science/r/skilang-68AC
application programming interfaces (APIs), (iii) the
instantiation of PyTorch modules to represent the learnable
predictors, (iv) the instantiation of PyTorch data-loaders to load
the datasets in batches, (v) the configuration of PyTorch
optimizers to train the aforementioned predictors, (vi) the
fuzzification of symbolic knowledge into the predictors’ loss
functions via PyTorch’s API, and, finally, (vii) the training
of the predictors, again via PyTorch.</p>
      <p>It is worth mentioning that the current implementation
assumes that training is performed via stochastic gradient
descent (SGD) optimizers, in turns relaying on batches of
instances being loaded from the training set. The batching
here is particularly important because logic constraints are
evaluated over batches, rather than over the entire training
set, which is a common practice in SKI literature making
the “batch size” a crucial hyperparameter to be tuned.
Limitations and Future Interventions. While the
current architecture is stable, the implementation is still a work
in progress, and some features are still under development,
while others are already ready for use. In particular,
requirements from R1 to R7 are already supported, despite with
some minor limitations. Details about the current
limitations and our plans to address them are following.</p>
      <p>While R1 is fully satisfied at the syntactical level, meaning
that all sorts of expressions prescribed by the requirement
can be expressed in SKI-lang, the implementation currently
lacks support for injecting expressions involving two or
more instances at once. In fact, expressions of such sorts
would require custom data-loaders to be implemented,
sampling the Cartesian power of training sets—and a general
(supporting  instances at once, with parametric  )
solution is still under development.</p>
      <p>
        Requirement R3 is partially satisfied, as the current
implementation only supports the injection of symbolic
knowledge as constraints, while the structuring of predictors from
symbolic expressions is ignored by the parser. Again, filling
this gap is a work in progress, requiring ad-hoc converters
from Pythonic formulae to neural structures to be
implemented – similarly, to what happens in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] – on top of
PyTorch’s API, with minimal or no changes to the
SKIlang syntax.
      </p>
      <p>Finally, despite allowing for the customisation of
learning parameters such as the learning rate, the number of
epochs, etc., requirements R6 and R7 are still mostly
unsupported, as the current implementation relies on a single SKI
structuring algorithm, and a single fuzzification strategy. In
fact, despite the framework is designed to let developers
plug in new SKI algorithms and fuzzification strategies –
by providing abstract APIs that implementers can extend
and override – the implementation currently only ships no
alternative algorithms or strategies. This is a deliberate
implementation choice: we wanted to stabilise the syntax and
the software architecture first, while leaving the door open
for future contributions – from either the community or
ourselves – to implement additional algorithms and strategies.
The rationale here is straightforward: widening the
coverage of SKI-lang’s supported algorithms and strategies takes
time, efort, and care, hence this goal should be pursued
incrementally.</p>
      <p>Demonstrative Experiments. To demonstrate the
functionality of SKI-lang in its current implementation, we
report experiments on the Fair Income Prediction benchmark,
as described in Sections 2.3 and 3.2.2. An overview of the
results is shown in Section 4. As the reader can notice,
SKIlang is efective in injecting symbolic knowledge into the
predictors, leading to significant improvements in the
fairness of the predictions at the expense of a slight decrease in
predictive performance (both accuracy and F1-score). The
phenomenon is expected, and it is known in the literature
as the accuracy–fairness trade-of.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusions</title>
      <p>In this paper, we introduced SKI-lang, a DSL designed to
make SKI practical and accessible for data scientists. We
discussed its design rationale, provided concrete syntax
examples, and demonstrated its applicability to well-known
neuro-symbolic benchmarks. Our preliminary
implementation shows that SKI-lang can efectively streamline SKI
workflows and facilitate the integration of symbolic
knowledge into ML pipelines.</p>
      <p>Future work will focus on extending algorithm support,
improving customisability, and further evaluating SKI-lang
across diverse application domains.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was partially supported by PNRR – M4C2
– Investment 1.3, Partenariato Esteso PE00000013 –
“FAIR—Future Artificial Intelligence Research” – Spoke 8
“Pervasive AI”.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used
ChatGPT and GitHub Copilot for the sake of grammar and
spelling check. After using these tools/services, the authors
reviewed and edited the content as needed and take full
responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Bhuyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramdane-Cherif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tomar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. P.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Neuro-symbolic artificial intelligence: a survey</article-title>
          ,
          <source>Neural Computing and Applications</source>
          <volume>36</volume>
          (
          <year>2024</year>
          )
          <fpage>12809</fpage>
          -
          <lpage>12844</lpage>
          . doi:
          <volume>10</volume>
          .1007/s00521-024-09960-z.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ciatto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sabbatini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agiollo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Omicini</surname>
          </string-name>
          ,
          <article-title>Symbolic knowledge extraction and injection with sub-symbolic predictors: A systematic literature review</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>56</volume>
          (
          <year>2024</year>
          )
          <volume>161</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>161</lpage>
          :
          <fpage>35</fpage>
          . doi:
          <volume>10</volume>
          .1145/3645103.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Agiollo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rafanelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ciatto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Omicini</surname>
          </string-name>
          ,
          <article-title>Symbolic knowledge injection meets intelligent agents: Qos metrics and experiments</article-title>
          ,
          <source>Autonomous Agents and Multi-Agent Systems</source>
          <volume>37</volume>
          (
          <year>2023</year>
          ).
          <source>doi:10.1007/s10458-023-09609-6.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ajtai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gurevich</surname>
          </string-name>
          ,
          <article-title>Datalog vs first-order logic</article-title>
          ,
          <source>Journal of Computer and System Sciences</source>
          <volume>49</volume>
          (
          <year>1994</year>
          )
          <fpage>562</fpage>
          -
          <lpage>588</lpage>
          . doi:
          <volume>10</volume>
          .1016/s0022-
          <volume>0000</volume>
          (
          <issue>05</issue>
          )
          <fpage>80071</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Körner</surname>
          </string-name>
          , M. Leuschel, e. a. Barbosa,
          <article-title>Fifty years of prolog and beyond</article-title>
          ,
          <source>Theory and Practice of Logic Programming</source>
          <volume>22</volume>
          (
          <year>2022</year>
          )
          <fpage>776</fpage>
          -
          <lpage>858</lpage>
          . doi:
          <volume>10</volume>
          .1017/ s1471068422000102.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Levesque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Brachman</surname>
          </string-name>
          ,
          <article-title>Expressiveness and tractability in knowledge representation and reasoning</article-title>
          ,
          <source>Computational Intelligence</source>
          <volume>3</volume>
          (
          <year>1987</year>
          )
          <fpage>78</fpage>
          -
          <lpage>93</lpage>
          . doi:
          <volume>10</volume>
          . 1111/j.1467-
          <fpage>8640</fpage>
          .
          <year>1987</year>
          .tb00176.x.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>T. van Gelder</surname>
          </string-name>
          ,
          <source>Why Distributed Representation is Inherently Non-Symbolic</source>
          , Springer Berlin Heidelberg,
          <year>1990</year>
          , pp.
          <fpage>58</fpage>
          -
          <lpage>66</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -76070-
          <issue>9</issue>
          _
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Badreddine</surname>
          </string-name>
          , A. S.
          <string-name>
            <surname>d'Avila Garcez</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Serafini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Spranger</surname>
          </string-name>
          ,
          <article-title>Logic tensor networks</article-title>
          ,
          <source>Artif. Intell</source>
          .
          <volume>303</volume>
          (
          <year>2022</year>
          )
          <article-title>103649</article-title>
          . doi:
          <volume>10</volume>
          .1016/J.ARTINT.
          <year>2021</year>
          .
          <volume>103649</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. W. S.</surname>
          </string-name>
          R. de Carvalho,
          <string-name>
            <given-names>R.</given-names>
            <surname>Riegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <article-title>Neuro-symbolic inductive logic programming with logical neural networks</article-title>
          ,
          <source>in: Thirty-Sixth Conference on Artificial Intelligence, AAAI, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence</source>
          ,
          <string-name>
            <surname>IAAI</surname>
          </string-name>
          ,
          <source>The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI Virtual Event, February 22 - March 1</source>
          , AAAI Press,
          <year>2022</year>
          , pp.
          <fpage>8212</fpage>
          -
          <lpage>8219</lpage>
          . doi:
          <volume>10</volume>
          .1609/AAAI.V36I8.20795.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Manhaeve</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dumancic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kimmig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Demeester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Raedt</surname>
          </string-name>
          ,
          <article-title>Neural probabilistic logic programming in deepproblog, Artif</article-title>
          . Intell.
          <volume>298</volume>
          (
          <year>2021</year>
          )
          <article-title>103504</article-title>
          . doi:
          <volume>10</volume>
          . 1016/J.ARTINT.
          <year>2021</year>
          .
          <volume>103504</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Rocktäschel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Riedel</surname>
          </string-name>
          ,
          <article-title>End-to-end diferentiable proving</article-title>
          , in: I. Guyon, U. von Luxburg, S. Bengio,
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. V. N.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , R. Garnett (Eds.),
          <source>Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4- 9</source>
          ,
          <year>2017</year>
          , Long Beach, CA, USA,
          <year>2017</year>
          , pp.
          <fpage>3788</fpage>
          -
          <lpage>3800</lpage>
          . URL: https://proceedings.neurips.cc/paper/2017/hash/ b2ab001909a8a6f04b51920306046ce5-Abstract.html.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naik</surname>
          </string-name>
          ,
          <article-title>Scallop: A language for neurosymbolic programming</article-title>
          ,
          <source>Proc. ACM Program. Lang</source>
          .
          <volume>7</volume>
          (
          <year>2023</year>
          )
          <fpage>1463</fpage>
          -
          <lpage>1487</lpage>
          . doi:
          <volume>10</volume>
          .1145/3591280.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Shindo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Dhami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kersting</surname>
          </string-name>
          ,
          <article-title>Neuro-symbolic forward reasoning</article-title>
          ,
          <source>CoRR abs/2110</source>
          .09383 (
          <year>2021</year>
          ). arXiv:
          <volume>2110</volume>
          .
          <fpage>09383</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Daniele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Serafini</surname>
          </string-name>
          ,
          <article-title>Knowledge enhanced neural networks for relational domains</article-title>
          , in: A.
          <string-name>
            <surname>Dovier</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Montanari</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Orlandini (Eds.),
          <source>AIxIA - Advances in Artificial Intelligence - XXIst International Conference of the Italian Association for Artificial Intelligence</source>
          , AIxIA, Udine, Italy, November 28 - December 2,
          <string-name>
            <surname>Proceedings</surname>
          </string-name>
          , volume
          <volume>13796</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2022</year>
          , pp.
          <fpage>91</fpage>
          -
          <lpage>109</lpage>
          . doi:
          <volume>10</volume>
          . 1007/978-3-
          <fpage>031</fpage>
          -27181-
          <issue>6</issue>
          _
          <fpage>7</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Glanois</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zimmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <article-title>Neuro-symbolic hierarchical rule induction</article-title>
          , in: K. Chaudhuri,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jegelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Szepesvári</surname>
          </string-name>
          , G. Niu, S. Sabato (Eds.),
          <source>International Conference on Machine Learning</source>
          ,
          <string-name>
            <surname>ICML</surname>
          </string-name>
          <year>2022</year>
          ,
          <volume>17</volume>
          -
          <issue>23</issue>
          <year>July</year>
          , Baltimore, Maryland, USA, volume
          <volume>162</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>7583</fpage>
          -
          <lpage>7615</lpage>
          . URL: https://proceedings.mlr.press/v162/ glanois22a.html.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ciatto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Omicini</surname>
          </string-name>
          ,
          <article-title>A view to a KILL: Knowledge injection via lambda layer</article-title>
          , in: A.
          <string-name>
            <surname>Ferrando</surname>
          </string-name>
          , V. Mascardi (Eds.),
          <source>WOA 2022 - 23rd Workshop “</source>
          From Objects to Agents”, volume
          <volume>3261</volume>
          <source>of CEUR Workshop Proceedings</source>
          , CEUR-WS.org, Genova, Italy,
          <year>2022</year>
          , pp.
          <fpage>61</fpage>
          -
          <lpage>76</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3261</volume>
          /paper5.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ciatto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Omicini</surname>
          </string-name>
          ,
          <string-name>
            <surname>KINS</surname>
          </string-name>
          :
          <article-title>Knowledge injection via network structuring</article-title>
          , in: R. Calegari,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ciatto</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Omicini (Eds.),
          <source>CILC 2022 - Italian Conference on Computational Logic</source>
          , volume
          <volume>3204</volume>
          <source>of CEUR Workshop Proceedings</source>
          , CEUR-WS.org, Bologna, Italy,
          <year>2022</year>
          , pp.
          <fpage>254</fpage>
          -
          <lpage>267</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3204</volume>
          / paper_25.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>X.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , Deeplogic:
          <article-title>Joint learning of neural perception and logical reasoning</article-title>
          ,
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>45</volume>
          (
          <year>2023</year>
          )
          <fpage>4321</fpage>
          -
          <lpage>4334</lpage>
          . doi:
          <volume>10</volume>
          .1109/TPAMI.
          <year>2022</year>
          .
          <volume>3191093</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ciatto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Omicini</surname>
          </string-name>
          ,
          <article-title>On the design of PSyKI: A platform for symbolic knowledge injection into sub-symbolic predictors</article-title>
          , in: D.
          <string-name>
            <surname>Calvaresi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Najjar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Winikof</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Främling (Eds.),
          <source>Explainable and Transparent AI</source>
          and
          <string-name>
            <surname>Multi-Agent Systems</surname>
            - 4th International Workshop, EXTRAAMAS 2022,
            <given-names>Virtual</given-names>
          </string-name>
          <string-name>
            <surname>Event</surname>
          </string-name>
          , May 9-
          <issue>10</issue>
          ,
          <year>2022</year>
          , Revised Selected Papers, volume
          <volume>13283</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2022</year>
          , pp.
          <fpage>90</fpage>
          -
          <lpage>108</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -15565-
          <issue>9</issue>
          _
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Pitassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Reingold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Zemel</surname>
          </string-name>
          ,
          <article-title>Fairness through awareness</article-title>
          , in: S. Goldwasser (Ed.),
          <source>Innovations in Theoretical Computer Science</source>
          <year>2012</year>
          , Cambridge, MA, USA, January 8-
          <issue>10</issue>
          ,
          <year>2012</year>
          , ACM,
          <year>2012</year>
          , pp.
          <fpage>214</fpage>
          -
          <lpage>226</lpage>
          . doi:
          <volume>10</volume>
          .1145/2090236.2090255.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paszke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          , et al.,
          <article-title>Pytorch: An imperative style, high-performance deep learning library</article-title>
          , in: H.
          <string-name>
            <surname>M. Wallach</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Beygelzimer</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>d'Alché-</article-title>
          <string-name>
            <surname>Buc</surname>
            ,
            <given-names>E. B.</given-names>
          </string-name>
          <string-name>
            <surname>Fox</surname>
          </string-name>
          , R. Garnett (Eds.),
          <source>Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems</source>
          <year>2019</year>
          ,
          <article-title>NeurIPS 2019</article-title>
          , December 8-
          <issue>14</issue>
          ,
          <year>2019</year>
          , Vancouver, BC, Canada,
          <year>2019</year>
          , pp.
          <fpage>8024</fpage>
          -
          <lpage>8035</lpage>
          . URL: https://proceedings.neurips.cc/paper/2019/hash/ bdbca288fee7f92f2bfa9f7012727740-Abstract.html.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. VanderPlas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , E. Duchesnay,
          <string-name>
            <surname>Scikitlearn:</surname>
          </string-name>
          <article-title>Machine learning in python</article-title>
          ,
          <source>J. Mach. Learn. Res</source>
          .
          <volume>12</volume>
          (
          <year>2011</year>
          )
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          . doi:
          <volume>10</volume>
          .5555/1953048. 2078195.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>W.</given-names>
            <surname>McKinney</surname>
          </string-name>
          ,
          <article-title>Data structures for statistical computing in python</article-title>
          , in: S. van der Walt, J. Millman (Eds.),
          <source>Proceedings of the 9th Python in Science Conference 2010 (SciPy</source>
          <year>2010</year>
          ), Austin, Texas, June 28 - July 3,
          <year>2010</year>
          , scipy.org,
          <year>2010</year>
          , pp.
          <fpage>56</fpage>
          -
          <lpage>61</lpage>
          . doi:
          <volume>10</volume>
          .25080/ MAJORA-92BF1922-00A.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>