<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ANSyA</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Neuro-Argumentative Learning with Legal Text</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Petros Vasileiadis</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emanuele De Angelis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maurizio Proietti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesca Toni</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IASI-CNR</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Imperial College London</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>1</volume>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>We introduce a neuro-argumentative pipeline for legal outcome classification on plain-text legal case descriptions, which integrates the flexibility of neural machine learning methods with the inherent explainability and reasoning capabilities of Assumption-Based Argumentation (ABA). This approach addresses the limitations of opaque machine learning models by producing interpretable logical rules from case data, which can be used to justify the predicted outcome. The pipeline consists of a neural BERT-based feature extractor, which processes the case description to generate logical facts, and a symbolic component, which applies ABALearn, a form of symbolic learning, to these facts to derive an ABA framework that captures domain-specific rules. The trained feature extractor can then be used alongside the learned framework to predict the outcome of new legal cases. The learned rules serve as explicit justifications for each prediction, resulting in an inherently explainable decision-making process. When evaluated using a synthetically generated legal dataset, our proposed pipeline achieves performance comparable to state-of-the-art models in terms of F1 score and other standard classification metrics, while also introducing a transparent, symbolic reasoning layer.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Legal reasoning</kwd>
        <kwd>Neuro-symbolic machine learning</kwd>
        <kwd>Assumption-Based Argumentation</kwd>
        <kwd>Explainable AI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Legal decision-making often involves binary or categorical
outcomes, such as guilty or not guilty, or application granted
or denied, based on case facts, applicable laws, and
supporting arguments. This process requires careful manual
analysis by legal professionals, who must identify relevant facts,
interpret complex statutory language, and reason about
their implications. The repetitive and time-consuming
nature of this work has motivated research into AI-assisted
legal decision-making systems [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Large language models (LLMs) such as BERT [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and
GPT [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] have demonstrated impressive natural language
understanding capabilities and show promise for legal tasks
such as case classification [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, their black-box
nature raises serious concerns, especially since these models
can hallucinate, generating plausible-sounding but incorrect
outputs [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and often fail to perform the kind of structured,
statutory reasoning required in law [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In the legal field,
where decisions carry significant consequences and require
justification, explainability is essential. Systems that provide
decisions without transparent reasoning undermine trust,
accountability, and fairness. Therefore, any system used
for legal decision-making must prioritise explainability and
transparency.
      </p>
      <p>
        We propose a neuro-argumentative pipeline (see Figure 1)
that enables Assumption-Based Argumentation (ABA) [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]
on natural language legal case descriptions. A neural
feature extractor identifies relevant facts from case
descriptions, which are translated into a symbolic form suitable for
ABALearn [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This is a form of symbolic learning which
obtains ABA frameworks from background knowledge and
positive/negative examples, to then make case-based
inferences given the facts of new cases.
      </p>
      <p>One of the key benefits of symbolic learning is its
inherent explainability. Each derived rule explicitly shows how
particular facts lead to a legal decision or classification
outcome. This allows users to trace which rules apply to a given
case and see the justification for the prediction output.</p>
      <p>
        We evaluate our pipeline on two datasets [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], extended
with synthetic case descriptions. One is based on tort law,
involving liability for damages [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and the other concerns
eligibility for a welfare benefit based on criteria like age,
gender, and distance [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>We compare our pipeline’s performance to other
state-ofthe-art classification systems, using standard classification
metrics such as accuracy and F1 score. The final
performance of the pipeline is comparable to that of the
state-ofthe-art, while also introducing explainability and reasoning.</p>
      <p>Overall, this paper contributes to the growing field of
neuro-symbolic and explainable AI by demonstrating how
neural feature extraction and argumentation-based
reasoning can be efectively combined in the legal domain, ofering
both performance and explainability.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <sec id="sec-2-1">
        <title>2.1. Assumption-Based Argumentation</title>
        <p>
          Assumption-Based Argumentation is a structured
argumentation formalism that generalises various non-monotonic
reasoning approaches. In ABA, arguments and attacks are
defined through rules, assumptions, and their contraries. It
provides a formal basis for reasoning, allowing us to
construct arguments that attack or defend assumptions in order
to draw acceptable conclusions [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>
          An ABA framework (as originally proposed in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], but
presented here following [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]) is a tuple ⟨ℒ, ℛ, , ⟩ such
that
• ⟨ℒ, ℛ⟩ is a deductive system, where ℒ is a language
and ℛ is a set of (inference) rules of the form 0 ←
1, . . . ,  ( ≥ 0,  ∈ ℒ, for 1 ≤  ≤ );
•  ⊆ ℒ
        </p>
        <p>is a (non-empty) set of assumptions;1
•</p>
        <p>is a total mapping from  into ℒ, where  is the
contrary of , for  ∈ .
1The non-emptiness requirement can always be satisfied by including
in  a bogus assumption, with its own contrary, neither occurring
elsewhere.</p>
        <p>Given a rule 0 ← 1, . . . , , 0 is the head and
1, . . . ,  is the body; if  = 0 then the body is said
to be empty (represented as 0 ← or 0 ← ) and the
rule is called a fact. In this paper we focus on flat ABA
frameworks, where assumptions are not heads of rules.
Elements of ℒ can be any sentences, but in this paper we
focus on ABA frameworks where ℒ is a finite set of ground
atoms. However, we will use schemata for rules,
assumptions and contraries, using variables to represent compactly
all instances over some underlying universe.</p>
        <p>Example 1. An simple ABA framework ⟨ℒ, ℛ, , ⟩ may
be as follows, for ,  ∈ {, , , ℎ, }:
ℒ = {(), (), _(), ()
_(,  ), ℎ__ℎ_()}
ℛ = {() ← (), _(),
() ← ℎ__ℎ_(),
_(, ) ← ,
_(ℎ, ) ← ,
() ← , ℎ__ℎ_() ← ,
() ← , () ← ,
() ← , (ℎ) ←
() ←} ,
 = {_()} with</p>
        <p>_() = ()</p>
        <p>Without loss of generality, we will leave the language
component of ABA frameworks implicit, and use, e.g.,
⟨ℛ, , ⟩ to stand for ⟨ℒ, ℛ, , ⟩, where ℒ is the set
of all sentences in ℛ,  and in the range of . We will also
write facts as rules with equalities in the body, e.g. we may
write () ← as () ←  = .</p>
        <p>Arguments are deductions of claims using rules and
supported by assumptions, and attacks between arguments
target assumptions in their support, as follows.</p>
        <p>• An argument for (claim)  ∈ ℒ supported by  ⊆
 and  ⊆ ℛ (denoted  ⊢ ) is a finite tree
with nodes labelled by sentences in ℒ or by ,
where: (i) the root is labelled by , (ii) leaves are
either  or assumptions in  (iii) non-leaves, ′,
have as children the elements of the body of some
rule in  with head ′.
• Argument 1 ⊢1 1 attacks argument 2 ⊢2 2
if and only if 1 =  for some  ∈ 2.</p>
        <p>
          In ABA, conclusions are drawn by determining
acceptability of sets of arguments (or extensions [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]). In this paper,
we use the following notion of extension:
• A set ∆ of arguments is a stable extension if (i) no
argument in ∆ attacks any argument in ∆ (i.e. ∆
is conflict-free ) and (ii) every argument not in ∆ is
attacked by an argument in ∆ .
        </p>
        <p>Example 2. The following are some of the arguments that
can be constructed in the ABA framework  of Example 1 (for
simplicity, we omit to indicate the set of rules that have been
used):
1: {_()} ⊢ ()
2: {_()} ⊢ ()
3: {} ⊢ ()
Argument 1 is not attacked by any other argument, as
no argument can be constructed for the claim ().
Instead, 2 is attacked by 3, whose claim is the
contrary of the assumption that supports 2. 3
cannot be attacked, as no assumption supports it. Thus, 1
and 3 are accepted in the stable extension of  , while
2 is not. It can be seen that  has a unique stable
extension (because the attack relation is acyclic). In
particular, this stable extension includes accepting arguments for
(), (), (), and
(ℎ), while it does not include an accepting
argument for ().</p>
        <p>We say that an ABA framework  is satisfiable if it admits
at least one stable extension, and unsatisfiable otherwise.
We also say that a sentence  is accepted, or covered, in a
stable extension ∆ of  , written  |=Δ , if it is the claim
of an argument in ∆ .</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Learning ABA Frameworks</title>
        <p>
          ABA frameworks can be automatically learned from a given
background knowledge, in the form of an ABA framework, a
set ℰ + of positive examples, and a set ℰ − of negative
examples. To this end, we follow the ABALearn method [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]:
• Given
        </p>
        <p>a satisfiable background knowledge
⟨ℛ, , ⟩, positive example+s ℰ + ⊆ ℒ and negative
examples ℰ − ⊆ ℒ , with ℰ ∩ ℰ − = ∅, a solution
of ABA learning is a new ABA framework,
 ′ = ⟨ℛ′, ′, ′⟩, with ℛ ⊆ ℛ ′,  ⊆  ′, and, for
all  ∈ ,  ′ =  , such that:
(Existence)  ′ admits at least one stable extension ∆ ,
(Completeness) for all  ∈ ℰ +,  ′ |=Δ , and
(Consistency) for all  ∈ ℰ − ,  ′ ̸|=Δ .</p>
        <p>The ABALearn method is based on the use of
transformation rules, which can be used for deriving new ABA
frameworks. These transformation rules include: (1) rote learning,
which adds a new rule () ←  = ; (2) folding, which,
given rules  ← ,  and  ← , replaces  ← , 
by the new rule  ← , ; (3) assumption introduction,
which, given rule  ← , introduces an assumption  ,
with contrary  , and adds the new rule  ← ,  ; and
(4) fact subsumption, which deletes any fact of the form
() ← if there is an accepted argument with claim () in
the ABA framework ⟨ℒ, ℛ ∖ {() ←} , ,− ⟩.</p>
        <p>The ABALearn algorithm iterates four steps (according
to the pattern: (1); (2; 3; 4)* ):
1. Generating initial rules. This step applies rote
learning to cover positive examples and avoid to
cover negative examples.
2. Generalising facts. This step selects a fact obtained
by rote learning and applies fact subsumption. If the
fact is not subsumed, it applies folding with the goal
of generating a new, more general, rule that makes
no explicit references to the constants occurring in
the ABA framework.
3. Introducing new assumptions. This step applies
assumption introduction to any rule obtained by step
(2) if it supports the an argument for a negative
example.
4. Learning facts for contraries. This step applies
rote learning to generate facts for the contrary of the
new assumption introduced by step (3).</p>
        <p>Example 3. Let us consider the background knowledge
consisting in the ABA framework  of Example 1, and the
following sets of positive and negative examples:
ℰ + = {()}
ℰ − = {(ℎ)}
The positive example () is already covered in
the unique stable extension of  . However, also the negative
example (ℎ) is covered. Thus, by rote learning,
ABALearn introduces the new rule:
 1. () ←</p>
        <p>= ℎ
which avoids covering (ℎ). By folding twice,
rule  1 is replaced by:
 2. () ←</p>
        <p>_(,  ), ( )
Now, the positive example () is no longer
covered, because () is covered. By assumption
introduction, we introduce a new assumption  ( ) with contrary
_ ( ) and, by rote learning, a new fact:
 3. () ←
 4. _ ( ) ←
_(,  ), ( ),
 ( )
 = 
Finally, by folding, rule  4 is replaced by:
 5. _ ( ) ←</p>
        <p>( )
The ABA framework with ℛ′ = ℛ ∪ { 3,  5} is a solution
of ABA Learning, as it covers all positive examples and no
negative example.</p>
        <p>
          ABALearn is implemented in ASP-ABAlearn [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]
(available at https://github.com/ABALearn/aba_asp). The
implementation takes advantage of the existence of a mapping
between ABA frameworks under stable extensions and
Answer Set Programming (ASP) [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], and in particular, it uses
SWI-Prolog [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] and the Clingo [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] ASP solver. The
mapping of ABA frameworks to ASP is exploited, among other
things, for determining the facts to be learned at Steps 1 and
4 of the ABALearn algorithm.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Datasets</title>
        <p>
          We will use two datasets [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The first is based on tort
law [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], and consists of cases represented by ten boolean
features:
1. dmg: Someone has sufered damages by someone
else’s act.
2. cau: The act caused the sufered damages.
3. vrt: The act is a violation of someone’s right.
4. vst: The act is a violation of a statutory duty.
5. vun: The act is a violation of unwritten law against
proper social conduct.
6. jus: There exist grounds of justification.
7. ift: The act is imputable to someone because of the
person’s fault.
8. ila: The act is imputable to someone because of law.
9. ico: The act is imputable to someone because of
common opinion.
10. prp: The violated statutory duty does not have the
purpose to prevent the damages.
        </p>
        <p>In addition, each case has a label dut of the verdict on
whether there is a duty to repair damages, that is, the final
case outcome.</p>
        <p>Hence, the dataset consists of tuples of the following form,
where column dut represents the outcome of the case.</p>
        <p>
          The second dataset is based on welfare benefit
applications [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], with cases described in terms of features about
applicants’ eligibility:
• Age: Integer. The applicant’s age.
• Gender: Categorical. The applicant’s gender.
• 1,...,5: Boolean. Indicates which years the
applicant has paid contributions.
• Spouse: Boolean. Indicates whether the applicant
is the spouse of the patient.
• Absent: Boolean. Indicates whether the applicant
is absent from the UK.
• Resources: Integer. The applicant’s amount of
capital resources.
• Patient Type: Categorical. Indicates whether the
patient is an in-patient or an out-patient.
• Distance (to the hospital): Integer. The distance
to the hospital in km.
        </p>
        <sec id="sec-2-3-1">
          <title>Synthetic Data Generation We have implemented a</title>
          <p>synthetic data generator to obtain plain text descriptions of
cases based on their features and outcomes in the datasets.
This is based on the Mistral-Small-24B-Instruct-2501 model2.
For instance, the generated description of the first case in
the tort law dataset above is as follows:</p>
          <p>“During the proceedings, the Plaintif, Ms. Emily Harris,
alleged that on January 5, 2022, she sufered severe burns
while using a defective electric kettle manufactured by the
Defendant, KettleTech Inc. The Plaintif testified that the kettle,
purchased new from a local retailer, malfunctioned when she
turned it on, causing hot water to spray onto her. . . ”</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Neuro-argumentative pipeline</title>
      <sec id="sec-3-1">
        <title>3.1. Pipeline overview</title>
        <p>As shown in Figure 1, the neuro-argumentative pipeline
consists of two main components:
1. a neural feature extractor, responsible for
extracting a background knowledge, along with positive
and negative examples, from labeled legal cases,
described as unstructured text, and
2. a symbolic learning module, which learns an ABA
framework from the background knowledge and the
examples using the ABALearn method, and employs
the learnt ABA framework to predict the outcome
of new cases.</p>
        <p>Training Process The dataset, which contains plain text
case descriptions labelled with the ground truth feature
values and the final case outcome, is split into two parts:
• One subset (3000 cases) is used to train the neural
feature extractor, mapping text to logical features.
• The other subset (1000 cases) is used by ABALearn
to learn new rules, using the features extracted by
the trained extractor alongside the known case
outcomes.
2https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501
Extracted
Case Facts
dmg
1
...</p>
        <p>cau
0
...</p>
        <p>...
...
...</p>
        <p>dut
1
...</p>
        <p>Dataset of
case descriptions</p>
        <p>Feature
Extractor
ABALearn</p>
        <p>ABA
Framework</p>
        <p>Training the neuro-argumentative pipeline is a two-stage
process, training each component separately. In the first
stage, we train the neural feature extractor to predict the
logical features present, using the plain-text case descriptions
as input.</p>
        <p>Then we can use the trained feature extractor to extract
the logical facts from the case descriptions of the second
training subset. These predicted facts, along with their
associated case outcomes, are used as input to ABALearn
to learn new rules and construct an ABA framework.</p>
        <p>The output of the training process is the trained feature
extractor model and the learned ABA framework, which
can be used to make predictions on the outcome of unseen
cases using argumentation-based inference.</p>
        <p>Inference At inference time, the neuro-argumentative
pipeline operates in a fully end-to-end manner. The
process is composed of two sequential steps: neural feature
extraction and symbolic inference.</p>
        <p>1. Feature Extraction: Given a plain-text case
description, the trained neural feature extractor is used to
predict the logical facts relevant to the case.
2. Symbolic Inference: The extracted logical facts
are added to the learned ABA framework, which
uses argumentation-based reasoning to determine
the legal outcome for the new case.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Feature Extraction</title>
        <p>This section presents the neural architectures explored for
extracting symbolic features from plain-text legal case
descriptions.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. BERT-based feature extractor</title>
          <p>In our baseline architecture, we utilise the pre-trained
LegalBERT [17] encoder to extract features from plain-text legal
case descriptions. This implementation was developed only
to handle datasets with exclusively boolean features.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Architecture Overview</title>
          <p>of:</p>
          <p>The feature extractor consists
1. A shared Legal-BERT encoder, which processes the
input case description and produces contextual
embeddings.
2. A separate independent multilayer perceptron (MLP)
head for each target feature, which predicts binary
labels from the [CLS] token embedding produced
by Legal-BERT, indicating the presence of the target
feature.</p>
          <p>Training Procedure During training, the Legal-BERT
encoder parameters are frozen to preserve the encoder’s
pretrained representations and reduce computational
overhead. Only the individual MLP heads for each feature are
trained using a sigmoid output activation and binary
crossentropy loss.</p>
          <p>Each feature head is trained independently. This means
that in every training step, each feature’s MLP head is
updated using the loss computed for that feature only.
Benefits and Limitations This architecture provides a
simple and computationally eficient way to extract multiple
symbolic features using a shared transformer-based encoder
and avoids interactions between heads during
backpropagation. It benefits from the generalisation capabilities of a
large pre-trained model while keeping the parameter count
and training time low.</p>
          <p>However, since the encoder is not updated during
training, it cannot fully adapt to the specific legal domain, which
can limit the feature extractor’s accuracy, especially for
more abstract or context-dependent features.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.2. Fine-tuning the BERT encoder</title>
          <p>Architecture Overview - Updates Building upon the
feature extractor described above, this approach retains the
same architecture, however, the encoder parameters are no
longer frozen. During training, the gradients from the loss
function are propagated all the way through the encoder,
enabling it to adapt to the specific legal domain, whether
that is tort law or welfare benefit applications.</p>
          <p>Training Procedure All MLP heads and the Legal-BERT
encoder are trained together. Given an input case  with 
binary features, the model outputs ˆ1, ˆ2, . . . , ˆ, and the
total loss is calculated as the sum of binary cross-entropy
losses across all heads:</p>
          <p>ℒ = ∑︁ BCE(, ˆ)</p>
          <p>=1
where  is the ground truth for feature . Gradients from
this loss flow through the MLP heads and into the encoder,
allowing the model to learn domain-specific contextual
embeddings.</p>
          <p>Benefits and Limitations Fine-tuning the Legal-BERT
model allows the encoder to adapt to the legal domain and
learn better contextual representations, leading to improved
classification performance.</p>
          <p>However, this approach significantly increases the
number of trainable parameters, leading to higher computational
and memory requirements during training. It also
introduces a greater risk of overfitting, especially when working
with small or imbalanced datasets, as the model may learn
to memorise patterns that do not generalise.</p>
        </sec>
        <sec id="sec-3-2-4">
          <title>3.2.3. Multi-Task Feature Extractor</title>
          <p>The previous architectures rely on a single shared encoder,
which can limit task-specific learning and make it dificult
to handle heterogeneous feature types. To address these
challenges, we propose a multi-task feature extractor
architecture where each feature is handled by a dedicated
model.</p>
          <p>Architecture Overview The proposed architecture
consists of multiple independent BERT encoders, one for each
target feature. Each encoder processes the same input case
description but independently learns to represent it in a way
that is most useful for predicting its assigned feature. The
encoder output is then passed through the corresponding
feature-specific MLP head that produces the prediction.
Training Procedure Our implementation supports three
types of target features:
1. Binary: The MLP head outputs a single neuron with
a sigmoid activation function and is optimised using
the binary cross-entropy loss.
2. Categorical: The MLP head outputs a probability
vector over possible classes using softmax activation
and categorical cross-entropy loss.
3. Numerical: The MLP head outputs a scalar value
without activation and is trained using mean squared
error (MSE) loss.</p>
          <p>Each encoder-head pair is trained independently using
only the label corresponding to its target feature. The
input case description is shared across all feature models, but
there is no parameter sharing between the encoders. At
each training step, we calculate the appropriate loss
specific to that feature, and use backpropagation only on the
corresponding encoder and head.</p>
        </sec>
        <sec id="sec-3-2-5">
          <title>Benefits and Limitations The main advantage of this</title>
          <p>architecture is improved performance through task
specialisation. This independent training ensures that the learning
for one feature does not conflict with another, preserving
task-specific representations and gradients, and allowing
the encoder to capture subtle patterns relevant to its target.</p>
          <p>However, this comes at a cost, as the number of
parameters increases linearly with the number of features, resulting
in higher memory requirements.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Symbolic learning</title>
        <p>In this section, we show how ABALearn is used as the
symbolic component of the pipeline shown in Figure 1. In
particular, we show how the features of the cases3 are encoded
to be given as input to ABALearn. Then, we present an
excerpt of the learned ABA framework. Finally, we describe
the inference process for predicting case outcomes and
discuss how the learned ABA framework, in conjunction with
3In the pipeline, these are the features extracted from textual
descriptions of the cases. We focus here, for illustration, on the features in the
original tort law dataset.
the case-specific facts, can be used to provide justifications
for each prediction.</p>
        <p>The encoding process is shown in Figure 2. Given any
case  and feature  in {dmg, cau, . . . , prp}, if  () is 1,
then ‘f(i)’ is added as a fact to the background
knowledge of ABALearn. In addition to background knowledge,
ABALearn takes as input a set of positive and negative
examples, denoted by the pair ⟨ℰ+, ℰ− ⟩. If dut() is 1, then
‘dut(i)’ is added to ℰ+; otherwise, it is added to ℰ− .</p>
        <p>Tabular Case Facts</p>
        <p>The new case is translated into facts using the same
process described to encode the background knowledge given
as input to ABALearn.</p>
        <p>The new facts are then added to the learned ABA
framework, thereby getting the encoding shown below.
% Learnt rules
dut(A) :- alpha_1(A), dmg(A), vst(A), prp(A).
c_alpha_1(A) :- dmg(A), cau(A), vst(A), prp(A).
% Facts of the new case with id 42
dmg(42). cau(42). vst(42). ico(42). prp(42).</p>
        <p>We can predict whether there is a duty to repair
damages by checking if the claim dut(42) belongs to the
stable extension of the ABA framework augmented with
the new facts. The new facts enable the deduction of
c_alpha_1(42), which attacks the argument supporting
the deduction of the claim dut(42). Hence, the ABA
framework predicts that there is no duty to repair damages.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <sec id="sec-4-1">
        <title>4.1. Feature Extraction</title>
        <p>We begin our evaluation by comparing the performance of
the feature extraction architectures discussed above. We
focus on the tort law dataset, the only one lending itself
to all feature extractor architectures, as it contains binary
features.</p>
        <p>Figure 3 compares the average F1 scores of all
architectures on the tort law dataset. The baseline Legal-BERT
model achieves an F1 score of 0.66. Fine-tuning the encoder
yields a substantial performance gain of 22.8%, while the
multi-task architecture adds a modest further improvement
of 2.3%.</p>
        <p>While the small gain from using multiple encoders may
suggest that a single Legal-BERT encoder already produces
suficiently rich embeddings for most features, a closer look
at per-feature performance reveals a diferent picture. The
multi-task model exhibits more consistent F1 scores across
features, whereas the single-encoder model seems to
perform well on some features while underperforming on
others.</p>
        <sec id="sec-4-1-1">
          <title>Downstream Task Performance We also assess the fea</title>
          <p>ture extractor architectures based on their utility in
downstream symbolic learning and classification tasks. Figure 4
presents the pipeline classification F1 scores for predicting
the dut (duty) label for each feature extractor architecture.
The baseline architecture achieves an F1 score of 0.59.
Finetuning the encoder improves this by 33%, while the
multitask architecture yields an additional 3.5% improvement.</p>
          <p>The larger improvements in downstream classification,
relative to standalone feature extraction, show the
importance of feature quality in symbolic learning. The symbolic
module depends on consistent features to learn meaningful
rules. Poor-quality features can lead to learning meaningless
or misleading rules, exemplifying the “garbage in, garbage
out” principle. Even small gains in feature extraction can
yield significant benefits in downstream reasoning.</p>
          <p>In the pipeline evaluation we adopt the multi-task
architecture as it demonstrates more consistent performance
across features.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Symbolic Learning (ABALearn)</title>
        <p>We evaluate ABALearn by training it on fragments of the
datasets of varying sizes to examine how training data
volume afects symbolic learning. We use ground truth features
to isolate the symbolic module’s performance from that of
the feature extractor. As a baseline, we compare ABALearn
to a decision tree classifier, using F1 score on case outcome
predictions.</p>
        <p>Figure 5 shows F1 scores across diferent training set
sizes for both datasets. On the tort law dataset, ABALearn
closely tracks the performance of the decision tree and
begins to outperform it with training sets of 1000 examples or
more. In contrast, ABALearn initially underperforms on the
welfare benefit dataset when trained on smaller sets, but
matches the decision tree’s performance once the training
size exceeds 500 examples. This slower convergence is likely
due to the added complexity of handling numerical features,
which significantly expands the logical space that the ABA
framework must represent.</p>
        <p>The strong F1 scores across both datasets suggest that
ABALearn can learn meaningful rules for case outcome
classification with relatively limited training examples. This
eficiency allows us to allocate more examples to feature
extraction. For our pipeline experiments, we fix the symbolic
training set size at 1000 examples, which is the training set
size where ABALearn surpasses the decision tree on the tort
law dataset and levels of on the welfare benefit dataset.
(a) ABA vs decision tree for the (b) ABA vs decision tree for the
tort law dataset welfare benefit dataset</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Neuro-argumentative Pipeline</title>
        <p>Building on the strong individual performance of the feature
extractor and symbolic learning modules, we now evaluate
the end-to-end neuro-argumentative pipeline, comparing
its performance against several BERT-based classifier
approaches for our synthetic datasets. Figures 6 and 7 show
the case outcome classification F1 scores across the
diferent classifier architectures. For both datasets, the
neuroargumentative pipeline outperforms both the baseline
BERTbase and Legal-BERT classifiers, but fails to surpass the
performance of their fine-tuned counterparts.</p>
        <p>While the 10% relative F1 improvement over baseline
classifiers on the tort law dataset is notable, the
consistently strong performance across all models suggests this
domain may be too simple to pose a meaningful challenge.
In contrast, the welfare benefit dataset is more challenging,
causing baseline classifiers to struggle. Here, our
neuroargumentative pipeline achieves a 39.5% relative F1
improvement over the domain-specific Legal-BERT classifier and
performs within 5% of the fine-tuned model.</p>
        <p>It is worth noting that the pipeline’s F1 score is
significantly lower than that of the standalone symbolic module
when it is given clean, ground truth features, which
indicates that the main performance bottleneck lies in feature
extraction. This means that improving the accuracy of the
extracted features could lead to substantial gains in symbolic
reasoning and overall pipeline performance.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Explainability</title>
        <p>One of the main objectives of this paper is to introduce
an explainable system for legal decision-making. While
Assumption-Based Argumentation is, in principle, an
inherently transparent and explainable argumentation
framework, the symbolic rules learned by ABALearn in practice
may fall short of this ideal.</p>
        <p>By using ABA, users can trace which rules and
assumptions lead to a particular conclusion, providing a transparent
decision-making process We show below some of the rules
learned by the neuro-argumentative pipeline.
% Learned rules
dut(A) :- alpha_1(A), dmg(A), ico(A), vun(A).
dut(A) :- alpha_2(A), cau(A), dmg(A), ila(A).
...
c_alpha_29(A) :- dmg(A), ico(A), ift(A),
ila(A), jus(A), prp(A),
vst(A), vun(A).</p>
        <p>We can see that the learned ABA frameworks can
sometimes include a large number of assumptions and contraries,
and rules with long and complex bodies, which complicates
the overall structure of the framework. The reasoning
behind a prediction can become dificult to follow, and the
large number of rules makes it challenging to isolate those
that are relevant to a specific case outcome. These issues
appear even when learning the framework using clean, ground
truth feature values and are even worse when using features
extracted from the case descriptions, as noise and variability
in the input data lead ABALearn to introduce workarounds
during learning, ultimately resulting in rules that are less
coherent and more dificult to interpret.</p>
        <p>Although the foundation for explainability is present in
the form of the learned rules, further work is needed to
improve the interpretability thereof and make them more
accessible to human users.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This work presents a neuro-argumentative pipeline for legal
decision-making that combines the flexibility of neural
feature extraction with the transparency of assumption-based
argumentation. The pipeline demonstrates good
classification performance, with particularly strong results in the
symbolic learning component, which achieves high
theoretical accuracy when provided with clean feature inputs.
Future Work As demonstrated in our experiments, the
quality of feature extraction significantly afects both
symbolic learning and overall pipeline performance. While our
proposed architecture performs well on many binary
features, it struggles with more nuanced features, leading to
suboptimal performance across certain feature types.
Exploring new feature extraction architectures to improve
upon the performance of the proposed ones could lead to
large overall improvements in the pipeline’s classification
performance.</p>
      <p>Additionally, even though the rules learned by ABALearn
can be inspected to trace the reasoning behind a prediction,
their interpretability remains limited. The learned
frameworks are often large and complex, even when derived from
a relatively small number of features. Future work could
focus on developing automated simplification tools to reduce
redundancy and improve clarity, making the learned rules
more accessible to users. For example, merging logically
equivalent rules or pruning unnecessary literals could yield
more concise representations.</p>
      <p>Finally, while the current pipeline has demonstrated
strong performance on synthetic datasets, futher work is
required to apply it to real-world datasets that are often
characterised by incomplete, noisy, or inconsistently structured
inputs.</p>
      <p>This paper demonstrates the promise of combining neural
learning with symbolic argumentation for legal
classification tasks. Notably, the results suggest that explainability
and performance do not need to be mutually exclusive. This
work lays a foundation for further exploration of
neurosymbolic systems in the legal domain, where transparency
in the decision-making process is paramount.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We thank support from the Royal Society, UK
(IEC\R2\222045). Toni was partially funded by the
ERC (grant agreement No. 101020934) and by J.P. Morgan
and the RAEng, UK, under the Research Chairs Fellowships
scheme (RCSRF2021\11\45). De Angelis and Proietti were
supported by the MUR PRIN 2022 Project DOMAIN
funded by the EU – NextGenerationEU (2022TSYYKJ, CUP
B53D23013220006, PNRR, M4.C2.1.1), by the PNRR MUR
project PE0000013-FAIR (CUP B53C22003630006), and by
the INdAM - GNCS Project Argomentazione Computazionale
per apprendimento automatico e modellazione di sistemi
intelligenti (CUP E53C24001950001). De Angelis and
Proietti are members of the INdAM-GNCS research group.
Finally, we would like to thank the anonymous reviewers
for their constructive remarks.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
[17] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N.
Aletras, I. Androutsopoulos, LEGAL-BERT: The Muppets
straight out of law school, 2020. arXiv:2010.02559.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Large language models in law: A survey</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2312</volume>
          .
          <fpage>03718</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <article-title>Language models are unsupervised multitask learners</article-title>
          ,
          <year>2019</year>
          . URL: https: //cdn.openai.
          <article-title>com/better-language-models/language_ models_are_unsupervised_multitask_learners</article-title>
          .pdf .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vatsal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Meyers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Ortega</surname>
          </string-name>
          ,
          <article-title>Classification of US Supreme Court Cases using BERT-Based techniques</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2304</volume>
          .
          <fpage>08649</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Magesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Suzgun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <article-title>Large legal fictions: Profiling legal hallucinations in large language models</article-title>
          ,
          <source>Journal of Legal Analysis</source>
          <volume>16</volume>
          (
          <year>2024</year>
          )
          <fpage>64</fpage>
          -
          <lpage>93</lpage>
          . doi:
          <volume>10</volume>
          .1093/jla/laae003.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Blair-Stanek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Holzenberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Van Durme</given-names>
            ,
            <surname>Can</surname>
          </string-name>
          <string-name>
            <surname>GPT</surname>
          </string-name>
          -
          <article-title>3 perform statutory reasoning?</article-title>
          ,
          <source>in: Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law</source>
          , ICAIL '23,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2023</year>
          , p.
          <fpage>22</fpage>
          -
          <lpage>31</lpage>
          . doi:
          <volume>10</volume>
          .1145/3594536.3595163.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bondarenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kowalski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Toni</surname>
          </string-name>
          ,
          <article-title>An abstract, argumentation-theoretic approach to default reasoning, Artif</article-title>
          . Intell.
          <volume>93</volume>
          (
          <year>1997</year>
          )
          <fpage>63</fpage>
          -
          <lpage>101</lpage>
          . doi:
          <volume>10</volume>
          . 1016/S0004-
          <volume>3702</volume>
          (
          <issue>97</issue>
          )
          <fpage>00015</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Cyras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schulz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Toni</surname>
          </string-name>
          ,
          <article-title>Assumption-based argumentation: Disputes, explanations, preferences</article-title>
          ,
          <source>FLAP</source>
          <volume>4</volume>
          (
          <year>2017</year>
          ). URL: https://www.collegepublications. co.uk/downloads/ifcolog00017.pdf .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Proietti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Toni</surname>
          </string-name>
          ,
          <source>Learning assumption-based argumentation frameworks</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2305</volume>
          .
          <fpage>15921</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Steging</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Renooij</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Verheij</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          Bench-Capon,
          <article-title>Arguments, rules and cases in law: Resources for aligning learning and reasoning in structured domains</article-title>
          ,
          <source>Argument &amp; Computation</source>
          <volume>14</volume>
          (
          <year>2023</year>
          )
          <fpage>235</fpage>
          -
          <lpage>243</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Verheij</surname>
          </string-name>
          ,
          <article-title>Formalizing arguments, rules and cases</article-title>
          ,
          <source>in: Proceedings of the 16th Edition of the International Conference on Articial Intelligence and Law</source>
          , ICAIL '17,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2017</year>
          , p.
          <fpage>199</fpage>
          -
          <lpage>208</lpage>
          . doi:
          <volume>10</volume>
          .1145/3086512. 3086533.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Bench-Capon</surname>
          </string-name>
          ,
          <article-title>Neural networks and open texture</article-title>
          ,
          <source>in: Proceedings of the 4th International Conference on Artificial Intelligence and Law</source>
          , ICAIL '93,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>1993</year>
          , p.
          <fpage>292</fpage>
          -
          <lpage>297</lpage>
          . doi:
          <volume>10</volume>
          .1145/158976.159012.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>E. De Angelis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Proietti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Toni</surname>
          </string-name>
          ,
          <article-title>Learning brave assumption-based argumentation frameworks via ASP</article-title>
          ,
          <source>in: Proceedings of ECAI</source>
          <year>2024</year>
          , volume
          <volume>392</volume>
          <source>of FAIA</source>
          , IOS Press,
          <year>2024</year>
          , pp.
          <fpage>3445</fpage>
          -
          <lpage>3452</lpage>
          . doi:
          <volume>10</volume>
          .3233/ FAIA240896.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gelfond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lifschitz</surname>
          </string-name>
          ,
          <article-title>The stable model semantics for logic programming</article-title>
          ,
          <source>in: Proceedings IJCSLP</source>
          , MIT Press,
          <year>1988</year>
          , pp.
          <fpage>1070</fpage>
          -
          <lpage>1080</lpage>
          . URL: http://www.cs.utexas. edu/users/ai-lab?
          <fpage>gel88</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wielemaker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schrijvers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Triska</surname>
          </string-name>
          , T. Lager, SWIProlog,
          <source>Theory and Practice of Logic Programming</source>
          <volume>12</volume>
          (
          <year>2012</year>
          )
          <fpage>67</fpage>
          -
          <lpage>96</lpage>
          . doi:
          <volume>10</volume>
          .1017/S1471068411000494.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gebser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kaminski</surname>
          </string-name>
          , B. Kaufmann, T. Schaub,
          <source>Clingo = ASP + Control: Preliminary report</source>
          ,
          <year>2014</year>
          . arXiv:
          <volume>1405</volume>
          .
          <fpage>3694</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>