<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Experimental Analysis of the Legal Coding Process</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matteo Cristani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guido Governatori</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Olivieri</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Monica Palmirani</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriele Buriola</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Computer Science, University of Verona</institution>
          ,
          <addr-line>Verona</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept. of Legal Studies, Alma Mater Studiorum University of Bologna</institution>
          ,
          <addr-line>Bologna</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Independent researcher</institution>
          ,
          <addr-line>Brisbane</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>School of Engineering and Technology, Central Queensland University</institution>
          ,
          <addr-line>Rockhampton</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents a new methodology for legal coding using Deontic Defeasible Logic. Starting from normative text fragments, the approach maps them into rules and tests their accuracy through example scenarios. We illustrate the method with a sample text and report on experiments in which participants encoded various legal fragments, measuring the efort based on specific features. The Houdini tool was used for the process, and we introduce a model to predict the coding time based on legal knowledge, process familiarity, text length, and reference depth.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Legal coding</kwd>
        <kwd>Explainable AI</kwd>
        <kwd>Deontic Defeasible Logic</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Legal coding [1, 2, 3] is the process of translating laws into a formal language, a task complicated
by factors such as the complexity of legal language, specialized reasoning, and conflicting logical
constraints. Many scholars [4] suggest that the RegTech approach could be an efective strategy.
This approach proposes that we can trust decisions made by rule-based intelligent systems
provided that we understand the reasoning system and ensure the rules correctly represent the
domain being coded. It relies on the following conceptual steps:
– Starting from the legal text, provide a code onto a standard coding system, accepted by a
large community of practice in the domain;</p>
      <p>The purpose of this contribution is two-fold. On the one hand, the paper outlines a novel
methodology for legal coding. On the other hand, our aim is to accurately value the time span
required to execute a coding. To achieve these goals, the following tools are needed:
1. A logical apparatus consistent with human legal reasoning. In this paper, we adopt a
commonly employed formalism known as Defeasible Deontic Logic as defined in a number
of studies [5, 6]. A framework-independent language, LegalRuleML [7, 8, 9], that maps
Deontic Defeasible Logic and other formalisms, is employed to provide a processable
source for the devised rules.
2. A technology eficiently implementing the logical apparatus. Here, we resort to Houdini,
a deontic defeasible reasoner implemented in Java [10, 11, 12].
3. A coding methodology. In this paper, a simplified methodology employed for the sole
purposes of the experiment is proposed.
4. An instrumental experimental apparatus to guarantee the correct measure of the coded
fragments. We espouse the paradigm of empirical software engineering that has been
conceived for general measures [13].</p>
      <p>To ease the presentation, we describe the translation process targeting only Deontic Defeasible
Logic, and not LegalRuleML. The steps towards such a translation are provided in [14].</p>
      <p>The rest of the paper is organised as follows: we briefly define the framework of Deontic
Defeasible Logic in Section 2; Section 3 exposes an example of translation, showing how the
translation process could be performed; Section 4 introduces some guidelines recommended to
be used during the translation process; in Section 5 we describe the experiment, its design, and
purpose; Section 6 discusses the results of the aforementioned experiment, and Section 7 takes
some brief conclusions sketching further investigations.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Introduction to Deontic Defeasible Logic</title>
      <p>The logical apparatus applied for our investigation is a deontic extension of Defeasible Logic
(DL) [15]. We start by defining the language.</p>
      <p>Let PROP be the set of propositional atoms, then the set of (plain) literals is defined as
PLit := PROP ∪ {¬ |  ∈ PROP}. The complementary of a literal  is denoted by ∼ : if  is
a positive literal  then ∼  is ¬, if  is a negative literal ¬ then ∼  is . Literals are denoted
by lower-case Roman letters. Let Lab be a set of labels to represent the names of rules, which
will be denoted as lower-case Greek letters.</p>
      <p>A defeasible theory  is a tuple (, , &gt;), where  is the set of facts (indisputable statements),
 is the rule set, and &gt; is a binary relation over .  is partitioned into three distinct sets of
rules, with diferent meanings to draw diferent “types of conclusions”. Strict rules are rules in
the classical fashion: whenever the premises are the case, so is the conclusion. Defeasible rules
represent (along which defeaters) the non-monotonic part of the logic: if the premises are the
case then typically the conclusion holds, unless we have contrary evidence that prevents us to
draw such a conclusion. Lastly, defeaters are special rules whose purpose is to prevent contrary
evidence to be the case. It follows that in DL, through defeasible rules and defeaters, we can
represent in a natural way exceptions (and exceptions to exceptions, and so forth).</p>
      <p>We finally have the superiority relation &gt; a binary relation between rules, that is the
mechanism to solve conflicts. Given two rules  and  , if we have (,  ) ∈&gt; (or simply  &gt;  ), then
in the scenario where both rules may fire (can be activated),  ’s conclusion will be preferred to
 ’s.</p>
      <p>A rule  ∈  has the form  : ( ) ⇝ ( ), where: (i)  ∈ Lab is the unique name of the
rule, (ii) ( ) ⊆ PLit is  ’s (set of) antecedents, (iii) ( ) =  ∈ PLit is its conclusion, and
(iv) ⇝ ∈ {→, ⇒, ⇝ } defines the type of rule, where: → is for strict rules, ⇒ is for defeasible
rules, and ⇝ is for defeaters.</p>
      <p>Some standard abbreviations. The set of strict rules in  is denoted by , and the set of
strict and defeasible rules by ; [] denotes the set of all rules whose conclusion is . Other
abbreviations will be adopted after the introduction of modal operators.</p>
      <p>A conclusion of  is a tagged literal with one of the following forms:
± ∆  means that  is definitely proved (resp. strictly refuted) in , i.e., there is a definite proof
for  in  (resp. a definite proof does not exist).
±  means that  is defeasibly proved (resp. defeasibly refuted) in , i.e., there is a defeasible
proof for  in  (resp. a defeasible proof does not exist).</p>
      <p>
        The definition of proof is also the standard in DL. Given a defeasible theory , a proof  of
length  in  is a finite sequence  (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ),  (2), . . . ,  () of tagged formulas of the type +∆ ,
− ∆ , +, − , where the proof conditions defined in the rest of this section hold.  (1..)
denotes the first  steps of  .
      </p>
      <p>Next, we introduce the notion of extension of a defeasible theory; informally, an extension is
everything that is derived and disproved.</p>
      <p>Definition 1 (Theory Extension). Given a defeasible theory , we define the set of positive
and negative conclusions of  as its extension:</p>
      <p>() = (+∆ , − ∆ , +, − ),
where ± # = {|  appears in  and  ⊢ ± #}, # ∈ {∆ , }.</p>
      <p>The deontic part of the logic encompasses obligations, permissions and prohibitions. A
prescriptive behaviour like “At trafic lights it is forbidden to perform a U-turn unless there is a
‘U-turn Permitted’ sign” can be formalised via the general obligation rule
and the exception through the permissive rule</p>
      <sec id="sec-2-1">
        <title>AtTrafficLight ⇒O ¬UTurn</title>
      </sec>
      <sec id="sec-2-2">
        <title>UTurnSign ⇒P UTurn.</title>
      </sec>
      <sec id="sec-2-3">
        <title>AtTrafficLight ⇒ O¬UTurn</title>
        <p>An alternative equivalent notation for permissions and obligations is to move the obligation or
permission tag into the formula as shown below.
and</p>
      </sec>
      <sec id="sec-2-4">
        <title>UTurnSign ⇒ PUTurn .</title>
        <p>The obligation to a negated literal establishes the prohibition of the literal; therefore, the above
determines the prohibition of passing on a red trafic light.</p>
        <p>While [16] discusses how to integrate strong and weak permission in Deontic Defeasible
Logic, in this paper, we restrict our attention to the notion of strong permission, namely, when
permissions are explicitly stated using permissive rules, i.e., rules whose conclusion is to be
asserted as a permission.</p>
        <p>Following the ideas of [17], obligation rules gain more expressiveness with the compensation
operator ⊗ which is used to model reparative chains of obligations. Roughly,  ⊗  means that
 is the primary obligation, but if we fail to obtain (or to comply with)  (by either not being
able to prove , or by proving ∼ ), then  becomes the new obligation in force. This operator is
used to build chains of preferences (or repair chains), called ⊗ -expressions, that are formed as
follows: (i) every plain literal is an ⊗ -expression, (ii) if  is an ⊗ -expression and  is a plain
literal, then  ⊗  is an ⊗ -expression [16]. In this paper, repair chains appear only as conclusion
of an obligaion rule. Moreover, if the conclusion ( ) of an obligation rule is a ⊗ -expression
⊗ ˆ , then (with a slight abuse of notation) we denote with ( ) also the set of literals appearing
in ⊗ ˆ</p>
        <p>We summarise proof conditions, starting with applicability and discardability, and following
the structure of proofs for constituents and deontic formulas. In the following, C denotes the
set of constitutive rules (i.e. without modal operators), P the set of permission rules, and O
the set of obligation rules. We keep the notation for [] adopted before; moreover, we extend
it to include ⊗ -expression, namely O[, ] denotes the set of obligation rules with literal  in
position  of the reparative chain.</p>
        <p>Definition 2 (Applicability). Let  = (, , &gt;) be a deontic defeasible theory. We say that
rule  ∈ C ∪ P is applicable at  ( + 1) if for all  ∈ ( )
1. if  ∈ PLit, then +C ∈  (1..),
2. if  = □ , then +□  ∈  (1..), with □
∈ {O, P},
and
3. if  = ¬□ , then − □  ∈  (1..), with □ ∈ {O, P}.</p>
        <p>We say that rule  ∈ O is applicable at index  and  ( + 1) if Conditions 1–3 above hold
4. ∀ ∈ ( ), if  &lt; , then +O ∈  (1..) and +C∼  ∈  (1..).1
Definition 3 (Discardability). Assume a deontic defeasible theory , with  = (, , &gt;). We
say that rule  ∈ C ∪ P is discarded at  ( + 1), if there exists  ∈ ( ) such that
1. if  ∈ PLit, then − C ∈  (1..), or
2. if  = □ , then − □  ∈  (1..), with □
∈ {O, P}, or
1As discussed above, we are allowed to move to the next element of an ⊗ -expression when the current element is
violated; to have a violation, we need (i) the obligation to be in force, and (ii) that its content does not hold. See [18]
for a deeper discussion.</p>
        <p>3. if  = ¬□ , then +□  ∈  (1..), with □ ∈ {O, P}.</p>
        <p>We say that rule  ∈ O is discarded at index  and  ( + 1) if either at least one of the
Conditions 1–3 above hold, or</p>
        <p>4. ∃ ∈ ( ),  &lt;  such that − O ∈  (1..), or − C∼  ∈  (1..).</p>
        <p>Note that discardability is obtained by applying the principle of strong negation to the definition
of applicability. The strong negation principle simplifies a formula by moving all negations to
an innermost position in the resulting formula, replacing the positive tags with the respective
negative tags, and the other way around see [19]. Positive proof tags ensure that there are
efective decidable procedures to build proofs; the strong negation principle guarantees that the
negative conditions provide a constructive and exhaustive method to verify that a derivation of
the given conclusion is not possible. Accordingly, condition 3 of Definition 2 allows us to state
that ¬□  holds when we have a (constructive) failure to prove  with mode □ (for obligation
or permission), thus it corresponds to a constructive version of negation as failure.</p>
        <sec id="sec-2-4-1">
          <title>Definition 4 (Constitutive Proof Conditions).</title>
          <p>
            Given a DDL theory (, , &gt;)
+C: If  ( + 1) = +C then
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )  ∈  , or
(2) (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) ∼  ̸∈  , and
(2) ∃ ∈ C[] s.t.  is appl., and
(3) ∀ ∈ C[∼ ] either
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )  is disc., or
(2) ∃ ∈ C[] s.t.
          </p>
          <p>
            (
            <xref ref-type="bibr" rid="ref1">1</xref>
            )  is appl. and
(2)  &gt;  .
          </p>
          <p>
            − C: If  ( + 1) = − C then
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )  ̸∈  and either
(2) (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) ∼  ∈  , or
(2) ∀ ∈ C[], either  is disc., or
(3) ∃ ∈ C[∼ ] such that
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )  is appl., and
(2) ∀  ∈ C[], either
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )  is disc., or
(2)  ̸&gt;  .
          </p>
          <p>A literal is defeasibly proved if: it is a fact, or there exists an applicable, defeasible rule
supporting it (such a rule cannot be a defeater), and all opposite rules are either discarded or
defeated. To prove a conclusion, not all the work has to be done by a stand-alone (applicable)
rule (the rule witnessing condition (2.2) in + ): all the applicable rules for the same conclusion
(may) contribute to defeating applicable rules for the opposite conclusion.</p>
          <p>Example 1. Let  = ( = {, , , , }, , &gt; = {(, 
 = { :  ⇒C   :  ⇒C 
), (,  )}) be a theory such that
 :  ⇒C 
 :  ⇒C ¬  :  ⇒C ¬  :  ⇒C ¬}.</p>
          <p>
            Here,  ⊢ +C , for each  ∈  and, by Condition (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) of +. Therefore, all rules except 
(which is discarded) are applicable:  is indeed discarded since no rule has  as consequent
nor  is a fact. The team defeat supporting  is made by  ,  and  ; whereas the team defeat
supporting ¬ is made by  and  . Given that  defeats  and  defeats  , we conclude that
 ⊢ +C. Note that, despise being applicable,  does not efectively contribute in proving
+C, i.e.,  without  would still prove +C.
          </p>
          <p>Suppose to change  such that both  and  are defeaters. Even if  defeats neither  nor  ,
 is now needed to prove + as Condition (2.2) requires that at least one applicable rule must
be a defeasible rule. Below we present the proof conditions for obligations.</p>
        </sec>
        <sec id="sec-2-4-2">
          <title>Definition 5 (Obligation Proof Conditions).</title>
          <p>
            +O: If  ( + 1) = +O then
∃ ∈ O[, ] s.t.
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )  is applicable at index  and
(2) ∀ ∈ O[∼ , ] ∪ P[∼ ] either
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )  is discarded (at index ), or
(2) ∃ ∈ O[, ] s.t.
          </p>
          <p>
            (
            <xref ref-type="bibr" rid="ref1">1</xref>
            )  is applicable at index  and
(2)  &gt;  .
          </p>
          <p>
            − O: If  ( + 1) = − O then
∀ ∈ O[, ] either
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )  is discarded at index , or
(2) ∃ ∈ O[∼ , ] ∪ P[∼ ] s.t.
          </p>
          <p>
            (
            <xref ref-type="bibr" rid="ref1">1</xref>
            )  is applicable (at index ), and
(2) ∀ ∈ O[, ] either
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )  is discarded at index , or
(2)  ̸&gt;  .
          </p>
          <p>Note that: (i) in Condition (2)  can be a permission rule as explicit, opposite permissions
represent exceptions to obligations, whereas  (Condition 2.2) must be an obligation rule as a
permission rule cannot reinstate an obligation, and that (ii)  may appear at diferent positions
(indices , , and ) within the three ⊗ -chains. The example below supports the intuition behind
the restriction to obligation rules in Conditions (2.2). Since they are very similar to the obligation
ones, the proof conditions for permission are omitted.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. An example of translation</title>
      <p>To illustrate DDL’s practical use, we present two coding examples outside the experimental set.
Chosen to reflect a broad range of cases within time constraints, these simple yet meaningful
examples highlight the inherent diversity in legal text length and complexity within the selected
normative framework.</p>
      <p>The first example features a legal excerpt from Italian Criminal Law, used as the translation
source. The full text provided for translation to the experimenters is available upon request. We
present the translation process for the main article and the relevant exception only. This text
appears as entry 1 in Table 1 .</p>
      <p>Art. 575 (Italian Criminal Code) - Homicide
Whoever causes the death of a person is punished
with imprisonment of no less than twenty-one years.</p>
      <p>Exceptions - Aggravating Circumstances:
The penalty of life imprisonment applies when the crime is committed:
– Using poisonous substances or other insidious means;
– With premeditation.</p>
      <p>The translation process made by the experimenters generates a Deontic Defeasible Logic
expression, adhering to the above specified guidelines.</p>
      <p>Article 575 can be translated as follows (the reported one is the actual translation by one of
the experimenters).</p>
      <p>⇒ O ∼ death ⊗ basic_punishment
basic_punishement ⇒ imprisonment := 21years</p>
      <p>For the exceptions, we adopt the principle that the translation should be on two steps. Firstly
we map the deontic rule as it would have been a primary one, and not an exception. In the
above case we shall have the translations below.</p>
      <p>poisonous_means ⇒ O ∼ death ⊗ life_imprisonment
premeditation ⇒ O ∼ death ⊗ life_imprisonment</p>
      <p>In addition to the legal background, we included case descriptions illustrating situations
where the codes might apply. Here, we present a single case related to homicide to demonstrate
the specific procedure.</p>
      <p>Brothers Alberto and Mario lived together since their parents’ death, never marrying or having
stable relationships. Alberto mistreated Mario for years, forcing him to do chores and mocking him.
After Alberto became wheelchair-bound due to illness, the abuse worsened with frequent rage and
insults. Unable to endure it any longer, Mario poisoned Alberto with arsenic, leading to his death
within three days.</p>
      <p>On request of judgment by human experimenters, this situation is translated into the
applicability of the article 575 and the aggravating circumstances of usage of poisonous means and
premeditation. In practice, the translation is a set of facts and propositional rules:</p>
      <sec id="sec-3-1">
        <title>FACTS</title>
        <p>Alberto
Mario
living_together
Alberto_mistreats_Mario
ill_Alberto
wheelchair_Alberto
rage_Alberto_Mario
Mario_buys_Arsenic
Mario_poisons_Alberto</p>
      </sec>
      <sec id="sec-3-2">
        <title>RULES</title>
        <p>Alberto, Mario, Mario_buys_Arsenic,
Mario_poisons_Alberto ⇒ Death
Alberto, Mario, Mario_buys_Arsenic,
Mario_poisons_Alberto ⇒ Poisonous_means
Alberto, Mario, Mario_buys_Arsenic,</p>
        <p>Mario_poisons_Alberto ⇒ Premeditation
Based on the facts and rules, we compute the extension of the propositional theory. Applying
the deontic rules for homicide and aggravating circumstances, we logically conclude that Mario
must be sentenced to life imprisonment.</p>
        <p>In private law, we created many definitions beyond direct deontic rules with punishments,
such as rules for coding contracts. One of the norms we coded, taken from the Italian Private
Law Code, is shown below.</p>
        <p>Art. 1470 Italian private law code
The sale is the contract that aims to transfer the ownership of a thing or the
transfer of another right in exchange for the payment of a price.</p>
        <p>Again, we have the case:</p>
        <p>Celeste saw a TV infomercial for Marvellous Blade knives, claimed to cut cobblestones and
drainpipes without damage. Skeptical but curious, she called to buy only after testing them herself.
The company, confident and eager for sales, agreed and sold her the set.</p>
        <p>The totality of experimenters classified the verbal agreement reached by Celeste and Marvellous
Blade a sale.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Coding guidelines</title>
      <p>The experimental pipeline is described in Section 5 and precisely in Figure 1. To frame our
investigation, we focused on accurately measuring the time needed for manual coding. The
experimental setup assessed both translation quality—using standard prediction performance
indicators—and coding duration. A preliminary workshop was held to train participants before
providing data and conducting the coding. We then tested both the normative background and
scenarios using an automated tool, resulting in a reliable estimate of coding time.</p>
      <p>Details on the organisation of the workshop are provided, along with the organisation of
the experiments, in Section 5. The encoding and testing methodology we introduce here is
essentially formed by three phases:
1. The normative background encoding phase. In this process, a coder transforms a legal text
into a coded version using Deontic Defeasible Logic (DDL). While LegalRuleML—supported
by existing editors with annotation tools—might simplify implementation, we use DDL
exclusively in this work. Despite its broader compatibility, LegalRuleML has drawbacks
such as lower transparency and more verbose syntax, making DDL more suitable for our
current purposes. The implemented rules shall be of four types:
– Strict rules, namely definitions, that are named, in the terminology of LegalRuleML,
constituent rules.
– Propositional defeasible rules aiming at capturing concrete aspects of the behaviour
of the reality, such as physics, biology but also very elementary social rules, without
obligations and permissions.
– Superiorities specifically, preferences between two propositional or deontic defeasible
rules indicate that when both are applicable (i.e., their antecedents are satisfied), the
preferred rule takes precedence.</p>
      <p>– Deontic Defeasible rules representing deontic operators.
2. The scenario encoding phase in which a coder maps scenarios fitting a specific legal
background into LegalRuleML (using Deontic Defeasible Logic). There are two types of
scenario coding: Private Law, which requires matching meta-rules [20, 21, 22, 23], and
Criminal Law, which uses flat Deontic Defeasible Logic with numeric constraints and
assignments to model repair chains like imprisonment time or fines. This matter has been
discussed in [10].
3. The evaluation phase involves testing the coded examples against the normative
background. The coder compares their decision on the case with the decision made by a
machine, specifically the reasoner processing both the background and scenario. If the
decisions difer, it signals a problem that requires further evaluation.</p>
      <p>At the end of the methodology, we produced a baseline implementation of the specification,
including test cases with expected outcomes. We encoded, tested, and evaluated a normative
background, with results marked as pass or review based on errors identified in specific scenarios.</p>
      <p>
        Inevitably, the methodology has some critical aspects that we need to address in the very
beginning:
– Diferences in language . Encoding a large set of norms would be impractical for a single
person, so a team approach is necessary. However, this could lead to inconsistencies,
as diferent coders might use varying terms to represent the same concepts. Even a
single coder might overlook the need to standardize terms for synonyms or antonyms. To
address this, three tools are recommended: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) a glossary with a thesaurus and archetypical
terms for synonyms and antonyms, (2) a set of guidelines to minimize variations, and (3)
a cyclic methodology where these steps are repeated until errors fall below an acceptable
threshold.
– Implicitness of defeasible rules. Coders may assume certain conditions are obvious—for
example, that the rule running is forbidden in quiet area does not apply to individuals
unable to run. However, if such exceptions (e.g., wheelchair users) aren’t explicitly
encoded, they won’t be considered. To address this, more time should be devoted to
encoding based on empirical observations, possibly by assigning this task to specific team
members.
      </p>
      <p>The recommendations we provided during the instruction workshop can be summarised in
the following points:
a) When translating a noun phrase, any adjective that specializes the noun should be
combined with it into a single positive literal. For example, divorced spouse becomes
divorced_spouse.
b) When a copulative verb (e.g., to be, to become) appears, the subject is defined by the
nominal part using a strict rule. For example, A sale is a contract is translated as sale →
contract.
c) When a verb diferent from copulative ones occurs in the sentence it has to be attached
to subject and complements to form a single literal. For instance Mario buys Arsenic
translates into Mario_buys_Arsenic
d) When a modal of obligation or permission appears, it should stay attached to the noun or
noun phrase as generated. Similarly, if a punishment follows specific behavior, a repair
chain is introduced with the punishment as a literal. This punishment literal heads a
propositional rule assigning a value for its duration. For general obligations or permissions
expressed generically, we recommend using a deontic rule without a tail. For example,
Whoever causes the death of a person is punished with no less than twenty-one years is
translated as in the basic example of Section 3.
e) Any conditional sentence—explicit or implied by verbs like "causes" or "generates"—should
be translated into a defeasible rule, possibly with deontic labels and repair chains if
punishments are involved.</p>
      <p>These guidelines are not automation rules, as automated legal translation has so far yielded
disappointing results. In practice, despite the guidelines, individual variations may still occur.</p>
    </sec>
    <sec id="sec-5">
      <title>5. An experiment in legal coding</title>
      <p>The coding example in Sec. 3 was carried out to provide a code for a legal text along with its
translation into Deontic Defeasible Logic. The translation team was intentionally composed of
non-experts in both DDL and legal matters to ensure a neutral perspective. Qualitatively, the
search for explanations proved important, but more significantly, it shaped a coding process
focused on measurable performance.</p>
      <p>This coding activity reflects observable human behavior, making it measurable with software
engineering metrics. While measuring parameters like coding time is straightforward, it is more
challenging to establish correlations with factors related to the humans involved in the process.
To investigate this aspects, we devised the following experiment whose structure is reported in
Figure 1.</p>
      <p>PHASE 1</p>
      <p>PHASE 2</p>
      <p>PHASE 3</p>
      <p>PHASE 4</p>
      <p>PHASE 5</p>
      <p>PHASE 6</p>
      <p>PHASE 7
Experimentrs</p>
      <p>Experimentrs</p>
      <p>Committee</p>
      <p>Subjects</p>
      <p>Subjects</p>
      <p>Committee</p>
      <p>Committee
TEXT
ANALYSIS
Selection of
subset
Selection of
subjects
Experiment
setting
S</p>
      <p>SELECTION
OF TEXTS
Attribution of
depth
Measure of
length</p>
      <p>SCENAROS
Creation of
scenarios</p>
      <p>FIRST
READING
Attribution of
expertise</p>
      <p>CODING
Coding of
texts
Coding of
scenarios</p>
      <p>TEST
Test
scenarios
Resultsannotation</p>
      <p>RAW DATA</p>
      <p>E
We selected eleven legal texts from the Italian penal and private codes, along with their related
legal references. Three expert jurists conducted this work to identify relevant variables for
measurement. The jurists evaluated the texts based on a depth scale of 1 to 5, where depth
1 corresponds to constitutional norms or related implementation laws, depth 2 covers laws
dependent on the constitution and related norms, and so on. The jurists agreed on a maximum
depth of 5 for the relevant norms. Texts from international treaties or conventions (e.g., the
Universal Declaration of Human Rights) are considered depth 0. Depth 1 is for constitutional
norms, depth 2 for general norms like fundamental codes, depth 3 for general laws, depth 4 for
decrees and regulations, and depth 5 for technical annexes. We selected seven of the eleven
texts to balance length growth ensuring a good sample distribution across depths, as shown in
Table 1. The length was measured by the number of characters—an objective metric unafected
by format or structure.</p>
      <p>After selecting the legal texts, we asked the experimenters to add six scenarios where the
normative background applies and to identify the resulting legal consequences. These six were
selected from a larger set of 17 scenarios previously prepared.</p>
      <p>The legal coding
We selected 30 participants—23 jurists and 7 non-jurists—for legal coding experiments. After a
workshop on Deontic Defeasible Logic basics, they produced experimental codes to assess their
coding ability and estimate required time. While some performed well, only 14—both expert
and less experienced jurists and knowledgeable non-jurists—completed the coding accurately.
Those who succeeded were then asked to code scenarios and were enrolled in the second phase
upon passing.</p>
      <p>Participants first rated their expertise on the texts using a decimal percentage scale. Since
most Italian jurists specialize in either penal or private law, we included a mix of both. Some
jurists rated themselves diferently across topics, complicating performance analysis. To balance
this, we also selected coding experts without legal backgrounds. The self-assessments are shown
in Table 2.</p>
      <p>The fourteen selected subjects individually coded all seven texts, with access to the normative
background and Deontic Defeasible Logic rules. Their work was measured by the length of
the coded output and the time taken. After coding, a verification phase allowed participants to
exchange codes, receive feedback, and make corrections. The combined time for this exchange
and re-coding was minimal and consistent across subjects, so it was included with the initial
coding time. Given its small impact and the likelihood of such consultation in practice, these
phases were not separated in performance measurements.</p>
      <p>After coding the texts, subjects were assigned to code scenarios, with evaluators randomly
selected to avoid coding scenarios for their own texts. Coders had access to both the original and
coded texts. Initially, we underestimated the time needed for scenario coding, assuming it was
negligible. However, scenario coding time depends on the complexity of the legal background
and is roughly proportional to the time required per character for encoding the background.
We adjusted the experiment accordingly to measure this time accurately.</p>
      <p>Tests on the Scenarios
The testing phase was conducted by a second group of five technology experts who used the
coded texts to test the scenarios. This phase required negligible time. Although some minor
errors in the submitted files needed correction, the extra time was minimal and expected to
decrease with improvements in the Houdini interface [10, 11, 12, 24]. Overall, this added time
amounts to about a 20% increase over the total coding time for backgrounds and scenarios.</p>
      <p>We assessed scenario analysis accuracy by linking the number of errors to each norm coding.
Subjects then received feedback and corrected the errors, with most fixes done almost
immediately. The additional time needed was roughly proportional to the initial coding time. On
average, the error rate was 6%, and recoding required about 5.8%. These figures are influenced
by the limitations of the procedure.</p>
    </sec>
    <sec id="sec-6">
      <title>6. The results of the experiment: a general analysis</title>
      <p>The experiment shows that legal coding is a reliable process, with a low error rate and
straightforward, quick corrections compared to the overall coding time.</p>
      <p>Regarding length, the coded texts in formal language (Deontic Defeasible Logic) are on
average 19% longer than the original texts (in number of characters). There is anyway, a
significant standard deviation of 9 % and a span from 2% to 33%. However, while eliminating
the two outliers of the extreme values the result is 20% with a standard deviation of only 3%.
This leads to the claim that the sample is too small to take a reasonable conclusion on the actual
distribution and therefore we should derive a preliminary acceptance of the value per se.</p>
      <p>Analyzing performance, we focused on coding time relative to text length, measured by the
number of characters—an objective metric unafected by format or structure. Two key metrics
emerged: mean and median coding time per character. Coding times ranged from 1.8 to 5.91
seconds per character, averaging 3.99 seconds with a standard deviation of 0.94. The median
time was 4.06 seconds, close to the average.</p>
      <p>Applying the Shapiro-Wilk test on the population [25], we get that the departure from
normality is below 0.01 on a 0.05 confidence interval span. Beyond validating the experimental
setup, the normal distribution—expected for continuous variables—shows a relatively narrow
spread around the mean. This indicates that the average coding time is a reliable predictor, with
some adjustments to be discussed later. Since the average page contains 1800 characters, we
can presuppose that the time required to code a page is roughly two hours (spanning from a
minimum of 54 minutes to a maximum of 3h57’).</p>
      <p>As discussed in Section 5, having collected data regarding expertise and depth we can now
look at these values, aiming to understand whether there is an influence of the expertise in the
delivery time.</p>
      <p>If we measure the correlation index between the time per character with the self-valued
expertise, then we have a value of -28.67% compatible with the hypothesis of a weak inverse
linear correlation with the expertise; namely experts code slightly quicker than non-experts.
With a class partition based on decimals, we can also figure this out as reported in Figure 2,
where the time of coding per character on average by classes of expertise from 0 to 1 is exposed.</p>
      <p>A qualitative analysis of this graphical representation gives two hints:
– Up to the expertise of 0.8 there is a rather evident negative correlation of the time required
to code, that improves moderately while growing the expertise;
– Once we pass the 0.8, although performance worsens on that scale, a negative correlation
remains. As often seen in self-evaluations, less experienced individuals tend to rate
themselves higher—a tendency known as the Dunning-Kruger efect. This may have led
the most expert participants to underrate themselves, influencing the observed results.</p>
      <p>Regarding, on the other hand, the relation between depth and time to code, there is again a
very similar negative correlation, to -28.71% which in turn explains itself by the higher dificulty
of a text with higher depth.</p>
      <p>Regarding bias analysis, we have to consider the following aspects:
1. Regarding sample size, a long-term study tracking years of practical coding would provide
stronger validation. However, expecting hundreds of subjects and thousands of hours
of observation to eliminate bias is unrealistic. The value of this investigation lies in the
method itself, and we believe we have a solid foundation.
2. Self-estimating expertise is challenging and prone to bias. We believe pursuing complex,
double-blind evaluations isn’t worthwhile, as assessing expertise is highly subjective and
such methods would likely hinder success.
3. Language and normative background inevitably influence the test. It’s nearly impossible
to design a test—such as one measuring coding time—that is entirely independent of
these factors. However, the robustness of the chosen approach allows it to be adapted for
diferent languages and normative contexts.
4. Texts on taxes, pensions, or economic topics often involve non-trivial computational
content. However, if this content is limited to a few pages, its impact on coding time
is minimal. If it’s spread throughout the text, its efect is evenly distributed and can be
captured by a suitable coeficient.</p>
      <p>Overall, we propose that, with few exceptions, coding time scales linearly with text length. For
suficiently long texts, the time required depends mainly on length, through a coeficient linked
to the text’s nature rather than its specific content. Further empirical validation is needed to
support this hypothesis.</p>
      <p>A further analysis regards the variability factors of the translation operations. Although we
provided, during the preparatory workshop mentioned above, a general methodology with
some guidelines as described in Section 4, there are factors inevitably generating variability:
1. Natural language expresses modality in many varied and hard-to-classify ways, including
modal verbs and expressions like "is entitled to," "has the duty of," or "is responsible for."
These forms can be interpreted diferently depending on context.
2. When a text is extended with related material, as in the first step of the methodology
described above.
3. In some cases, multiple adjectives applied to a noun can lead to ambiguity—either they
combine into a single noun phrase or form two distinct descriptive phrases.</p>
      <p>The error rate is relatively low compared to similar experimental settings. This is partly due
to a bias in phases 6 and 7, where the same participants who coded the norms also coded the
scenarios for synchronization, reducing potential inconsistencies. If diferent individuals coded
the scenarios and the normative background separately—even with prior synchronization—the
error rate would likely increase. However, with clear guidelines, a long-term convergence
process could eventually ensure consistent performance.
6.1. A scenario with the Italian Criminal Code
Based on the estimate of 4 seconds per character, we can roughly gauge the coding efort to
translate the Italian Criminal Code. Although not the actual elapsed time, it serves as a useful
preliminary measure.</p>
      <p>Let us start with some data that shall help us in both of the activities defined above. In Table
3 we report raw data on the Italian Criminal Code.</p>
      <p>Length
Number of sections (libri)
Number of subsections (titoli)
Number of articles
Typical depth
Interference with other codes
435,939 characters
3
8, 14, 4
832
1, 2</p>
      <p>Criminal procedure code, Italian Constitution</p>
      <p>Using the methodology, we estimate a total efort of 485 hours to encode the background alone.
Expert opinions from the workshop suggest that scenarios are at least twice as long as the
background. There is also a large collection of such cases available for reference (for instance, the UK
National Archives have a Section dedicated to this topic -
https://www.nationalarchives.gov.uk/helpwith-your-research/research-guides/criminal-court-cases-an-overview/), so the efort would
consist of finding right cases that could be estimated in a further 100% of the total. Overall, we
can make the following estimate:
– Time to code: 485 hours;
– Time to retrieve scenarios: 485 hours;
– Time to code scenarios: 485*3=1455 hours;
– Time to test: (485+1455)*0,2=388 hours.</p>
      <p>The total time required is 2813 hours. If one person handles both coding and testing, this
amounts to roughly 23 person-months. The highest feasible parallelization is by subsections—26
in total. Assuming equal length, this could reduce the efort to about one month. However, true
parallelization is unlikely for three main reasons:
– We need to forecast time to sort discrepancies. This shall account to a very long excess,
that is very dificult to determine a priori, without further data.
– Better insight into internal dependencies is needed to split the work, as this division
might unpredictably increase workload in other phases.
– The testing phase will result more complicated if conducted by separate teams in complex
cases.
– Some additional time could be required to properly connect the formal translation to
other IT tools.</p>
      <p>Many criminal case descriptions aren’t schematic, causing longer coding times and harder
test predictions. Complex cases involving multiple behaviors or subjects further increase coding
dificulty and scenario length.</p>
      <p>Despite these challenges, formalizing the entire Criminal Code has notable benefits: 1) its
polynomial computational cost allows for full translation without the need to extract specific
subparts, and 2) if extraction is needed, the atom dependency graph makes isolating the required
rules easier.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions and further developments</title>
      <p>The experiment has given important hints regarding any future process of translation of the
law based on scientific ground and solid engineering methodologies.</p>
      <p>First of all, it is clear that we have by no means a technique that can be used to predict the
required coding time for single pieces of the law. The ability to measure coding time relies on
large amounts of texts and numerous translators, since the individual variance is very significant;
and the mean time per character, computed as a result of the investigation, only makes sense on
a large base. Secondly, the large variance referred above is mainly individual, depending on the
expertise of the translator more than on the depth (in some sense, a measure of the complexity
of references) of the text.</p>
      <p>We are integrating the results of this paper with a further study using segregation methods,
particularly unsupervised clustering, to divide a legal text corpus into subcorpora. Each
subcorpus can be coded by a subset of the encoding team, minimizing overlap and allowing for
better consensus on the language token set. This reduces individual variance and makes the
translation efort feasible by involving a larger number of people.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This project is co-funded by the European Union - NextGenerationEU under the National
Recovery and Resilience Plan (PNRR) - Mission 4 Education and research - Component 2 From
research to business - Investment 1.1 Notice PRIN 2022 (DD N. 104 del 02/02/2022), title Smart
Legal Order in DigiTal Society (SLOTS), proposal code 2022LRL2C2 - CUP J53D23005610006.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT in order to: Paraphrase and
reword, Grammar and spelling check. After using this tool, the authors reviewed and edited the
content as needed and take full responsibility for the publication’s content.
[2] A. Huggins, M. Burdon, A. Witt, N. Suzor, Digitising legislation: connecting regulatory
mind-sets and constitutional values, Law, Innovation and Technology 14 (2022) 325–354.
doi:10.1080/17579961.2022.2113670.
[3] A. Witt, A. Huggins, G. Governatori, J. Buckley, Encoding legislation: a methodology
for enhancing technical validation, legal alignment and interdisciplinarity, Artificial
Intelligence and Law 32 (2024) 293–324. doi:10.1007/s10506-023-09350-1.
[4] S. Batsakis, G. Baryannis, G. Governatori, I. Tachmazidis, G. Antoniou, Legal representation
and reasoning in practice: A critical comparison, Frontiers in Artificial Intelligence and
Applications 313 (2018) 31 – 40. URL: https://www.scopus.com/inward/record.uri?eid=2-s
2.0-85059635121&amp;doi=10.3233%2f978-1-61499-935-5-31&amp;partnerID=40&amp;md5=f035ae481
093fb6db46378d00c74c70e. doi:10.3233/978-1-61499-935-5-31.
[5] G. Governatori, Practical normative reasoning with defeasible deontic logic, Lecture
Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence
and Lecture Notes in Bioinformatics) 11078 LNCS (2018) 1–25. doi:10.1007/978-3-030
-00338-8_1.
[6] G. Governatori, A. Rotolo, E. Calardo, Possible world semantics for defeasible deontic
logic, LNCS (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes
in Bioinformatics) 7393 LNAI (2012) 46–60. doi:10.1007/978-3-642-31570-1_4.
[7] T. Athan, H. Boley, G. Governatori, M. Palmirani, A. Paschke, A. Wyner, Oasis legalruleml,
2013, p. 3 – 12. doi:10.1145/2514601.2514603.
[8] T. Athan, G. Governatori, M. Palmirani, A. Paschke, A. Wyner, Legalruleml: Design
principles and foundations, LNCS (including subseries Lecture Notes in Artificial Intelligence
and Lecture Notes in Bioinformatics) 9203 (2015) 151 – 188. doi:10.1007/978-3-319-2
1768-0_6.
[9] M. Palmirani, G. Governatori, A. Rotolo, S. Tabet, H. Boley, A. Paschke, Legalruleml:
Xml-based rules and norms, Lecture Notes in Computer Science (including subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7018 LNCS
(2011) 298 – 312. doi:10.1007/978-3-642-24908-2_30.
[10] M. Cristani, G. Governatori, F. Olivieri, L. Pasetto, F. Tubini, C. Veronese, A. Villa, E. Zorzi,
Houdini (unchained): an efective reasoner for defeasible logic, in: CEUR Workshop
Proceedings, volume 3354, 2022.
[11] M. Cristani, G. Governatori, F. Olivieri, L. Pasetto, F. Tubini, C. Veronese, A. Villa, E. Zorzi,
The architecture of a reasoning system for defeasible deontic logic, in: Procedia Computer
Science, 2023, pp. 4214–4224. URL: https://www.scopus.com/inward/record.uri?eid=2-s2.
0-85183536153&amp;doi=10.1016%2fj.procs.2023.10.418&amp;partnerID=40&amp;md5=a2cfea0700fbe0
1540e05c2b80d4215b. doi:10.1016/j.procs.2023.10.418, vol. 225.
[12] M. Cristani, F. Olivieri, G. Governatori, G. Buriola, Simulating the law in a multi-agent
system, in: CEUR Workshop Proc., 2024, pp. 217–232. URL: https://www.scopus.com/inw
ard/record.uri?eid=2-s2.0-85200109297&amp;partnerID=40&amp;md5=6443fb0e73d56e6f0b6530e3
18cc1daf, vol. 3735.
[13] T. Moher, G. Schneider, Methodology and experimental research in software engineering,
International Journal of Man-Machine Studies 16 (1982) 65 – 87. URL: https://www.
scopus.com/inward/record.uri?eid=2-s2.0-0020085901&amp;doi=10.1016%2f S0020-7
373%2882%2980072-2&amp;partnerID=40&amp;md5=f1160d8d72c6a3b9bf eb18a7f0f3e554.
doi:10.1016/S0020-7373(82)80072-2.
[14] M. Palmirani, M. Martoni, A. Rossi, C. Bartolini, L. Robaldo, Legal ontology for modelling
gdpr concepts and norms, Frontiers in Artificial Intelligence and Applications 313 (2018)
91 – 100. doi:10.3233/978-1-61499-935-5-91.
[15] G. Antoniou, D. Billington, G. Governatori, M. Maher, Representation results for defeasible
logic, ACM Tran. on Comp. Logic 2 (2001) 255–287. doi:10.1145/371316.371517.
[16] G. Governatori, F. Olivieri, A. Rotolo, S. Scannapieco, Computing strong and weak
permissions in defeasible logic, J. Philos. Log. 42 (2013) 799–829. URL: https://doi.or
g/10.1007/s10992-013-9295-1. doi:10.1007/s10992-013-9295-1.
[17] G. Governatori, A. Rotolo, Logic of violations: A gentzen system for reasoning with
contrary-to-duty obligations, Australasian Journal of Logic 4 (2006) 193–215. URL: http:
//ojs.victoria.ac.nz/ajl/article/view/1780.
[18] G. Governatori, Burden of compliance and burden of violations, in: A. Rotolo (Ed.), 28th
JURIX on Legal Knowledge and Information Systems, Frontieres in Artificial Intelligence
and Applications, IOS Press, Amsterdam, 2015, pp. 31–40.
[19] G. Governatori, V. Padmanabhan, A. Rotolo, A. Sattar, A defeasible logic for modelling
policy-based intentions and motivational attitudes, Log. J. IGPL 17 (2009) 227–265. doi:10
.1093/jigpal/jzp006.
[20] G. Governatori, F. Olivieri, A. Rotolo, A. Sattar, M. Cristani, Computing private
international law, Frontiers in Artificial Intelligence and App. (2021). doi: 10.3233/FAIA210334.
[21] A. Malerba, A. Rotolo, G. Governatori, A logic for the interpretation of private international
law, Logic, Argumentation and Reasoning (2022). doi:10.1007/978-3-030-70084-3_7.
[22] F. Olivieri, G. Governatori, M. Cristani, A. Rotolo, A. Sattar, Deontic meta-rules, Journal
of Logic and Computation 34 (2024) 261 – 314. doi:10.1093/logcom/exac081.
[23] A. Rotolo, G. Sartor, Logical models for private international law, 2024. doi:10.1093/os
o/9780192858771.003.0006.
[24] L. Pasetto, M. Cristani, G. Governatori, F. Olivieri, E. Zorzi, Extraction of defeasible proofs
as explanations, in: CEUR Workshop Proceedings, 2023. Vol. 3546.
[25] S. Shapiro, M. Wilk, H. Chen, A comparative study of various tests for normality, J. of the
American Statistical Association 63 (1968) 1343 – 1372. doi:10.1080/01621459.1968.
10480932.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Godfrey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Burdon</surname>
          </string-name>
          ,
          <article-title>Fidelity in legal coding: applying legal translation frameworks to address interpretive challenges</article-title>
          ,
          <source>Information and Communications Technology Law</source>
          <volume>33</volume>
          (
          <year>2024</year>
          )
          <fpage>153</fpage>
          -
          <lpage>176</lpage>
          . URL: https://www.scopus.com/inward/record.uri?eid=
          <fpage>2</fpage>
          -
          <lpage>s2</lpage>
          .
          <fpage>0</fpage>
          -
          <lpage>851855040</lpage>
          76&amp;doi=10.1080%
          <fpage>2f13600834</fpage>
          .
          <year>2024</year>
          .
          <volume>2312620</volume>
          &amp;
          <source>partnerID=40&amp;md5=efe26ddc13ea30459027 c056eab70cb8. doi:10.1080/13600834</source>
          .
          <year>2024</year>
          .
          <volume>2312620</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>