Explainability of Formal Models of
     Argumentation Applied to Legal Domain
                                Michał ARASZKIEWICZa,1
       a
           Department of Legal Theory, Jagiellonian University in Kraków, Poland
                                  Grzegorz J. NALEPAb
                b
                  Jagiellonian University, AGH University, Kraków, Poland


             Abstract. This paper deals with the problems of explainability of argumentation
             models applied to legal domains. The systems based on the theory of Argumentation
             Frameworks may be in principle fruitfully applied as explanation tools for AI
             systems enhanced with Machine Learning mechanisms. However, Argumentation
             Frameworks - as a quickly evolving and diversified field – should be tested with
             regard to their own explainability. In this paper we provide a set of criteria and we
             outline a testing procedure towards this goal.

             Keywords. Argumentation frameworks, explainability, hybrid systems, legal
             reasoning.

             .


1. Introduction

Explainability has recently arose an important feature of AI systems. While black-box
models are nothing new in AI, they have been rapidly growing in popularity. This is due
to the fact to the new methods of building them based on big-data in many domains.
Today however, these models play an important role in some very sensitive applications
of AI systems. This includes medical ones, but also and mainly legal ones. In fact, it
could be argued that almost any use of AI could eventually have certain legal
implications. This concerns in particular the impact on the subjects’ privacy [21].
Therefore, while discussing explainability, we should consider not only the use of AI in
legal domain as such, but also the legal dimension of the use of AI as such. Besides its
technical merits, AI systems have different limitations regarding their societal
acceptance. This is mainly due to the limited trust people have in their operation. Legal
analysis of requirements of AI systems can contribute to the building of this trust. In this
paper, we do not discuss the legal framework of the application of AI systems. Instead,
we investigate the possibility of increasing explainability of AI based decisions via
computational argumentation. By definition, argumentation is a rational process of
posing reasons for and against a given position is order to choose such conclusion that is
best justified in the light of available reasons [comprehensive elaboration of the state of
the art: 22]. Argumentation frameworks are a fine example of explainable AI techniques.

 1
            Corresponding Author. The writing of this paper was supported by the project K/DSC/004874.
However, they might play an even greater role in the future in explanation generation in
hybrid AI systems. Such systems combine black-box models (such as artificial neural
networks) with additional explanation facilities (e.g. decision trees). This approach is
promising, because it may perform a double role: not only contribute to the explanation
of the system’s decision as regards the merit, but also analyse its legal implications by
providing a reasoning in terms of the decision’s acceptability with regard to appropriate
legal regulation. In specific cases, the two functions may conflate, in particular where an
AI system reasons precisely with the scope of right to privacy or right to explanation [4].
The debate on explainability of AI should therefore take place in the domain of AI and
Law research, also because of the growing significance of text analytics and machine
learning approach in this field [6]. In particular, machine learning mechanisms are
developed to predict the outcome of a legal case (construed broadly; a case in this context
means not only a case heard before a court/jury, but any task that involves legal reasoning,
such as assessment of a given contractual provision etc.). However in legal context we
are particularly interested in receiving a sound justification of the solution to a given
problem. A result produced by a learning algorithm may be assessed as correct or
accurate, but such result may be reached by accident. There is also a greet degree of bias-
related risk in legal reasoning systems. Therefore it is necessary to relate the quantitative
and the argumentation-based research on legal argumentation to enhance the
transparency of the former. The recent contributions to the domain of transparency of
recommending systems [29] provide a basis for analogical investigations in the field of
law. Argumentation systems are broadly perceived as a natural candidate for building
explanation models for AI. However, before argumentation formalisms are applied to
explain the results provided by ML algorithms in the field of law, they should themselves
be evaluated with regard to their explainability.
The structure of investigations is as follows. In Section 2 we provide a brief review of
the current state of art in the field of computational argumentation and the application of
its models to the sphere of law. In Section 3 we develop a scheme for assessment of
transparency of argumentation-based models, thus proposing an operationalization of the
notion of explainability in this context. Section 4 concludes.


2. Formal and Computational Models of Argumentation and their Applications in
     Legal Domain


Computational models of argumentation have been intensively investigated since early
1990s, however important earlier work was done also earlier, in 1980s in connection with
the rapid growth of interest in the topic of nonmonotonic reasoning modelling. Perhaps
the dominant paradigm in the field was started by Dung, who argued for a theory of
Abstract Argumentation Frameworks (AAFs) [16]. The idea behind this formalism is
simple: certain pieces of information (referred to as arguments) are related means of a
binary attack relation. This simple set of primitive concepts enabled Dung and scholars
developing this approach to define a set of methods (called semantics) that produce
certain sets of arguments (called extensions) which represent those subsets of initially
given information that are “justified” in the light of the total set of available information.
The formal properties of argumentation frameworks, including complexity features, has
been intensively investigated. The characteristic feature of this approach is the existence
of different semantics that in certain cases reflect clear intuitions (like in case of
grounded or preferred semantics which respectively correspond to the attitude of a
sceptical or credulous person). The methodological status of reasoning modelling by
means of argumentation frameworks is debatable. In particular, it is claimed that they
simulate human ability to solve problems in intelligent manner [13]. We are of the
opinion that this feature strongly depends on the resemblance between reasoning
structures in given argumentation framework and those found in reasoning expressed in
natural language; and the degree of the said resemblance depends both on the modelled
problem and the conceptual richness of a given argumentation frameworks. Simple, early
AAFs do not meet this criterion for many categories of problems. This is one of the
factors that led to numerous developments in the field (preserving abstract character of
the notion of argument):
     • Introduction of a second type of relation between arguments: the relation of
          support, thus leading to development of bipolar argumentation frameworks [3,
          14], recently, a third type of relation (neutralisation) was proposed to develop
          tripolar AFs [29];
     • Adding the element of values to AAFs thus developing preference-based and
          value-based argumentation frameworks [7, 8]
     • Introducing the elements of acceptance conditions attached to the elements of
          reasoning (Abstract Dialectical Frameworks) [12];
     • Generalizing the approaches referring to the strength of attacks into the notion
          of Weighted Argumentation Frameworks [17];
     • Allowing group attack relations: joint attacks of certain arguments on other
          arguments [25];
     • Developing a system of multi-level attacks: allowing attacks not only on
          arguments, but also on attack relations [24];
     • Including the intuitions concerning gradual acceptability of an argument in the
          framework [15];
     • Formalizing algebraic operators enhancing reasoning with labels on arguments
          [13].
Apparently, the on-going tendency consists, first, in extending argumentation
frameworks to encompass, as parts of defined vocabulary, certain elements that are
explicitly present in argumentation expressed in a given language; second, removing
constraints on the possible relations between elements and the types of those elements
and third, introducing complexity into the method of acceptance of given sets of
arguments, taking into account the intuitions following from the real-life reasoning.
However, the inherent limitation of AAFs is that they do not enable to account for the
structure of arguments and its role in the process of argumentation. This facet is perhaps
the most counterintuitive elements of AAFs theory, for argumentation is naturally
accounted for posing reasons (premises) to support or attack a given conclusion. These
elements are represented in models of structures argumentation such as ASPIC+ [27] or
Carneades [18]. The important feature of these formalisms is the possibility to represent
different types of attack on an argument, namely:
     • undermining: attack on a premise of an argument,
     • undercutting: attack on a relation between premise(s) and conclusion of an
          argument, and
     • rebuttal: attack on the conclusion of an argument.
This feature of structured argumentation models makes them undoubtedly more
resembling to argumentation expressed in natural language. The two mentioned models
make use of different formalisms, but it was shown that it is possible to translate
Carneades into ASPIC+ [23]. However, formal translatability of certain parts of given
formalisms does not mean that they are similarly explainable.
As a matter of course, the above examples of argumentation formalisms do not exhaust
the catalogue of the existing approaches. We concentrate on the mainstream, Dungean
approach and related research, because of its wide acceptance and extensive elaboration
in the argumentation community.
So far, no comprehensive, systematic study has been made into testing the different
formal models of argumentation on the basis of instances of legal reasoning. The
evaluation of the existing approaches is being done in distributed manner: the authors of
a particular model discuss its features on the basis of analysed cases. For instance, one
can enumerate the following contributions to the state of the art:
     • application of Abstract Dialectical Frameworks to represent case law [1]
     • application of ASPIC+ to model reasoning with legal cases [11]
     • application of value-based argumentation model to case-based reasoning [9]
     • a special issue of Artificial Intelligence and Law journal devoted to modeling
          one case: Popov v. Hayashi, by means of four different formalisms, including
          structured argumentation framework with Dungean semantics [10, 19, 28, 31].
However, even in the latter case the formalisms were not compared and evaluated with
regard to their transparency, with an intention to increase the transparency of machine
learning model. This is not necessarily an objection, because the issue of explainability
has only recently become a grand topic in AI and the related research in the legal context
is on its preliminary stage. However, the existing gap concerning the assessment of
explainability of these models should be filled..


3. Towards Systematic Research on Explainability of Argumentation Formalisms
     Applied to Legal Domain

The notion of explainability is vague and multi-faceted; therefore the typical approach
to this problems consists in adoption of certain measurable criteria (both on the side of
the user and on the side of the system) that may be verified in experimental research.
It is first necessary to delineate the class of legal problems against which the
explainability of the models should be tested. In our opinion, the most important context
is that of judicial application of law, not only because it constitutes the most investigated
sphere of legal argumentation, but also because taking into account the potential use of
argumentation formalism to explain the functioning of machine learning enhanced
predictive models.
The judicial application of statutory law consists of five (interrelated) phases: (1)
determination of validity of a norm that is potentially applicable to the current state of
affairs; (2) legal fact-finding: stablishing the facts of the case in the process of proof; (3)
solving the problems of interpretation of a legal rule; (4) subsumption – determining
whether current fact situation qualifies as an instance of the rule’s condition and (5)
determining the legal consequences of the rule’s application [30]. In legal practice, these
phases are mutually interrelated, but for the sake of model development it is convenient
to consider them in separation. The tasks solved on each of these stages are naturally
modelled as argumentative problems. Similar point should be made with regard to
application of precedents in Anglo-American law. The process of application of case law
may be subdivided into the following stages: (1) initial characterization of the current
state of affairs with regard to its legally relevant features; (2) retrieval of precedents that
match the characterization of the case at bar; (3) assessing the similarities and
dissimilarities between the current case and the retrieved cases; (4) application of
distinguishing argumentation and assessment of counterexamples and (5) determining
the outcome in the case at bar. In knowledge-based AI and law research these tasks are
best captured with arguments based on knowledge representation structures such as
scalable dimensions [5] or binary factors [2], recently extended by the concept of
magnitudes [20]. In both legal cultures argumentation on each stage may involve
reasoning with values.
The reasoning operations on each stage of any model of application of law may be
accounted for as:
     • resolving a classification problem (e.g. determining whether a given state of
          affairs falls under the conditions of a rule or is within the scope of a precedent);
     • comparing certain object with regard to certain parameters (as in any case of
          value balancing, or in case of case comparison with the use of factors);
     • assigning consequences to the result of classification or comparison (e.g. by
          mean od deductive reasoning, defeasible reasoning, analogical reasoning etc.).
Therefore, a good explainable model of legal reasoning should be able to provide
justification to the user with regard to the following issues:
     • whether a certain object is subsumed under a certain category and why (e.g. on
          the basis of semantic considerations, prior labeling, etc.);
     • what scale (metric) is applied to characterize objects that are subject to
          comparison, what are the values of parameters of each object on this scale, and
          why;
     • how are the consequences assigned to a certain classificatory decision or a result
          of comparison, and why; it should be noted that each classificatory decision or
          result of comparison may have its default consequences which eventually may
          be trumped by other considerations, for instance following from value-based
          reasoning.
Let us now consider how argumentation formalisms should be investigated and assessed
with regard to the realization of explainability of reasoning and results in legal domain.
It should be stressed that this assessment may be performed at three levels of generality
at least:
     • the level of the argumentation framework as a whole; here, in particular, we
          may investigate whether the basic conceptual scheme of a given system is
          sufficient to capture the relevant elements of reasoning;
     • the level of application of certain argumentation semantics; even if we agree
          that the basic conceptual scheme is appropriate for modeling legal reasoning,
          we may discuss whether a given semantics application is appropriate, for
          instance with regard to correctness of the results;
     • the level of modeling of a concrete reasoning; on this level we may for instance
          investigate whether the elements of reasoning expressed in natural language are
          properly transposed into the elements of the framework.
The problem of criteria of explainability of AI systems has already attracted broad
attention in the community. The discussion of these criteria takes place first and foremost
in the context of evaluation of systems based on statistical methods and enhanced with
learning mechanisms. From the point of view of those criteria Argumentation
Frameworks are explainable by definition, because they make use of explicit knowledge
and reasoning patterns: there is no need to transpose the quantitative reasoning into
qualitative argumentation, because we already begin with the latter. However, the current
developments in the theory of AFs, enhancing their expressive power, at the same time
consists in introduction of more and more complicated logical and mathematical tools,
thus decreasing the transparency of elements of knowledge bases and reasoning patterns.
Thus it is worthwhile to recall a part of the classic set of requirements that were discussed
in earlier AI and law work in connection with representation of rules and exceptions in
defeasible logic systems [26], applied accordingly to the problem of modeling legal
argumentation:
      • structural resemblance: preserving the structure of knowledge units and
          argumentation with regard to natural language expressions;
      • modularity: formalizing parts of the domain without taking into account the
          whole domain at the same time (practically important for validation and
          maintenance, but also increasing explainability because of the limited capacities
          of actual user to handle too much information at one time);
      • expressiveness: the formalization should be able to capture all distinctions that
          are important in natural language reasoning.
We think that the above criteria may be fruitfully applied as criteria of explainability of
models of legal reasoning based on Argumentation Frameworks. However, we think that
it is necessary to add another important criterion:
      • substantial resemblance: the reasons that justify certain conclusions should be
          identical or at least significantly similar to those accepted by an experienced
          expert. In other words: not only the structure of reasoning, but also its merit
          (content) should be in a certain similarity relation between natural language
          reasoning and reasoning represented in an AF.
The substantial resemblance requirement is more important for explainability that the
standard criterion of accuracy of result broadly adopted in ML-enhanced systems. As
discussed above, a correct answer may be yielded by a system by accident, or through
the flawed, fallacious reasoning, also on the basis of distorted or false data. The
explainable system should be ready to answer the why-questions in a matter similar to
the expert user.
Taking into account the set of criteria we may outline the process of testing explainability
of argumentation models for legal domain.
      • the choice of use case (UC);
      • formalizing the knowledge elements present in the UC with the tested
          argumentation formalisms;
      • enabling the systems to generate the conclusion;
      • comparison of the conclusions generated by systems to the ones adopted by
          expert users;
      • careful investigation of each step of performance of the system with regard to
          the adopted criteria, with an applied scale (such as Likert scale or another);
      • evaluation of results and development of sets of postulates with regard to: (1)
          applicability of a given Argumentation Framework to legal modeling reasoning;
          (2) applicability of a given semantics and (3) formalization of a concrete case.
Several a priori hypotheses can be made at this point, with a reservation that they may
be falsified in the course of experimental work. First, structured argumentation systems
should be assessed higher than abstract argumentation frameworks, because legal
reasoning essentially involves the analysis of relation between premises and conclusions
of an argument, and not only the analysis of conflicts between arguments. However, it
should be noted on the contrary that AAFs enable to map the notion of “argument” to an
element of natural language argumentat (and not to a given argument taken as a whole),
which may decrease the importance of this drawback. Second, the formalism should
enable representation not only of attack relations, but also support relations; this
hypothesis favors bipolar over classical argumentation frameworks. Third, taking into
account the expressiveness, structural resemblance and substantial resemblance as
criteria, and the role of value judgments in legal reasoning on the other hand, the testing
should presumably favor the formalisms that used the notion of values explicitly. Fourth,
because in legal reasoning we use elements that follow from different sources and that
the “pedigree” of elements is an important factor, the argumentation framework should
enable some labeling to express this aspect. Fifth, as comparison of objects (in particular
weighing of values) involves scalable reasoning, the frameworks that enable weighted
relations or gradual attacks will be preferred by default. However, the choice of proper
scale and metrics is a complicated issue: too fine-grained scale may decrease
explainability.


4. Conclusions


The current developments of AI systems have more and more influence on the life of
individuals and societies. The problem of explainability of decisions made with the
support of those systems as well as procedures they are based on has become an
important social and legal issue. Argumentation Frameworks may in principle be
fruitfully used as tools of explanation of the AI systems’ operation. However, AFs
themselves, as models developed in a complex, diversified and quickly evolving fieldof
research, should themselves be tested with regard to their transparency and explainability.
In this contribution we have outlined a general procedure for such testing, subject to
future development.


5. References

[1] Latifa Al-Abdulkarim, Katie Atkinson, Trevor J. M. Bench-Capon: A methodology for designing systems
     to reason with legal cases using Abstract Dialectical Frameworks. Artif. Intell. Law 24(1): 1-49 (2016)
[2] Vincent Aleven 1997. Teaching Case-Based Argumentation Through a Model and Examples, PhD
     Dissertation, Teaching Case-Based Argumentation Intelligent Systems Program, University of Pittsburgh
[3] Leila Amgoud, Claudette Cayrol, Marie-Christine Lagasquie-Schiex 2004. On the bipolarity in
     argumentation frameworks. NMR 2004: 1-9
[4] Michał Araszkiewicz. 2018. Sztuczna Inteligencja i prawo do wyjaśnienia (Artificial Intelligence and the
     right to explanation), Trzeci Sektor IV/2018, forthcoming.
[5] Kevin Ashley. Modeling legal argument. Reasoning with Cases and Hypotheticals. MIT Press, Cambridge:
     Mass., 1990.
[6] Kevin Ashley, 2017. Artificial Intelligence and Legal Analytics. New Tools for the Law Practice in the
     Digital Age, Cambridge: Cambridge University Press
[7] Trevor J. M. Bench-Capon: Value-based argumentation frameworks. NMR 2002: 443-454
[8] Trevor J. M. Bench-Capon: Persuasion in Practical Argument Using Value-based Argumentation
     Frameworks. J. Log. Comput. 13(3): 429-448 (2003)
[9] Trevor J. M. Bench-Capon, Katie Atkinson, Alison Chorley: Persuasion and Value in Legal Argument. J.
     Log. Comput. 15(6): 1075-1097 (2005)
[10] Trevor J. M. Bench-Capon: Representing Popov v Hayashi with dimensions and factors. Artif. Intell.
     Law 2012 20(1): 15-35
[11] Trevor J. M. Bench-Capon, Henry Prakken, Adam Z. Wyner, Katie Atkinson: Argument schemes for
     reasoning with legal cases using values. ICAIL 2013: 13-22
[12] Gerhard Brewka, Stefan Woltran: Abstract Dialectical Frameworks. KR 2010
[13] Maximiliano Celmo Budán, Gerardo I. Simari, Ignacio Darío Viglizzo, Guillermo Ricardo Simari: An
     approach to characterize graded entailment of arguments through a label-based framework. Int. J. Approx.
     Reasoning 82: 242-269 (2017)
[14] Claudette Cayrol, Marie-Christine Lagasquie-Schiex 2005. On the Acceptability of Arguments in Bipolar
     Argumentation Frameworks. ECSQARU 2005: 378-389
[15] Claudette Cayrol, Marie-Christine Lagasquie-Schiex: Graduality in Argumentation. J. Artif. Intell. Res.
     23: 245-297 (2005)
[16] Phan Minh Dung, 1995. On the Acceptability of Arguments and its Fundamental Role in Nonmonotonic
     Reasoning, Logic Programming and n-Person Games. Artif. Intell. 77(2): 321-358
[17] Paul E. Dunne, Anthony Hunter, Peter McBurney, Simon Parsons, Michael Wooldridge: Weighted
     argument systems: Basic definitions, algorithms, and complexity results. Artif. Intell. 175(2): 457-486
     (2011)
[18] Thomas F. Gordon, Douglas Walton: The Carneades Argumentation Framework - Using Presumptions
     and Exceptions to Model Critical Questions. COMMA 2006: 195-207
[19] Thomas F. Gordon, Douglas Walton: A Carneades reconstruction of Popov v Hayashi. Artif. Intell. Law
     20(1): 37-56
[20] John F. Horty: Reasoning with dimensions and magnitudes. ICAIL 2017: 109-118
[21] Edwards, Lilian, Veale, Michael. 2017. Slave to the Algorithm? Why a ‘Right to an Explanation’ is
     Probably not the Remedy You Were Looking For. Duke Law and Technology Review, 16: 18–84.
[22] Frans H. van Eemeren, Bart Garssen, Erik C. W. Krabbe, A. Francisca Snoeck Henkemans, Bart Verheij,
     Jean H. M. Wagemans. 2014. Handbook of Argumentation Theory. Dordrecht: Springer
     Science+Business Media.
[23] Bas van Gijzel, Henry Prakken: Relating Carneades with Abstract Argumentation. IJCAI 2011: 1113-
     1119
[24] Sanjay Modgil, Trevor J. M. Bench-Capon: Metalevel argumentation. J. Log. Comput. 21(6): 959-1003
     (2011)
[25] Søren Holbech Nielsen, Simon Parsons: A Generalization of Dung's Abstract Framework for
     Argumentation: Arguing with Sets of Attacking Arguments. ArgMAS 2006: 54-73
[26] Henry Prakken. 1997. Logical Models for Modelling Legal Argument. A Study of Defeasible Reasoning
     in Law. Dordrecht: Springer.
[27] Henry Prakken: An abstract framework for argumentation with structured arguments. Argument &
     Computation 1(2): 93-124 (2010)
[28] Henry Prakken: Reconstructing Popov v. Hayashi in a framework for argumentation with structured
     arguments and Dungean semantics. Artif. Intell. Law 2012, 20(1): 57-82
[29] Antonio Rago, Oana Cocarascu, Francesca Toni. 2018. Argumentation-Based Recommendations:
     Fantastic Explanations and How to Find Them. IJCAI 2018: 1949-1955.
[30] Jerzy Wróblewski. 1992. The Judicial Application of Law. Dordrecht: Springer.
[31] Adam Z. Wyner, Rinke Hoekstra: A legal case OWL ontology with an instantiation of Popov v. Hayashi.
     Artif. Intell. Law 2012, 20(1): 83-107