Formalization and Algebraic Verification of Legal Requirements Alexander Godlevskyi1, Aleksander Letichevskyi1 , Vladimir Peschanenko2, Oleksandr Letychevskyi1, Maryna Morokhovets1, Volodymyr Skobelev1, Maksym Poltorackiy2 {a.godl49, aaletichevsky78, vpeschanenko, marina.morokhovets, mpoltorackiy}@gmail.com, lit@iss.org.ua, skobelevvg@mail.ru, 1 Glushkov Institute of Cybernetics, National Academy of Science of Ukraine, Kiev, Ukraine 2 Kherson State University, Ukraine Abstract. This paper outlines the technology of the formalization and verifica- tion of legal text documents. A formal model of a legal document is presented as an ordered set of rules. Every rule is presented as a triple containing pre- and post-conditions, as well as process parts. All rules are ordered in Use Case Maps (UCM) diagrams. Such a formal model can be verified for inconsistency and incompleteness, which are the causes of gaps and ambiguities in legal re- quirements. Verification is implemented by means of algebraic computation and symbolic modelling. The formalization of legal requirements is performed with the use of an intermediate language. An example of the procedure of regis- tration from the Tax Code of Ukraine is given for the illustration of the algebra- ic method implemented in the Insertion Modeling System (IMS). Keywords: Legal Requirements, Algebraic Computation, Use Case Maps, Basic Protocols, Symbolic Modeling 1 Introduction We examine the problem of the formalization and processing of legal require- ments in the context of Ukrainian law, especially the Tax Code. The adopted laws and other artifacts of law activity often contravene the existing ones, and the causes of inconsistency could be deeply hidden in non-visible dependencies of rules and no- tions. These causes could result from such gaps as incompleteness or when precedent belongs to a situation that is not described in law. The other gap is ambiguities in notions or inconsistency when different consequences are applied to the same circum- stances. The formalization of laws and the use of formal verification methods in the detection of such issues are among the ways to solve this problem. On the other hand, the existence of a formal model of laws allows an analysis of different precedents in terms of compliance with laws. Different interpretations of the precedent often lead to a wrong verdict, and formal proving could be a serious argumentation to cast away or to confirm such a situation. To create a formal model, the existing formal languages used for formalization in different subject domains could be applied. Thus, a formal model of law could be created based on the technology of creating a formal model of requirements for soft- ware represented in natural language (NL). For the last 15 years, significant efforts have been directed toward developing models and methods intended to retrieve and analyze information from text sources [1]. However, the possibility of using this arse- nal in legal document processing is essentially limited to the following circumstances. It is well known that conflict detection plays an important role in the process of the analysis of legal texts. Existing approaches to this research problem for stable and sufficiently well-studied American and European legislation, in fact, are carried out under the following scheme [2].The source text, originally presented in an NL, is converted into some particular controlled natural language (CNL) [3]. The obtained text in the CNL with the use of some grammatical frame (GF) [14] is converted auto- matically into a text in some formal language CL [5,6]. The resulting text in the for- mal language CL is analyzed by various tools of CLAN [7], which can build counter- examples if conflicts are detected. It should be noted that any used language CL must have the potential to represent such concepts used in legal texts as obligations, permissions, prohibitions, etc. It is also known that the inclusion of the time of the events in the analysis of information in the legal text documents is essential to the conclusions in the context of decision- making. Some attempts to implement this aspect were taken in [8-10] by marking time expressions with special labels. Experiments on the analysis of contracts, agreements, and other small-scale legal documents showed that during processing, a number of significant difficulties, mainly in the implementation chain NL→CN→CL arise. In particular, substantial difficulties arise in the processing of synonyms for relevant legal concepts. A much more com- plex situation occurs in the analysis of a country’s legislation. It is well known that legal statements are interrelated and cannot be considered in isolation from each other. At the same time, the relationship between the statements in the legal text are presented by means of cross-references, which significantly affect the processes of structuring the organization, automatic generation of annotations for legal texts, and the extraction of the basic concepts (subjects, rights, duties, etc.). Thus, the detection and resolution of cross-references in legal documents is one of the basic problems. Different approaches to the study of this problem for stable and sufficiently well-studied American and European legislation [11-15] show that alge- braic methods, based on the development and analysis of some calculus [16-20], play a significant role in legal document processing. 2 Technology Overview In the last decade the research in the field of legal document processing has also been conducted in Eastern Europe. It should be noted some attempts to present an XML model of a normative act (law) and to apply the XQuery language to perform applications in the processing of law were undertaken in Bulgaria [21]. Alongside the formal model proposed in [22], these attempts were intended to build a structured document (basic operations include inserting some new element, deleting, updating, and searching of the item). [23], demonstrating the possibility to apply the basic pro- cedures used for the English-speaking legal texts to the analysis of the Polish legal acts. US scientists have attempted to storage and processing of legal information in the format of GJXDM. The main problem was to create a set of specifications that would clearly define the structure of legal information data. The architecture of the regulato- ry legal act was represented in terms of xml, is that in the data dictionary used strictly defined vocabulary of names collected in an object-oriented data model (GJXDM). One of the first expert systems was a system of "JUDITH" which was developed in 1975 in collaboration with experts from Heidelberg and Darmshtadsk universities. This expert system allows the lawyers to obtain expert conclusions in the field of civil law. Data of knowledge system consists of preconditions and executive files which indicate on the result that exist between sets of premises. Expert system "DSCAS" helps to analyze the law aspects of claims for additional costs compensation con- cerned with the difference in physical conditions of the place for building of the con- tract specified. Expert System provides the state employee with knowledge for law- suit decision. Expert System DSCAS helps to analyze legal aspects of claims for addi- tional costs related to the physical conditions award in case of construction specified in the contract[29] Legal analysis system helps the lawyers to conduct any legal analysis to deliberate crime cases. The legal knowledge theory, the facts of case are presented in the form of semantic networks. The system gives its user some findings, including its logic to the conclusion on which they are based. It substantiates the conclusion by links to judg- ments and supports legal documents. Quite powerful expert system is SARA, which helps lawyers to analyze decisions based on discretionary rules. The user tells the system some factors and their meanings. The system assigns weights of all the factors referring to regulation acts. Factors and solutions are presented in the form of frames. An iterative correlation method is used to calculate these weights [30-31]. One of the most powerful systems of such class is TAXAMAN-I and TAXAMAN-II systems. These systems specialize in tax law of England; the software tool architecture is implemented on the directory base. Designed intelligent system contains the output mechanism that is built on the model of logic programming. The output mechanism is built on the model of logic programming. The system operation begins from an attempt to do the regulation of a high level by checking whether it was true premises. The system tests the truthfulness of such premises examining whether these backgrounds are true prerequisites. The process returns if it comes to a "deadlock". This process is called also as one that works on "inverse chain considerations" base, according to which the hypothesis trying to find the original data has to be proved. Inference mechanism can work on the base of "direct chain of reasoning" in which arguments are conducted on the output data to the hypothesis. Subsystem of logical mechanism allows to process the information from its user and compare it with knowledge that is in the Base of knowledge. The user has to choose only one of the options, comparing the circumstances of the crime filed with components that law- makers put in the law. All expert systems are based on general law positions that in some way were structured and formalized, and also based on the personal perception of law, the legal system or legal norms, logic argumentation. In perspective, the expert legal systems can be used effectively in practice and systematization of legislation to solve the fol- lowing important tasks: 1. Identifying of conflicting legal norms by expert interpretation. 2. To identify and eliminate legal gaps using similar rights or similar law; 3. Informal interpretation of unclear defined regulations, rules, concepts, principles. The difference between EU "Byzantine law and acts" will certainly be to their own combination of expert opinions and estimates presented in audio. Combining different in its orientation methods and techniques of implementing information approach to learning objects mark the way to the discovery of new knowledge, previously inac- cessible researcher in the empirical research in Byzantine and Slavic studies. Accord- ing to the planned development of software support, which will support the storage and use of expert character data presented in both text and audio and video forms. In this regard, the main thing - the possibility of information-analytical finding and sys- tematization of current scientific knowledge the information panel, which would meet modern standards of classification of scientific and technical staff of critical editions of historical sources [31]. Reference legal system "Garant" was developed in 1990. The system is provided with several exclusive tools. Among them: The system contains the documents of the Russian legislation in the English ver- sion; It helps to find the old version of the document and to refer to other documents of the time with the help of a click on the desired date in the comment; Function "Related Documents" helps to place the hyperlinks to the documents of interest to you, as well as to examine documents that are similar to yours; Chatting "online meeting" can correspond with colleagues and send links in real time; Legislation in the schemes "section - a" skeleton "of the Russian legislation; In the "Home Legal Encyclopedia you can find answers to common questions "Guarantor" of the system is designed to work with the information base. This is the work with class information, search for documents, database updates. This box contains information of interest primarily economists and accountants. This block includes "tax calendar", informing about payment terms, the number of working days in the period information, and so on. Etc. Search unit, this block contains a variety of search tools in the document infor- mation database; update block, in this block, the user can obtain statistical infor- mation about the information database and used to update a database, if neces- sary[33]. Bank "Expert application" includes special documents of the President, the Gov- ernment, the Federal Assembly, in particular on: the state property and privatization, funding, government, legislative activities on international relations, staffing and other issues. Included narrow instruments of government, presented documents of Russian ministries and departments, including the Ministries of Finance, Central Bank, State Customs Committee, State Tax Service, the State Property Committee. The base system - documents relating to specific industries, specific areas and organi- zations, regulatory, technical and organizational regulations[30-31]. The first electronic filing cabinet for computer search of legal information was Credoc Belgian system, which made it impossible to read the full text of documents, but only with details (name, number, date of publication and so on). It consists of a bank of electronic documents [29]. One of the most famous full-text reference legal systems of the United States - LEXIS. Since 1980, the system became available to UK users, and from 1985 - Aus- tralia. It is responsible for more than 20 thousand requests daily, basis of information gathered regulations, state and federal acts, including the full text of the US Constitu- tion and all legal precedents USA [30]. Specifically focused on identifying and modeling relationships in a criminal group it allows you to record information that reflects their quality characteristics (corrup- tion, family ties, etc.). Based on this analysis it cannot find strings that come from the immediate perpetrators to the organizers and sponsors of the crime [30-31]. A much more difficult situation occurs while processing legal documents in Ukraine, where legislation is only now being set and is subject to frequent changes. In contrast to the United States at this point, in Ukraine, there is no document like the Bluebook [24] or the US Association of Legal Writing Directors’ (ALWD) Citation Manual [25], and the variety of cross-references (including the implicit ones) in Ukrainian law requires a thorough analysis. Moreover, the implementation of the chain NL→CN→CL in addition to the abovementioned challenges, requires resolution of a variety of difficulties associated with the peculiarities in Ukrainian language grammar. This paper attempts to apply the technique of insertion modeling [26] and an algebraic approach, previously used for the verification of telecommunication models, to the analysis of legal texts, in this case to the Tax Code of Ukraine. 3 Formalization and Creation of Algebraic Model of Tax Code of Ukraine Insertion modeling considers the interaction of agents and the environment. Agents are characterized by types, each of which having its own set of local attributes. The environment also contains global attributes. An interacting agent or agent inserted into the environment has a behavior that is described by the insertion function. This function is given by the rules of changing the states of agents and the environment. These rules describe local interactions and transitions that are called basic protocols [27]. Every basic protocol has a form x ( U(x)
V(x)), where x is a list of pa-
rameters, U(x) – a precondition of basic protocol that defines the condition when the
protocol can be applied, V(x) – a postcondition that defines transition of a system to a
new state, and P – a process that illustrates the given transition. U(x) and V(x) are
formulae of first-order logic constructing from predicates over integer, real, enumer-
ated, and symbolic types, while P is a sequence of local actions and message passing.
The state of a system is represented by means of simple attributes and attributes with
parameters (functional attributes).
The basic concepts of insertion modeling are suitable to represent the notions and
ideas used in the formulation of Tax Code articles. Laws are considered as require-
ments for the functioning of public institutions. These institutions constitute environ-
ment for taxpayers. There are executive and legislative bodies functioning within the
environment that impose requirements on the operation of the agents: the various
registries, supported by authorities, are considered as global attributes of the environ-
ment.
Authorities, firms, and businessmen (legal entities and persons) are acting as
agents, interacting with each other and with the environment. However, the formaliza-
tion of legal texts for further processing based on insertion modeling techniques has
its own specifics. For example, the type of agent representing a taxpayer has the fol-
lowing attributes: name, set of activities, tax payment mode, status of payer, and oth-
ers. Attribute values are changed by basic protocols that simulate the articles and
clauses of laws regulating the legal actions of legal entities and authorities.
Document circulation and payments are simulated through a message passing
mechanism between corresponding agents, where the types of messages are state-
ments, requests, different templates.
Modeling of processes in time is carried out by two known mechanisms: timers
and a global clock. Timers are used for the simulation of those regulations of the Tax
Code, which are governed by restrictions similar to “answer must be given no later
than 5 days after.” The global clock is to simulate restrictions associated with the
implementation of actions, such as the timely payment of taxes.
Among the attributes of the agent, we could distinguish the agent status attributes
of an enumerated type, specifying the variety of current agent modalities (obliga-
tions/restrictions/permissions).
For example, when an agent made another deal and increased its income, it may
exceed a certain limit that obliges it to execute an additional registration.
Having applied for registration, it goes to an intermediate state, where the obliga-
tion is removed, but must wait for receipt of a positive response from the supervisory
authority; then, the agent state will change to the “registered“ state that allows/obliges
it to perform certain actions.
4 Example of Registration Procedure from Tax Code
As an illustration, below is considered a formal model of the procedure of the regis-
tration of a person as a payer of Value Added Tax (VAT), defined in articles 180-183
of the Tax Code of Ukraine. This model is a fragment of the general model that is
being developed for the Tax Code.
The source for model development is the texts of the Tax Code articles in natural
language. It anticipates the use of an intermediate language close to the natural. In this
language, the legal items are reformulated in the form close to “if ... then ...” that is
convenient for the creation of the next formal presentation as basic protocols.
Accordingly, in the development of the Tax Code model, one can identify the
stages when a list of global attributes, types of agents, and their local attributes is
formed. It is implemented with an exact reference to the Tax Code items or other laws
referred therein. Of note is that the elements of environments and agents could be
refined at later stages due to a better understanding of the essence of items. The next
stage is the constructing of basic protocols that describe admissible action of agents.
Below, we present as example some items of article 183 in an intermediate lan-
guage that will be used afterwards to create a formal model defining the explicit set of
attributes. These items specify the actions of a person that should be registered as a
taxpayer and the actions of local authorities.
We consider the interaction between agent Person with agent State Authority.
Rule 1. If Person (1.1) is not a single taxpayer (1.2) whose total income during
last 12 months is more than 1,000,000 UAH, then Person (1.3) must be registered as a
VAT payer.
Rule 2. . If Person (2.1) to be registered as a VAT payer (2.2) did not submit the
application to State Authority before time T_CRITICAL, then Person (2.3) is respon-
sible for a tax payment violation.
Rule 3. If Person (3.1) to be registered as VAT payer during (3.2) the period be-
fore T_CRITICAL is not completed, then Person (3.3) must submit the application to
State Authority before T_CRITICAL.
Rule 4. If Person (4.1) must submit the application to State Authority, then Person
(4.2) submits an application to State Authority.
Rule 5. If State Authority (5.1) has the application from Person, (5.2) there are no
obstacles to refuse in registration; then, State Authority (5.3) registers Person as VAT
payer.
5 Formal Model of Registration Procedure and its Verification
We consider Person and State Authority in a given procedure as types of interacting
agents. Type of agent Person has attributes:
SingleTaxPayer – attribute of Boolean type that is equal to TRUE when it is a sin-
gle tax payer;
MustRegisteredVAT – attribute of Boolean type that presents the status of Person
when Person should be registered as VAT payer;
TotalIncome(t1,t2) – total income obtained by Person from time t1 to t2. It is a
functional attribute, which maps (t1,t2) to an integer;
SubmitApplication – attribute of Boolean type that presents the status of Person
when the application is submitted;
T_CRITICAL – time when the application shall be submitted. It is an integer-type
attribute;
Responsibility – attribute of Boolean type that signalized responsibility of Person
for wrong tax payment;
ShouldSubmitApplication – attribute of Boolean type that signalizes the necessity
of application submitting;
NoObstacles – attribute of Boolean type that signalizes the absence of obstacles
for registration as VAT payer.
The initial state of all agents could be presented as the below formula:
~SingleTaxPayer ~MustRegistredVAT ~SubmitApplication
~Responsibility ~ShouldSubmitApplication
We abstracted from other values of attributes considering it arbitrary. Time is con-
sidered as some integer attribute t changing during Person activity. The changing of
time is fixed in the post-condition as the assignment statement t := t + a, where a is a
parameter of the basic protocol. We consider days as the unit of time.
The order of events of the registration procedure could be presented as UCM nota-
tion, which is a subset of User Requirements Notation (URN) language, standardized
by ITU (Z.151). The UCM diagram is an oriented graph containing constructs, such
as its nodes with “responsibilities,” branches OR-fork and OR-join, references to
other diagrams, and synchronization constructs. Basic protocols could also be pre-
sented without order and correspondingly without UCM. Below is the procedure of
registration in UCM notation (Fig. 1).
Fig. 1. Example of Registration Procedure presented as UCM-diagram
In the nodes of “responsibility” constructs, there are transitions associated with
basic protocols. The list of basic protocols for agent P of type Person and agent A of
type Authority is given below:
BP1 (1.1, 1.2, 1.3). a, (a > 0) ~SingleTaxPayer (TotalIncome(t - 365,t) >
1000000) (t > 365)