Towards Formalizing Statute Law as Default Logic through Automatic Semantic Parsing Marcos Pertierra Sarah Lawsky MIT, CSAIL Northwestern Pritzker School of Law marcosp@mit.edu sarah.lawsky@law.northwestern.edu Erik Hemberg Una-May O’Reilly MIT, CSAIL MIT, CSAIL hembergerik@csail.mit.edu unamay@csail.mit.edu ABSTRACT examining tax regulations and statutes, multiple law and technol- Tax regulations and statutes are long, complex, and difficult to un- ogy projects have modeled the law and used artificial intelligence to derstand, and thus present the opportunity for undetectable legal enlist computers for automatic reasoning, see [1, 17, 19, 24, 25]. Ap- avoidance. Our project goal is to facilitate a new approach to statute proaches vary in whether they manually or automatically interpret composition wherein a logic representation of existing law would the meaning of the IRC text and whether they resort to an ad hoc be extended and checked before its translation to natural language. representation of it or different logical formalisms. Modeling tax We envision a software pipeline that would automatically parse law is challenging because tax law is represented in the IRC with a requested section of the Internal Revenue Code (IRC) and accu- natural language. Natural language is obviously useful for humans rately express it with a default logic representation. Herein, we to read and interpret. However, it is very difficult for computers. evaluate the effectiveness of an end to end assembly of existing Thus, a critical step is to convert the law from its natural language software tools. This pipeline uses regular expression search on the representation to some formal representation that a computer can Code’s common structural text patterns and conducts semantic read and use for inference, i.e. semantically parse it. parsing with various open-source natural language parsers. Using Our project proposes to: (1) Employ a version of default logic IRC Section 163(h) which we have manually expressed in default (DL) to represent a portion of tax law relevant to the study logic, we evaluate the resulting intermediate logic representations. of avoidance. This encompasses the work of Lawsky et. al. [14], We observe that the semantic complexity of tax regulations over- which advocates formalizing tax law, and philosophical logic [7] pre- whelms the parsers’ capabilities. Their shortcomings will have to senting a version of DL that may be particularly well suited to for- be addressed as a prerequisite to a component that will, starting malizing the tax law. (2) Provide a software system that collab- from the intermediate logic, automatically express the default logic. oratively helps a legal expert to reason about legal logic in this new formalism. We envision designing software that trans- KEYWORDS lates and represents relevant sections of the IRC or proposed new statutes as DL to whatever extent that is feasible. In infeasible cir- tax, semantics, parsing, default logic cumstances, the system teams up with its expert to accomplish tasks more efficiently and with improved ease. The expert is expected 1 INTRODUCTION to query the logic system they have set up to determine whether United States tax law, as represented in the Internal Revenue Code avoidance is possible under its meaning. (IRC) and its accompanying regulations, is notoriously complicated. We defer our rationale and description of our DL to prior contri- This complexity increases the cost of tax compliance. Even more butions [14, 15]. Here, we focus on automation technology. To date alarming, both individuals and corporations take advantage of the we have conceptually designed an end-to-end IRC-to-default- law’s complexity to reduce their taxes by engaging in legal avoid- logic1 pipeline that translates IRC code in XML into supernormal ance. Legal avoidance describes actions that are technically legal DL, see Figure 1. We have interfacing components which allow us but which do not fall within legislators’ intentions. Fundamentally to focus on two stages independently. The goal of the first stage is we are interested in the way in which tax law fails to prevent legal to automatically parse the text of the relevant regulations into an avoidance. Our central research question is how formulations of intermediate logic representation. In this contribution we report statutes and regulations can be improved to reduce legal avoidance. our progress in implementing the first stage. The aim of the second Tax avoidance occurs because taxpayers are able exploit ambigu- stage is to accurately transform the intermediate logic representa- ities within the law or take advantage of disparate legal treatment tion to background theory and the ordered default rules that are of similar concepts (i.e., engage in regulatory arbitrage) to reduce supported by a theorem prover, allowing queries and propositions their tax owed in a way that is legal but is unintended by the law. to be tested around taxation concepts such as deductibility. To date, Because it is almost impossible to foresee avoidance by manually we have assembled a solver and set it up to handle simple examples of our DL, but have deferred the most challenging task of accurate In: Proceedings of the Second Workshop on Automated Semantic Analysis of Informa- tion in Legal Text (ASAIL 2017), June 16, 2017, London, UK. transformation. Copyright © 2017 held by the authors. Copying permitted for private and academic purposes. 1 https://github.com/mpertierra/irc_to_default_logic Published at http://ceur-ws.org ASAIL 2017, June 16, 2017, London, UK by subsequent information [14]. This reasoning is modeled by DL in [12], a non-monotonic logic. A metarule is required in the DL system to indicate how to reason about apparently conflicting statu- tory rules. The DL formalization fits with the IRC structure making it easier to accurately express the statutory meaning. Much of this Figure 1: Project pipeline, script: pipeline.py. meaning can be found by paying attention to the level-based style or structure of the IRC, e.g. general rules are followed by excep- For the first stage, parsing the regulations, we have taken two tions.2 There may be a variety of different interpretations of law, specific strategies: (1) We exploit the style guidelines for drafting depending on the precise question one is asking. DL’s formaliza- legislation [18] to extract definitions and rules from the IRC text tion provides appropriately different answers by depending on the by pattern matching with regular expressions, a relatively simple priority the formalizer gives to the various rules. and a semantically superficial approach. (2) We leverage existing There have been various attempts to formalize legal text, whether natural language processing semantic parsers to extract formal that be via some programmable representation, an ontological repre- representations from text. These currently expect relatively simple sentation, or some other semi-formal representation that is not tied input sentences. As a result, we must evaluate their capacity. Our to any implementation. For example, Sergot et. al. [19] translated evaluation is on the definitions, rules and DL representation of the British Nationality Act into Prolog [8]. This entailed manually elements of Section 163(a), previously identified by an expert [15]. extracting the meaning from the Act then programming the Prolog The contributions of this paper are: (1) The introduction of our rules. Using the logic of Prolog presented difficulty because the project with its goals and approach, encompassing a version of DL British Nationality Act expresses non-monotonic logic [19]. Other supporting the expression of tax statues relevant to our focus on work has explored the use of non-monotonic logics, e.g. Defeasible legal avoidance. (2) The introduction of IRC-to-default-logic, Logic [24], to express the law, which is tested on a selection of Sec- open source software that automates our approach. (3) A demonstra- tion 8.2 of Australia’s Telecommunications Consumer Protections tion and evaluation of IRC-to-default-logic. This has revealed Code (2012) on complaint management. the merits and open issues arising from stepping off with existing software tools. 2.3 Semantic Parsers In § 2, we discuss related work. In § 3, we present our method. Another body of work has focused on automatic translation of tax In § 4 we present the results. In § 5 we conclude. law to formal representation using semantic parsers. A semantic parser takes text as input and outputs a formal representation, e.g. 2 RELATED WORK first-order logic. Semantic parsers are usually initialized with ma- To express natural language completely and precisely in a formal chine learning. They are trained on pairs of sentences and their representation upon which a computer can meaningfully act, we corresponding logical representations. Much work focuses on train- need to capture its semantics. This can be partially accomplished by ing models for specific domains. Others are trained on a variety of pattern matching and by semantic parsing. We also face the question corpora to achieve wide coverage. Semantic parsers include: of what formalism is best suited as a target output representation. 1. C&C/Boxer Combining C&C tools [6]3 and Boxer [5]4 . We review: § 2.1 text extraction and ad hoc representations. § 2.2 2. JAMR A graph-based parser [9] for Abstract Meaning Rep- formalisms for representing law, as well as the version of DL we resentation (AMR) [4]. use. § 2.3 relevant existing semantic parsers. 3. Cornell AMR A CCG-based parser for AMR [3]. 4. Cornell SPF A semantic parsing framework that uses CCG 2.1 Extraction & Ad Hoc Representation to implement various algorithms. [2] 5. CAMR A transition-based, AMR parser. [22] Previous work has focused on pattern-based rule extraction from 6. NL2KR A platform with a CCG-based parser. [21] law, see e.g. [25]. These efforts have typically focused on extracting The lack of a large machine learning data set available for law higher level elements from text, such as exception phrases, rather texts (training pairs of input text and output formal representation) than translating to a formal representation. A tax avoidance project makes it difficult to train a semantic parser specific to legal text. named Stealth does manual formalization and translation [11] and In [17], McCarty proposes a semi-supervised learning approach uses an ad hoc rule-based representation that supports tax calcu- based on word embeddings computed from legal texts that could lations. The Tax Knowledge Adventure [1] ontology reuses the potentially be used to overcome this problem; however, this is still WordNet and LKIF-Core ontologies for a set of terms extracted theoretical and has yet to be implemented. from in the “open-text” from IRC and Tax resources. IRC sections Some research has taken a different approach by experiment- 301, 302, 317 are represented as concepts in the ontology. Rules ing with wide-coverage semantic parsers. Work by Wyner et. al. are “too complicated” for OWL assertions; instead, rules are class [23] shows that this is not entirely unreasonable for short, simple member functions in an object oriented programming language. sentences from legal texts, using C&C/Boxer parser. Gaur et al. [10] attempt the same task with their own semantic parser, NL2KR. 2.2 Formal Represention of Statute Law Standard formal logic is not the best representation to accommo- 2 The IRC does not necessarily follow the recommended structure. date statutory reasoning. One better choice is defeasible reasoning, 3 Consists of a POS-tagger, Named-Entity Recognition, and a CCG [20] parser. i.e. reasoning that may result in conclusions that can be defeated 4 Maps CCG derivations output to Discourse Representation Structure [13] Towards Formalizing Statute Law as Default Logic through Automatic Semantic Parsing ASAIL 2017, June 16, 2017, London, UK [16] seek to show that the distinction between logical and statistical use the word “means” after the defined term. Other less common approaches is being closed with the development of models that phrases used instead are “has the meaning”, “includes”, “does not can learn the conventional aspects of natural language meaning include”, “shall include”, and “shall not include”. from corpora and databases. 3. Parse This component parses initially with C&C Boxer. It displays the result in both Discourse Representation Structure (DRS) and First-Order Logic. The parser will fail on sentences that are 3 METHOD too long (of which there are a few in the IRC). CAMR is called if We have designed IRC-to-default-logic, an open source soft- C&C/Boxer fails, and outputs the result in AMR. ware pipeline, see Figure 1. The top level code, pipeline.py, takes 4. Order We use NLTK’s logic package to parse and represent as input the number of a specific section (or lower level unit) of the the first-order logic expressions that make up the background the- IRC to be parsed and a desired final representation. The pipeline ory and default rules of our supernormal DL. We use NLTK’s in- consists of 4 functional modules: ference package to access the library’s theorem prover as well as 1. Crawl: irc_crawler.py, Input: XML formatted IRC code. Out- interface with the Mace4 model builder, which we use to process put: A pipeline level data structure that mimics the level structure default rules, and query the supernormal DL. of the IRC. After the pipeline completes (1–4), it is possible to query and 2. Extract: definition_extractor.py & rule_extractor.py, prove the resulting DL with default_logic.py. This will reference Output: first-order logic of definitions and text segments of rules. background theory and default rules. The background theory is a 3. Parse: candc_boxer_api.py and parse_amr.py Input: a “sen- set of first-order logic expressions that express any information tence” aka text segment, Output: an intermediate formal represen- that is already established. The default rules are a list of rules, tation. expressed using first-order logic, that are ordered by priority. These 4. Order: Output: DL. default rules are processed, starting with the highest-priority rule, The modules in IRC-to-default-logic are: and we extend our theory using these rules. We stop processing 1. Crawl This module references the IRC in its XML format default rules once a default rule being processed is inconsistent with and isolates an element at any specified level. We represent the our current theory [7]. The query formal representation we use is IRC in terms of abstract elements called “levels”. Each level can be supernormal DL, a variant of DL, particularly suited for statutory one of the following: section, subsection, paragraph, subparagraph, law as detailed in [15]. clause, sub-clause, item, sub-item, sub-sub-item. These are in hi- erarchical order; each level can be nested in levels that precede it. 4 EXPERIMENTS Each level can optionally contain any of the following: heading, Our experimental starting point is a set of elements in Section 163 chapeau, content, sub-levels, continuations. The heading indicates that have been represented as DL in [15]. We worked first to achieve that this paragraph states a general rule. The chapeau sets up the as complete and accurate extraction and parsing as possible of beginning of the rule. The sub-levels, in this case sub-paragraphs, this set, using the expert translation as ground truth. We then provide conditions on the contract mentioned in the chapeau, and applied the regular expression(s) we used in extraction to the rest the continuation states the conclusion of that rule. of the IRC. We are interested in how many relevant rules and 2. Extract This module pattern matches and extracts (1) defined definitions can be parsed and represented in the entire IRC. More terms using a single regular expression, (2) definitions given the broadly we are looking for hints as to how much knowledge beyond retrieved terms using a single regular expression (3) “general rules”, the IRC we may need. For example, “interest” is not defined in “exceptions”, and “special rules”, by searching for levels with headers the IRC but its definition is important. For that extra knowledge, matching those terms. It outputs defined terms and first-order logic we are interested in learning how much may need to be elicited expressions representing ontological relations between defined from experts versus from another digital resource. We present our terms. It outputs text for extracted rules. experiments following the IRC-to-default-logic pipeline. The regular expressions are based upon the style guidelines for drafting legislation in a manual entitled “House Legislative Counsel’s Manual on Drafting Style” [18]: (1) General rule – State the main 4.1 Definition & Rule Extraction message. (2) Exceptions – State the persons or things to which the The pattern-based approach has shown some promising results. main message does not apply. (3) Special rules – Describe the person Although the lead-ins listed in Section 3 are used in many sections or things – (a) to which the main message applies in a different of the IRC, we find that the majority of definitions do not follow way; or (b) for which there is a different message. the style manual. The pattern: the term (?:("[^"]+")|(âĂŸ[^âĂŹ]+âĂŹ))5 re- The manual lists three phrases that are generally used to “lead in” trieves 4971 matches in the entire IRC (not including notes, repealed a definition: • “For purposes of this [provision]” • “In this [provision]” sections, omitted sections). Some of these matches are not found in • “As used in this [provision]”, where [provision] is a placeholder for definitions and are instead just references to a defined term. When the level type, such as “paragraph”. The manual also indicates that we refine the pattern: the term (?:("[^"]+")|(âĂŸ[^âĂŹ]+âĂŹ)) (?:means|includes|does not include|has the meaning| drafters should begin a definition (after the lead in) with the phrase shall include|shall not include) “the term”, preceding it. The word or phrase following “the term” we retrieve 4710 matches in all of the IRC, about 94.75% of all will specify the type of definition. There is no explicit guideline in occurrences of the first pattern. the manual for what this should be. However, through using regular expressions, we have observed that the vast majority of definitions 5 See https://docs.python.org/2/library/re.html ASAIL 2017, June 16, 2017, London, UK total # of sentences: 423 min: 4, median: 48, mean: 64, std: 54, max: 509 outlier cutoff: 300 # of outliers: 3 # of sentences that crash: 91 total # of sentences: 904 min: 3, median: 45, mean: 55, std: 46, max: 882 outlier cutoff: 300 # of outliers: 2 # of sentences that crash: 135 total # of sentences: 1136 min: 2, median: 51, mean: 65, std: 55, max: 471 outlier cutoff: 300 # of outliers: 13 # of sentences that crash: 286 (a) C&C/Boxer DRS output (a) Frequency of tokens in IRC different rules. (b) C&C/Boxer FOL output total # of sentences: 4701 minimum token count: 6 median token count: 50 mean token count: 63 standard deviation: 59 maximum token count: 1863 outlier cutoff: 300 # of outliers: 28 # of sentences that crash: 1117 (c) CAMR AMR output (d) CAMR FOL output Figure 3: Parser outputs for section 163, subsection a. logic in Figure 3b. From Figure 3a, we see that the operands of the (b) Frequency of tokens in IRC definitions “or" were wrongly parsed into “There shall be allowed as a deduction all paid" and “accrued within the taxable year on indebtedness", as Figure 2: Output from our analysis scripts for IRC rules and indicated by the two boxes separated by the | symbol. However, definitions. the two innermost boxes separated by the -> symbol bear some semblance to a simpler statement of the rule that if x is an inter- est then x is deductible. Figure 3c shows CAMR outputs in AMR Because the number of tokens in a sentence is a limit for the representation and Figure 3d the converted first-order logic. From parsers we use, we also count the number of tokens we retrieve Figure 3c, we see that CAMR’s output is an invalid AMR graph, for exceptions, special and general rules. Figures 2a and 2b show as the x4 node contains two edges with the same relation ARG1. It histograms of the number of tokens found in each rule (Figure 2a) also misrepresents the “or" operation, as its operands have concepts and definition (Figure 2b), as well as some statistics of token counts “all interest", “pay", and “accrue". The “or" operation should have for the extracted rules and definitions6 . operands “pay" and “accrue", and should be modifying “all interest". However, it did capture the fact that a deduction should be allowed, 4.2 Semantic Parsing and the representation is less convoluted than that of C&C/Boxer. Subsection 163(a) is a general rule, as indicated by its header “GEN- ERAL RULE”. It states “There shall be allowed as a deduction all 4.3 Default logic representation interest paid or accrued within the taxable year on indebtedness.”. Figure 4 shows the DL representation for Section 163 (h), used for This sentence is one of the shortest sentences in the entire sec- determining whether interest, such as personal interest or qualified tion, and C&C/Boxer and CAMR were unable to correctly parse it. residence interest, is deductible. The background theory includes re- They generated different outputs. The DRS outputs generated by lations between defined terms, extracted by definition_extractor.py. C&C/Boxer are shown in Figure 3a and the converted first-order Not all of these expressions in the background theory are necessary 6 Note that crashes in IRC-to-default-logic comes from C&C/Boxer calls. for determining whether personal interest and qualified residence Towards Formalizing Statute Law as Default Logic through Automatic Semantic Parsing ASAIL 2017, June 16, 2017, London, UK Linguistics, Lisbon, Portugal, 1699–1710. http://aclweb.org/anthology/D15-1198 [4] Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2012. Abstract meaning representation (AMR) 1.0 specification. In Parsing on Freebase from Question-Answer Pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle: ACL. 1533–1544. [5] Johan Bos. 2008. Wide-coverage semantic analysis with boxer. In Proceedings of the 2008 Conference on Semantics in Text Processing. Association for Computa- tional Linguistics, 277–286. [6] Johan Bos, Stephen Clark, Mark Steedman, James R Curran, and Julia Hock- enmaier. 2004. Wide-coverage semantic representations from a CCG parser. Figure 4: DL representation’s background and default rules, In Proceedings of the 20th international conference on Computational Linguistics. Association for Computational Linguistics, 1240. from pipeline.py, for Section 163, subsection (h). [7] Gerhard Brewka and Thomas Eiter. 2000. Prioritizing default logic. In Intellectics and computational logic. Springer, 27–45. [8] William Clocksin and Christopher S Mellish. 2003. Programming in PROLOG. interest are deductible. Also, the expression personal_SPACE_interest(y)[9] Springer Science & Business Media. Jeffrey Flanigan, Sam Thomson, Jaime G Carbonell, Chris Dyer, and Noah A is one that we hand-coded, as this represents the query that one Smith. 2014. A discriminative graph-based parser for the abstract meaning might inject into the background theory to query the DL the- representation. (2014). [10] Shruti Gaur, Nguyen H Vo, Kazuaki Kashihara, and Chitta Baral. 2014. Translating ory about whether or not “personal interest" is deductible or not. simple legal text to formal representations. In JSAI International Symposium on However, the expression all x.(personal_SPACE_interest(x) Artificial Intelligence. Springer, 259–273. -> interest(x)) also had to be hand-coded; the term “interest" [11] Erik Hemberg, Jacob Rosen, Geoffrey Warner, Sanith Wijesinghe, and Una-May O’Reilly. 2015. Tax Non-Compliance Detection Using Co-Evolution of Tax is not a defined term, and so this relation was not extracted by Evasion Risk and Audit Likelihood. In ICAIL. definition_extractor.py. The default rules shown were also [12] John F Horty. 2012. Reasons as defaults. Oxford University Press. [13] Hans Kamp, Josef Van Genabith, and Uwe Reyle. 2011. Discourse representation hand-coded, as our semantic parser approach was not able to parse theory. In Handbook of philosophical logic. Springer, 125–394. the sentences corresponding to these rules. [14] Sarah Lawsky. 2017. Formalizing the Code. Tax Law Review (2017). [15] Sarah Lawsky. forthcoming 2017. A Logic for Statutes, 21 Florida Tax Review. (forthcoming 2017). 5 CONCLUSIONS & FUTURE WORK [16] Percy Liang and Christopher Potts. 2015. Bringing machine learning and com- In this paper we have presented software tools that make use of positional semantics together. Annu. Rev. Linguist. 1, 1 (2015), 355–376. [17] L Thorne McCarty. 2016. Discussion Paper: On Semi-Supervised Learning existing work in the field of semantic parsing, as well as having of Legal Semantics. (2016). https://www.researchgate.net/profile/L_Thorne_ shown how we exploit the drafting style of the law to make use of Mccarty2/publication/304742441_Discussion_Paper_On_Semi-Supervised_ regular expressions for extracting rules and definitions. We take Learning_of_Legal_Semantics/links/5778bfc108ae4645d61182cf.pdf [18] US HOUSE OF REPRESENTATIVES. 1995. HOUSE LEGISLATIVE COUNSEL’S our first step towards automatically converting the law from its MANUAL ON DRAFTING STYLE. (1995). natural language representation to a DL, that a computer can easily [19] Marek J. Sergot, Fariba Sadri, Robert A. Kowalski, Frank Kriwaczek, Peter Ham- mond, and H Terese Cory. 1986. The British Nationality Act as a logic program. read and use for inference. Commun. ACM 29, 5 (1986), 370–386. We extract formal representations from definitions and rules, [20] Mark Steedman and Jason Baldridge. 2011. Combinatory categorial grammar. by exploiting common structural patterns in the IRC. We are able Non-Transformational Syntax: Formal and Explicit Models of Grammar. Wiley- Blackwell (2011). to extract some simple formal representations from definitions in [21] Nguyen Ha Vo, Arindam Mitra, and Chitta Baral. 2015. The NL2KR Platform for certain sections. However, extracting rules is not straightforward, building Natural Language Translation Systems.. In ACL (1). 899–908. most rules do not have an easily exploitable pattern and require [22] Chuan Wang, Nianwen Xue, and Sameer Pradhan. 2015. A Transition-based Algorithm for AMR Parsing. In Proceedings of the 2015 Conference of the North deeper analysis. American Chapter of the Association for Computational Linguistics: Human Lan- We make use of existing natural language parsing algorithms that guage Technologies. Association for Computational Linguistics, Denver, Colorado, 366–375. http://www.aclweb.org/anthology/N15-1040 are capable of extracting formal representations from text. These [23] Adam Wyner, Johan Bos, Valerio Basile, and Paulo Quaresma. 2012. An empiri- parsers will only work adequately when the input text consists cal approach to the semantic representation of laws. In The 25th International of short, simple sentences. Unfortunately, tax law in its textual Conference on Legal Knowledge and Information Systems. [24] Adam Wyner and Guido Governatori. 2013. A Study on Translating Regulatory representation consists of long, complex sentences that are difficult Rules from Natural Language to Defeasible Logics.. In RuleML (2). Citeseer. even for humans to understand. As a result, this approach has been [25] Adam Wyner and Wim Peters. 2011. On Rule Extraction from Regulations.. In only marginally successful. JURIX, Vol. 11. 113–122. As future work, we will investigate whether software could han- dle a large number of cases while flagging ambiguities it encounters. These ambiguities could be passed to an expert for assistance. This task could be sourced to law students or a pool of experts. REFERENCES [1] Yoo Jung An and Ned Wilson. 2016. Tax Knowledge Adventure: Ontologies that Analyze Corporate Tax Transactions. In Proceedings of the 17th International Digital Government Research Conference on Digital Government Research. ACM, 303–311. [2] Yoav Artzi. 2016. Cornell SPF: Cornell Semantic Parsing Framework. (2016). arXiv:arXiv:1311.3011 [3] Yoav Artzi, Kenton Lee, and Luke Zettlemoyer. 2015. Broad-coverage CCG Semantic Parsing with AMR. In Proceedings of the 2015 Conference on Empir- ical Methods in Natural Language Processing. Association for Computational